I am currently responsible for developing bespoke scripts to assist clients with managing their workflows, backups, and various other unstructured data management needs, training client staff on use of the Starfish CLI, assisting clients with discovering use cases for Starfish in their environment, managing Starfish servers and agents in client environments, and various other professional services. I am also responsible for developing internal documentation for clients' infrastructure and workflows as well as documentation for common and surprising things that can be done with Starfish.
- Engineered systemd-based copy loops to maximize throughput for Disaster Recovery to ensure near-real-time data redundancy
- Developed a Proof of Concept (POC) for automated metadata extraction from SGY file for Oil & Gas clients
- Collaborated on the architecture of data movement strategies for multi-petabyte client environments
- Collaborated on the development of storage rentention workflows to align with regulatory requirements
Sr. Sysadmin
-
Seitel Inc
November 2021 - September 2024
My responsibilities at Sietel included a small HPC cluster and its infrastrucutre, a couple of TrueNAS file servers that handled ~1.5 PiB raw storage, a Proxmox VE hyperconverged environment with GPU passthough, remote access for our entire workforce, the odd bit of networking, managing the BYOD policy via M365, monitoring for all of those bits and everything else.
- Architected and implemented TrueNAS storage solution
- Planned and executed move from in-house datacenter to colocation
- Implemented Parallels RAS for remote access to Windows desktops and applications
- Architected and implemented a Proxmox solution for remote 3D Linux workstations
- Responsible for directly supporting all users at all levels of support
- Standardized enterprise-issued devices to make support easier in a work from home environment
- Implemented Netbox for IPAM and networking visualization
- Implemented CheckMK for monitoring various infrastructure bits
At Downunder Geosolutions (DUG) I was a core part of the team that managed a complex infrastructure spanning four geo-disparate sites as well as over 8000 physical nodes local to Houston. The biggest challenge was implementing current knowledge at a much larger scale without overloading the infrastructure. All of the Linux folks were responsible for everything in the environment whether that was the diskless boot images, monitoring, or infrastructure projects like Proxmox VE.
- Implemented a global FreeIPA solution to replace an aging OpenLDAP implementation
- Architected a hyperconverged Proxmox VE solution for business critical virtual machines and containers
- Developed proof of concept using Wazuh for security monitoring
- Responsible for level 3 escalations of HPC cluster issues
- Implemented a complete DNS solution for ad/malware/phishing blocking along with serving over 100,000 internal DNS entries using Ansible for deployment and Git for management
- Member of the team responsible for maintaining the diskless boot image for over 12,000 compute nodes using an internally developed PXE deployment system built on bittorrent to allow easy scaling
Sr. HPC Architect
-
PCPC Direct
February 2014 – June 2019
PCPC Direct was a VAR that serviced the oil and gas industry as the majority of its business. My responsibilities varied from project to project based on current needs. It was a fast paced environment and required constant learning. Whether it was an RFQ that required learning and understanding a new technology or maintaining certifications the learning was constant. This is where I earned my RHCSA as well as many smaller Redhat certifications. A lot of the work involved travel and I truly enjoyed being on the road, meeting new people, and seeing new datacenters.
- Technical lead for the development of a 3D accelerated, remote workstation and collaboration solution utilizing Red Hat Virtualization, Red Hat Cloud Suite, and Nvidia vGPU technology
- Team Lead for teams managing multiple remote clusters in a managed services environment including all systems administration tasks, maintaining and tuning schedulers/resource managers, performance tuning, and system monitoring. Responsible for deploying multiple clusters at client sites ranging from single rack clusters to clusters utilizing hundreds of nodes
- Responsible for onsite testing of non-managed clusters prior to hand-off to clients including local and remote sites
- Developed an automated system using xCAT to deploy clusters, collect inventory, run burn in, and validate results in order to drastically reduce the man hours necessary to complete the integration and testing process
- Architected a burn in suite utilizing FOSS tools to simulate various HPC workloads in order to reduce the number of failures after cluster delivery
- Responsible for software stack architecture for all HPC related RFPs including OS, scheduler/resource manager, development tools, monitoring, and applications for all hardware vendors
- Developed cluster administration documentation for clients as well as in-house use
MD Anderson is likely the best cancer research hospital in the United States if not the world. While the work was roughly the same as other HPC jobs, the work was directly supporting cancer research. It was in this position that I gained familiarity with R and its intricacies. Unlike the oil and gas industry where everything is standardized, each researcher had their own preferences involving applications and languages. This was a fun challenge as multiple versions of Python, R, and Perl had to be maintained along with some of the standard genetics packages like Top Hat and Bowtie. With a stable cluster most of the team’s time was spent either installing new tools or updating current installations. The central apps repository was the largest I have worked with.
- Responsible for maintaining a 336 node/8064 core HPC cluster using HP CMU
- Development of cluster node images
- Responsible for maintaining centralized installs for Perl, Python, R, and various NGS processing packages
- Project lead for converting the cluster from CentOS 5.5 to RHEL 5.5 and from RHEL 5.5 to RHEL 6.2
- Designed and deployed a cluster health monitoring system using Icinga and a cluster metric gathering system using Ganglia
- Participated in the development and execution of a move of all computing resources from an older datacenter to a newer facility
- Team lead for migration of the research and development environment to a new filesystem