I currently work as Sr. Sysadmin for Seitel Inc. My responsibilities include a small HPC cluster and its infrastrucutre, a couple of TrueNAS file servers that handle ~1.5 PiB raw storage, a handful of physical and virutal RHEL workstations, remote access for our entire workforce, the odd bit of networking, and monitoring for all of those bits and everything else.
- Architected and implemented TrueNAS storage solution
- Planned and executed move from in-house datacenter to colocation
- Implemented Parallels RAS for remote access to Windows desktops and applications
Senior Linux admin for a large team spread across four global sites.
- Implemented a global FreeIPA solution to replace an aging OpenLDAP implementation
- Archticted a hyperconverged Proxmox solution for business critical virtual machines and containers
- Developed proof of concept using Wazuh for security monitoring
- Responsible for level 3 escalations of HPC cluster issues
- Implemented a complete DNS solution for ad/malware/phishing blocking along with serving over 100,000 internal DNS entries using Ansible for deployment and Git for management
- Member of the team responsible for maintaing the diskless boot image for over 12,000 compute nodes using an internally developed PXE deployment system built on bittorrent to allow easy scaling
Sr. HPC Architect
-
PCPC Direct
February 2014 – June 2019
Designed, deployed, and maintained dozens of HPC clusters for clients in diverse markets.
- Technical lead for the development of a 3D accelerated, remote workstation and collaboration solution utilizing Red Hat Virtualization, Red Hat Cloud Suite, and Nvidia vGPU technology
- Team Lead for teams managing multiple remote clusters in a managed services environment including all systems administration tasks, maintaining and tuning schedulers/resource managers, performance tuning, and system monitoring. Responsible for deploying multiple clusters at client sites ranging from single rack clusters to clusters utilizing hundreds of nodes
- Responsible for onsite testing of non-managed clusters prior to hand-off to clients
- Developed an automated system using xCAT to deploy clusters, collect inventory, run burn in, and validate results in order to drastically reduce the man hours necessary to complete the integration and testing process
- Architected a burn in suite utilizing FOSS tools to simulate various HPC workloads in order to reduce the number of failures after cluster delivery
- Responsible for software stack architecture for all HPC related RFPs from including OS, scheduler/resource manager, development tools, and applications for all hardware vendors
- Developed cluster administration documentation for in house use as well as for client use
Member of the HPC teaam at MD Anderson, a leading cancer research hospital.
- Responsible for maintaining a 336 node/8064 core HPC cluster using HP CMU
- Development of cluster node images
- Responsible for maintaining centralized installs for Perl, Python, R, and various NGS processing packages
- Project lead for converting the cluster from CentOS 5.5 to RHEL 5.5 and from RHEL 5.5 to RHEL 6.2
- Designed and deployed a cluster health monitoring system using Icinga and a cluster metric gathering system using Ganglia
- Participated in the development and execution of a move of all computing resources from one an old data center to a newer facility
- Team lead for migration of the research and development environment to a new filesystem