PCPC Direct Ltd

Team Lead for teams managing multiple remote clusters in a managed services environment including all systems administration tasks, maintaining and tuning schedulers/resource managers, performance tuning, and system monitoring. Deployed multiple clusters at client sites ranging from single rack clusters to clusters utilizing hundreds of nodes. Deployed multiple clusters at client sites ranging from single rack clusters to clusters utilizing hundreds of nodes. Responsible for on site testing of non-managed clusters prior to hand-off to clients. Developed an automated system using xCAT to deploy clusters, collect inventory, run burn in, and validate results in order to drastically reduce the man hours necessary to complete the integration and testing process. Architected a burn in suite utilizing FOSS tools to simulate various HPC workloads in order to reduce the number of failures after cluster delivery. Responsible for software stack architecture for all HPC related RFPs from including OS, scheduler/resource manager, development tools, and applications for all hardware vendors. Documented internal processes for cluster deployment, testing, burn in, and validation. Developed cluster administration documentation for in-house use as well as for clients. Documented internal processes for cluster deployment, testing, burn in, and validation. Documented internal processes for cluster deployment, testing, burn in, and validation.