The Swiss National Supercomputing Centre (CSCS) develops and operates cutting-edge, high-performance computing (HPC) systems as an essential service facility for science. The centre enables world-class research through its scientific user lab, which is available to domestic and international researchers in academia, industry, and the business sector. The centre is operated by ETH Zurich and has offices in Lugano (headquarters) and Zurich.
Project background
As HPC and cloud technologies converge, CSCS strives to improve its service portfolio that focuses on large, diverse scientific and engineering applications. Effectively managing a large number of HPC infrastructure resources in order to support diverse HPC and AI platforms is complex and challenging.
To this end, CSCS has an open position at our office in Lugano for an HPC systems infrastructure engineer.
Job description
The main goal of this position is to develop and manage HPC/AI infrastructure services for a multi-tenant infrastructure. As a systems infrastructure engineer, you will directly contribute to the design, implementation, maintenance, and documentation of infrastructure services to support HPC and AI platforms.
Your work will directly enhance the overall system functionality and efficiency.
Your responsibilities:
* Investigating, troubleshooting, and debugging infrastructure management services and infrastructure resources,
* Developing, maintaining, and supporting infrastructure management tools and pipelines to support a geographically redundant infrastructure,
* Developing automations to provision, test, deploy, and monitor infrastructure resources to support the needs of HPC and AI platforms,
* Supporting, documenting, and sharing knowledge of the infrastructure, tools, and procedures.
Profile
We are looking for a professional with more than five years of experience in the operation and management of HPC, AI, or Cloud infrastructures.
Expected qualifications:
* Experience in the deployment of HPC, AI, or Cloud infrastructures
* Management of HPC/AI hardware, including compute, storage and high speed network components,
* Working knowledge of automation tools and frameworks, including CI/CD processes and ecosystem,
* Linux administration skills.
Experience with the following is preferred, though there will be ample opportunities to learn and gain more experience with all of these skills on the job:
* Maintaining Kubernetes infrastructure and debugging of microservices
* Performance monitoring and diagnostic tools for HPC/AI hardware
* Infrastructure as Code tools such as Terraform and Ansible
You should have a bachelor’s or higher degree in computer engineering, computer science, a relevant technical field, or equivalent practical experience.
We offer
* ETH Zurich is a family-friendly employer with excellent working conditions.
* You can look forward to an exciting working environment, cultural diversity and attractive offers and benefits.
* Remote working within Switzerland is available for up to 4 days per week.
* We value the diversity of our team and, to further enhance the diversity of our workforce, we particularly encourage women to apply.
We value diversity
In line with our values, ETH Zurich encourages an inclusive culture. We promote equality of opportunity, value diversity and nurture a working and learning environment in which the rights and dignity of all our staff and students are respected.
About ETH Zürich
ETH Zurich is one of the world’s leading universities specialising in science and technology. We are renowned for our excellent education, cutting-edge fundamental research and direct transfer of new knowledge into society. Over 30,000 people from more than 120 countries find our university to be a place that promotes independent thinking and an environment that inspires excellence.
#J-18808-Ljbffr