VITO maakt gebruik van noodzakelijke cookies.
Om je gebruikerservaring op onze website te optimaliseren willen we ook gebruik maken van optionele cookies waarvoor we je toestemming vragen. Meer informatie: Cookiebeleid, Privacyverklaring
VITO, the Flemish Institute for Technological Research, is committed to expediting the shift towards a sustainable world. Our research covers themes such as Energy, Materials, Chemistry, Health, and Land Use.
You will join the Data and Analytics (D&A) team, a transversal team that supports research groups within VITO with topics related to data, software, machine learning. To support VITO projects, the team manages a central High Performance Computing(HPC) Platform. This platform plays a pivotal role in making research at VITO more data-driven and aims to provide a cohesive, scalable, and universally accessible computing cluster across VITO. By leveraging state of the art hardware accelerators like NVIDIA A100 GPUs, it caters to a diverse range of data science applications, statistical analyses, and training of machine learning models.
As an HPC engineer in the D&A team, your role will be instrumental in managing and enhancing this platform. You will focus on refining its functionalities, optimizing user experiences, promoting widespread adoption, and implementing robust system monitoring measures. Your contributions will play a crucial part in advancing our capabilities and ensuring the seamless operation of this critical piece of infrastructure.
We are looking for someone that will take the initiative and take the lead in the continued development of the platform. You will act as a bridge between the various IT teams, the D&A team and platform users. You are comfortable with deep diving into technical problems with IT, but also in discussing with researchers how to take full advantage of the HPC platform for their research.
• Oversee the day-to-day management and maintenance of our HPC cluster.
• Monitor system performance, troubleshoot issues, and implement necessary upgrades to ensure uninterrupted HPC services.
• Provide user support for HPC systems and applications.
• Collaborate with researchers and IT teams to understand computational needs and provide tailored solutions.
• Conduct training sessions and tutorials to empower users in utilizing HPC resources effectively.
• Keep track of the latest developments and technologies in the field to ensure our HPC environment remains state-of-the-art.
• Continuously assess and implement improvements to enhance cluster efficiency and user experience.
• Act as a bridge between Data and Analytics (D&A) and IT teams.
Must have
• Proven track record as a Linux System Administrator.
• Working knowledge of Python, Conda, and Jupyter Notebooks.
• Experience managing access privileges with Active Directory and authentications (e.g. Kerberos and NFS security).
• Excellent problem-solving and communication skills.
• Enthusiastic about further developing the platform independently.
• Proactive, takes initiative and follows through on opportunities.
• Stay on top of the latest HPC technologies to drive our computing capabilities forward
• Comfortable working in an R&D environment.
Nice to have
• A master’s degree in computer science or a relevant field.
• Proficiency in configuring and optimizing Slurm-based HPC clusters.
• Experience with Grafana and Prometheus monitoring tools.
• Experience with Docker/Singularity containers and orchestrators like Kubernetes.
• Previous experience as a researcher utilizing HPC clusters.
• Proficiency in compiling and installing scientific software from source.
• Familiarity with configuration as code tools like Puppet and Ansible.
• Experience with deep learning and GPU computing.
• Knowledgeable about network and storage at the hardware level.
Not sure if you meet ALL the qualifications, but you recognize yourself in the majority of them? Do not hesitate to apply anyway.
Offer
or
Your application has been successfully submitted.