PhD position Query Execution over Personal Genomics

Job description

Are you a MSc graduate with a background in computer science and an interest in life sciences, specifically personal genomics?

We are looking for a highly motivated individual to join our team in developing Privacy-enabled Personal Genome Repositories to be employed in Clinical and Healthcare. You will have the opportunity to work on cutting-edge technology and research, creating a proof-of-concept Personal Genome Data Repository using the Solid ecosystem. The system will be designed to give individuals control over their own data and the ability to share it on their own terms. You will work on securing the repository using state-of-the-art encryption and authentication methods, and facilitate data exchange through standard formats while ensuring compatibility with existing genome analysis tools. You will also have the opportunity to develop new tools and algorithms for genome analysis, with a focus on making the data accessible and understandable for non-experts.

Not only will you be making a significant impact on the field of personal genomics, but you will also have the opportunity to pave the way for personal genome projects across European countries and provide an ecosystem for academic and industry partners using personal genome data in a responsible manner. Apply today and join our team in shaping the future of personal genomics.

This position is part of a strategical collaboration between University of Ghent and VITO where the candidate will have access to both institutions expertise.

Please feel free to contact Gokhan Ertaylan, PhD (VITO) or Ruben Taelman, PhD (UGent-imec).

We offer:
- a 4-year PhD-research project (candidates will be encouraged to apply for FWO grants in 2023/4);
- a competitive enumeration during the PhD period• participation in an exciting project with great potential for societal impact;
- a career development opportunity with access to data derived from cutting edge technologies and analysis platforms;
- to become a member of two dynamic and multi-disciplinary research teams from VITO and University of Ghent.


Application Procedure:

Please submit your application through the VITO application portal by uploading your motivation letter (max.1 page), short CV (max. 3 pages), a transcript from the MSc degree, and three names of referees.

After being selected as PhD candidate, you will prepare a formal PhD application and defend it for a VITO-jury. The next PhD jury will be organised on June 2nd, 2023 (registration deadline 29/03/2023).

---------------------------------------------------------------------------------------------------------------------

Background and Aims:

Personal Genomics is now becoming a reality, thanks to the steadily increasing availability of affordable sequencing technology. It is evolving from its initial niche in disease-specific research into the broader areas of health and wellbeing, both in the research community and the general population. Despite the great promise of genomics for personal health and precision medicine, most institutions, as well as individuals, are hesitant to share genomic data. Principal reasons are the risk of unethical handling of the data, for example leading to higher insurance costs at the level of individuals. Hence, the build-up of trust for sharing this highly sensitive, identifiable personal information is critical, individually, nation-wide, and internationally. Currently, genomic data sharing happens mostly i) in an ‘all or none’ fashion where individuals either share the data of their whole genome or none and ii) often in an irreversible manner where once the permission is granted and access has been gained it is difficult to reverse. 

In essence, we propose to research existing methodologies and create a Proof-of-Concept: the Personal Genome Data Repository, where the Solid ecosystem is employed to create individual Genome Pods (personal online datastores). The system design will be based on the Solid standards, which allows individuals to have control over their own data and share it with others on their own terms. Each individual will have their own "Genome Pod," which will serve as their personal repository for their genome data.

The Genome Pods will be secured using state-of-the-art encryption, authentication, and authorization methods to ensure that only authorized individuals have access to the data. Data exchange will be facilitated through the use of standard data exchange formats (RDF) while ensuring compatibility with existing genome analysis tools and data formats (FASTA, VCF). Querying data from a single individual and from a cohort of individuals will be an important challenge to address in this project. 

The data processing will be done in a federated manner, meaning that the genome data will be distributed among multiple nodes rather than being stored in a central location. This approach should allow for minimal reduction in efficiency in the analysis of the data, as well as increased security and privacy for the individuals whose data is being analyzed.

The main challenge here will be to design query algorithms that can handle the massive scale of data distribution that is caused by the personal nature of these Genome Pods. These query algorithms will need to be able to i) discover all data relevant to a given query, ii) plan the query execution based on the organization of discovered data, and iii) execute the query to produce most relevant results as early as possible in an iterative manner.

The project will also include the development of new tools and algorithms for genome analysis, with a focus on making the data accessible and understandable for non-experts.

The same platform can also enable the sharing of infectious disease genomic variants (SARS-CoV-2, HIV, etc.) isolated from various projects across Europe in a personalized manner.

We believe that the proposed architecture where the provider/owner of the data (the individual patient or citizen) can also benefit from the sharing of their genome (e.g., by having access to specific analyses or personally tailored advice in exchange) will pave the way for personal genome projects across European countries and provide an ecosystem for academic and industry partners using personal genome data in a responsible manner.

Job requirements

We are looking for a highly motivated and scientifically excellent candidate with:

  • computer science, genomics or bioinformatics background;
  • with problem-solving attitude to work in an international, collegial environment;
  • strong communication skills complemented with innovative and analytical thinking are important assets.