This is an exciting opportunity for working at the interface between data management and generative AI. The project aims to develop new analytical infrastructure paradigms to support semantic querying and inference over heterogeneous and distributed data sources in the context of Pharma.
The vision behind the project is to extend the state-of-art in the direction of delivering infrastructures which allows for the creation of data-based digital twins, where end-users (domain experts, analysts) could perform seamless natural language (NL) analytical queries over semantically integrated data sources (with an emphasis on relational, knowledge graph (KG) data sources).
Project Objectives
* To develop a declarative/neuro-symbolic inference engine that enables the implementation of a digital twin based on semantically integrated datasets.
* To establish and evaluate the complementary analytical value of integrating controlled use of LLMs and ontology/KG-based representations.
The project will be developed in close collaboration with the research lab of a large pharmaceutical company. The PhD candidate will have the opportunity to work closely together with data analysts, software engineers, and domain experts to develop Minimum Viable Products (MVPs) aiming to achieve a Self-Service Insights platform.
This project requires a self-motivated candidate who can address complex problems.
Mandatory Requirements
* A Bachelor's and Master's degree in Computer Science or related areas.
* Fully comfortable in developing complex systems in Python, with evidence of previous industrial or academic projects.
* Previous academic or industrial project experience in NLP or data management.
* Some days of physical presence of the candidate in the pharmaceutical company are expected.
Desirable Requirements
* Previous published papers.
* Previous exposure to knowledge graphs and ontologies.
We strive to create a diverse and inclusive environment and welcome applications from individuals of all backgrounds.