Career Profile
I’m currently a researcher at the Erasmus University Medical Center, at the Medical Informatics department in the biosemantics group. My PhD focusses on using biomedical knowledge graphs to support data analyses and biomedical research in general. My promotor is Johan van der Lei, and my daily supervisors are Erik van Mulligen and Rein Vos (visiting professor from Maastricht University).
Knowledge extracted from biomedical literature and datasets are semantically integrated with each other into a knowledge graph, after which this comprehensive body of biomedical knowledge is used for knowledge intensive tasks such as identifying migraine biomarkers to predicting the efficacy of drugs. The elements within these knowledge graphs consist of subject-predicate-object triples, along with the provenance of this triple. Triples also form the basic elements of linked (open) data. However, while the linked data adheres to specific standards (e.g. URI’s), knowledge graphs do not necessarily adhere to these standards. For our research, Euretos has provided us access to their knowledge graph.
Our most recent research efforts have led us to analyze the complex data extracted from knowledge graphs with machine learning. As future topics, I hope to investigate temporal aspects of knowledge graphs, as well as a detailed study into the provenance underlying triples.
I have collaborated within both EU as well as national research projects, i.e. the EU FP7 Eureka project, and the NWO ODEX project, in which the FAIR principles and supporting tools were further developed.
Employers and projects
During the final two years of my four-year period at the Erasmus Medical Center, I'm finishing my research and writing my papers without being assigned to any specific project.
During the first two years of my PhD at the Erasmus Medical Center I was part of the [NWO-ODEX](https://www.dtls.nl/categorie/odex/) project, led by Barend Mons. The goals of this project were to 1) to solve a number of industry relevant life-science use cases using a knowledge graph, and 2) to develop a FAIR infrastructure, which supports transforming data to the FAIR format. The results are currently being written up, and will be published as soon as possible.
During my 1+ year at the VUA I was part of the Knowledge Representation and Reasoning group, under the supervision of Annette ten Teije and Frank van Harmelen. I was assigned to the [Eureca](www.eurecaproject.eu) project, an EU FP7 project, where I worked on the contextualization of patient data. The goal was to use patient data to search relevant information based on a small number of clinical use cases. A simple web interface was developed, with as a first step pre-programmed use cases, and a theoretical underpinning for a more flexible approach, for which there was no time. As contributions to the Eureca project I wrote two chapters of deliverables, and contributed to many more.
Publications
Here I list my publications, along with their abstracts. For each a link is provided.
Automated extraction of potential migraine biomarkers using a semantic graph
PROBLEM Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers.
METHOD We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance.
RESULTS Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974.
DISCUSSION Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.
Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets
BACKGROUND Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins.
RESULTS We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%.
CONCLUSIONS Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified.”
Dengue in travellers; applicability of the 1975-1997 and the 2009 WHO classification system of dengue fever
OBJECTIVES The aim of this study was to assess the applicability and benefits of the new WHO dengue fever guidelines in clinical practice, for returning travellers.
METHODS We compared differences in specificity and sensitivity between the old and the new guidelines for diagnosing dengue and assessed the usefulness in predicting the clinical course of the disease. Also, we investigated whether hypertension, diabetes or allergies, ethnicity or high age influenced the course of disease.
RESULTS In our setting, the old classification, compared with the new, had a marginally higher sensitivity for diagnosing dengue. The new classification had a slightly higher specificity and was less rigid. Patients with dengue who had warning signs as postulated in the new classification were admitted more often than those who had no warning signs (RR, 8.09 [1.80-35.48]). We did not find ethnicity, age, hypertension, diabetes mellitus or allergies to be predictive of the clinical course.
CONCLUSIONS In our cohort of returned travellers, the new classification system did not differ in sensitivity and specificity from the old system to a clinically relevant degree. The guidelines did not improve identification of severe disease.”
Proteomic analysis of HIV-T cell interaction; an update
This mini-review summarizes techniques applied in, and results obtained with, proteomic studies of human immunodeficiency virus type 1 (HIV-1)-T cell interaction. Our group previously reported on the use of two-dimensional differential gel electrophoresis (2D-DIGE) coupled to matrix assisted laser-desorption time of flight peptide mass fingerprint analysis, to study T cell responses upon HIV-1 infection. Only one in three differentially expressed proteins could be identified using this experimental setup. Here we report on our latest efforts to test models generated by this data set and extend its analysis by using novel bioinformatic algorithms. The 2D-DIGE results are compared with other studies including a pilot study using one-dimensional peptide separation coupled to MS(E), a novel mass spectrometric approach. It can be concluded that although the latter method detects fewer proteins, it is much faster and less labor intensive. Last but not least, recent developments and remaining challenges in the field of proteomic studies of HIV-1 infection and proteomics in general are discussed.