My university profile provides a nice overview of some of my research interests, particularly using machine learning to do clever things with biomedical text. Below are a few recent projects.


The pandemic has caused an incredible surge in research on different aspects of the virus and its effects. It is very challenging to navigate this fire hose of papers. This portal makes it more manageable to find the important papers for a variety of topics, from risk factors and forecasting to vaccines and the psychology impacts. The research uses a supervised learning approach with a set of classifiers trained on topic annotations for several thousand papers. It was published in PNAS. The machine learning code is on Github and the data is on Zenodo.


Pharmacogenomics studies the effect of genetic variation on drug response. The daunting task of curating the entire biomedical literature for relevant knowledge is taken on by the PharmGKB team at Stanford University. To assist in this task, I created the PGxMine tool that identifies important pharmacogenomics knowledge in literature and prioritises papers for their curation. This data is now integrated into the PharmGKB resource (under Automated Annotations) and is being used by curators. The data is viewable online and the work was published at the Pacific Symposium for Biocomputing.


CIViCmine aids curation of the CIViC database for known cancer biomarkers for diagnosis, prognosis, predisposition and drug resistance. This knowledge is invaluable for personalized cancer projects to help select treatments for individual patients. To assist in curation and to provide a high quality knowledge base in this area, cancer biomarkers have been mined from abstracts and full text papers. The resulting data can be viewed with the associated web viewer. The work has been published in Genome Medicine.


CancerMine uses text mining to extract known drivers, oncogenes and tumor suppressors discussed in the literature. Understanding the role of different genes in different cancer types is essential for precision cancer efforts. The project data can be viewed with the associated web viewer and downloaded at Zenodo. This work has been published in Nature Methods and a preprint paper is available at bioRxiv.


Kindred is our relation extraction tool that uses a supervised learning approach. The code and associated paper are freely available. It is the successor to our BioNLP’16 Shared Task winning VERSE tool.