NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction

Dustin Wright, Yannis Katsis, Raghav Mehta, Chun-Nan Hsu

Published in Automated Knowledge Base Construction, 2019

Biomedical knowledge bases are crucial in modern data-driven biomedical sciences, but automated biomedical knowledge base construction remains challenging. In this paper, we consider the problem of disease entity normalization, an essential task in constructing a biomedical knowledge base. We present NormCo, a deep coherence model which considers the semantics of an entity mention, as well as the topical coherence of the mentions within a single document. NormCo models entity mentions using a simple semantic model which composes phrase representations from word embeddings, and treats coherence as a disease concept co-mention sequence using an RNN rather than modeling the joint probability of all concepts in a document, which requires NP-hard inference. To overcome the issue of data sparsity, we used distantly supervised data and synthetic data generated from priors derived from the BioASQ dataset. Our experimental results show that NormCo outperforms state-of-the-art baseline methods on two disease normalization corpora in terms of (1) prediction quality and (2) efficiency, and is at least as performant in terms of accuracy and F1 score on tagged documents.

Download paper here

View code

Recommended bibtex:

@inproceedings {
  title={NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction},
  author={Wright, Dustin and Katsis, Yannis and Mehta, Raghav and Hsu, Chun-Nan},
  booktitle={Automated Knowledge Base Construction},