Cancer Publication Portal: Identifying Gene-Cancer Associations from Biomedical Literature

Presenter: Michael Gargano

Research Category: Interdisciplinary Topics, Centers and Institutes
Student Type: Graduate
PI: Garrett Dancik
Award Winner Category: Interdisciplinary Topics

We describe a web application interface capable of summarizing the gene-cancer associations in PubMed, an online database of biomedical research articles, with exponential growth of gene-related cancer articles in the past 15 years. Cancer is a deadly genetic disease where gene mutations result in abnormal cellular functions. Mutations may include substitutions of one base by another, insertions or deletions of varying sizes, or DNA rearrangements. Often, gene expression is drastically effected by these mutations which can drive abnormal cell activity. This research incorporates a two-step approach for summarizing associations between genes and tumor types. The first integral step involves collecting, parsing, and interpreting PubMed data for five cancer types, by using the scripting language Python, the statistical programming language R, and the database program MongoDB. The second step involves developing a web application (the Cancer Publication Portal) for reporting the most commonly studied genes across tumor types, identifying genes that have been studied in a tumor-specific way, and allowing a user to query specific genes. Overall, this work provides a useful summary of genes and their association with tumors.