
Excellent! Next you can
create a new website with this list, or
embed it in an existing web page.
This is just a preview! If you would like to use this list on your web page
or create a new webpage based on this,
create a free account and upload
the file there. Then you will be able to modify it going forward.
To the site owner:
Action required! Mendeley is changing its API. In order to keep using Mendeley with BibBase past April 14th, you need to:
- renew the authorization for BibBase on Mendeley, and
- update the BibBase URL in your page the same way you did when you initially set up this page.
2023
(3)
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing.
Yao, Z.; Cao, Y.; Yang, Z.; and Yu, H.
March 2023.
AMIA 2023 Informatics Summit, Seattle WA
Paper
link
bibtex
abstract
@misc{yao_context_2023, address = {Seattle WA, USA}, title = {Context {Variance} {Evaluation} of {Pretrained} {Language} {Models} for {Prompt}-based {Biomedical} {Knowledge} {Probing}}, url = {http://arxiv.org/abs/2211.10265}, abstract = {Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Yu, Hong}, month = mar, year = {2023}, note = {AMIA 2023 Informatics Summit, Seattle WA}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language}, }
Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.
Yang, Z.; Kwon, S.; Yao, Z.; and Yu, H.
February 2023.
AAAI 2023, Washington DC
Paper
link
bibtex
abstract
@misc{yang_multi-label_2023, address = {Washington DC USA}, title = {Multi-label {Few}-shot {ICD} {Coding} as {Autoregressive} {Generation} with {Prompt}}, url = {http://arxiv.org/abs/2211.13813}, abstract = {Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.}, urldate = {2022-12-18}, publisher = {arXiv}, author = {Yang, Zhichao and Kwon, Sunjae and Yao, Zonghai and Yu, Hong}, month = feb, year = {2023}, note = {AAAI 2023, Washington DC}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language}, }
Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.
H4H: A Comprehensive Repository of Housing Resources for Homelessness.
Osebe, S.; Tsai, J.; and Yu, H.
In Seattle WA, USA, March 2023.
AMIA 2023 Informatics Summit, Seattle WA
link bibtex
link bibtex
@inproceedings{osebe_h4h_2023, address = {Seattle WA, USA}, title = {{H4H}: {A} {Comprehensive} {Repository} of {Housing} {Resources} for {Homelessness}}, author = {Osebe, Samuel and Tsai, Jack and Yu, Hong}, month = mar, year = {2023}, note = {AMIA 2023 Informatics Summit, Seattle WA}, }
2022
(26)
Geographic Disparities in Prevalence of Opioid Use Disorders in US Veterans.
Li, W.; Leon, C.; Liu, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.
In Boston MA, November 2022.
APHA 2022 Annual Meeting and Expo
link bibtex
link bibtex
@inproceedings{li_geographic_2022, address = {Boston MA}, title = {Geographic {Disparities} in {Prevalence} of {Opioid} {Use} {Disorders} in {US} {Veterans}}, author = {Li, Weijun and Leon, Casey and Liu, Weisong and Sung, Minhee L. and Kerns, Robert D. and Becker, William C. and Yu, Hong}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo}, }
Prevalence of Frailty and Associations with Oral Anticoagulant Prescribing in Atrial Fibrillation.
Sanghai, S. R.; Liu, W.; Wang, W.; Rongali, S.; Orkaby, A. R.; Saczynski, J. S.; Rose, A. J.; Kapoor, A.; Li, W.; Yu, H.; and McManus, D. D.
Journal of General Internal Medicine, 37(4): 730–736. March 2022.
Paper
doi
link
bibtex
abstract
@article{sanghai_prevalence_2022, title = {Prevalence of {Frailty} and {Associations} with {Oral} {Anticoagulant} {Prescribing} in {Atrial} {Fibrillation}}, volume = {37}, issn = {1525-1497}, url = {https://doi.org/10.1007/s11606-021-06834-1}, doi = {10.1007/s11606-021-06834-1}, abstract = {Frailty is often cited as a factor influencing oral anticoagulation (OAC) prescription in patients with non-valvular atrial fibrillation (NVAF). We sought to determine the prevalence of frailty and its association with OAC prescription in older veterans with NVAF.}, language = {en}, number = {4}, urldate = {2022-12-13}, journal = {Journal of General Internal Medicine}, author = {Sanghai, Saket R. and Liu, Weisong and Wang, Weijia and Rongali, Subendhu and Orkaby, Ariela R. and Saczynski, Jane S. and Rose, Adam J. and Kapoor, Alok and Li, Wenjun and Yu, Hong and McManus, David D.}, month = mar, year = {2022}, keywords = {atrial fibrillation, frailty, oral anticoagulation}, pages = {730--736}, }
Frailty is often cited as a factor influencing oral anticoagulation (OAC) prescription in patients with non-valvular atrial fibrillation (NVAF). We sought to determine the prevalence of frailty and its association with OAC prescription in older veterans with NVAF.
Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition.
Cai, P.; Wan, H.; Liu, F.; Yu, M.; Yu, H.; and Joshi, S.
In Seattle WA, USA, July 2022.
NAACL 2022
Paper
link
bibtex
@inproceedings{cai_learning_2022, address = {Seattle WA, USA}, title = {Learning as {Conversation}: {Dialogue} {Systems} {Reinforced} for {Information} {Acquisition}}, shorttitle = {{NAACL} 2022}, url = {https://www.semanticscholar.org/reader/ea6b152a07dcd2e4ff6c4646d8efe1314346793c}, author = {Cai, Pengshan and Wan, Hui and Liu, Fei and Yu, Mo and Yu, Hong and Joshi, Sachindra}, month = jul, year = {2022}, note = {NAACL 2022}, }
Using data science to improve outcomes for persons with opioid use disorder.
Hayes, C. J.; Cucciare, M. A.; Martin, B. C.; Hudson, T. J.; Bush, K.; Lo-Ciganic, W.; Yu, H.; Charron, E.; and Gordon, A. J.
Substance Abuse, 43(1): 956–963. 2022.
Paper
doi
link
bibtex
abstract
@article{hayes_using_2022, title = {Using data science to improve outcomes for persons with opioid use disorder}, volume = {43}, issn = {1547-0164}, url = {https://pubmed.ncbi.nlm.nih.gov/35420927/}, doi = {10.1080/08897077.2022.2060446}, abstract = {Medication treatment for opioid use disorder (MOUD) is an effective evidence-based therapy for decreasing opioid-related adverse outcomes. Effective strategies for retaining persons on MOUD, an essential step to improving outcomes, are needed as roughly half of all persons initiating MOUD discontinue within a year. Data science may be valuable and promising for improving MOUD retention by using "big data" (e.g., electronic health record data, claims data mobile/sensor data, social media data) and specific machine learning techniques (e.g., predictive modeling, natural language processing, reinforcement learning) to individualize patient care. Maximizing the utility of data science to improve MOUD retention requires a three-pronged approach: (1) increasing funding for data science research for OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as data not specific to medical care (e.g., mobile, sensor, and social media data), and (3) applying multiple data science approaches with integrated big data to provide insights and optimize advances in the OUD and overall addiction fields.}, language = {eng}, number = {1}, journal = {Substance Abuse}, author = {Hayes, Corey J. and Cucciare, Michael A. and Martin, Bradley C. and Hudson, Teresa J. and Bush, Keith and Lo-Ciganic, Weihsuan and Yu, Hong and Charron, Elizabeth and Gordon, Adam J.}, year = {2022}, pmid = {35420927 PMCID: PMC9705076}, keywords = {Opioid-related disorders, big data, machine learning}, pages = {956--963}, }
Medication treatment for opioid use disorder (MOUD) is an effective evidence-based therapy for decreasing opioid-related adverse outcomes. Effective strategies for retaining persons on MOUD, an essential step to improving outcomes, are needed as roughly half of all persons initiating MOUD discontinue within a year. Data science may be valuable and promising for improving MOUD retention by using "big data" (e.g., electronic health record data, claims data mobile/sensor data, social media data) and specific machine learning techniques (e.g., predictive modeling, natural language processing, reinforcement learning) to individualize patient care. Maximizing the utility of data science to improve MOUD retention requires a three-pronged approach: (1) increasing funding for data science research for OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as data not specific to medical care (e.g., mobile, sensor, and social media data), and (3) applying multiple data science approaches with integrated big data to provide insights and optimize advances in the OUD and overall addiction fields.
Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context.
Yao, Z.; Cao, Y.; Yang, Z.; Deshpande, V.; and Yu, H.
In Washington DC USA, November 2022.
AMIA Annual Symposium
Paper
link
bibtex
abstract
@inproceedings{yao_extracting_2022, address = {Washington DC USA}, title = {Extracting {Biomedical} {Factual} {Knowledge} {Using} {Pretrained} {Language} {Model} and {Electronic} {Health} {Record} {Context}}, url = {https://arxiv.org/ftp/arxiv/papers/2209/2209.07859.pdf}, abstract = {Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more in line with the actual application scenarios of the biomedical domain, we specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain. We design and validate a series of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes, and such distinguishing ability can also be used as a new metric to evaluate the amount of knowledge possessed by the model.}, language = {en}, author = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Deshpande, Vijeta and Yu, Hong}, month = nov, year = {2022}, note = {AMIA Annual Symposium}, }
Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more in line with the actual application scenarios of the biomedical domain, we specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain. We design and validate a series of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes, and such distinguishing ability can also be used as a new metric to evaluate the amount of knowledge possessed by the model.
Generation of Patient After-Visit Summaries to Support Physicians.
Cai, P.; Liu, F.; Bajracharya, A.; Sills, J.; Kapoor, A.; Liu, W.; Berlowitz, D.; Levy, D.; Pradhan, R.; and Yu, H.
In Proceedings of the 29th International Conference on Computational Linguistics, pages 6234–6247, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics
Paper
link
bibtex
abstract
@inproceedings{cai_generation_2022, address = {Gyeongju, Republic of Korea}, title = {Generation of {Patient} {After}-{Visit} {Summaries} to {Support} {Physicians}}, url = {https://aclanthology.org/2022.coling-1.544}, abstract = {An after-visit summary (AVS) is a summary note given to patients after their clinical visit. It recaps what happened during their clinical visit and guides patients' disease self-management. Studies have shown that a majority of patients found after-visit summaries useful. However, many physicians face excessive workloads and do not have time to write clear and informative summaries. In this paper, we study the problem of automatic generation of after-visit summaries and examine whether those summaries can convey the gist of clinical visits. We report our findings on a new clinical dataset that contains a large number of electronic health record (EHR) notes and their associated summaries. Our results suggest that generation of lay language after-visit summaries remains a challenging task. Crucially, we introduce a feedback mechanism that alerts physicians when an automatic summary fails to capture the important details of the clinical notes or when it contains hallucinated facts that are potentially detrimental to the summary quality. Automatic and human evaluation demonstrates the effectiveness of our approach in providing writing feedback and supporting physicians.}, urldate = {2022-12-18}, booktitle = {Proceedings of the 29th {International} {Conference} on {Computational} {Linguistics}}, publisher = {International Committee on Computational Linguistics}, author = {Cai, Pengshan and Liu, Fei and Bajracharya, Adarsha and Sills, Joe and Kapoor, Alok and Liu, Weisong and Berlowitz, Dan and Levy, David and Pradhan, Richeek and Yu, Hong}, month = oct, year = {2022}, pages = {6234--6247}, }
An after-visit summary (AVS) is a summary note given to patients after their clinical visit. It recaps what happened during their clinical visit and guides patients' disease self-management. Studies have shown that a majority of patients found after-visit summaries useful. However, many physicians face excessive workloads and do not have time to write clear and informative summaries. In this paper, we study the problem of automatic generation of after-visit summaries and examine whether those summaries can convey the gist of clinical visits. We report our findings on a new clinical dataset that contains a large number of electronic health record (EHR) notes and their associated summaries. Our results suggest that generation of lay language after-visit summaries remains a challenging task. Crucially, we introduce a feedback mechanism that alerts physicians when an automatic summary fails to capture the important details of the clinical notes or when it contains hallucinated facts that are potentially detrimental to the summary quality. Automatic and human evaluation demonstrates the effectiveness of our approach in providing writing feedback and supporting physicians.
Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding.
Yang, Z.; Wang, S.; Rawat, B. P. S.; Mitra, A.; and Yu, H.
In Abu Dhabi, United Arab Emirates, December 2022.
Findings of the Association for Computational Linguistics: EMNLP 2022
Paper
link
bibtex
@inproceedings{yang_knowledge_2022, address = {Abu Dhabi, United Arab Emirates}, title = {Knowledge {Injected} {Prompt} {Based} {Fine}-tuning for {Multi}-label {Few}-shot {ICD} {Coding}}, url = {https://arxiv.org/pdf/2210.03304.pdf}, author = {Yang, Zhichao and Wang, Shufan and Rawat, Bhanu Pratap Singh and Mitra, Avijit and Yu, Hong}, month = dec, year = {2022}, note = {Findings of the Association for Computational Linguistics: EMNLP 2022}, }
ScAN: Suicide Attempt and Ideation Events Dataset.
Rawat, B. P. S.; Kovaly, S.; Pigeon, W. R.; and Yu, H.
July 2022.
NAACL 2022
Paper
link
bibtex
abstract
@misc{rawat_scan_2022, address = {Seattle WA, USA}, title = {{ScAN}: {Suicide} {Attempt} and {Ideation} {Events} {Dataset}}, shorttitle = {{ScAN}}, url = {http://arxiv.org/abs/2205.07872}, abstract = {Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Rawat, Bhanu Pratap Singh and Kovaly, Samuel and Pigeon, Wilfred R. and Yu, Hong}, month = jul, year = {2022}, note = {NAACL 2022}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning}, }
Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.
MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score.
Kwon, S.; Yao, Z.; Jordan, H. S.; Levy, D. A.; Corner, B.; and Yu, H.
December 2022.
Number: arXiv:2210.05875 arXiv:2210.05875 [cs] The 2022 Conference on Empirical Methods in Natural Language Processing
Paper
link
bibtex
abstract
@misc{kwon_medjex_2022, address = {Abu Dhabi, United Arab Emirates}, title = {{MedJEx}: {A} {Medical} {Jargon} {Extraction} {Model} with {Wiki}'s {Hyperlink} {Span} and {Contextualized} {Masked} {Language} {Model} {Score}}, shorttitle = {{MedJEx}}, url = {http://arxiv.org/abs/2210.05875}, abstract = {This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences (\$MedJ\$). Then, we introduce a novel medical jargon extraction (\$MedJEx\$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Kwon, Sunjae and Yao, Zonghai and Jordan, Harmon S. and Levy, David A. and Corner, Brian and Yu, Hong}, month = dec, year = {2022}, note = {Number: arXiv:2210.05875 arXiv:2210.05875 [cs] The 2022 Conference on Empirical Methods in Natural Language Processing}, keywords = {Computer Science - Computation and Language}, }
This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ($MedJ$). Then, we introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.
An Investigation of Social Determinants of Health in UMLS.
Rawat, B. P. S.; and Yu, H.
In Houston TX USA, May 2022.
AMIA Clinical Informatics 2022
link bibtex
link bibtex
@inproceedings{rawat_investigation_2022, address = {Houston TX USA}, title = {An {Investigation} of {Social} {Determinants} of {Health} in {UMLS}}, author = {Rawat, Bhanu Pratap Singh and Yu, Hong}, month = may, year = {2022}, note = {AMIA Clinical Informatics 2022}, }
Prediction of Alzheimer's Disease a Decade Prior to Clinical Diagnoses Using Machine Learning on Longitudinal Electronic Health Records of US Military Veterans.
Li, R.; Wang, X.; Hu, W.; Keating, H.; Goodwin, R.; Liu, W.; Berlowitz, D.; Silver, B.; and Yu, H.
December 2022.
Paper
doi
link
bibtex
abstract
@misc{li_prediction_2022, address = {Rochester, NY}, type = {{SSRN} {Scholarly} {Paper}}, title = {Prediction of {Alzheimer}'s {Disease} a {Decade} {Prior} to {Clinical} {Diagnoses} {Using} {Machine} {Learning} on {Longitudinal} {Electronic} {Health} {Records} of {US} {Military} {Veterans}}, url = {https://papers.ssrn.com/abstract=4298145}, doi = {10.2139/ssrn.4298145}, abstract = {Background: Prediction of Alzheimer’s Disease (AD) prior to clinical diagnoses affords the opportunity for early intervention and treatment. We investigate whether machine learning of clinical notes in longitudinal electronic health records (EHRs) could help early prediction of AD.Methods: We conducted an incidence-based case-control design study using longitudinal EHRs from the U.S. Veterans Health Administration (VHA) from 2006 to 2021. The study case population was defined as VHA patients who were diagnosed with AD after 1/1/2016 based on ICD-10-CM codes. Patients were matched by age, sex and hospital utilization. A panel of AD-related keywords were expert-curated, and their occurrences in patient’s longitudinal EHRs were used as predictors for AD prediction using four machine learning models. Subgroup analyses were conducted based on age, sex, and race. Validation was conducted on a hold-out VHA station group.Findings: The AD case and control groups comprised 16,701 and 39,097 patients, respectively. In a randomly sampled subset of 4,076 patients including 1,112 with AD, the best machine learning model reached Precision=1, Recall=0·87, F1=0·93, Accuracy=0·96, ROCAUC=0·997 and PRAUC=0·990 for making predictions ten years prior to ICD-based AD diagnoses. The model performed similarly well in all subgroups, and in the hold-out VHA station group (1670 patients including 493 with AD). The model failed to make competitive predictions using only the structured data of longitudinal EHRs.Interpretation: Signs and symptoms of early AD are reported in EHR notes many years before a clinical diagnosis is made and the frequency of these signs and symptoms, approximated by AD-related keywords in this study, increases the closer one is to the diagnosis. The AD-related keyword-based approach can capture these signs and symptoms to predict people who are likely to be diagnosed with AD in the future.Funding Information: This research is in part supported by funding from the University of Massachusetts Lowell (UML).Declaration of Interests: The authors have no conflicts of interest.Ethics Approval Statement: The study was approved by the Institutional Review Board at the VHA Bedford Healthcare System, which also approved the waiver of documentation of informed consent.}, language = {en}, urldate = {2022-12-14}, author = {Li, Rumeng and Wang, Xun and Hu, Wen and Keating, Heather and Goodwin, Raelene and Liu, Weisong and Berlowitz, Dan and Silver, Brian and Yu, Hong}, month = dec, year = {2022}, keywords = {Alzheimer's disease, electronic health records, machine learning}, }
Background: Prediction of Alzheimer’s Disease (AD) prior to clinical diagnoses affords the opportunity for early intervention and treatment. We investigate whether machine learning of clinical notes in longitudinal electronic health records (EHRs) could help early prediction of AD.Methods: We conducted an incidence-based case-control design study using longitudinal EHRs from the U.S. Veterans Health Administration (VHA) from 2006 to 2021. The study case population was defined as VHA patients who were diagnosed with AD after 1/1/2016 based on ICD-10-CM codes. Patients were matched by age, sex and hospital utilization. A panel of AD-related keywords were expert-curated, and their occurrences in patient’s longitudinal EHRs were used as predictors for AD prediction using four machine learning models. Subgroup analyses were conducted based on age, sex, and race. Validation was conducted on a hold-out VHA station group.Findings: The AD case and control groups comprised 16,701 and 39,097 patients, respectively. In a randomly sampled subset of 4,076 patients including 1,112 with AD, the best machine learning model reached Precision=1, Recall=0·87, F1=0·93, Accuracy=0·96, ROCAUC=0·997 and PRAUC=0·990 for making predictions ten years prior to ICD-based AD diagnoses. The model performed similarly well in all subgroups, and in the hold-out VHA station group (1670 patients including 493 with AD). The model failed to make competitive predictions using only the structured data of longitudinal EHRs.Interpretation: Signs and symptoms of early AD are reported in EHR notes many years before a clinical diagnosis is made and the frequency of these signs and symptoms, approximated by AD-related keywords in this study, increases the closer one is to the diagnosis. The AD-related keyword-based approach can capture these signs and symptoms to predict people who are likely to be diagnosed with AD in the future.Funding Information: This research is in part supported by funding from the University of Massachusetts Lowell (UML).Declaration of Interests: The authors have no conflicts of interest.Ethics Approval Statement: The study was approved by the Institutional Review Board at the VHA Bedford Healthcare System, which also approved the waiver of documentation of informed consent.
Generating Coherent Narratives with Subtopic Planning to Answer How-to Questions.
Cai, P.; Yu, M.; Liu, F.; and Yu, H.
In Abu Dhabi, December 2022.
The GEM Workshop at EMNLP 2022
link bibtex
link bibtex
@inproceedings{cai_generating_2022, address = {Abu Dhabi}, title = {Generating {Coherent} {Narratives} with {Subtopic} {Planning} to {Answer} {How}-to {Questions}}, author = {Cai, Pengshan and Yu, Mo and Liu, Fei and Yu, Hong}, month = dec, year = {2022}, note = {The GEM Workshop at EMNLP 2022}, }
Parameter Efficient Transfer Learning for Suicide Attempt and Ideation Detection.
Rawat, B. P. S.; and Yu, H.
In Abu Dhabi, December 2022.
LOUHI 2022
link bibtex
link bibtex
@inproceedings{rawat_parameter_2022, address = {Abu Dhabi}, title = {Parameter {Efficient} {Transfer} {Learning} for {Suicide} {Attempt} and {Ideation} {Detection}}, author = {Rawat, Bhanu Pratap Singh and Yu, Hong}, month = dec, year = {2022}, note = {LOUHI 2022}, }
UMass A&P: An Assessment and Plan Reasoning System of UMass in the 2022 N2C2 Challenge.
Kwon, S.; Yang, Z.; and Yu, H.
November 2022.
2022 n2c2 Workshop, Washington DC
link bibtex
link bibtex
@misc{kwon_umass_2022, address = {Washington DC USA}, title = {{UMass} {A}\&{P}: {An} {Assessment} and {Plan} {Reasoning} {System} of {UMass} in the 2022 {N2C2} {Challenge}}, author = {Kwon, Sunjae and Yang, Zhichao and Yu, Hong}, month = nov, year = {2022}, note = {2022 n2c2 Workshop, Washington DC}, }
Racial differences in receipt of medications for opioid use disorder before and during the COVID-19 pandemic in the Veterans Health Administration.
Sung, M. L.; Li, W.; León, C.; Reisman, J.; Liu, W.; Kerns, R. D.; Yu, H.; and Becker, W. C.
November 2022.
APHA 2022 Annual Meeting and Expo, Boston MA
link bibtex
link bibtex
@misc{sung_racial_2022, address = {Boston MA, USA}, title = {Racial differences in receipt of medications for opioid use disorder before and during the {COVID}-19 pandemic in the {Veterans} {Health} {Administration}}, author = {Sung, Minhee L. and Li, Wenjun and León, Casey and Reisman, Joel and Liu, Weisong and Kerns, Robert D. and Yu, Hong and Becker, William C.}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo, Boston MA}, }
Using Machine Learning to Predict Opioid Overdose Using Electronic Health Record.
Wang, X.; Li, R.; Druhl, E.; Li, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.
November 2022.
APHA 2022 Annual Meeting and Expo, Boston MA
link bibtex
link bibtex
@misc{wang_using_2022, address = {Boston MA, USA}, title = {Using {Machine} {Learning} to {Predict} {Opioid} {Overdose} {Using} {Electronic} {Health} {Record}}, author = {Wang, Xun and Li, Rumeng and Druhl, Emily and Li, Wenjun and Sung, Minhee L. and Kerns, Robert D. and Becker, William C. and Yu, Hong}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo, Boston MA}, }
Automatically Detecting Opioid-Related Aberrant Behaviors from Electronic Health Records.
Wang, X.; Li, R.; Lingeman, J. M.; Druhl, E.; Li, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.
November 2022.
APHA 2022 Annual Meeting and Expo, Boston MA
link bibtex
link bibtex
@misc{wang_automatically_2022, address = {Boston MA, USA}, title = {Automatically {Detecting} {Opioid}-{Related} {Aberrant} {Behaviors} from {Electronic} {Health} {Records}}, author = {Wang, Xun and Li, Rumeng and Lingeman, Jesse M. and Druhl, Emily and Li, Wenjun and Sung, Minhee L. and Kerns, Robert D. and Becker, William C. and Yu, Hong}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo, Boston MA}, }
An Investigation of the Representation of Social Determinants of Health in the UMLS.
Rawat, B. P. S.; Keating, H.; Goodwin, R.; Druhl, E. B.; and Yu, H.
In Washington, D.C., November 2022.
AMIA 2022 Annual Symposium
link bibtex
link bibtex
@inproceedings{rawat_investigation_2022-1, address = {Washington, D.C.}, title = {An {Investigation} of the {Representation} of {Social} {Determinants} of {Health} in the {UMLS}}, author = {Rawat, Bhanu Pratap Singh and Keating, Heather and Goodwin, Raelene and Druhl, Emily B. and Yu, Hong}, month = nov, year = {2022}, note = {AMIA 2022 Annual Symposium}, }
Pretraining of Patient Representations On Structured Electronic Health Records for Patient Outcome Prediction: case study as self-harm screening tool.
Yang, Z.; and Hong, Y.
In Washington DC USA, June 2022.
ARM2022
link bibtex
link bibtex
@inproceedings{yang_pretraining_2022, address = {Washington DC USA}, title = {Pretraining of {Patient} {Representations} {On} {Structured} {Electronic} {Health} {Records} for {Patient} {Outcome} {Prediction}: case study as self-harm screening tool}, shorttitle = {{ARM} 2022}, author = {Yang, Zhichao and Hong, Yu}, month = jun, year = {2022}, note = {ARM2022}, }
Risk Factors Associated with Nonfatal Opioid Overdose Leading to Intensive Care Unit Admission: A Cross-Sectional Study.
Mitra, A.; Ahsan, H.; Li, W.; Liu, W.; Kerns, R. D.; Tsai, J.; Becker, W. C.; Smelson, D. A.; and Yu, H.
In Washington DC USA, June 2022.
ARM 2022
link bibtex
link bibtex
@inproceedings{mitra_risk_2022, address = {Washington DC USA}, title = {Risk {Factors} {Associated} with {Nonfatal} {Opioid} {Overdose} {Leading} to {Intensive} {Care} {Unit} {Admission}: {A} {Cross}-{Sectional} {Study}}, shorttitle = {{ARM} 2022}, author = {Mitra, Avijit and Ahsan, Hiba and Li, Wenjun and Liu, Weisong and Kerns, Robert D. and Tsai, Jack and Becker, William C. and Smelson, David A. and Yu, Hong}, month = jun, year = {2022}, note = {ARM 2022}, }
SBDH and Suicide: A Multi-Task Learning Framework for SBDH Detection in Electronic Health Records Using NLP.
Mitra, A.; Rawat, B. P. S.; Druhl, E. B.; Keating, H.; Goodwin, R.; Hu, W.; Liu, W.; Tsai, J.; Smelson, D. A.; and Yu, H.
In Washington DC USA, June 2022.
ARM 2022
link bibtex
link bibtex
@inproceedings{mitra_sbdh_2022, address = {Washington DC USA}, title = {{SBDH} and {Suicide}: {A} {Multi}-{Task} {Learning} {Framework} for {SBDH} {Detection} in {Electronic} {Health} {Records} {Using} {NLP}}, shorttitle = {{ARM} 2022}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and Druhl, Emily B. and Keating, Heather and Goodwin, Raelene and Hu, Wen and Liu, Weisong and Tsai, Jack and Smelson, David A. and Yu, Hong}, month = jun, year = {2022}, note = {ARM 2022}, }
Studying Association of Traumatic Brain Injury and Posttraumatic Stress Disorder Diagnoses with Hospitalized Self-Harm Among US Veterans, 2008-2017.
Rawat, B. P. S.; Reisman, J.; Rongali, S.; Liu, W.; Yu, H.; and Carlson, K.
In Washington DC USA, June 2022.
ARM 2022 (Poster)
link bibtex
link bibtex
@inproceedings{rawat_studying_2022, address = {Washington DC USA}, title = {Studying {Association} of {Traumatic} {Brain} {Injury} and {Posttraumatic} {Stress} {Disorder} {Diagnoses} with {Hospitalized} {Self}-{Harm} {Among} {US} {Veterans}, 2008-2017}, shorttitle = {{ARM} 2022}, author = {Rawat, Bhanu Pratap Singh and Reisman, Joel and Rongali, Subendhu and Liu, Weisong and Yu, Hong and Carlson, Kathleen}, month = jun, year = {2022}, note = {ARM 2022 (Poster)}, }
NLP and Annie App for Social Determinants of Health.
Mahapatra, S.; Chen, H.; Tsai, J.; and Yu, H.
In Houston TX USA, May 2022.
AMIA Clinical Informatics 2022
link bibtex
link bibtex
@inproceedings{mahapatra_nlp_2022, address = {Houston TX USA}, title = {{NLP} and {Annie} {App} for {Social} {Determinants} of {Health}}, author = {Mahapatra, Sneha and Chen, Huan-Yuan and Tsai, Jack and Yu, Hong}, month = may, year = {2022}, note = {AMIA Clinical Informatics 2022}, }
EASE: A Tool to Extract Social Determinants of Health from Electronic Health Records.
Rawat, B. P. S.; and Yu, H.
In Houston TX USA, May 2022.
AMIA Clinical Informatics 2022 (System Demo)
link bibtex
link bibtex
@inproceedings{rawat_ease_2022, address = {Houston TX USA}, title = {{EASE}: {A} {Tool} to {Extract} {Social} {Determinants} of {Health} from {Electronic} {Health} {Records}}, author = {Rawat, Bhanu Pratap Singh and Yu, Hong}, month = may, year = {2022}, note = {AMIA Clinical Informatics 2022 (System Demo)}, }
The association of prescribed long-acting versus short-acting opioids and mortality among older adults.
Sung, M.; Smirnova, J.; Li, W.; Liu, W.; Kerns, R. D.; Reisman, J. I.; Yu, H.; and Becker, W. C.
In Society of General Internal Medicine Annual National Meeting, Orlando, Florida, USA, April 2022.
link bibtex
link bibtex
@inproceedings{sung_association_2022, address = {Orlando, Florida, USA}, title = {The association of prescribed long-acting versus short-acting opioids and mortality among older adults}, booktitle = {Society of {General} {Internal} {Medicine} {Annual} {National} {Meeting}}, author = {Sung, Minhee and Smirnova, Jimin and Li, Wenjun and Liu, Weisong and Kerns, Robert D. and Reisman, Joel I. and Yu, Hong and Becker, William C.}, month = apr, year = {2022}, }
EHR Cohort Development Using Natural Language Processing For Identifying Symptoms Of Alzheimer's Disease.
Yu, H.; Mitra, A.; Keating, H.; Liu, W.; Hu, W.; Xia, W.; Morin, P.; Berlowitz, D. R.; Bray, M.; Monfared, A.; and Zhang, Q.
In Barcelona, Spain (Online), March 2022.
AD/PD 2022
link bibtex
link bibtex
@inproceedings{yu_ehr_2022, address = {Barcelona, Spain (Online)}, title = {{EHR} {Cohort} {Development} {Using} {Natural} {Language} {Processing} {For} {Identifying} {Symptoms} {Of} {Alzheimer}'s {Disease}}, shorttitle = {{AD}/{PD} 2022}, author = {Yu, Hong and Mitra, Avijit and Keating, Heather and Liu, Weisong and Hu, Wen and Xia, Weiming and Morin, Peter and Berlowitz, Dan R. and Bray, Margaret and Monfared, Amir and Zhang, Quanwu}, month = mar, year = {2022}, note = {AD/PD 2022}, }
2021
(9)
Evaluating the effectiveness of noteaid in a community hospital setting: randomized trial of electronic health record note comprehension interventions with patients.
Lalor, J. P; Hu, W.; Tran, M.; Wu, H.; Mazor, K. M; and Yu, H.
Journal of Medical Internet Research, 23(5). 2021.
Paper
doi
link
bibtex
abstract
@article{lalor_evaluating_2021, title = {Evaluating the effectiveness of noteaid in a community hospital setting: randomized trial of electronic health record note comprehension interventions with patients}, volume = {23}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8160802/}, doi = {10.2196/26354}, abstract = {Background: Interventions to define medical jargon have been shown to improve electronic health record (EHR) note comprehension among crowdsourced participants on Amazon Mechanical Turk (AMT). However, AMT participants may not be representative of the general population or patients who are most at-risk for low health literacy. Objective: In this work, we assessed the efficacy of an intervention (NoteAid) for EHR note comprehension among participants in a community hospital setting. Methods: Participants were recruited from Lowell General Hospital (LGH), a community hospital in Massachusetts, to take the ComprehENotes test, a web-based test of EHR note comprehension. Participants were randomly assigned to control (n=85) or intervention (n=89) groups to take the test without or with NoteAid, respectively. For comparison, we used a sample of 200 participants recruited from AMT to take the ComprehENotes test (100 in the control group and 100 in the intervention group). Results: A total of 174 participants were recruited from LGH, and 200 participants were recruited from AMT. Participants in both intervention groups (community hospital and AMT) scored significantly higher than participants in the control groups (P{\textless}.001). The average score for the community hospital participants was significantly lower than the average score for the AMT participants (P{\textless}.001), consistent with the lower education levels in the community hospital sample. Education level had a significant effect on scores for the community hospital participants (P{\textless}.001). Conclusions: Use of NoteAid was associated with significantly improved EHR note comprehension in both community hospital and AMT samples. Our results demonstrate the generalizability of ComprehENotes as a test of EHR note comprehension and the effectiveness of NoteAid for improving EHR note comprehension.}, number = {5}, journal = {Journal of Medical Internet Research}, author = {Lalor, John P and Hu, Wen and Tran, Matthew and Wu, Hao and Mazor, Kathleen M and Yu, Hong}, year = {2021}, pmid = {33983124}, pmcid = {8160802}, }
Background: Interventions to define medical jargon have been shown to improve electronic health record (EHR) note comprehension among crowdsourced participants on Amazon Mechanical Turk (AMT). However, AMT participants may not be representative of the general population or patients who are most at-risk for low health literacy. Objective: In this work, we assessed the efficacy of an intervention (NoteAid) for EHR note comprehension among participants in a community hospital setting. Methods: Participants were recruited from Lowell General Hospital (LGH), a community hospital in Massachusetts, to take the ComprehENotes test, a web-based test of EHR note comprehension. Participants were randomly assigned to control (n=85) or intervention (n=89) groups to take the test without or with NoteAid, respectively. For comparison, we used a sample of 200 participants recruited from AMT to take the ComprehENotes test (100 in the control group and 100 in the intervention group). Results: A total of 174 participants were recruited from LGH, and 200 participants were recruited from AMT. Participants in both intervention groups (community hospital and AMT) scored significantly higher than participants in the control groups (P\textless.001). The average score for the community hospital participants was significantly lower than the average score for the AMT participants (P\textless.001), consistent with the lower education levels in the community hospital sample. Education level had a significant effect on scores for the community hospital participants (P\textless.001). Conclusions: Use of NoteAid was associated with significantly improved EHR note comprehension in both community hospital and AMT samples. Our results demonstrate the generalizability of ComprehENotes as a test of EHR note comprehension and the effectiveness of NoteAid for improving EHR note comprehension.
MIMIC-SBDH: A Dataset for Social and Behavioral Determinants of Health.
Ahsan, H.; Ohnuki, E.; Mitra, A.; and Yu, H.
Proceedings of Machine Learning Research, 149: 391–413. August 2021.
Paper
link
bibtex
abstract
@article{ahsan_mimic-sbdh_2021, title = {{MIMIC}-{SBDH}: {A} {Dataset} for {Social} and {Behavioral} {Determinants} of {Health}}, volume = {149}, issn = {2640-3498}, shorttitle = {{MIMIC}-{SBDH}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8734043/}, abstract = {Social and Behavioral Determinants of Health (SBDHs) are environmental and behavioral factors that have a profound impact on health and related outcomes. Given their importance, physicians document SBDHs of their patients in Electronic Health Records (EHRs). However, SBDHs are mostly documented in unstructured EHR notes. Determining the status of the SBDHs requires manually reviewing the notes which can be a tedious process. Therefore, there is a need to automate identifying the patients' SBDH status in EHR notes. In this work, we created MIMIC-SBDH, the first publicly available dataset of EHR notes annotated for patients' SBDH status. Specifically, we annotated 7,025 discharge summary notes for the status of 7 SBDHs as well as marked SBDH-related keywords. Using this annotated data for training and evaluation, we evaluated the performance of three machine learning models (Random Forest, XGBoost, and Bio-ClinicalBERT) on the task of identifying SBDH status in EHR notes. The performance ranged from the lowest 0.69 F1 score for Drug Use to the highest 0.96 F1 score for Community-Present. In addition to standard evaluation metrics such as the F1 score, we evaluated four capabilities that a model must possess to perform well on the task using the CheckList tool (Ribeiro et al., 2020). The results revealed several shortcomings of the models. Our results highlighted the need to perform more capability-centric evaluations in addition to standard metric comparisons.}, language = {eng}, journal = {Proceedings of Machine Learning Research}, author = {Ahsan, Hiba and Ohnuki, Emmie and Mitra, Avijit and Yu, Hong}, month = aug, year = {2021}, pmid = {35005628}, pmcid = {PMC8734043}, pages = {391--413}, }
Social and Behavioral Determinants of Health (SBDHs) are environmental and behavioral factors that have a profound impact on health and related outcomes. Given their importance, physicians document SBDHs of their patients in Electronic Health Records (EHRs). However, SBDHs are mostly documented in unstructured EHR notes. Determining the status of the SBDHs requires manually reviewing the notes which can be a tedious process. Therefore, there is a need to automate identifying the patients' SBDH status in EHR notes. In this work, we created MIMIC-SBDH, the first publicly available dataset of EHR notes annotated for patients' SBDH status. Specifically, we annotated 7,025 discharge summary notes for the status of 7 SBDHs as well as marked SBDH-related keywords. Using this annotated data for training and evaluation, we evaluated the performance of three machine learning models (Random Forest, XGBoost, and Bio-ClinicalBERT) on the task of identifying SBDH status in EHR notes. The performance ranged from the lowest 0.69 F1 score for Drug Use to the highest 0.96 F1 score for Community-Present. In addition to standard evaluation metrics such as the F1 score, we evaluated four capabilities that a model must possess to perform well on the task using the CheckList tool (Ribeiro et al., 2020). The results revealed several shortcomings of the models. Our results highlighted the need to perform more capability-centric evaluations in addition to standard metric comparisons.
Risk Factors Associated With Nonfatal Opioid Overdose Leading to Intensive Care Unit Admission: A Cross-sectional Study.
Mitra, A.; Ahsan, H.; Li, W.; Liu, W.; Kerns, R. D.; Tsai, J.; Becker, W.; Smelson, D. A.; and Yu, H.
JMIR medical informatics, 9(11): e32851. November 2021.
Paper
doi
link
bibtex
abstract
@article{mitra_risk_2021, title = {Risk {Factors} {Associated} {With} {Nonfatal} {Opioid} {Overdose} {Leading} to {Intensive} {Care} {Unit} {Admission}: {A} {Cross}-sectional {Study}}, volume = {9}, issn = {2291-9694}, shorttitle = {Risk {Factors} {Associated} {With} {Nonfatal} {Opioid} {Overdose} {Leading} to {Intensive} {Care} {Unit} {Admission}}, url = {https://pubmed.ncbi.nlm.nih.gov/34747714/}, doi = {10.2196/32851}, abstract = {BACKGROUND: Opioid overdose (OD) and related deaths have significantly increased in the United States over the last 2 decades. Existing studies have mostly focused on demographic and clinical risk factors in noncritical care settings. Social and behavioral determinants of health (SBDH) are infrequently coded in the electronic health record (EHR) and usually buried in unstructured EHR notes, reflecting possible gaps in clinical care and observational research. Therefore, SBDH often receive less attention despite being important risk factors for OD. Natural language processing (NLP) can alleviate this problem. OBJECTIVE: The objectives of this study were two-fold: First, we examined the usefulness of NLP for SBDH extraction from unstructured EHR text, and second, for intensive care unit (ICU) admissions, we investigated risk factors including SBDH for nonfatal OD. METHODS: We performed a cross-sectional analysis of admission data from the EHR of patients in the ICU of Beth Israel Deaconess Medical Center between 2001 and 2012. We used patient admission data and International Classification of Diseases, Ninth Revision (ICD-9) diagnoses to extract demographics, nonfatal OD, SBDH, and other clinical variables. In addition to obtaining SBDH information from the ICD codes, an NLP model was developed to extract 6 SBDH variables from EHR notes, namely, housing insecurity, unemployment, social isolation, alcohol use, smoking, and illicit drug use. We adopted a sequential forward selection process to select relevant clinical variables. Multivariable logistic regression analysis was used to evaluate the associations with nonfatal OD, and relative risks were quantified as covariate-adjusted odds ratios (aOR). RESULTS: The strongest association with nonfatal OD was found to be drug use disorder (aOR 8.17, 95\% CI 5.44-12.27), followed by bipolar disorder (aOR 2.69, 95\% CI 1.68-4.29). Among others, major depressive disorder (aOR 2.57, 95\% CI 1.12-5.88), being on a Medicaid health insurance program (aOR 2.26, 95\% CI 1.43-3.58), history of illicit drug use (aOR 2.09, 95\% CI 1.15-3.79), and current use of illicit drugs (aOR 2.06, 95\% CI 1.20-3.55) were strongly associated with increased risk of nonfatal OD. Conversely, Blacks (aOR 0.51, 95\% CI 0.28-0.94), older age groups (40-64 years: aOR 0.65, 95\% CI 0.44-0.96; {\textgreater}64 years: aOR 0.16, 95\% CI 0.08-0.34) and those with tobacco use disorder (aOR 0.53, 95\% CI 0.32-0.89) or alcohol use disorder (aOR 0.64, 95\% CI 0.42-1.00) had decreased risk of nonfatal OD. Moreover, 99.82\% of all SBDH information was identified by the NLP model, in contrast to only 0.18\% identified by the ICD codes. CONCLUSIONS: This is the first study to analyze the risk factors for nonfatal OD in an ICU setting using NLP-extracted SBDH from EHR notes. We found several risk factors associated with nonfatal OD including SBDH. SBDH are richly described in EHR notes, supporting the importance of integrating NLP-derived SBDH into OD risk assessment. More studies in ICU settings can help health care systems better understand and respond to the opioid epidemic.}, language = {eng}, number = {11}, journal = {JMIR medical informatics}, author = {Mitra, Avijit and Ahsan, Hiba and Li, Wenjun and Liu, Weisong and Kerns, Robert D. and Tsai, Jack and Becker, William and Smelson, David A. and Yu, Hong}, month = nov, year = {2021}, pmid = {34747714}, keywords = {electronic health records, intensive care unit, natural language processing, opioids, overdose, risk factors, social and behavioral determinants of health}, pages = {e32851}, }
BACKGROUND: Opioid overdose (OD) and related deaths have significantly increased in the United States over the last 2 decades. Existing studies have mostly focused on demographic and clinical risk factors in noncritical care settings. Social and behavioral determinants of health (SBDH) are infrequently coded in the electronic health record (EHR) and usually buried in unstructured EHR notes, reflecting possible gaps in clinical care and observational research. Therefore, SBDH often receive less attention despite being important risk factors for OD. Natural language processing (NLP) can alleviate this problem. OBJECTIVE: The objectives of this study were two-fold: First, we examined the usefulness of NLP for SBDH extraction from unstructured EHR text, and second, for intensive care unit (ICU) admissions, we investigated risk factors including SBDH for nonfatal OD. METHODS: We performed a cross-sectional analysis of admission data from the EHR of patients in the ICU of Beth Israel Deaconess Medical Center between 2001 and 2012. We used patient admission data and International Classification of Diseases, Ninth Revision (ICD-9) diagnoses to extract demographics, nonfatal OD, SBDH, and other clinical variables. In addition to obtaining SBDH information from the ICD codes, an NLP model was developed to extract 6 SBDH variables from EHR notes, namely, housing insecurity, unemployment, social isolation, alcohol use, smoking, and illicit drug use. We adopted a sequential forward selection process to select relevant clinical variables. Multivariable logistic regression analysis was used to evaluate the associations with nonfatal OD, and relative risks were quantified as covariate-adjusted odds ratios (aOR). RESULTS: The strongest association with nonfatal OD was found to be drug use disorder (aOR 8.17, 95% CI 5.44-12.27), followed by bipolar disorder (aOR 2.69, 95% CI 1.68-4.29). Among others, major depressive disorder (aOR 2.57, 95% CI 1.12-5.88), being on a Medicaid health insurance program (aOR 2.26, 95% CI 1.43-3.58), history of illicit drug use (aOR 2.09, 95% CI 1.15-3.79), and current use of illicit drugs (aOR 2.06, 95% CI 1.20-3.55) were strongly associated with increased risk of nonfatal OD. Conversely, Blacks (aOR 0.51, 95% CI 0.28-0.94), older age groups (40-64 years: aOR 0.65, 95% CI 0.44-0.96; \textgreater64 years: aOR 0.16, 95% CI 0.08-0.34) and those with tobacco use disorder (aOR 0.53, 95% CI 0.32-0.89) or alcohol use disorder (aOR 0.64, 95% CI 0.42-1.00) had decreased risk of nonfatal OD. Moreover, 99.82% of all SBDH information was identified by the NLP model, in contrast to only 0.18% identified by the ICD codes. CONCLUSIONS: This is the first study to analyze the risk factors for nonfatal OD in an ICU setting using NLP-extracted SBDH from EHR notes. We found several risk factors associated with nonfatal OD including SBDH. SBDH are richly described in EHR notes, supporting the importance of integrating NLP-derived SBDH into OD risk assessment. More studies in ICU settings can help health care systems better understand and respond to the opioid epidemic.
SBDH and Suicide: A Multi-task Learning Framework for SBDH in Electronic Health Records.
Mitra, A.; Rawat, B. P. S.; Druhl, E.; Keating, H.; Goodwin, R.; Hu, W.; Liu, W.; Tsai, J.; Smelson, D. A.; and Yu, H.
In Online, October 2021.
SciNLP 2021
link bibtex
link bibtex
@inproceedings{mitra_sbdh_2021, address = {Online}, title = {{SBDH} and {Suicide}: {A} {Multi}-task {Learning} {Framework} for {SBDH} in {Electronic} {Health} {Records}}, shorttitle = {{SciNLP} 2021}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and Druhl, Emily and Keating, Heather and Goodwin, Raelene and Hu, Wen and Liu, Weisong and Tsai, Jack and Smelson, David A. and Yu, Hong}, month = oct, year = {2021}, note = {SciNLP 2021}, }
Membership Inference Attack Susceptibility of Clinical Language Models.
Jagannatha, A.; Rawat, B. P. S.; and Yu, H.
CoRR, abs/2104.08305. 2021.
arXiv: 2104.08305
Paper
link
bibtex
abstract
@article{jagannatha_membership_2021, title = {Membership {Inference} {Attack} {Susceptibility} of {Clinical} {Language} {Models}}, volume = {abs/2104.08305}, url = {https://arxiv.org/abs/2104.08305}, abstract = {Deep Neural Network (DNN) models have been shown to have high empirical privacy leakages. Clinical language models (CLMs) trained on clinical data have been used to improve performance in biomedical natural language processing tasks. In this work, we investigate the risks of training-data leakage through white-box or black-box access to CLMs. We design and employ membership inference attacks to estimate the empirical privacy leaks for model architectures like BERT and GPT2. We show that membership inference attacks on CLMs lead to non-trivial privacy leakages of up to 7\%. Our results show that smaller models have lower empirical privacy leakages than larger ones, and masked LMs have lower leakages than auto-regressive LMs. We further show that differentially private CLMs can have improved model utility on clinical domain while ensuring low empirical privacy leakage. Lastly, we also study the effects of group-level membership inference and disease rarity on CLM privacy leakages.}, journal = {CoRR}, author = {Jagannatha, Abhyuday and Rawat, Bhanu Pratap Singh and Yu, Hong}, year = {2021}, note = {arXiv: 2104.08305}, }
Deep Neural Network (DNN) models have been shown to have high empirical privacy leakages. Clinical language models (CLMs) trained on clinical data have been used to improve performance in biomedical natural language processing tasks. In this work, we investigate the risks of training-data leakage through white-box or black-box access to CLMs. We design and employ membership inference attacks to estimate the empirical privacy leaks for model architectures like BERT and GPT2. We show that membership inference attacks on CLMs lead to non-trivial privacy leakages of up to 7%. Our results show that smaller models have lower empirical privacy leakages than larger ones, and masked LMs have lower leakages than auto-regressive LMs. We further show that differentially private CLMs can have improved model utility on clinical domain while ensuring low empirical privacy leakage. Lastly, we also study the effects of group-level membership inference and disease rarity on CLM privacy leakages.
Guideline-discordant dosing of direct-acting oral anticoagulants in the veterans health administration.
Rose, A. J.; Lee, J. S.; Berlowitz, D. R.; Liu, W.; Mitra, A.; and Yu, H.
BMC Health Services Research, 21(1): 1351. December 2021.
Paper
doi
link
bibtex
abstract
@article{rose_guideline-discordant_2021, title = {Guideline-discordant dosing of direct-acting oral anticoagulants in the veterans health administration}, volume = {21}, issn = {1472-6963}, url = {https://doi.org/10.1186/s12913-021-07397-x}, doi = {10.1186/s12913-021-07397-x}, abstract = {Clear guidelines exist to guide the dosing of direct-acting oral anticoagulants (DOACs). It is not known how consistently these guidelines are followed in practice.}, number = {1}, urldate = {2022-01-24}, journal = {BMC Health Services Research}, author = {Rose, Adam J. and Lee, Jong Soo and Berlowitz, Dan R. and Liu, Weisong and Mitra, Avijit and Yu, Hong}, month = dec, year = {2021}, keywords = {Anticoagulants, Atrial fibrillation, Medication therapy management, Quality of health care}, pages = {1351}, }
Clear guidelines exist to guide the dosing of direct-acting oral anticoagulants (DOACs). It is not known how consistently these guidelines are followed in practice.
Improving Formality Style Transfer with Context-Aware Rule Injection.
Yao, Z.; and Yu, H.
In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1561–1570, Online, August 2021. Association for Computational Linguistics
Paper
doi
link
bibtex
abstract
@inproceedings{yao_improving_2021, address = {Online}, title = {Improving {Formality} {Style} {Transfer} with {Context}-{Aware} {Rule} {Injection}}, url = {https://aclanthology.org/2021.acl-long.124}, doi = {10.18653/v1/2021.acl-long.124}, abstract = {Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST) by injecting multiple rules into an end-to-end BERT-based encoder and decoder model. CARI is able to learn to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models' performance on several tweet sentiment analysis tasks. Our contributions are as follows: 1.We propose a new method, CARI, to integrate rules for pre-trained language models. CARI is context-aware and can trained end-to-end with the downstream NLP applications. 2.We have achieved new state-of-the-art results for FST on the benchmark GYAFC dataset. 3.We are the first to evaluate FST methods with extrinsic evaluation and specifically on sentiment classification tasks. We show that CARI outperformed existing rule-based FST approaches for sentiment classification.}, urldate = {2021-09-21}, booktitle = {Proceedings of the 59th {Annual} {Meeting} of the {Association} for {Computational} {Linguistics} and the 11th {International} {Joint} {Conference} on {Natural} {Language} {Processing} ({Volume} 1: {Long} {Papers})}, publisher = {Association for Computational Linguistics}, author = {Yao, Zonghai and Yu, Hong}, month = aug, year = {2021}, pages = {1561--1570}, }
Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST) by injecting multiple rules into an end-to-end BERT-based encoder and decoder model. CARI is able to learn to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models' performance on several tweet sentiment analysis tasks. Our contributions are as follows: 1.We propose a new method, CARI, to integrate rules for pre-trained language models. CARI is context-aware and can trained end-to-end with the downstream NLP applications. 2.We have achieved new state-of-the-art results for FST on the benchmark GYAFC dataset. 3.We are the first to evaluate FST methods with extrinsic evaluation and specifically on sentiment classification tasks. We show that CARI outperformed existing rule-based FST approaches for sentiment classification.
Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study.
Mitra, A.; Rawat, B. P. S.; McManus, D. D.; and Yu, H.
JMIR Medical Informatics, 9(7): e27527. July 2021.
Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada
Paper
doi
link
bibtex
abstract
@article{mitra_relation_2021, title = {Relation {Classification} for {Bleeding} {Events} {From} {Electronic} {Health} {Records} {Using} {Deep} {Learning} {Systems}: {An} {Empirical} {Study}}, volume = {9}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work ("first published in the Journal of Medical Internet Research...") is properly cited with original URL and bibliographic citation information. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.}, shorttitle = {Relation {Classification} for {Bleeding} {Events} {From} {Electronic} {Health} {Records} {Using} {Deep} {Learning} {Systems}}, url = {https://medinform.jmir.org/2021/7/e27527}, doi = {10.2196/27527}, abstract = {Background: Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities. Objective: In this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event–related relation classification. Methods: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT). Results: Our experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4\% (P\<.001) and 7.9\% (P\<.001), respectively. Conclusions: In this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation}, language = {EN}, number = {7}, urldate = {2021-07-02}, journal = {JMIR Medical Informatics}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and McManus, David D. and Yu, Hong}, month = jul, year = {2021}, note = {Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada}, pages = {e27527}, }
Background: Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities. Objective: In this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event–related relation classification. Methods: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT). Results: Our experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4% (P<.001) and 7.9% (P<.001), respectively. Conclusions: In this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation
Epinoter: A Natural Language Processing Tool for Epidemiological Studies.
Liu, W.; Li, F.; Jin, Y.; Granillo, E.; Yarzebski, J.; Li, W.; and Yu, H.
In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, volume 5, pages 754–761, February 2021.
link bibtex
link bibtex
@inproceedings{liu_epinoter_2021, title = {Epinoter: {A} {Natural} {Language} {Processing} {Tool} for {Epidemiological} {Studies}.}, volume = {5}, booktitle = {Proceedings of the 14th {International} {Joint} {Conference} on {Biomedical} {Engineering} {Systems} and {Technologies}}, author = {Liu, Weisong and Li, Fei and Jin, Yonghao and Granillo, Edgard and Yarzebski, Jorge and Li, Wenjun and Yu, Hong}, month = feb, year = {2021}, pages = {754--761}, }
2020
(15)
Inferring ADR causality by predicting the Naranjo Score from Clinical Notes.
Rawat, B. P. S.; Jagannatha, A.; Liu, F.; and Yu, H.
In AMIA Fall Symposium, pages 1041–1049, 2020.
Paper
link
bibtex
abstract
@inproceedings{rawat_inferring_2020, title = {Inferring {ADR} causality by predicting the {Naranjo} {Score} from {Clinical} {Notes}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075501/}, abstract = {Clinical judgment studies are an integral part of drug safety surveillance and pharmacovigilance frameworks. They help quantify the causal relationship between medication and its adverse drug reactions (ADRs). To conduct such studies, physicians need to review patients’ charts manually to answer Naranjo questionnaire1. In this paper, we propose a methodology to automatically infer causal relations from patients’ discharge summaries by combining the capabilities of deep learning and statistical learning models. We use Bidirectional Encoder Representations from Transformers (BERT)2 to extract relevant paragraphs for each Naranjo question and then use a statistical learning model such as logistic regression to predict the Naranjo score and the causal relation between the medication and an ADR. Our methodology achieves a macro-averaged f1-score of 0.50 and weighted f1-score of 0.63.}, booktitle = {{AMIA} {Fall} {Symposium}}, author = {Rawat, Bhanu Pratap Singh and Jagannatha, Abhyuday and Liu, Feifan and Yu, Hong}, year = {2020}, pmcid = {PMC8075501}, pmid = {33936480}, pages = {1041--1049}, }
Clinical judgment studies are an integral part of drug safety surveillance and pharmacovigilance frameworks. They help quantify the causal relationship between medication and its adverse drug reactions (ADRs). To conduct such studies, physicians need to review patients’ charts manually to answer Naranjo questionnaire1. In this paper, we propose a methodology to automatically infer causal relations from patients’ discharge summaries by combining the capabilities of deep learning and statistical learning models. We use Bidirectional Encoder Representations from Transformers (BERT)2 to extract relevant paragraphs for each Naranjo question and then use a statistical learning model such as logistic regression to predict the Naranjo score and the causal relation between the medication and an ADR. Our methodology achieves a macro-averaged f1-score of 0.50 and weighted f1-score of 0.63.
Calibrating Structured Output Predictors for Natural Language Processing.
Jagannatha, A.; and Yu, H.
In 2020 Annual Conference of the Association for Computational Linguistics (ACL), volume Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2078–2092, July 2020.
NIHMSID: NIHMS1661932
Paper
doi
link
bibtex
abstract
@inproceedings{jagannatha_calibrating_2020, title = {Calibrating {Structured} {Output} {Predictors} for {Natural} {Language} {Processing}.}, volume = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, url = {https://aclanthology.org/2020.acl-main.188}, doi = {10.18653/v1/2020.acl-main.188}, abstract = {We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the system is to be deployed in a safety-critical domain such as healthcare. However, the output space of such structured prediction models is often too large to adapt binary or multi-class calibration methods directly. In this study, we propose a general calibration scheme for output entities of interest in neural-network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering. We also improve our model's performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-of-domain test scenarios as well.}, booktitle = {2020 {Annual} {Conference} of the {Association} for {Computational} {Linguistics} ({ACL})}, author = {Jagannatha, Abhyuday and Yu, Hong}, month = jul, year = {2020}, pmcid = {PMC7890517}, pmid = {33612961}, note = {NIHMSID: NIHMS1661932}, pages = {2078--2092}, }
We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the system is to be deployed in a safety-critical domain such as healthcare. However, the output space of such structured prediction models is often too large to adapt binary or multi-class calibration methods directly. In this study, we propose a general calibration scheme for output entities of interest in neural-network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering. We also improve our model's performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-of-domain test scenarios as well.
Conversational machine comprehension: a literature review.
Gupta, S.; Rawat, B. P. S.; and Yu, H.
arXiv preprint arXiv:2006.00671,2739–2753. December 2020.
COLING 2020
Paper
doi
link
bibtex
abstract
@article{gupta_conversational_2020, title = {Conversational machine comprehension: a literature review}, shorttitle = {Conversational machine comprehension}, url = {https://aclanthology.org/2020.coling-main.247}, doi = {10.18653/v1/2020.coling-main.247}, abstract = {Conversational Machine Comprehension (CMC), a research track in conversational AI, expects the machine to understand an open-domain natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. While most of the research in Machine Reading Comprehension (MRC) revolves around single-turn question answering (QA), multi-turn CMC has recently gained prominence, thanks to the advancement in natural language understanding via neural language models such as BERT and the introduction of large-scale conversational datasets such as CoQA and QuAC. The rise in interest has, however, led to a flurry of concurrent publications, each with a different yet structurally similar modeling approach and an inconsistent view of the surrounding literature. With the volume of model submissions to conversational datasets increasing every year, there exists a need to consolidate the scattered knowledge in this domain to streamline future research. This literature review attempts at providing a holistic overview of CMC with an emphasis on the common trends across recently published models, specifically in their approach to tackling conversational history. The review synthesizes a generic framework for CMC models while highlighting the differences in recent approaches and intends to serve as a compendium of CMC for future researchers.}, journal = {arXiv preprint arXiv:2006.00671}, author = {Gupta, Somil and Rawat, Bhanu Pratap Singh and Yu, Hong}, month = dec, year = {2020}, note = {COLING 2020}, pages = {2739--2753}, }
Conversational Machine Comprehension (CMC), a research track in conversational AI, expects the machine to understand an open-domain natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. While most of the research in Machine Reading Comprehension (MRC) revolves around single-turn question answering (QA), multi-turn CMC has recently gained prominence, thanks to the advancement in natural language understanding via neural language models such as BERT and the introduction of large-scale conversational datasets such as CoQA and QuAC. The rise in interest has, however, led to a flurry of concurrent publications, each with a different yet structurally similar modeling approach and an inconsistent view of the surrounding literature. With the volume of model submissions to conversational datasets increasing every year, there exists a need to consolidate the scattered knowledge in this domain to streamline future research. This literature review attempts at providing a holistic overview of CMC with an emphasis on the common trends across recently published models, specifically in their approach to tackling conversational history. The review synthesizes a generic framework for CMC models while highlighting the differences in recent approaches and intends to serve as a compendium of CMC for future researchers.
Bleeding Entity Recognition in Electronic Health Records: A Comprehensive Analysis of End-to-End Systems.
Mitra, A.; Rawat, B. P. S.; McManus, D.; Kapoor, A.; and Yu, H.
In AMIA Annu Symp Proc, pages 860–869, 2020.
Paper
link
bibtex
abstract
@inproceedings{mitra_bleeding_2020, title = {Bleeding {Entity} {Recognition} in {Electronic} {Health} {Records}: {A} {Comprehensive} {Analysis} of {End}-to-{End} {Systems}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075442/}, abstract = {A bleeding event is a common adverse drug reaction amongst patients on anticoagulation and factors critically into a clinician's decision to prescribe or continue anticoagulation for atrial fibrillation. However, bleeding events are not uniformly captured in the administrative data of electronic health records (EHR). As manual review is prohibitively expensive, we investigate the effectiveness of various natural language processing (NLP) methods for automatic extraction of bleeding events. Using our expert-annotated 1,079 de-identified EHR notes, we evaluated state-of-the-art NLP models such as biLSTM-CRF with language modeling, and different BERT variants for six entity types. On our dataset, the biLSTM-CRF surpassed other models resulting in a macro F1-score of 0.75 whereas the performance difference is negligible for sentence and document-level predictions with the best macro F1-scores of 0.84 and 0.96, respectively. Our error analyses suggest that the models' incorrect predictions can be attributed to variability in entity spans, memorization, and missing negation signals.}, booktitle = {{AMIA} {Annu} {Symp} {Proc}}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and McManus, David and Kapoor, Alok and Yu, Hong}, year = {2020}, pmid = {33936461 PMCID: PMC8075442}, pages = {860--869}, }
A bleeding event is a common adverse drug reaction amongst patients on anticoagulation and factors critically into a clinician's decision to prescribe or continue anticoagulation for atrial fibrillation. However, bleeding events are not uniformly captured in the administrative data of electronic health records (EHR). As manual review is prohibitively expensive, we investigate the effectiveness of various natural language processing (NLP) methods for automatic extraction of bleeding events. Using our expert-annotated 1,079 de-identified EHR notes, we evaluated state-of-the-art NLP models such as biLSTM-CRF with language modeling, and different BERT variants for six entity types. On our dataset, the biLSTM-CRF surpassed other models resulting in a macro F1-score of 0.75 whereas the performance difference is negligible for sentence and document-level predictions with the best macro F1-scores of 0.84 and 0.96, respectively. Our error analyses suggest that the models' incorrect predictions can be attributed to variability in entity spans, memorization, and missing negation signals.
Neural Multi-Task Learning for Adverse Drug Reaction Extraction.
Liu, F.; Zheng, X.; Yu, H.; and Tjia, J.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2020: 756–762. 2020.
Paper
link
bibtex
abstract
@article{liu_neural_2020, title = {Neural {Multi}-{Task} {Learning} for {Adverse} {Drug} {Reaction} {Extraction}}, volume = {2020}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075418/pdf/110_3417286.pdf}, abstract = {A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system.}, language = {eng}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Liu, Feifan and Zheng, Xiaoyu and Yu, Hong and Tjia, Jennifer}, year = {2020}, pmid = {33936450}, pmcid = {PMC8075418}, keywords = {Data Mining, Databases, Factual, Deep Learning, Drug-Related Side Effects and Adverse Reactions, Humans, Machine Learning}, pages = {756--762}, }
A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system.
BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.
Jin, Y.; Li, F.; and Yu, H.
In 2020 Annual Conference of the Association for Computational Linguistics (ACL), pages 95–100, July 2020.
NIHMSID: NIHMS1644629
Paper
doi
link
bibtex
abstract
@inproceedings{jin_bento_2020, title = {{BENTO}: {A} {Visual} {Platform} for {Building} {Clinical} {NLP} {Pipelines} {Based} on {CodaLab}.}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7679080/}, doi = {10.18653/v1/2020.acl-demos.13}, abstract = {CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.}, booktitle = {2020 {Annual} {Conference} of the {Association} for {Computational} {Linguistics} ({ACL})}, author = {Jin, Yonghao and Li, Fei and Yu, Hong}, month = jul, year = {2020}, pmcid = {PMC7679080}, pmid = {33223604}, note = {NIHMSID: NIHMS1644629}, pages = {95--100}, }
CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.
ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.
Li, F.; and Yu, H.
In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), pages 8180–8187, New York City, New York, February 2020.
doi link bibtex
doi link bibtex
@inproceedings{li_icd_2020, address = {New York City, New York}, title = {{ICD} {Coding} from {Clinical} {Text} {Using} {Multi}-{Filter} {Residual} {Convolutional} {Neural} {Network}.}, shorttitle = {{AAAI} 2020}, doi = {10.1609/AAAI.V34I05.6331}, booktitle = {The {Thirty}-{Fourth} {AAAI} {Conference} on {Artificial} {Intelligence} ({AAAI}-20)}, author = {Li, Fei and Yu, Hong}, month = feb, year = {2020}, keywords = {Computer Science - Computation and Language, Computer Science - Machine Learning}, pages = {8180--8187}, }
Improved Pretraining for Domain-specific Contextual Embedding Models.
Rongali, S.; Jagannatha, A.; Rawat, B. P. S.; and Yu, H.
CoRR, abs/2004.02288. 2020.
arXiv: 2004.02288
Paper
link
bibtex
@article{rongali_improved_2020, title = {Improved {Pretraining} for {Domain}-specific {Contextual} {Embedding} {Models}}, volume = {abs/2004.02288}, url = {https://arxiv.org/abs/2004.02288}, journal = {CoRR}, author = {Rongali, Subendhu and Jagannatha, Abhyuday and Rawat, Bhanu Pratap Singh and Yu, Hong}, year = {2020}, note = {arXiv: 2004.02288}, }
Neural data-to-text generation with dynamic content planning.
Chen, K.; Li, F.; Hu, B.; Peng, W.; Chen, Q.; Yu, H.; and Xiang, Y.
Knowledge-Based Systems,106610. November 2020.
Paper
doi
link
bibtex
abstract
@article{chen_neural_2020, title = {Neural data-to-text generation with dynamic content planning}, issn = {0950-7051}, url = {http://www.sciencedirect.com/science/article/pii/S0950705120307395}, doi = {10.1016/j.knosys.2020.106610}, abstract = {Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP 2 2This work was completed in cooperation with Baidu Inc.for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE and NBAZHN datasets, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.}, language = {en}, urldate = {2020-12-29}, journal = {Knowledge-Based Systems}, author = {Chen, Kai and Li, Fayuan and Hu, Baotian and Peng, Weihua and Chen, Qingcai and Yu, Hong and Xiang, Yang}, month = nov, year = {2020}, keywords = {Data-to-text, Dynamic content planning, Reconstruction mechanism}, pages = {106610}, }
Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP 2 2This work was completed in cooperation with Baidu Inc.for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE and NBAZHN datasets, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.
Generating Medical Assessments Using a Neural Network Model: Algorithm Development and Validation.
Hu, B.; Bajracharya, A.; and Yu, H.
JMIR Medical Informatics, 8(1): e14971. 2020.
Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada
Paper
doi
link
bibtex
abstract
@article{hu_generating_2020, title = {Generating {Medical} {Assessments} {Using} a {Neural} {Network} {Model}: {Algorithm} {Development} and {Validation}}, volume = {8}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Generating {Medical} {Assessments} {Using} a {Neural} {Network} {Model}}, url = {https://medinform.jmir.org/2020/1/e14971/}, doi = {10.2196/14971}, abstract = {Background: Since its inception, artificial intelligence has aimed to use computers to help make clinical diagnoses. Evidence-based medical reasoning is important for patient care. Inferring clinical diagnoses is a crucial step during the patient encounter. Previous works mainly used expert systems or machine learning–based methods to predict the International Classification of Diseases - Clinical Modification codes based on electronic health records. We report an alternative approach: inference of clinical diagnoses from patients’ reported symptoms and physicians’ clinical observations. Objective: We aimed to report a natural language processing system for generating medical assessments based on patient information described in the electronic health record (EHR) notes. Methods: We processed EHR notes into the Subjective, Objective, Assessment, and Plan sections. We trained a neural network model for medical assessment generation (N2MAG). Our N2MAG is an innovative deep neural model that uses the Subjective and Objective sections of an EHR note to automatically generate an “expert-like” assessment of the patient. N2MAG can be trained in an end-to-end fashion and does not require feature engineering and external knowledge resources. Results: We evaluated N2MAG and the baseline models both quantitatively and qualitatively. Evaluated by both the Recall-Oriented Understudy for Gisting Evaluation metrics and domain experts, our results show that N2MAG outperformed the existing state-of-the-art baseline models. Conclusions: N2MAG could generate a medical assessment from the Subject and Objective section descriptions in EHR notes. Future work will assess its potential for providing clinical decision support. [JMIR Med Inform 2020;8(1):e14971]}, language = {en}, number = {1}, urldate = {2020-04-07}, journal = {JMIR Medical Informatics}, author = {Hu, Baotian and Bajracharya, Adarsha and Yu, Hong}, year = {2020}, pmid = {31939742 PMCID: PMC7006435}, note = {Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada}, pages = {e14971}, }
Background: Since its inception, artificial intelligence has aimed to use computers to help make clinical diagnoses. Evidence-based medical reasoning is important for patient care. Inferring clinical diagnoses is a crucial step during the patient encounter. Previous works mainly used expert systems or machine learning–based methods to predict the International Classification of Diseases - Clinical Modification codes based on electronic health records. We report an alternative approach: inference of clinical diagnoses from patients’ reported symptoms and physicians’ clinical observations. Objective: We aimed to report a natural language processing system for generating medical assessments based on patient information described in the electronic health record (EHR) notes. Methods: We processed EHR notes into the Subjective, Objective, Assessment, and Plan sections. We trained a neural network model for medical assessment generation (N2MAG). Our N2MAG is an innovative deep neural model that uses the Subjective and Objective sections of an EHR note to automatically generate an “expert-like” assessment of the patient. N2MAG can be trained in an end-to-end fashion and does not require feature engineering and external knowledge resources. Results: We evaluated N2MAG and the baseline models both quantitatively and qualitatively. Evaluated by both the Recall-Oriented Understudy for Gisting Evaluation metrics and domain experts, our results show that N2MAG outperformed the existing state-of-the-art baseline models. Conclusions: N2MAG could generate a medical assessment from the Subject and Objective section descriptions in EHR notes. Future work will assess its potential for providing clinical decision support. [JMIR Med Inform 2020;8(1):e14971]
Dynamic Data Selection for Curriculum Learning via Ability Estimation.
Lalor, J. P.; and Yu, H.
In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 545–555, Online, November 2020. Association for Computational Linguistics
Paper
link
bibtex
abstract
@inproceedings{lalor_dynamic_2020, address = {Online}, title = {Dynamic {Data} {Selection} for {Curriculum} {Learning} via {Ability} {Estimation}}, url = {https://www.aclweb.org/anthology/2020.findings-emnlp.48}, abstract = {Curriculum learning methods typically rely on heuristics to estimate the difficulty of training examples or the ability of the model. In this work, we propose replacing difficulty heuristics with learned difficulty parameters. We also propose Dynamic Data selection for Curriculum Learning via Ability Estimation (DDaCLAE), a strategy that probes model ability at each training epoch to select the best training examples at that point. We show that models using learned difficulty and/or ability outperform heuristic-based curriculum learning models on the GLUE classification tasks.}, urldate = {2020-11-29}, booktitle = {Findings of the {Association} for {Computational} {Linguistics}: {EMNLP} 2020}, publisher = {Association for Computational Linguistics}, author = {Lalor, John P. and Yu, Hong}, month = nov, year = {2020}, pmid = {33381774 PMCID: PMC7771727}, pages = {545--555}, }
Curriculum learning methods typically rely on heuristics to estimate the difficulty of training examples or the ability of the model. In this work, we propose replacing difficulty heuristics with learned difficulty parameters. We also propose Dynamic Data selection for Curriculum Learning via Ability Estimation (DDaCLAE), a strategy that probes model ability at each training epoch to select the best training examples at that point. We show that models using learned difficulty and/or ability outperform heuristic-based curriculum learning models on the GLUE classification tasks.
Generating Accurate Electronic Health Assessment from Medical Graph.
Yang, Z.; and Yu, H.
In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3764–3773, Online, November 2020. Association for Computational Linguistics
NIHMSID: NIHMS1658452
Paper
link
bibtex
abstract
@inproceedings{yang_generating_2020, address = {Online}, title = {Generating {Accurate} {Electronic} {Health} {Assessment} from {Medical} {Graph}}, url = {https://www.aclweb.org/anthology/2020.findings-emnlp.336}, abstract = {One of the fundamental goals of artificial intelligence is to build computer-based expert systems. Inferring clinical diagnoses to generate a clinical assessment during a patient encounter is a crucial step towards building a medical diagnostic system. Previous works were mainly based on either medical domain-specific knowledge, or patients' prior diagnoses and clinical encounters. In this paper, we propose a novel model for automated clinical assessment generation (MCAG). MCAG is built on an innovative graph neural network, where rich clinical knowledge is incorporated into an end-to-end corpus-learning system. Our evaluation results against physician generated gold standard show that MCAG significantly improves the BLEU and rouge score compared with competitive baseline models. Further, physicians' evaluation showed that MCAG could generate high-quality assessments.}, urldate = {2020-11-29}, booktitle = {Findings of the {Association} for {Computational} {Linguistics}: {EMNLP} 2020}, publisher = {Association for Computational Linguistics}, author = {Yang, Zhichao and Yu, Hong}, month = nov, year = {2020}, pmcid = {PMC7821471}, pmid = {33491009}, note = {NIHMSID: NIHMS1658452}, pages = {3764--3773}, }
One of the fundamental goals of artificial intelligence is to build computer-based expert systems. Inferring clinical diagnoses to generate a clinical assessment during a patient encounter is a crucial step towards building a medical diagnostic system. Previous works were mainly based on either medical domain-specific knowledge, or patients' prior diagnoses and clinical encounters. In this paper, we propose a novel model for automated clinical assessment generation (MCAG). MCAG is built on an innovative graph neural network, where rich clinical knowledge is incorporated into an end-to-end corpus-learning system. Our evaluation results against physician generated gold standard show that MCAG significantly improves the BLEU and rouge score compared with competitive baseline models. Further, physicians' evaluation showed that MCAG could generate high-quality assessments.
Neural Data-to-Text Generation with Dynamic Content Planning.
Chen, K.; Li, F.; Hu, B.; Peng, W.; Chen, Q.; and Yu, H.
arXiv:2004.07426 [cs]. April 2020.
arXiv: 2004.07426
Paper
link
bibtex
abstract
@article{chen_neural_2020-1, title = {Neural {Data}-to-{Text} {Generation} with {Dynamic} {Content} {Planning}}, url = {http://arxiv.org/abs/2004.07426}, abstract = {Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.}, urldate = {2020-12-29}, journal = {arXiv:2004.07426 [cs]}, author = {Chen, Kai and Li, Fayuan and Hu, Baotian and Peng, Weihua and Chen, Qingcai and Yu, Hong}, month = apr, year = {2020}, note = {arXiv: 2004.07426}, keywords = {Computer Science - Computation and Language}, }
Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.
BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.
Jin, Y; Li, F; and Yu, H
In AMIA Fall Symposium, 2020.
link bibtex
link bibtex
@inproceedings{jin_bento_2020-1, title = {{BENTO}: {A} {Visual} {Platform} for {Building} {Clinical} {NLP} {Pipelines} {Based} on {CodaLab}.}, booktitle = {{AMIA} {Fall} {Symposium}}, author = {Jin, Y and Li, F and Yu, H}, year = {2020}, }
Learning Latent Space Representations to Predict Patient Outcomes: Model Development and Validation.
Rongali, S.; Rose, A. J.; McManus, D. D.; Bajracharya, A. S.; Kapoor, A.; Granillo, E.; and Yu, H.
Journal of Medical Internet Research, 22(3): e16374. 2020.
Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada
Paper
doi
link
bibtex
abstract
@article{rongali_learning_2020, title = {Learning {Latent} {Space} {Representations} to {Predict} {Patient} {Outcomes}: {Model} {Development} and {Validation}}, volume = {22}, shorttitle = {Learning {Latent} {Space} {Representations} to {Predict} {Patient} {Outcomes}}, url = {https://www.jmir.org/2020/3/e16374/}, doi = {10.2196/16374}, abstract = {Background: Scalable and accurate health outcome prediction using electronic health record (EHR) data has gained much attention in research recently. Previous machine learning models mostly ignore relations between different types of clinical data (ie, laboratory components, International Classification of Diseases codes, and medications). Objective: This study aimed to model such relations and build predictive models using the EHR data from intensive care units. We developed innovative neural network models and compared them with the widely used logistic regression model and other state-of-the-art neural network models to predict the patient’s mortality using their longitudinal EHR data. Methods: We built a set of neural network models that we collectively called as long short-term memory (LSTM) outcome prediction using comprehensive feature relations or in short, CLOUT. Our CLOUT models use a correlational neural network model to identify a latent space representation between different types of discrete clinical features during a patient’s encounter and integrate the latent representation into an LSTM-based predictive model framework. In addition, we designed an ablation experiment to identify risk factors from our CLOUT models. Using physicians’ input as the gold standard, we compared the risk factors identified by both CLOUT and logistic regression models. Results: Experiments on the Medical Information Mart for Intensive Care-III dataset (selected patient population: 7537) show that CLOUT (area under the receiver operating characteristic curve=0.89) has surpassed logistic regression (0.82) and other baseline NN models (\<0.86). In addition, physicians’ agreement with the CLOUT-derived risk factor rankings was statistically significantly higher than the agreement with the logistic regression model. Conclusions: Our results support the applicability of CLOUT for real-world clinical use in identifying patients at high risk of mortality. Trial Registration: [J Med Internet Res 2020;22(3):e16374]}, language = {en}, number = {3}, urldate = {2020-04-07}, journal = {Journal of Medical Internet Research}, author = {Rongali, Subendhu and Rose, Adam J. and McManus, David D. and Bajracharya, Adarsha S. and Kapoor, Alok and Granillo, Edgard and Yu, Hong}, year = {2020}, pmid = {32202503 PMCID: PMC7136840}, note = {Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada}, pages = {e16374}, }
Background: Scalable and accurate health outcome prediction using electronic health record (EHR) data has gained much attention in research recently. Previous machine learning models mostly ignore relations between different types of clinical data (ie, laboratory components, International Classification of Diseases codes, and medications). Objective: This study aimed to model such relations and build predictive models using the EHR data from intensive care units. We developed innovative neural network models and compared them with the widely used logistic regression model and other state-of-the-art neural network models to predict the patient’s mortality using their longitudinal EHR data. Methods: We built a set of neural network models that we collectively called as long short-term memory (LSTM) outcome prediction using comprehensive feature relations or in short, CLOUT. Our CLOUT models use a correlational neural network model to identify a latent space representation between different types of discrete clinical features during a patient’s encounter and integrate the latent representation into an LSTM-based predictive model framework. In addition, we designed an ablation experiment to identify risk factors from our CLOUT models. Using physicians’ input as the gold standard, we compared the risk factors identified by both CLOUT and logistic regression models. Results: Experiments on the Medical Information Mart for Intensive Care-III dataset (selected patient population: 7537) show that CLOUT (area under the receiver operating characteristic curve=0.89) has surpassed logistic regression (0.82) and other baseline NN models (<0.86). In addition, physicians’ agreement with the CLOUT-derived risk factor rankings was statistically significantly higher than the agreement with the logistic regression model. Conclusions: Our results support the applicability of CLOUT for real-world clinical use in identifying patients at high risk of mortality. Trial Registration: [J Med Internet Res 2020;22(3):e16374]
2019
(18)
Improving electronic health record note comprehension with noteaid: randomized trial of electronic health record note comprehension interventions with crowdsourced workers.
Lalor, J. P.; Woolf, B.; and Yu, H.
Journal of Medical Internet Research, 21(1): e10793. 2019.
Paper
doi
link
bibtex
abstract
@article{lalor_improving_2019, title = {Improving electronic health record note comprehension with noteaid: randomized trial of electronic health record note comprehension interventions with crowdsourced workers}, volume = {21}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Improving electronic health record note comprehension with noteaid}, url = {https://www.jmir.org/2019/1/e10793/}, doi = {10.2196/jmir.10793}, abstract = {Background: Patient portals are becoming more common, and with them, the ability of patients to access their personal electronic health records (EHRs). EHRs, in particular the free-text EHR notes, often contain medical jargon and terms that are difficult for laypersons to understand. There are many Web-based resources for learning more about particular diseases or conditions, including systems that directly link to lay definitions or educational materials for medical concepts. Objective: Our goal is to determine whether use of one such tool, NoteAid, leads to higher EHR note comprehension ability. We use a new EHR note comprehension assessment tool instead of patient self-reported scores. Methods: In this work, we compare a passive, self-service educational resource (MedlinePlus) with an active resource (NoteAid) where definitions are provided to the user for medical concepts that the system identifies. We use Amazon Mechanical Turk (AMT) to recruit individuals to complete ComprehENotes, a new test of EHR note comprehension. Results: Mean scores for individuals with access to NoteAid are significantly higher than the mean baseline scores, both for raw scores (P=.008) and estimated ability (P=.02). Conclusions: In our experiments, we show that the active intervention leads to significantly higher scores on the comprehension test as compared with a baseline group with no resources provided. In contrast, there is no significant difference between the group that was provided with the passive intervention and the baseline group. Finally, we analyze the demographics of the individuals who participated in our AMT task and show differences between groups that align with the current understanding of health literacy between populations. This is the first work to show improvements in comprehension using tools such as NoteAid as measured by an EHR note comprehension assessment tool as opposed to patient self-reported scores. [J Med Internet Res 2019;21(1):e10793]}, language = {en}, number = {1}, urldate = {2019-01-31}, journal = {Journal of Medical Internet Research}, author = {Lalor, John P. and Woolf, Beverly and Yu, Hong}, year = {2019}, pmid = {30664453 PMCID: 6351990}, pages = {e10793}, }
Background: Patient portals are becoming more common, and with them, the ability of patients to access their personal electronic health records (EHRs). EHRs, in particular the free-text EHR notes, often contain medical jargon and terms that are difficult for laypersons to understand. There are many Web-based resources for learning more about particular diseases or conditions, including systems that directly link to lay definitions or educational materials for medical concepts. Objective: Our goal is to determine whether use of one such tool, NoteAid, leads to higher EHR note comprehension ability. We use a new EHR note comprehension assessment tool instead of patient self-reported scores. Methods: In this work, we compare a passive, self-service educational resource (MedlinePlus) with an active resource (NoteAid) where definitions are provided to the user for medical concepts that the system identifies. We use Amazon Mechanical Turk (AMT) to recruit individuals to complete ComprehENotes, a new test of EHR note comprehension. Results: Mean scores for individuals with access to NoteAid are significantly higher than the mean baseline scores, both for raw scores (P=.008) and estimated ability (P=.02). Conclusions: In our experiments, we show that the active intervention leads to significantly higher scores on the comprehension test as compared with a baseline group with no resources provided. In contrast, there is no significant difference between the group that was provided with the passive intervention and the baseline group. Finally, we analyze the demographics of the individuals who participated in our AMT task and show differences between groups that align with the current understanding of health literacy between populations. This is the first work to show improvements in comprehension using tools such as NoteAid as measured by an EHR note comprehension assessment tool as opposed to patient self-reported scores. [J Med Internet Res 2019;21(1):e10793]
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.
Li, F.; Jin, Y.; Liu, W.; Rawat, B. P. S.; Cai, P.; and Yu, H.
JMIR Medical Informatics, 7(3): e14830. September 2019.
Paper
doi
link
bibtex
@article{li_fine-tuning_2019, title = {Fine-{Tuning} {Bidirectional} {Encoder} {Representations} {From} {Transformers} ({BERT})–{Based} {Models} on {Large}-{Scale} {Electronic} {Health} {Record} {Notes}: {An} {Empirical} {Study}}, volume = {7}, issn = {2291-9694}, shorttitle = {Fine-{Tuning} {Bidirectional} {Encoder} {Representations} {From} {Transformers} ({BERT})–{Based} {Models} on {Large}-{Scale} {Electronic} {Health} {Record} {Notes}}, url = {http://medinform.jmir.org/2019/3/e14830/}, doi = {10.2196/14830}, language = {en}, number = {3}, urldate = {2019-10-07}, journal = {JMIR Medical Informatics}, author = {Li, Fei and Jin, Yonghao and Liu, Weisong and Rawat, Bhanu Pratap Singh and Cai, Pengshan and Yu, Hong}, month = sep, year = {2019}, pmid = {31516126 PMCID: PMC6746103}, pages = {e14830}, }
Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance.
Chen, J.; Lalor, J.; Liu, W.; Druhl, E.; Granillo, E.; Vimalananda, V. G; and Yu, H.
Journal of Medical Internet Research, 21(3). March 2019.
Paper
doi
link
bibtex
abstract
@article{chen_detecting_2019, title = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}: {Using} {Cost}-{Sensitive} {Learning} and {Oversampling} to {Reduce} {Data} {Imbalance}}, volume = {21}, issn = {1439-4456}, shorttitle = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431826/}, doi = {10.2196/11990}, abstract = {Background Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80\%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.}, number = {3}, urldate = {2019-12-29}, journal = {Journal of Medical Internet Research}, author = {Chen, Jinying and Lalor, John and Liu, Weisong and Druhl, Emily and Granillo, Edgard and Vimalananda, Varsha G and Yu, Hong}, month = mar, year = {2019}, pmid = {30855231 PMCID: PMC6431826}, }
Background Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.
Automatic Detection of Hypoglycemic Events From the Electronic Health Record Notes of Diabetes Patients: Empirical Study.
Jin, Y.; Li, F.; Vimalananda, V. G.; and Yu, H.
JMIR Medical Informatics, 7(4): e14340. 2019.
Paper
doi
link
bibtex
abstract
@article{jin_automatic_2019, title = {Automatic {Detection} of {Hypoglycemic} {Events} {From} the {Electronic} {Health} {Record} {Notes} of {Diabetes} {Patients}: {Empirical} {Study}}, volume = {7}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Automatic {Detection} of {Hypoglycemic} {Events} {From} the {Electronic} {Health} {Record} {Notes} of {Diabetes} {Patients}}, url = {https://medinform.jmir.org/2019/4/e14340/}, doi = {10.2196/14340}, abstract = {Background: Hypoglycemic events are common and potentially dangerous conditions among patients being treated for diabetes. Automatic detection of such events could improve patient care and is valuable in population studies. Electronic health records (EHRs) are valuable resources for the detection of such events. Objective: In this study, we aim to develop a deep-learning–based natural language processing (NLP) system to automatically detect hypoglycemic events from EHR notes. Our model is called the High-Performing System for Automatically Detecting Hypoglycemic Events (HYPE). Methods: Domain experts reviewed 500 EHR notes of diabetes patients to determine whether each sentence contained a hypoglycemic event or not. We used this annotated corpus to train and evaluate HYPE, the high-performance NLP system for hypoglycemia detection. We built and evaluated both a classical machine learning model (ie, support vector machines [SVMs]) and state-of-the-art neural network models. Results: We found that neural network models outperformed the SVM model. The convolutional neural network (CNN) model yielded the highest performance in a 10-fold cross-validation setting: mean precision=0.96 (SD 0.03), mean recall=0.86 (SD 0.03), and mean F1=0.91 (SD 0.03). Conclusions: Despite the challenges posed by small and highly imbalanced data, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE can be used for EHR-based hypoglycemia surveillance and population studies in diabetes patients. [JMIR Med Inform 2019;7(4):e14340]}, language = {en}, number = {4}, urldate = {2019-11-10}, journal = {JMIR Medical Informatics}, author = {Jin, Yonghao and Li, Fei and Vimalananda, Varsha G. and Yu, Hong}, year = {2019}, pmid = {31702562 PMCID: PMC6913754}, keywords = {adverse events, convolutional neural networks, hypoglycemia, natural language processing}, pages = {e14340}, }
Background: Hypoglycemic events are common and potentially dangerous conditions among patients being treated for diabetes. Automatic detection of such events could improve patient care and is valuable in population studies. Electronic health records (EHRs) are valuable resources for the detection of such events. Objective: In this study, we aim to develop a deep-learning–based natural language processing (NLP) system to automatically detect hypoglycemic events from EHR notes. Our model is called the High-Performing System for Automatically Detecting Hypoglycemic Events (HYPE). Methods: Domain experts reviewed 500 EHR notes of diabetes patients to determine whether each sentence contained a hypoglycemic event or not. We used this annotated corpus to train and evaluate HYPE, the high-performance NLP system for hypoglycemia detection. We built and evaluated both a classical machine learning model (ie, support vector machines [SVMs]) and state-of-the-art neural network models. Results: We found that neural network models outperformed the SVM model. The convolutional neural network (CNN) model yielded the highest performance in a 10-fold cross-validation setting: mean precision=0.96 (SD 0.03), mean recall=0.86 (SD 0.03), and mean F1=0.91 (SD 0.03). Conclusions: Despite the challenges posed by small and highly imbalanced data, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE can be used for EHR-based hypoglycemia surveillance and population studies in diabetes patients. [JMIR Med Inform 2019;7(4):e14340]
Learning to detect and understand drug discontinuation events from clinical narratives.
Liu, F.; Pradhan, R.; Druhl, E.; Freund, E.; Liu, W.; Sauer, B. C.; Cunningham, F.; Gordon, A. J.; Peters, C. B.; and Yu, H.
Journal of the American Medical Informatics Association, 26(10): 943–951. October 2019.
Paper
doi
link
bibtex
abstract
@article{liu_learning_2019, title = {Learning to detect and understand drug discontinuation events from clinical narratives}, volume = {26}, url = {https://academic.oup.com/jamia/article/26/10/943/5481540}, doi = {10.1093/jamia/ocz048}, abstract = {AbstractObjective. Identifying drug discontinuation (DDC) events and understanding their reasons are important for medication management and drug safety survei}, language = {en}, number = {10}, urldate = {2019-12-29}, journal = {Journal of the American Medical Informatics Association}, author = {Liu, Feifan and Pradhan, Richeek and Druhl, Emily and Freund, Elaine and Liu, Weisong and Sauer, Brian C. and Cunningham, Fran and Gordon, Adam J. and Peters, Celena B. and Yu, Hong}, month = oct, year = {2019}, pmid = {31034028 PMCID: PMC6748801}, pages = {943--951}, }
AbstractObjective. Identifying drug discontinuation (DDC) events and understanding their reasons are important for medication management and drug safety survei
Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).
Jagannatha, A.; Liu, F.; Liu, W.; and Yu, H.
Drug Safety, (1): 99–111. January 2019.
doi link bibtex abstract
doi link bibtex abstract
@article{jagannatha_overview_2019, title = {Overview of the {First} {Natural} {Language} {Processing} {Challenge} for {Extracting} {Medication}, {Indication}, and {Adverse} {Drug} {Events} from {Electronic} {Health} {Record} {Notes} ({MADE} 1.0)}, issn = {1179-1942}, doi = {10.1007/s40264-018-0762-z}, abstract = {INTRODUCTION: This work describes the Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) corpus and provides an overview of the MADE 1.0 2018 challenge for extracting medication, indication, and adverse drug events (ADEs) from electronic health record (EHR) notes. OBJECTIVE: The goal of MADE is to provide a set of common evaluation tasks to assess the state of the art for natural language processing (NLP) systems applied to EHRs supporting drug safety surveillance and pharmacovigilance. We also provide benchmarks on the MADE dataset using the system submissions received in the MADE 2018 challenge. METHODS: The MADE 1.0 challenge has released an expert-annotated cohort of medication and ADE information comprising 1089 fully de-identified longitudinal EHR notes from 21 randomly selected patients with cancer at the University of Massachusetts Memorial Hospital. Using this cohort as a benchmark, the MADE 1.0 challenge designed three shared NLP tasks. The named entity recognition (NER) task identifies medications and their attributes (dosage, route, duration, and frequency), indications, ADEs, and severity. The relation identification (RI) task identifies relations between the named entities: medication-indication, medication-ADE, and attribute relations. The third shared task (NER-RI) evaluates NLP models that perform the NER and RI tasks jointly. In total, 11 teams from four countries participated in at least one of the three shared tasks, and 41 system submissions were received in total. RESULTS: The best systems F1 scores for NER, RI, and NER-RI were 0.82, 0.86, and 0.61, respectively. Ensemble classifiers using the team submissions improved the performance further, with an F1 score of 0.85, 0.87, and 0.66 for the three tasks, respectively. CONCLUSION: MADE results show that recent progress in NLP has led to remarkable improvements in NER and RI tasks for the clinical domain. However, some room for improvement remains, particularly in the NER-RI task.}, language = {eng}, number = {1}, journal = {Drug Safety}, author = {Jagannatha, Abhyuday and Liu, Feifan and Liu, Weisong and Yu, Hong}, month = jan, year = {2019}, pmid = {30649735 PMCID: PMC6860017}, pages = {99--111}, }
INTRODUCTION: This work describes the Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) corpus and provides an overview of the MADE 1.0 2018 challenge for extracting medication, indication, and adverse drug events (ADEs) from electronic health record (EHR) notes. OBJECTIVE: The goal of MADE is to provide a set of common evaluation tasks to assess the state of the art for natural language processing (NLP) systems applied to EHRs supporting drug safety surveillance and pharmacovigilance. We also provide benchmarks on the MADE dataset using the system submissions received in the MADE 2018 challenge. METHODS: The MADE 1.0 challenge has released an expert-annotated cohort of medication and ADE information comprising 1089 fully de-identified longitudinal EHR notes from 21 randomly selected patients with cancer at the University of Massachusetts Memorial Hospital. Using this cohort as a benchmark, the MADE 1.0 challenge designed three shared NLP tasks. The named entity recognition (NER) task identifies medications and their attributes (dosage, route, duration, and frequency), indications, ADEs, and severity. The relation identification (RI) task identifies relations between the named entities: medication-indication, medication-ADE, and attribute relations. The third shared task (NER-RI) evaluates NLP models that perform the NER and RI tasks jointly. In total, 11 teams from four countries participated in at least one of the three shared tasks, and 41 system submissions were received in total. RESULTS: The best systems F1 scores for NER, RI, and NER-RI were 0.82, 0.86, and 0.61, respectively. Ensemble classifiers using the team submissions improved the performance further, with an F1 score of 0.85, 0.87, and 0.66 for the three tasks, respectively. CONCLUSION: MADE results show that recent progress in NLP has led to remarkable improvements in NER and RI tasks for the clinical domain. However, some room for improvement remains, particularly in the NER-RI task.
Naranjo Question Answering using End-to-End Multi-task Learning Model.
Rawat, B. P; Li, F.; and Yu, H.
25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD),2547–2555. 2019.
doi link bibtex abstract
doi link bibtex abstract
@article{rawat_naranjo_2019, title = {Naranjo {Question} {Answering} using {End}-to-{End} {Multi}-task {Learning} {Model}}, doi = {10.1145/3292500.3330770}, abstract = {In the clinical domain, it is important to understand whether an adverse drug reaction (ADR) is caused by a particular medication. Clinical judgement studies help judge the causal relation between a medication and its ADRs. In this study, we present the first attempt to automatically infer the causality between a drug and an ADR from electronic health records (EHRs) by answering the Naranjo questionnaire, the validated clinical question answering set used by domain experts for ADR causality assessment. Using physicians’ annotation as the gold standard, our proposed joint model, which uses multi-task learning to predict the answers of a subset of the Naranjo questionnaire, significantly outperforms the baseline pipeline model with a good margin, achieving a macro-weighted f-score between 0.3652 – 0.5271 and micro-weighted f-score between 0.9523 – 0.9918.}, journal = {25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)}, author = {Rawat, Bhanu P and Li, Fei and Yu, Hong}, year = {2019}, pmid = {31799022 NIHMSID: NIHMS1058295 PMCID:PMC6887102}, pages = {2547--2555}, }
In the clinical domain, it is important to understand whether an adverse drug reaction (ADR) is caused by a particular medication. Clinical judgement studies help judge the causal relation between a medication and its ADRs. In this study, we present the first attempt to automatically infer the causality between a drug and an ADR from electronic health records (EHRs) by answering the Naranjo questionnaire, the validated clinical question answering set used by domain experts for ADR causality assessment. Using physicians’ annotation as the gold standard, our proposed joint model, which uses multi-task learning to predict the answers of a subset of the Naranjo questionnaire, significantly outperforms the baseline pipeline model with a good margin, achieving a macro-weighted f-score between 0.3652 – 0.5271 and micro-weighted f-score between 0.9523 – 0.9918.
A neural abstractive summarization model guided with topic sentences. ICONIP.
Chen, C.; Hu, B.; Chen, Q.; and Yu, H.
In 2019.
link bibtex
link bibtex
@inproceedings{chen_neural_2019, title = {A neural abstractive summarization model guided with topic sentences. {ICONIP}}, author = {Chen, Chen and Hu, Baotian and Chen, Qingcai and Yu, Hong}, year = {2019}, }
An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models.
Li, F.; and Yu, H.
Journal of the American Medical Informatics Association, 26(7): 646–654. July 2019.
Paper
doi
link
bibtex
abstract
@article{li_investigation_2019, title = {An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models}, volume = {26}, url = {https://academic.oup.com/jamia/article/26/7/646/5426087}, doi = {10.1093/jamia/ocz018}, abstract = {AbstractObjective. We aim to evaluate the effectiveness of advanced deep learning models (eg, capsule network [CapNet], adversarial training [ADV]) for single-}, language = {en}, number = {7}, urldate = {2019-12-09}, journal = {Journal of the American Medical Informatics Association}, author = {Li, Fei and Yu, Hong}, month = jul, year = {2019}, pages = {646--654}, }
AbstractObjective. We aim to evaluate the effectiveness of advanced deep learning models (eg, capsule network [CapNet], adversarial training [ADV]) for single-
Anticoagulant prescribing for non-valvular atrial fibrillation in the Veterans Health Administration.
Rose, A.; Goldberg, R; McManus, D.; Kapoor, A; Wang, V; Liu, W; and Yu, H
Journal of the American Heart Association. 2019.
doi link bibtex abstract
doi link bibtex abstract
@article{rose_anticoagulant_2019, title = {Anticoagulant prescribing for non-valvular atrial fibrillation in the {Veterans} {Health} {Administration}}, doi = {10.1161/JAHA.119.012646}, abstract = {Background Direct acting oral anticoagulants (DOACs) theoretically could contribute to addressing underuse of anticoagulation in non-valvular atrial fibrillation (NVAF). Few studies have examined this prospect, however. The potential of DOACs to address underuse of anticoagulation in NVAF could be magnified within a healthcare system that sharply limits patients' exposure to out-of-pocket copayments, such as the Veterans Health Administration (VA). Methods and Results We used a clinical data set of all patients with NVAF treated within VA from 2007 to 2016 (n=987 373). We examined how the proportion of patients receiving any anticoagulation, and which agent was prescribed, changed over time. When first approved for VA use in 2011, DOACs constituted a tiny proportion of all prescriptions for anticoagulants (2\%); by 2016, this proportion had increased to 45\% of all prescriptions and 67\% of new prescriptions. Patient characteristics associated with receiving a DOAC, rather than warfarin, included white race, better kidney function, fewer comorbid conditions overall, and no history of stroke or bleeding. In 2007, before the introduction of DOACs, 56\% of VA patients with NVAF were receiving anticoagulation; this dipped to 44\% in 2012 just after the introduction of DOACs and had risen back to 51\% by 2016. Conclusions These results do not suggest that the availability of DOACs has led to an increased proportion of patients with NVAF receiving anticoagulation, even in the context of a healthcare system that sharply limits patients' exposure to out-of-pocket copayments.}, journal = {Journal of the American Heart Association}, author = {Rose, AJ and Goldberg, R and McManus, DD and Kapoor, A and Wang, V and Liu, W and Yu, H}, year = {2019}, pmid = {31441364 PMCID:PMC6755851}, }
Background Direct acting oral anticoagulants (DOACs) theoretically could contribute to addressing underuse of anticoagulation in non-valvular atrial fibrillation (NVAF). Few studies have examined this prospect, however. The potential of DOACs to address underuse of anticoagulation in NVAF could be magnified within a healthcare system that sharply limits patients' exposure to out-of-pocket copayments, such as the Veterans Health Administration (VA). Methods and Results We used a clinical data set of all patients with NVAF treated within VA from 2007 to 2016 (n=987 373). We examined how the proportion of patients receiving any anticoagulation, and which agent was prescribed, changed over time. When first approved for VA use in 2011, DOACs constituted a tiny proportion of all prescriptions for anticoagulants (2%); by 2016, this proportion had increased to 45% of all prescriptions and 67% of new prescriptions. Patient characteristics associated with receiving a DOAC, rather than warfarin, included white race, better kidney function, fewer comorbid conditions overall, and no history of stroke or bleeding. In 2007, before the introduction of DOACs, 56% of VA patients with NVAF were receiving anticoagulation; this dipped to 44% in 2012 just after the introduction of DOACs and had risen back to 51% by 2016. Conclusions These results do not suggest that the availability of DOACs has led to an increased proportion of patients with NVAF receiving anticoagulation, even in the context of a healthcare system that sharply limits patients' exposure to out-of-pocket copayments.
Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds.
Lalor, J. P.; Wu, H.; and Yu, H.
In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4240–4250, Hong Kong, China, November 2019. Association for Computational Linguistics
NIHMSID: NIHMS1059054
Paper
doi
link
bibtex
abstract
@inproceedings{lalor_learning_2019, address = {Hong Kong, China}, title = {Learning {Latent} {Parameters} without {Human} {Response} {Patterns}: {Item} {Response} {Theory} with {Artificial} {Crowds}}, shorttitle = {Learning {Latent} {Parameters} without {Human} {Response} {Patterns}}, url = {https://www.aclweb.org/anthology/D19-1434}, doi = {10.18653/v1/D19-1434}, abstract = {Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.}, urldate = {2019-11-11}, booktitle = {Proceedings of the 2019 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing} and the 9th {International} {Joint} {Conference} on {Natural} {Language} {Processing} ({EMNLP}-{IJCNLP})}, publisher = {Association for Computational Linguistics}, author = {Lalor, John P. and Wu, Hao and Yu, Hong}, month = nov, year = {2019}, pmcid = {PMC6892593}, pmid = {31803865}, note = {NIHMSID: NIHMS1059054}, pages = {4240--4250}, }
Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.
Clinical Question Answering from Electronic Health Records. In the MLHC 2019 research track proceedings.
Singh, B.; Li, F.; and Yu, H.
In The MLHC 2019 research track proceedings, 2019.
Paper
link
bibtex
@inproceedings{singh_clinical_2019, title = {Clinical {Question} {Answering} from {Electronic} {Health} {Records}. {In} the {MLHC} 2019 research track proceedings}, url = {https://static1.squarespace.com/static/59d5ac1780bd5ef9c396eda6/t/5d472f54d73cd5000124d13c/1564946262055/Rawat.pdf}, booktitle = {The {MLHC} 2019 research track proceedings}, author = {Singh, Bhanu and Li, Fei and Yu, Hong}, year = {2019}, }
Comparing Human and DNN-Ensemble Response Patterns for Item Response Theory Model Fitting.
Lalor, J.; Wu, H.; and Yu, H.
2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)The Workshop on Cognitive Modeling and Computational Linguistics (CMCL). 2019.
Paper
link
bibtex
@article{lalor_comparing_2019, title = {Comparing {Human} and {DNN}-{Ensemble} {Response} {Patterns} for {Item} {Response} {Theory} {Model} {Fitting}}, url = {http://jplalor.github.io/pdfs/cmcl19_irt.pdf}, journal = {2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)The Workshop on Cognitive Modeling and Computational Linguistics (CMCL)}, author = {Lalor, John and Wu, Hao and Yu, Hong}, year = {2019}, }
QuikLitE, a Framework for Quick Literacy Evaluation in Medicine: Development and Validation.
Zheng, J.; and Yu, H.
Journal of Medical Internet Research, 21(2): e12525. 2019.
Paper
doi
link
bibtex
abstract
@article{zheng_quiklite_2019, title = {{QuikLitE}, a {Framework} for {Quick} {Literacy} {Evaluation} in {Medicine}: {Development} and {Validation}}, volume = {21}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {{QuikLitE}, a {Framework} for {Quick} {Literacy} {Evaluation} in {Medicine}}, url = {https://www.jmir.org/2019/2/e12525/}, doi = {10.2196/jmir.12525}, abstract = {Background: A plethora of health literacy instruments was developed over the decades. They usually start with experts curating passages of text or word lists, followed by psychometric validation and revision based on test results obtained from a sample population. This process is costly and it is difficult to customize for new usage scenarios. Objective: This study aimed to develop and evaluate a framework for dynamically creating test instruments that can provide a focused assessment of patients’ health literacy. Methods: A health literacy framework and scoring method were extended from the vocabulary knowledge test to accommodate a wide range of item difficulties and various degrees of uncertainty in the participant’s answer. Web-based tests from Amazon Mechanical Turk users were used to assess reliability and validity. Results: Parallel forms of our tests showed high reliability (correlation=.78; 95\% CI 0.69-0.85). Validity measured as correlation with an electronic health record comprehension instrument was higher (.47-.61 among 3 groups) than 2 existing tools (Short Assessment of Health Literacy-English, .38-.43; Short Test of Functional Health Literacy in Adults, .34-.46). Our framework is able to distinguish higher literacy levels that are often not measured by other instruments. It is also flexible, allowing customizations to the test the designer’s focus on a particular interest in a subject matter or domain. The framework is among the fastest health literacy instrument to administer. Conclusions: We proposed a valid and highly reliable framework to dynamically create health literacy instruments, alleviating the need to repeat a time-consuming process when a new use scenario arises. This framework can be customized to a specific need on demand and can measure skills beyond the basic level. [J Med Internet Res 2019;21(2):e12525]}, language = {en}, number = {2}, urldate = {2019-02-22}, journal = {Journal of Medical Internet Research}, author = {Zheng, Jiaping and Yu, Hong}, year = {2019}, pmid = {30794206 PMCID: 6406229}, pages = {e12525}, }
Background: A plethora of health literacy instruments was developed over the decades. They usually start with experts curating passages of text or word lists, followed by psychometric validation and revision based on test results obtained from a sample population. This process is costly and it is difficult to customize for new usage scenarios. Objective: This study aimed to develop and evaluate a framework for dynamically creating test instruments that can provide a focused assessment of patients’ health literacy. Methods: A health literacy framework and scoring method were extended from the vocabulary knowledge test to accommodate a wide range of item difficulties and various degrees of uncertainty in the participant’s answer. Web-based tests from Amazon Mechanical Turk users were used to assess reliability and validity. Results: Parallel forms of our tests showed high reliability (correlation=.78; 95% CI 0.69-0.85). Validity measured as correlation with an electronic health record comprehension instrument was higher (.47-.61 among 3 groups) than 2 existing tools (Short Assessment of Health Literacy-English, .38-.43; Short Test of Functional Health Literacy in Adults, .34-.46). Our framework is able to distinguish higher literacy levels that are often not measured by other instruments. It is also flexible, allowing customizations to the test the designer’s focus on a particular interest in a subject matter or domain. The framework is among the fastest health literacy instrument to administer. Conclusions: We proposed a valid and highly reliable framework to dynamically create health literacy instruments, alleviating the need to repeat a time-consuming process when a new use scenario arises. This framework can be customized to a specific need on demand and can measure skills beyond the basic level. [J Med Internet Res 2019;21(2):e12525]
Towards Drug Safety Surveillance and Pharmacovigilance: Current Progress in Detecting Medication and Adverse Drug Events from Electronic Health Records.
Liu, F.; Jagannatha, A.; and Yu, H.
Drug Safety. January 2019.
Paper
doi
link
bibtex
@article{liu_towards_2019, title = {Towards {Drug} {Safety} {Surveillance} and {Pharmacovigilance}: {Current} {Progress} in {Detecting} {Medication} and {Adverse} {Drug} {Events} from {Electronic} {Health} {Records}}, issn = {1179-1942}, shorttitle = {Towards {Drug} {Safety} {Surveillance} and {Pharmacovigilance}}, url = {https://doi.org/10.1007/s40264-018-0766-8}, doi = {10.1007/s40264-018-0766-8}, language = {en}, urldate = {2019-01-31}, journal = {Drug Safety}, author = {Liu, Feifan and Jagannatha, Abhyuday and Yu, Hong}, month = jan, year = {2019}, pmid = {30649734}, }
Generating Classical Chinese Poems from Vernacular Chinese.
Yang, Z.; Cai, P.; Feng, Y.; Li, F.; Feng, W.; Chiu, E. S.; and yu , h.
In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6156–6165, Hong Kong, China, November 2019. Association for Computational Linguistics
Paper
doi
link
bibtex
abstract
@inproceedings{yang_generating_2019, address = {Hong Kong, China}, title = {Generating {Classical} {Chinese} {Poems} from {Vernacular} {Chinese}}, url = {https://www.aclweb.org/anthology/D19-1637}, doi = {10.18653/v1/D19-1637}, abstract = {Classical Chinese poetry is a jewel in the treasure house of Chinese culture. Previous poem generation models only allow users to employ keywords to interfere the meaning of generated poems, leaving the dominion of generation to the model. In this paper, we propose a novel task of generating classical Chinese poems from vernacular, which allows users to have more control over the semantic of generated poems. We adapt the approach of unsupervised machine translation (UMT) to our task. We use segmentation-based padding and reinforcement learning to address under-translation and over-translation respectively. According to experiments, our approach significantly improve the perplexity and BLEU compared with typical UMT models. Furthermore, we explored guidelines on how to write the input vernacular to generate better poems. Human evaluation showed our approach can generate high-quality poems which are comparable to amateur poems.}, urldate = {2019-11-11}, booktitle = {Proceedings of the 2019 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing} and the 9th {International} {Joint} {Conference} on {Natural} {Language} {Processing} ({EMNLP}-{IJCNLP})}, publisher = {Association for Computational Linguistics}, author = {Yang, Zhichao and Cai, Pengshan and Feng, Yansong and Li, Fei and Feng, Weijiang and Chiu, Elena Suet-Ying and yu, hong}, month = nov, year = {2019}, pages = {6156--6165}, }
Classical Chinese poetry is a jewel in the treasure house of Chinese culture. Previous poem generation models only allow users to employ keywords to interfere the meaning of generated poems, leaving the dominion of generation to the model. In this paper, we propose a novel task of generating classical Chinese poems from vernacular, which allows users to have more control over the semantic of generated poems. We adapt the approach of unsupervised machine translation (UMT) to our task. We use segmentation-based padding and reinforcement learning to address under-translation and over-translation respectively. According to experiments, our approach significantly improve the perplexity and BLEU compared with typical UMT models. Furthermore, we explored guidelines on how to write the input vernacular to generate better poems. Human evaluation showed our approach can generate high-quality poems which are comparable to amateur poems.
Method for Meta-Level Continual Learning.
Yu, H.; and Munkhdalai, T.
January 2019.
Paper
link
bibtex
abstract
@patent{yu_method_2019, title = {Method for {Meta}-{Level} {Continual} {Learning}}, url = {https://patents.google.com/patent/US20190034798A1/en}, abstract = {Classification of an input task data set by meta level continual learning includes analyzing first and second training data sets in a task space to generate first and second meta weights and a slow weight value, and comparing an input task data set to the slow weight to generate a fast weight. The first and second meta weights are parameterized with the fast weight value to update the slow weight value, whereby a value is associated with the input task data set, thereby classifying the input task data set by meta level continual learning.}, nationality = {US}, assignee = {University Of Massachusetts Medical School}, number = {US20190034798A1}, urldate = {2019-04-10}, author = {Yu, Hong and Munkhdalai, Tsendsuren}, month = jan, year = {2019}, keywords = {loss, meta, slow, task, weight}, }
Classification of an input task data set by meta level continual learning includes analyzing first and second training data sets in a task space to generate first and second meta weights and a slow weight value, and comparing an input task data set to the slow weight to generate a fast weight. The first and second meta weights are parameterized with the fast weight value to update the slow weight value, whereby a value is associated with the input task data set, thereby classifying the input task data set by meta level continual learning.
Advancing Clinical Research Through Natural Language Processing on Electronic Health Records: Traditional Machine Learning Meets Deep Learning.
Liu, F.; Weng, C.; and Yu, H.
In Richesson, R. L.; and Andrews, J. E., editor(s), Clinical Research Informatics, of Health Informatics, pages 357–378. Springer International Publishing, Cham, 2019.
Paper
doi
link
bibtex
abstract
@incollection{liu_advancing_2019, address = {Cham}, series = {Health {Informatics}}, title = {Advancing {Clinical} {Research} {Through} {Natural} {Language} {Processing} on {Electronic} {Health} {Records}: {Traditional} {Machine} {Learning} {Meets} {Deep} {Learning}}, isbn = {978-3-319-98779-8}, shorttitle = {Advancing {Clinical} {Research} {Through} {Natural} {Language} {Processing} on {Electronic} {Health} {Records}}, url = {https://doi.org/10.1007/978-3-319-98779-8_17}, abstract = {Electronic health records (EHR) capture “real-world” disease and care processes and hence offer richer and more generalizable data for comparative effectiveness research than traditional randomized clinical trial studies. With the increasingly broadening adoption of EHR worldwide, there is a growing need to widen the use of EHR data to support clinical research. A big barrier to this goal is that much of the information in EHR is still narrative. This chapter describes the foundation of biomedical language processing and explains how traditional machine learning and the state-of-the-art deep learning techniques can be employed in the context of extracting and transforming narrative information in EHR to support clinical research.}, language = {en}, urldate = {2019-04-09}, booktitle = {Clinical {Research} {Informatics}}, publisher = {Springer International Publishing}, author = {Liu, Feifan and Weng, Chunhua and Yu, Hong}, editor = {Richesson, Rachel L. and Andrews, James E.}, year = {2019}, doi = {10.1007/978-3-319-98779-8_17}, keywords = {Biomedical natural language processing, Clinical research, Deep learning, Electronic health records, Machine learning, Rule-based approach}, pages = {357--378}, }
Electronic health records (EHR) capture “real-world” disease and care processes and hence offer richer and more generalizable data for comparative effectiveness research than traditional randomized clinical trial studies. With the increasingly broadening adoption of EHR worldwide, there is a growing need to widen the use of EHR data to support clinical research. A big barrier to this goal is that much of the information in EHR is still narrative. This chapter describes the foundation of biomedical language processing and explains how traditional machine learning and the state-of-the-art deep learning techniques can be employed in the context of extracting and transforming narrative information in EHR to support clinical research.
2018
(13)
A natural language processing system that links medical terms in electronic health record notes to lay definitions: system development using physician reviews.
Chen, J.; Druhl, E.; Polepalli Ramesh, B.; Houston, T. K.; Brandt, C. A.; Zulman, D. M.; Vimalananda, V. G.; Malkani, S.; and Yu, H.
Journal of Medical Internet Research, 20(1): e26. January 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{chen_natural_2018, title = {A natural language processing system that links medical terms in electronic health record notes to lay definitions: system development using physician reviews}, volume = {20}, issn = {1438-8871}, shorttitle = {A natural language processing system that links medical terms in electronic health record notes to lay definitions}, doi = {10.2196/jmir.8669}, abstract = {BACKGROUND: Many health care systems now allow patients to access their electronic health record (EHR) notes online through patient portals. Medical jargon in EHR notes can confuse patients, which may interfere with potential benefits of patient access to EHR notes. OBJECTIVE: The aim of this study was to develop and evaluate the usability and content quality of NoteAid, a Web-based natural language processing system that links medical terms in EHR notes to lay definitions, that is, definitions easily understood by lay people. METHODS: NoteAid incorporates two core components: CoDeMed, a lexical resource of lay definitions for medical terms, and MedLink, a computational unit that links medical terms to lay definitions. We developed innovative computational methods, including an adapted distant supervision algorithm to prioritize medical terms important for EHR comprehension to facilitate the effort of building CoDeMed. Ten physician domain experts evaluated the user interface and content quality of NoteAid. The evaluation protocol included a cognitive walkthrough session and a postsession questionnaire. Physician feedback sessions were audio-recorded. We used standard content analysis methods to analyze qualitative data from these sessions. RESULTS: Physician feedback was mixed. Positive feedback on NoteAid included (1) Easy to use, (2) Good visual display, (3) Satisfactory system speed, and (4) Adequate lay definitions. Opportunities for improvement arising from evaluation sessions and feedback included (1) improving the display of definitions for partially matched terms, (2) including more medical terms in CoDeMed, (3) improving the handling of terms whose definitions vary depending on different contexts, and (4) standardizing the scope of definitions for medicines. On the basis of these results, we have improved NoteAid's user interface and a number of definitions, and added 4502 more definitions in CoDeMed. CONCLUSIONS: Physician evaluation yielded useful feedback for content validation and refinement of this innovative tool that has the potential to improve patient EHR comprehension and experience using patient portals. Future ongoing work will develop algorithms to handle ambiguous medical terms and test and evaluate NoteAid with patients.}, language = {eng}, number = {1}, journal = {Journal of Medical Internet Research}, author = {Chen, Jinying and Druhl, Emily and Polepalli Ramesh, Balaji and Houston, Thomas K. and Brandt, Cynthia A. and Zulman, Donna M. and Vimalananda, Varsha G. and Malkani, Samir and Yu, Hong}, month = jan, year = {2018}, pmid = {29358159 PMCID: PMC5799720}, keywords = {computer software, consumer health informatics, electronic health records, natural language processing, usability testing}, pages = {e26}, }
BACKGROUND: Many health care systems now allow patients to access their electronic health record (EHR) notes online through patient portals. Medical jargon in EHR notes can confuse patients, which may interfere with potential benefits of patient access to EHR notes. OBJECTIVE: The aim of this study was to develop and evaluate the usability and content quality of NoteAid, a Web-based natural language processing system that links medical terms in EHR notes to lay definitions, that is, definitions easily understood by lay people. METHODS: NoteAid incorporates two core components: CoDeMed, a lexical resource of lay definitions for medical terms, and MedLink, a computational unit that links medical terms to lay definitions. We developed innovative computational methods, including an adapted distant supervision algorithm to prioritize medical terms important for EHR comprehension to facilitate the effort of building CoDeMed. Ten physician domain experts evaluated the user interface and content quality of NoteAid. The evaluation protocol included a cognitive walkthrough session and a postsession questionnaire. Physician feedback sessions were audio-recorded. We used standard content analysis methods to analyze qualitative data from these sessions. RESULTS: Physician feedback was mixed. Positive feedback on NoteAid included (1) Easy to use, (2) Good visual display, (3) Satisfactory system speed, and (4) Adequate lay definitions. Opportunities for improvement arising from evaluation sessions and feedback included (1) improving the display of definitions for partially matched terms, (2) including more medical terms in CoDeMed, (3) improving the handling of terms whose definitions vary depending on different contexts, and (4) standardizing the scope of definitions for medicines. On the basis of these results, we have improved NoteAid's user interface and a number of definitions, and added 4502 more definitions in CoDeMed. CONCLUSIONS: Physician evaluation yielded useful feedback for content validation and refinement of this innovative tool that has the potential to improve patient EHR comprehension and experience using patient portals. Future ongoing work will develop algorithms to handle ambiguous medical terms and test and evaluate NoteAid with patients.
Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning.
Munkhdalai, T.; Liu, F.; and Yu, H.
JMIR public health and surveillance, 4(2): e29. April 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{munkhdalai_clinical_2018, title = {Clinical {Relation} {Extraction} {Toward} {Drug} {Safety} {Surveillance} {Using} {Electronic} {Health} {Record} {Narratives}: {Classical} {Learning} {Versus} {Deep} {Learning}}, volume = {4}, issn = {2369-2960}, shorttitle = {Clinical {Relation} {Extraction} {Toward} {Drug} {Safety} {Surveillance} {Using} {Electronic} {Health} {Record} {Narratives}}, doi = {10.2196/publichealth.9361}, abstract = {BACKGROUND: Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. OBJECTIVE: To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. METHODS: We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. RESULTS: Our results show that the SVM model achieved the best average F1-score of 89.1\% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72\%) as well as the rule induction baseline system (F1-score of 7.47\%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35\%. CONCLUSIONS: It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community.}, language = {eng}, number = {2}, journal = {JMIR public health and surveillance}, author = {Munkhdalai, Tsendsuren and Liu, Feifan and Yu, Hong}, month = apr, year = {2018}, pmid = {29695376 PMCID: PMC5943628}, keywords = {drug-related side effects and adverse reactions, electronic health records, medical informatics applications, natural language processing, neural networks}, pages = {e29}, }
BACKGROUND: Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. OBJECTIVE: To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. METHODS: We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. RESULTS: Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. CONCLUSIONS: It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community.
A hybrid Neural Network Model for Joint Prediction of Presence and Period Assertions of Medical Events in Clinical Notes.
Rumeng, L.; Abhyuday N, J.; and Hong, Y.
AMIA Annual Symposium Proceedings, 2017: 1149–1158. April 2018.
Paper
link
bibtex
abstract
@article{rumeng_hybrid_2018, title = {A hybrid {Neural} {Network} {Model} for {Joint} {Prediction} of {Presence} and {Period} {Assertions} of {Medical} {Events} in {Clinical} {Notes}}, volume = {2017}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977733/}, abstract = {In this paper, we propose a novel neural network architecture for clinical text mining. We formulate this hybrid neural network model (HNN), composed of recurrent neural network and deep residual network, to jointly predict the presence and period assertion values associated with medical events in clinical texts. We evaluate the effectiveness of our model on a corpus of expert-annotated longitudinal Electronic Health Records (EHR) notes from Cancer patients. Our experiments show that HNN improves the joint assertion classification accuracy as compared to conventional baselines.}, urldate = {2018-10-01}, journal = {AMIA Annual Symposium Proceedings}, author = {Rumeng, Li and Abhyuday N, Jagannatha and Hong, Yu}, month = apr, year = {2018}, pmid = {29854183}, pmcid = {PMC5977733}, pages = {1149--1158}, }
In this paper, we propose a novel neural network architecture for clinical text mining. We formulate this hybrid neural network model (HNN), composed of recurrent neural network and deep residual network, to jointly predict the presence and period assertion values associated with medical events in clinical texts. We evaluate the effectiveness of our model on a corpus of expert-annotated longitudinal Electronic Health Records (EHR) notes from Cancer patients. Our experiments show that HNN improves the joint assertion classification accuracy as compared to conventional baselines.
Assessing Readability of Medical Documents: A Ranking Approach.
Zheng, J.; and Yu, H
The Journal of Medical Internet Research Medical Informatics. March 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{zheng_assessing_2018, title = {Assessing {Readability} of {Medical} {Documents}: {A} {Ranking} {Approach}.}, doi = {DOI: 10.2196/medinform.8611}, abstract = {BACKGROUND: The use of electronic health record (EHR) systems with patient engagement capabilities, including viewing, downloading, and transmitting health information, has recently grown tremendously. However, using these resources to engage patients in managing their own health remains challenging due to the complex and technical nature of the EHR narratives. OBJECTIVE: Our objective was to develop a machine learning-based system to assess readability levels of complex documents such as EHR notes. METHODS: We collected difficulty ratings of EHR notes and Wikipedia articles using crowdsourcing from 90 readers. We built a supervised model to assess readability based on relative orders of text difficulty using both surface text features and word embeddings. We evaluated system performance using the Kendall coefficient of concordance against human ratings. RESULTS: Our system achieved significantly higher concordance (.734) with human annotators than did a baseline using the Flesch-Kincaid Grade Level, a widely adopted readability formula (.531). The improvement was also consistent across different disease topics. This method's concordance with an individual human user's ratings was also higher than the concordance between different human annotators (.658). CONCLUSIONS: We explored methods to automatically assess the readability levels of clinical narratives. Our ranking-based system using simple textual features and easy-to-learn word embeddings outperformed a widely used readability formula. Our ranking-based method can predict relative difficulties of medical documents. It is not constrained to a predefined set of readability levels, a common design in many machine learning-based systems. Furthermore, the feature set does not rely on complex processing of the documents. One potential application of our readability ranking is personalization, allowing patients to better accommodate their own background knowledge.}, journal = {The Journal of Medical Internet Research Medical Informatics}, author = {Zheng, JP and Yu, H}, month = mar, year = {2018}, pmid = {29572199 PMCID: PMC5889493}, }
BACKGROUND: The use of electronic health record (EHR) systems with patient engagement capabilities, including viewing, downloading, and transmitting health information, has recently grown tremendously. However, using these resources to engage patients in managing their own health remains challenging due to the complex and technical nature of the EHR narratives. OBJECTIVE: Our objective was to develop a machine learning-based system to assess readability levels of complex documents such as EHR notes. METHODS: We collected difficulty ratings of EHR notes and Wikipedia articles using crowdsourcing from 90 readers. We built a supervised model to assess readability based on relative orders of text difficulty using both surface text features and word embeddings. We evaluated system performance using the Kendall coefficient of concordance against human ratings. RESULTS: Our system achieved significantly higher concordance (.734) with human annotators than did a baseline using the Flesch-Kincaid Grade Level, a widely adopted readability formula (.531). The improvement was also consistent across different disease topics. This method's concordance with an individual human user's ratings was also higher than the concordance between different human annotators (.658). CONCLUSIONS: We explored methods to automatically assess the readability levels of clinical narratives. Our ranking-based system using simple textual features and easy-to-learn word embeddings outperformed a widely used readability formula. Our ranking-based method can predict relative difficulties of medical documents. It is not constrained to a predefined set of readability levels, a common design in many machine learning-based systems. Furthermore, the feature set does not rely on complex processing of the documents. One potential application of our readability ranking is personalization, allowing patients to better accommodate their own background knowledge.
Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study.
Lalor, J.; Wu, H.; Munkhdalai, T.; and Yu, H.
In EMNLP, 2018.
Paper
doi
link
bibtex
abstract
@inproceedings{lalor_understanding_2018, title = {Understanding {Deep} {Learning} {Performance} through an {Examination} of {Test} {Set} {Difficulty}: {A} {Psychometric} {Case} {Study}}, url = {https://arxiv.org/abs/1702.04811v3}, doi = {DOI: 10.18653/v1/D18-1500}, abstract = {Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.}, booktitle = {{EMNLP}}, author = {Lalor, John and Wu, Hao and Munkhdalai, Tsendsuren and Yu, Hong}, year = {2018}, }
Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.
Soft Label Memorization-Generalization for Natural Language Inference.
Lalor, J.; Wu, H.; and Yu, H.
In 2018.
Paper
link
bibtex
abstract
@inproceedings{lalor_soft_2018, title = {Soft {Label} {Memorization}-{Generalization} for {Natural} {Language} {Inference}.}, url = {https://arxiv.org/abs/1702.08563v3}, abstract = {Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03\% of training set size) we can improve generalization performance over several baselines.}, author = {Lalor, John and Wu, Hao and Yu, Hong}, year = {2018}, }
Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03% of training set size) we can improve generalization performance over several baselines.
Sentence Simplification with Memory-Augmented Neural Networks.
Vu, T.; Hu, B.; Munkhdalai, T.; and Yu, H.
In North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{vu_sentence_2018, title = {Sentence {Simplification} with {Memory}-{Augmented} {Neural} {Networks}}, doi = {DOI:10.18653/v1/N18-2013}, abstract = {Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.}, booktitle = {North {American} {Chapter} of the {Association} for {Computational} {Linguistics}: {Human} {Language} {Technologies}}, author = {Vu, Tu and Hu, Baotian and Munkhdalai, Tsendsuren and Yu, Hong}, year = {2018}, }
Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.
Recent Trends In Oral Anticoagulant Use and Post-Discharge Complications Among Atrial Fibrillation Patients With Acute Myocardial Infarction.
Amartya Kundu; Kevin O ’Day; Darleen M. Lessard; Joel M. Gore1; Steven A. Lubitz; Hong Yu; Mohammed W. Akhter; Daniel Z. Fisher; Robert M. Hayward Jr.; Nils Henninger; Jane S. Saczynski; Allan J. Walkey; Alok Kapoor; Jorge Yarzebski; Robert J. Goldberg; and David D. McManus
In 2018. Journal of Atrial Fibrillation
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{amartya_kundu_recent_2018, title = {Recent {Trends} {In} {Oral} {Anticoagulant} {Use} and {Post}-{Discharge} {Complications} {Among} {Atrial} {Fibrillation} {Patients} {With} {Acute} {Myocardial} {Infarction}}, doi = {DOI: 10.4022/jafib.1749}, abstract = {BACKGROUND: Atrial fibrillation (AF) is a common complication of acute myocardial infarction (AMI).The CHA2DS2VAScand CHADS2risk scoresare used to identifypatients with AF at risk for strokeand to guide oral anticoagulants (OAC) use, including patients with AMI. However, the epidemiology of AF, further stratifiedaccording to patients' risk of stroke, has not been wellcharacterized among those hospitalized for AMI. METHODS: We examined trends in the frequency of AF, rates of discharge OAC use, and post-discharge outcomes among 6,627 residents of the Worcester, Massachusetts area who survived hospitalization for AMI at 11 medical centers between 1997 and 2011. RESULTS: A total of 1,050AMI patients had AF (16\%) andthe majority (91\%)had a CHA2DS2VAScscore {\textgreater}2.AF rates were highest among patients in the highest stroke risk group.In comparison to patients without AF, patients with AMI and AF in the highest stroke risk category had higher rates of post-discharge complications, including higher 30-day re-hospitalization [27 \% vs. 17 \%], 30-day post-discharge death [10 \% vs. 5\%], and 1-year post-discharge death [46 \% vs. 18 \%] (p {\textless} 0.001 for all). Notably, fewerthan half of guideline-eligible AF patientsreceived an OACprescription at discharge. Usage rates for other evidence-based therapiessuch as statins and beta-blockers,lagged in comparison to AMI patients free from AF. CONCLUSIONS: Our findings highlight the need to enhance efforts towards stroke prevention among AMI survivors with AF.}, publisher = {Journal of Atrial Fibrillation}, author = {{Amartya Kundu} and {Kevin O ’Day} and {Darleen M. Lessard} and {Joel M. Gore1} and {Steven A. Lubitz} and {Hong Yu} and {Mohammed W. Akhter} and {Daniel Z. Fisher} and {Robert M. Hayward Jr.} and {Nils Henninger} and {Jane S. Saczynski} and {Allan J. Walkey} and {Alok Kapoor} and {Jorge Yarzebski} and {Robert J. Goldberg} and {David D. McManus}}, year = {2018}, pmid = {29988239 PMCID: PMC6006973}, }
BACKGROUND: Atrial fibrillation (AF) is a common complication of acute myocardial infarction (AMI).The CHA2DS2VAScand CHADS2risk scoresare used to identifypatients with AF at risk for strokeand to guide oral anticoagulants (OAC) use, including patients with AMI. However, the epidemiology of AF, further stratifiedaccording to patients' risk of stroke, has not been wellcharacterized among those hospitalized for AMI. METHODS: We examined trends in the frequency of AF, rates of discharge OAC use, and post-discharge outcomes among 6,627 residents of the Worcester, Massachusetts area who survived hospitalization for AMI at 11 medical centers between 1997 and 2011. RESULTS: A total of 1,050AMI patients had AF (16%) andthe majority (91%)had a CHA2DS2VAScscore \textgreater2.AF rates were highest among patients in the highest stroke risk group.In comparison to patients without AF, patients with AMI and AF in the highest stroke risk category had higher rates of post-discharge complications, including higher 30-day re-hospitalization [27 % vs. 17 %], 30-day post-discharge death [10 % vs. 5%], and 1-year post-discharge death [46 % vs. 18 %] (p \textless 0.001 for all). Notably, fewerthan half of guideline-eligible AF patientsreceived an OACprescription at discharge. Usage rates for other evidence-based therapiessuch as statins and beta-blockers,lagged in comparison to AMI patients free from AF. CONCLUSIONS: Our findings highlight the need to enhance efforts towards stroke prevention among AMI survivors with AF.
ComprehENotes: An Instrument to Assess Patient EHR Note Reading Comprehension of Electronic Health Record Notes: Development and Validation.
Lalor, J; Wu, H; Chen, L; Mazor, K; and Yu, H
The Journal of Medical Internet Research. April 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{lalor_comprehenotes:_2018, title = {{ComprehENotes}: {An} {Instrument} to {Assess} {Patient} {EHR} {Note} {Reading} {Comprehension} of {Electronic} {Health} {Record} {Notes}: {Development} and {Validation}}, doi = {DOI: 10.2196/jmir.9380}, abstract = {BACKGROUND: Patient portals are widely adopted in the United States and allow millions of patients access to their electronic health records (EHRs), including their EHR clinical notes. A patient's ability to understand the information in the EHR is dependent on their overall health literacy. Although many tests of health literacy exist, none specifically focuses on EHR note comprehension. OBJECTIVE: The aim of this paper was to develop an instrument to assess patients' EHR note comprehension. METHODS: We identified 6 common diseases or conditions (heart failure, diabetes, cancer, hypertension, chronic obstructive pulmonary disease, and liver failure) and selected 5 representative EHR notes for each disease or condition. One note that did not contain natural language text was removed. Questions were generated from these notes using Sentence Verification Technique and were analyzed using item response theory (IRT) to identify a set of questions that represent a good test of ability for EHR note comprehension. RESULTS: Using Sentence Verification Technique, 154 questions were generated from the 29 EHR notes initially obtained. Of these, 83 were manually selected for inclusion in the Amazon Mechanical Turk crowdsourcing tasks and 55 were ultimately retained following IRT analysis. A follow-up validation with a second Amazon Mechanical Turk task and IRT analysis confirmed that the 55 questions test a latent ability dimension for EHR note comprehension. A short test of 14 items was created along with the 55-item test. CONCLUSIONS: We developed ComprehENotes, an instrument for assessing EHR note comprehension from existing EHR notes, gathered responses using crowdsourcing, and used IRT to analyze those responses, thus resulting in a set of questions to measure EHR note comprehension. Crowdsourced responses from Amazon Mechanical Turk can be used to estimate item parameters and select a subset of items for inclusion in the test set using IRT. The final set of questions is the first test of EHR note comprehension.}, journal = {The Journal of Medical Internet Research}, author = {Lalor, J and Wu, H and Chen, L and Mazor, K and Yu, H}, month = apr, year = {2018}, pmid = {29695372 PMCID: PMC5943623}, }
BACKGROUND: Patient portals are widely adopted in the United States and allow millions of patients access to their electronic health records (EHRs), including their EHR clinical notes. A patient's ability to understand the information in the EHR is dependent on their overall health literacy. Although many tests of health literacy exist, none specifically focuses on EHR note comprehension. OBJECTIVE: The aim of this paper was to develop an instrument to assess patients' EHR note comprehension. METHODS: We identified 6 common diseases or conditions (heart failure, diabetes, cancer, hypertension, chronic obstructive pulmonary disease, and liver failure) and selected 5 representative EHR notes for each disease or condition. One note that did not contain natural language text was removed. Questions were generated from these notes using Sentence Verification Technique and were analyzed using item response theory (IRT) to identify a set of questions that represent a good test of ability for EHR note comprehension. RESULTS: Using Sentence Verification Technique, 154 questions were generated from the 29 EHR notes initially obtained. Of these, 83 were manually selected for inclusion in the Amazon Mechanical Turk crowdsourcing tasks and 55 were ultimately retained following IRT analysis. A follow-up validation with a second Amazon Mechanical Turk task and IRT analysis confirmed that the 55 questions test a latent ability dimension for EHR note comprehension. A short test of 14 items was created along with the 55-item test. CONCLUSIONS: We developed ComprehENotes, an instrument for assessing EHR note comprehension from existing EHR notes, gathered responses using crowdsourcing, and used IRT to analyze those responses, thus resulting in a set of questions to measure EHR note comprehension. Crowdsourced responses from Amazon Mechanical Turk can be used to estimate item parameters and select a subset of items for inclusion in the test set using IRT. The final set of questions is the first test of EHR note comprehension.
Detecting Hypoglycemia Incidence from Patients’ Secure Messages.
Chen, J; and Yu, H
In 2018.
link bibtex
link bibtex
@inproceedings{chen_detecting_2018, title = {Detecting {Hypoglycemia} {Incidence} from {Patients}’ {Secure} {Messages}}, author = {Chen, J and Yu, H}, year = {2018}, }
Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.
Li, F.; Liu, W.; and Yu, H.
JMIR medical informatics, 6(4): e12159. November 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{li_extraction_2018, title = {Extraction of {Information} {Related} to {Adverse} {Drug} {Events} from {Electronic} {Health} {Record} {Notes}: {Design} of an {End}-to-{End} {Model} {Based} on {Deep} {Learning}}, volume = {6}, issn = {2291-9694}, shorttitle = {Extraction of {Information} {Related} to {Adverse} {Drug} {Events} from {Electronic} {Health} {Record} {Notes}}, doi = {10.2196/12159}, abstract = {BACKGROUND: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. OBJECTIVE: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps. METHODS: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. RESULTS: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9\%), which is significantly higher than that (F1=61.7\%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8\%, boosting the F1 to 66.7\%, whereas RegMTL and LearnMTL failed to boost the performance. CONCLUSIONS: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning.}, language = {eng}, number = {4}, journal = {JMIR medical informatics}, author = {Li, Fei and Liu, Weisong and Yu, Hong}, month = nov, year = {2018}, pmid = {30478023 PMCID: PMC6288593}, keywords = {adverse drug event, deep learning, multi-task learning, named entity recognition, natural language processing, relation extraction}, pages = {e12159}, }
BACKGROUND: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. OBJECTIVE: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps. METHODS: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. RESULTS: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9%), which is significantly higher than that (F1=61.7%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8%, boosting the F1 to 66.7%, whereas RegMTL and LearnMTL failed to boost the performance. CONCLUSIONS: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning.
Reference Standard Development to Train Natural Language Processing Algorithms to Detect Problematic Buprenorphine-Naloxone Therapy.
Celena B Peters; Fran Cunningham; Adam Gordon; Hong Yu; Cedric Salone; Jessica Zacher; Ronald Carico; Jianwei Leng; Nikolh Durley; Weisong Liu; Chao-Chin Lu; Emily Druhl; Feifan Liu; and Brian C Sauer
In VA Pharmacy Informatics Conference 2018, 2018.
Paper
link
bibtex
@inproceedings{celena_b_peters_reference_2018, title = {Reference {Standard} {Development} to {Train} {Natural} {Language} {Processing} {Algorithms} to {Detect} {Problematic} {Buprenorphine}-{Naloxone} {Therapy}}, url = {https://vapharmacytraining.remote-learner.net/mod/resource/view.php?id=13218}, booktitle = {{VA} {Pharmacy} {Informatics} {Conference} 2018}, author = {{Celena B Peters} and {Fran Cunningham} and {Adam Gordon} and {Hong Yu} and {Cedric Salone} and {Jessica Zacher} and {Ronald Carico} and {Jianwei Leng} and {Nikolh Durley} and {Weisong Liu} and {Chao-Chin Lu} and {Emily Druhl} and {Feifan Liu} and {Brian C Sauer}}, year = {2018}, }
Inadequate diversity of information resources searched in US-affiliated systematic reviews and meta-analyses: 2005-2016.
Pradhan, R.; Garnick, K.; Barkondaj, B.; Jordan, H. S.; Ash, A.; and Yu, H.
Journal of Clinical Epidemiology, 102: 50–62. October 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{pradhan_inadequate_2018, title = {Inadequate diversity of information resources searched in {US}-affiliated systematic reviews and meta-analyses: 2005-2016}, volume = {102}, issn = {1878-5921}, shorttitle = {Inadequate diversity of information resources searched in {US}-affiliated systematic reviews and meta-analyses}, doi = {10.1016/j.jclinepi.2018.05.024}, abstract = {OBJECTIVE: Systematic reviews and meta-analyses (SRMAs) rely upon comprehensive searches into diverse resources that catalog primary studies. However, since what constitutes a comprehensive search is unclear, we examined trends in databases searched from 2005-2016, surrounding the publication of search guidelines in 2013, and associations between resources searched and evidence of publication bias in SRMAs involving human subjects. STUDY DESIGN: To ensure comparability of included SRMAs over the 12 years in the face of a near 100-fold increase of international SRMAs (mainly genetic studies from China) during this period, we focused on USA-affiliated SRMAs, manually reviewing 100 randomly selected SRMAs from those published in each year. After excluding articles (mainly for inadequate detail or out-of-scope methods), we identified factors associated with the databases searched, used network analysis to see which resources were simultaneously searched, and used logistic regression to link information sources searched with a lower chance of finding publication bias. RESULTS: Among 817 SRMA articles studied, the common resources used were Medline (95\%), EMBASE (44\%), and Cochrane (41\%). Methods journal SRMAs were most likely to use registries and grey literature resources. We found substantial co-searching of resources with only published materials, and not complemented by searches of registries and the grey literature. The 2013 guideline did not substantially increase searching of registries and grey literature resources to retrieve primary studies for the SRMAs. When used to augment Medline, Scopus (in all SRMAs) and ClinicalTrials.gov (in SRMAs with safety outcomes) were negatively associated with publication bias. CONCLUSIONS: Even SRMAs that search multiple sources tend to search similar resources. Our study supports searching Scopus and CTG in addition to Medline to reduce the chance of publication bias.}, language = {eng}, journal = {Journal of Clinical Epidemiology}, author = {Pradhan, Richeek and Garnick, Kyle and Barkondaj, Bikramjit and Jordan, Harmon S. and Ash, Arlene and Yu, Hong}, month = oct, year = {2018}, pmid = {29879464}, pmcid = {PMC6250602}, keywords = {Evidence synthesis, Grey literature, Literature databases, Meta-analysis, Publication bias, Systematic review, Trial registries}, pages = {50--62}, }
OBJECTIVE: Systematic reviews and meta-analyses (SRMAs) rely upon comprehensive searches into diverse resources that catalog primary studies. However, since what constitutes a comprehensive search is unclear, we examined trends in databases searched from 2005-2016, surrounding the publication of search guidelines in 2013, and associations between resources searched and evidence of publication bias in SRMAs involving human subjects. STUDY DESIGN: To ensure comparability of included SRMAs over the 12 years in the face of a near 100-fold increase of international SRMAs (mainly genetic studies from China) during this period, we focused on USA-affiliated SRMAs, manually reviewing 100 randomly selected SRMAs from those published in each year. After excluding articles (mainly for inadequate detail or out-of-scope methods), we identified factors associated with the databases searched, used network analysis to see which resources were simultaneously searched, and used logistic regression to link information sources searched with a lower chance of finding publication bias. RESULTS: Among 817 SRMA articles studied, the common resources used were Medline (95%), EMBASE (44%), and Cochrane (41%). Methods journal SRMAs were most likely to use registries and grey literature resources. We found substantial co-searching of resources with only published materials, and not complemented by searches of registries and the grey literature. The 2013 guideline did not substantially increase searching of registries and grey literature resources to retrieve primary studies for the SRMAs. When used to augment Medline, Scopus (in all SRMAs) and ClinicalTrials.gov (in SRMAs with safety outcomes) were negatively associated with publication bias. CONCLUSIONS: Even SRMAs that search multiple sources tend to search similar resources. Our study supports searching Scopus and CTG in addition to Medline to reduce the chance of publication bias.
2017
(10)
Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach.
Chen, J.; Jagannatha, A. N.; Fodeh, S. J.; and Yu, H.
JMIR medical informatics, 5(4): e42. October 2017.
doi link bibtex abstract
doi link bibtex abstract
@article{chen_ranking_2017, title = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach}, volume = {5}, issn = {2291-9694}, shorttitle = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes}, doi = {10.2196/medinform.8531}, abstract = {BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P{\textless}.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.}, language = {eng}, number = {4}, journal = {JMIR medical informatics}, author = {Chen, Jinying and Jagannatha, Abhyuday N. and Fodeh, Samah J. and Yu, Hong}, month = oct, year = {2017}, pmid = {29089288}, pmcid = {PMC5686421}, keywords = {Information extraction, electronic health records, lexical entry selection, natural language processing, transfer learning}, pages = {e42}, }
BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P\textless.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.
Meta Networks.
Munkhdalai, T.; and Yu, H.
In ICML, volume 70, pages 2554–2563, Sydney, Australia, August 2017.
link bibtex abstract
link bibtex abstract
@inproceedings{munkhdalai_meta_2017, address = {Sydney, Australia}, title = {Meta {Networks}}, volume = {70}, abstract = {Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6\% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning.}, booktitle = {{ICML}}, author = {Munkhdalai, Tsendsuren and Yu, Hong}, month = aug, year = {2017}, pmid = {31106300; PMCID: PMC6519722}, pages = {2554--2563}, }
Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning.
Neural Semantic Encoders.
Munkhdalai, T; and Yu, H.
In European Chapter of the Association for Computational Linguistics 2017 (EACL), volume 1, pages 397–407, April 2017.
Paper
link
bibtex
abstract
@inproceedings{munkhdalai_neural_2017, title = {Neural {Semantic} {Encoders}}, volume = {1}, url = {https://arxiv.org/pdf/1607.04315v2.pdf}, abstract = {We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read\vphantom{\{}\}, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.}, booktitle = {European {Chapter} of the {Association} for {Computational} {Linguistics} 2017 ({EACL})}, author = {Munkhdalai, T and Yu, Hong}, month = apr, year = {2017}, pmid = {29081578 PMCID: PMC5657452}, pages = {397--407}, }
We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through readp̌hantom\\, compose and write operations. NSE can also access multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.
Detecting Opioid-Related Aberrant Behavior using Natural Language Processing.
Lingeman, J. M.; Wang, P.; Becker, W.; and Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2017: 1179–1185. 2017.
link bibtex abstract
link bibtex abstract
@article{lingeman_detecting_2017, title = {Detecting {Opioid}-{Related} {Aberrant} {Behavior} using {Natural} {Language} {Processing}}, volume = {2017}, issn = {1942-597X}, abstract = {The United States is in the midst of a prescription opioid epidemic, with the number of yearly opioid-related overdose deaths increasing almost fourfold since 20001. To more effectively prevent unintentional opioid overdoses, the medical profession requires robust surveillance tools that can effectively identify at-risk patients. Drug-related aberrant behaviors observed in the clinical context may be important indicators of patients at risk for or actively abusing opioids. In this paper, we describe a natural language processing (NLP) method for automatic surveillance of aberrant behavior in medical notes relying only on the text of the notes. This allows for a robust and generalizable system that can be used for high volume analysis of electronic medical records for potential predictors of opioid abuse.}, language = {eng}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Lingeman, Jesse M. and Wang, Priscilla and Becker, William and Yu, Hong}, year = {2017}, pmid = {29854186 PMCID: PMC5977697}, pages = {1179--1185}, }
The United States is in the midst of a prescription opioid epidemic, with the number of yearly opioid-related overdose deaths increasing almost fourfold since 20001. To more effectively prevent unintentional opioid overdoses, the medical profession requires robust surveillance tools that can effectively identify at-risk patients. Drug-related aberrant behaviors observed in the clinical context may be important indicators of patients at risk for or actively abusing opioids. In this paper, we describe a natural language processing (NLP) method for automatic surveillance of aberrant behavior in medical notes relying only on the text of the notes. This allows for a robust and generalizable system that can be used for high volume analysis of electronic medical records for potential predictors of opioid abuse.
CIFT: Crowd-Informed Fine-Tuning to Improve Machine Learning Ability.
Lalor, J; Wu, H; and Yu, H
In February 2017.
link bibtex abstract
link bibtex abstract
@inproceedings{lalor_cift:_2017, title = {{CIFT}: {Crowd}-{Informed} {Fine}-{Tuning} to {Improve} {Machine} {Learning} {Ability}.}, abstract = {tem Response Theory (IRT) allows for measuring ability of Machine Learning models as compared to a human population. However, it is difficult to create a large dataset to train the ability of deep neural network models (DNNs). We propose Crowd-Informed Fine-Tuning (CIFT) as a new training process, where a pre-trained model is fine-tuned with a specialized supplemental training set obtained via IRT model-fitting on a large set of crowdsourced response patterns. With CIFT we can leverage the specialized set of data obtained through IRT to inform parameter tuning in DNNs. We experiment with two loss functions in CIFT to represent (i) memorization of fine-tuning items and (ii) learning a probability distribution over potential labels that is similar to the crowdsourced distribution over labels to simulate crowd knowledge. Our results show that CIFT improves ability for a state-of-the-art DNN model for Recognizing Textual Entailment (RTE) tasks and is generalizable to a large-scale RTE test set.}, author = {Lalor, J and Wu, H and Yu, H}, month = feb, year = {2017}, }
tem Response Theory (IRT) allows for measuring ability of Machine Learning models as compared to a human population. However, it is difficult to create a large dataset to train the ability of deep neural network models (DNNs). We propose Crowd-Informed Fine-Tuning (CIFT) as a new training process, where a pre-trained model is fine-tuned with a specialized supplemental training set obtained via IRT model-fitting on a large set of crowdsourced response patterns. With CIFT we can leverage the specialized set of data obtained through IRT to inform parameter tuning in DNNs. We experiment with two loss functions in CIFT to represent (i) memorization of fine-tuning items and (ii) learning a probability distribution over potential labels that is similar to the crowdsourced distribution over labels to simulate crowd knowledge. Our results show that CIFT improves ability for a state-of-the-art DNN model for Recognizing Textual Entailment (RTE) tasks and is generalizable to a large-scale RTE test set.
Assessing Electronic Health Record Readability.
Zheng, J; and Yu, H
In 2017.
link bibtex
link bibtex
@inproceedings{zheng_assessing_2017, title = {Assessing {Electronic} {Health} {Record} {Readability}.}, author = {Zheng, J and Yu, H}, year = {2017}, }
Reasoning with memory augmented neural networks for language comprehension.
Munkhdalai, T.; and Yu, H.
5th International Conference on Learning Representations (ICLR). 2017.
Paper
link
bibtex
abstract
@article{munkhdalai_reasoning_2017, title = {Reasoning with memory augmented neural networks for language comprehension.}, url = {https://arxiv.org/abs/1610.06454}, abstract = {Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2\% to 2.6\% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets.}, urldate = {2017-06-02}, journal = {5th International Conference on Learning Representations (ICLR)}, author = {Munkhdalai, Tsendsuren and Yu, Hong}, year = {2017}, }
Hypothesis testing is an important cognitive process that supports human reasoning. In this paper, we introduce a computational hypothesis testing approach based on memory augmented neural networks. Our approach involves a hypothesis testing loop that reconsiders and progressively refines a previously formed hypothesis in order to generate new hypotheses to test. We apply the proposed approach to language comprehension task by using Neural Semantic Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by single and ensemble systems on standard machine comprehension benchmarks such as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets.
Readability Formulas and User Perceptions of Electronic Health Records Difficulty: A Corpus Study.
Zheng, J.; and Yu, H.
Journal of Medical Internet Research, 19(3): e59. 2017.
Paper
doi
link
bibtex
abstract
@article{zheng_readability_2017, title = {Readability {Formulas} and {User} {Perceptions} of {Electronic} {Health} {Records} {Difficulty}: {A} {Corpus} {Study}}, volume = {19}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Readability {Formulas} and {User} {Perceptions} of {Electronic} {Health} {Records} {Difficulty}}, url = {https://www.jmir.org/2017/3/e59/}, doi = {10.2196/jmir.6962}, abstract = {Background: Electronic health records (EHRs) are a rich resource for developing applications to engage patients and foster patient activation, thus holding a strong potential to enhance patient-centered care. Studies have shown that providing patients with access to their own EHR notes may improve the understanding of their own clinical conditions and treatments, leading to improved health care outcomes. However, the highly technical language in EHR notes impedes patients’ comprehension. Numerous studies have evaluated the difficulty of health-related text using readability formulas such as Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI). They conclude that the materials are often written at a grade level higher than common recommendations. Objective: The objective of our study was to explore the relationship between the aforementioned readability formulas and the laypeople’s perceived difficulty on 2 genres of text: general health information and EHR notes. We also validated the formulas’ appropriateness and generalizability on predicting difficulty levels of highly complex technical documents. Methods: We collected 140 Wikipedia articles on diabetes and 242 EHR notes with diabetes International Classification of Diseases, Ninth Revision code. We recruited 15 Amazon Mechanical Turk (AMT) users to rate difficulty levels of the documents. Correlations between laypeople’s perceived difficulty levels and readability formula scores were measured, and their difference was tested. We also compared word usage and the impact of medical concepts of the 2 genres of text. Results: The distributions of both readability formulas’ scores (P{\textless}.001) and laypeople’s perceptions (P=.002) on the 2 genres were different. Correlations of readability predictions and laypeople’s perceptions were weak. Furthermore, despite being graded at similar levels, documents of different genres were still perceived with different difficulty (P{\textless}.001). Word usage in the 2 related genres still differed significantly (P{\textless}.001). Conclusions: Our findings suggested that the readability formulas’ predictions did not align with perceived difficulty in either text genre. The widely used readability formulas were highly correlated with each other but did not show adequate correlation with readers’ perceived difficulty. Therefore, they were not appropriate to assess the readability of EHR notes. [J Med Internet Res 2017;19(3):e59]}, language = {en}, number = {3}, urldate = {2017-03-06}, journal = {Journal of Medical Internet Research}, author = {Zheng, Jiaping and Yu, Hong}, year = {2017}, pmid = {28254738 PMCID: PMC5355629}, pages = {e59}, }
Background: Electronic health records (EHRs) are a rich resource for developing applications to engage patients and foster patient activation, thus holding a strong potential to enhance patient-centered care. Studies have shown that providing patients with access to their own EHR notes may improve the understanding of their own clinical conditions and treatments, leading to improved health care outcomes. However, the highly technical language in EHR notes impedes patients’ comprehension. Numerous studies have evaluated the difficulty of health-related text using readability formulas such as Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI). They conclude that the materials are often written at a grade level higher than common recommendations. Objective: The objective of our study was to explore the relationship between the aforementioned readability formulas and the laypeople’s perceived difficulty on 2 genres of text: general health information and EHR notes. We also validated the formulas’ appropriateness and generalizability on predicting difficulty levels of highly complex technical documents. Methods: We collected 140 Wikipedia articles on diabetes and 242 EHR notes with diabetes International Classification of Diseases, Ninth Revision code. We recruited 15 Amazon Mechanical Turk (AMT) users to rate difficulty levels of the documents. Correlations between laypeople’s perceived difficulty levels and readability formula scores were measured, and their difference was tested. We also compared word usage and the impact of medical concepts of the 2 genres of text. Results: The distributions of both readability formulas’ scores (P\textless.001) and laypeople’s perceptions (P=.002) on the 2 genres were different. Correlations of readability predictions and laypeople’s perceptions were weak. Furthermore, despite being graded at similar levels, documents of different genres were still perceived with different difficulty (P\textless.001). Word usage in the 2 related genres still differed significantly (P\textless.001). Conclusions: Our findings suggested that the readability formulas’ predictions did not align with perceived difficulty in either text genre. The widely used readability formulas were highly correlated with each other but did not show adequate correlation with readers’ perceived difficulty. Therefore, they were not appropriate to assess the readability of EHR notes. [J Med Internet Res 2017;19(3):e59]
Neural Tree Indexers for Text Understanding.
Munkhdalai, T.; and Yu, H.
In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 11–21, Valencia, Spain, April 2017. Association for Computational Linguistics
Paper
link
bibtex
abstract
@inproceedings{munkhdalai_neural_2017-1, address = {Valencia, Spain}, title = {Neural {Tree} {Indexers} for {Text} {Understanding}}, url = {http://www.aclweb.org/anthology/E17-1002}, abstract = {Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binary tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.}, urldate = {2017-04-02}, booktitle = {Proceedings of the 15th {Conference} of the {European} {Chapter} of the {Association} for {Computational} {Linguistics}: {Volume} 1, {Long} {Papers}}, publisher = {Association for Computational Linguistics}, author = {Munkhdalai, Tsendsuren and Yu, Hong}, month = apr, year = {2017}, pages = {11--21}, }
Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binary tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.
Generating a Test of Electronic Health Record Narrative Comprehension with Item Response Theory.
Lalor, J; Wu, H; Chen, L; Mazor, K; and Yu, H
In November 2017.
link bibtex abstract
link bibtex abstract
@inproceedings{lalor_generating_2017, title = {Generating a {Test} of {Electronic} {Health} {Record} {Narrative} {Comprehension} with {Item} {Response} {Theory}.}, abstract = {In this work, we report the development of a new instrument to test patients' ability to comprehend EHR notes. Our instrument comprises of a test set of question and answer pairs that are based on the semantic content of EHR notes and selected using the psychometrics method Item Response Theory.}, author = {Lalor, J and Wu, H and Chen, L and Mazor, K and Yu, H}, month = nov, year = {2017}, }
In this work, we report the development of a new instrument to test patients' ability to comprehend EHR notes. Our instrument comprises of a test set of question and answer pairs that are based on the semantic content of EHR notes and selected using the psychometrics method Item Response Theory.
2016
(7)
Structured prediction models for RNN based sequence labeling in clinical text.
Jagannatha, A. N.; and Yu, H.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing, volume 2016, pages 856–865, November 2016.
link bibtex abstract
link bibtex abstract
@inproceedings{jagannatha_structured_2016, title = {Structured prediction models for {RNN} based sequence labeling in clinical text}, volume = {2016}, abstract = {Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.}, language = {eng}, booktitle = {Proceedings of the {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing}}, author = {Jagannatha, Abhyuday N. and Yu, Hong}, month = nov, year = {2016}, pmid = {28004040 PMCID: PMC5167535}, keywords = {Computer Science - Computation and Language}, pages = {856--865}, }
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism.
Choi, E.; Bahadori, M. T.; Sun, J.; Kulas, J.; Schuetz, A.; and Stewart, W.
In Advances in Neural Information Processing Systems, pages 3504–3512, 2016.
Paper
link
bibtex
@inproceedings{choi_retain:_2016, title = {{RETAIN}: {An} {Interpretable} {Predictive} {Model} for {Healthcare} using {Reverse} {Time} {Attention} {Mechanism}}, shorttitle = {{RETAIN}}, url = {http://papers.nips.cc/paper/6321-retain-an-interpretable-predictive-model-for-healthcare-using-reverse-time-attention-mechanism}, urldate = {2017-01-12}, booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, author = {Choi, Edward and Bahadori, Mohammad Taha and Sun, Jimeng and Kulas, Joshua and Schuetz, Andy and Stewart, Walter}, year = {2016}, pages = {3504--3512}, }
Learning to Rank Scientific Documents from the Crowd.
Lingeman, J. M; and Yu, H.
arXiv:1611.01400. November 2016.
Paper
link
bibtex
abstract
@article{lingeman_learning_2016, title = {Learning to {Rank} {Scientific} {Documents} from the {Crowd}}, url = {https://arxiv.org/pdf/1611.01400v1.pdf}, abstract = {Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones with the highest text similarities. In this study, we first develop an innovative crowd-sourcing approach to build an expert-annotated document-ranking corpus. Using this corpus as the gold standard, we then evaluate the approaches of using text similarity to rank the relatedness of articles. Finally, we develop and evaluate a new supervised model to automatically rank related scientific articles. Our results show that authors' ranking differ significantly from rankings by text-similarity-based models. By training a learning-to-rank model on a subset of the annotated corpus, we found the best supervised learning-to-rank model (SVM-Rank) significantly surpassed state-of-the-art baseline systems.}, journal = {arXiv:1611.01400}, author = {Lingeman, Jesse M and Yu, Hong}, month = nov, year = {2016}, }
Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. The most related articles may not be ones with the highest text similarities. In this study, we first develop an innovative crowd-sourcing approach to build an expert-annotated document-ranking corpus. Using this corpus as the gold standard, we then evaluate the approaches of using text similarity to rank the relatedness of articles. Finally, we develop and evaluate a new supervised model to automatically rank related scientific articles. Our results show that authors' ranking differ significantly from rankings by text-similarity-based models. By training a learning-to-rank model on a subset of the annotated corpus, we found the best supervised learning-to-rank model (SVM-Rank) significantly surpassed state-of-the-art baseline systems.
Learning for Biomedical Information Extraction: Methodological Review of Recent Advances.
Liu, F.; Chen, J.; Jagannatha, A.; and Yu, H.
arXiv:1606.07993. June 2016.
Paper
link
bibtex
abstract
@article{liu_learning_2016, title = {Learning for {Biomedical} {Information} {Extraction}: {Methodological} {Review} of {Recent} {Advances}}, url = {https://arxiv.org/ftp/arxiv/papers/1606/1606.07993.pdf}, abstract = {Biomedical information extraction (BioIE) is important to many applications, including clinical decision support, integrative biology, and pharmacovigilance, and therefore it has been an active research. Unlike existing reviews covering a holistic view on BioIE, this review focuses on mainly recent advances in learning based approaches, by systematically summarizing them into different aspects of methodological development. In addition, we dive into open information extraction and deep learning, two emerging and influential techniques and envision next generation of BioIE.}, journal = {arXiv:1606.07993}, author = {Liu, Feifan and Chen, Jinying and Jagannatha, Abhyuday and Yu, Hong}, month = jun, year = {2016}, }
Biomedical information extraction (BioIE) is important to many applications, including clinical decision support, integrative biology, and pharmacovigilance, and therefore it has been an active research. Unlike existing reviews covering a holistic view on BioIE, this review focuses on mainly recent advances in learning based approaches, by systematically summarizing them into different aspects of methodological development. In addition, we dive into open information extraction and deep learning, two emerging and influential techniques and envision next generation of BioIE.
Citation Analysis with Neural Attention Models.
Munkhdalai, M; Lalor, J; and Yu, H
In Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis (LOUHI) ,, pages 69–77, Austin, TX, November 2016. Association for Computational Linguistics
Paper
doi
link
bibtex
@inproceedings{munkhdalai_citation_2016, address = {Austin, TX}, title = {Citation {Analysis} with {Neural} {Attention} {Models}}, url = {http://www.aclweb.org/anthology/W/W16/W16-6109.pdf}, doi = {10.18653/v1/W16-6109}, booktitle = {Proceedings of the {Seventh} {International} {Workshop} on {Health} {Text} {Mining} and {Information} {Analysis} ({LOUHI}) ,}, publisher = {Association for Computational Linguistics}, author = {Munkhdalai, M and Lalor, J and Yu, H}, month = nov, year = {2016}, pages = {69--77}, }
Condensed Memory Networks for Clinical Diagnostic Inferencing.
Prakash, A.; Zhao, S.; Hasan, S. A.; Datla, V.; Lee, K.; Qadir, A.; Liu, J.; and Farri, O.
arXiv:1612.01848 [cs]. December 2016.
arXiv: 1612.01848
Paper
link
bibtex
abstract
@article{prakash_condensed_2016, title = {Condensed {Memory} {Networks} for {Clinical} {Diagnostic} {Inferencing}}, url = {http://arxiv.org/abs/1612.01848}, abstract = {Diagnosis of a clinical condition is a challenging task, which often requires significant medical investigation. Previous work related to diagnostic inferencing problems mostly consider multivariate observational data (e.g. physiological signals, lab tests etc.). In contrast, we explore the problem using free-text medical notes recorded in an electronic health record (EHR). Complex tasks like these can benefit from structured knowledge bases, but those are not scalable. We instead exploit raw text from Wikipedia as a knowledge source. Memory networks have been demonstrated to be effective in tasks which require comprehension of free-form text. They use the final iteration of the learned representation to predict probable classes. We introduce condensed memory neural networks (C-MemNNs), a novel model with iterative condensation of memory representations that preserves the hierarchy of features in the memory. Experiments on the MIMIC-III dataset show that the proposed model outperforms other variants of memory networks to predict the most probable diagnoses given a complex clinical scenario.}, urldate = {2017-01-12}, journal = {arXiv:1612.01848 [cs]}, author = {Prakash, Aaditya and Zhao, Siyuan and Hasan, Sadid A. and Datla, Vivek and Lee, Kathy and Qadir, Ashequl and Liu, Joey and Farri, Oladimeji}, month = dec, year = {2016}, note = {arXiv: 1612.01848}, keywords = {Computer Science - Computation and Language}, }
Diagnosis of a clinical condition is a challenging task, which often requires significant medical investigation. Previous work related to diagnostic inferencing problems mostly consider multivariate observational data (e.g. physiological signals, lab tests etc.). In contrast, we explore the problem using free-text medical notes recorded in an electronic health record (EHR). Complex tasks like these can benefit from structured knowledge bases, but those are not scalable. We instead exploit raw text from Wikipedia as a knowledge source. Memory networks have been demonstrated to be effective in tasks which require comprehension of free-form text. They use the final iteration of the learned representation to predict probable classes. We introduce condensed memory neural networks (C-MemNNs), a novel model with iterative condensation of memory representations that preserves the hierarchy of features in the memory. Experiments on the MIMIC-III dataset show that the proposed model outperforms other variants of memory networks to predict the most probable diagnoses given a complex clinical scenario.
Finding Important Terms for Patients in Their Electronic Health Records: A Learning-to-Rank Approach Using Expert Annotations.
Chen, J.; Zheng, J.; and Yu, H.
JMIR medical informatics, 4(4): e40. November 2016.
doi link bibtex abstract
doi link bibtex abstract
@article{chen_finding_2016, title = {Finding {Important} {Terms} for {Patients} in {Their} {Electronic} {Health} {Records}: {A} {Learning}-to-{Rank} {Approach} {Using} {Expert} {Annotations}}, volume = {4}, shorttitle = {Finding {Important} {Terms} for {Patients} in {Their} {Electronic} {Health} {Records}}, doi = {10.2196/medinform.6373}, abstract = {BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients' notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care. OBJECTIVE: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients. METHODS: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians' agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems. RESULTS: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen's kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P{\textless}.001). Rich learning features contributed to FOCUS's performance substantially. CONCLUSIONS: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care.}, language = {eng}, number = {4}, journal = {JMIR medical informatics}, author = {Chen, Jinying and Zheng, Jiaping and Yu, Hong}, month = nov, year = {2016}, pmid = {27903489}, pmcid = {PMC5156821}, keywords = {Information extraction, Learning to rank, Supervised learning, electronic health records, natural language processing}, pages = {e40}, }
BACKGROUND: Many health organizations allow patients to access their own electronic health record (EHR) notes through online patient portals as a way to enhance patient-centered care. However, EHR notes are typically long and contain abundant medical jargon that can be difficult for patients to understand. In addition, many medical terms in patients' notes are not directly related to their health care needs. One way to help patients better comprehend their own notes is to reduce information overload and help them focus on medical terms that matter most to them. Interventions can then be developed by giving them targeted education to improve their EHR comprehension and the quality of care. OBJECTIVE: We aimed to develop a supervised natural language processing (NLP) system called Finding impOrtant medical Concepts most Useful to patientS (FOCUS) that automatically identifies and ranks medical terms in EHR notes based on their importance to the patients. METHODS: First, we built an expert-annotated corpus. For each EHR note, 2 physicians independently identified medical terms important to the patient. Using the physicians' agreement as the gold standard, we developed and evaluated FOCUS. FOCUS first identifies candidate terms from each EHR note using MetaMap and then ranks the terms using a support vector machine-based learn-to-rank algorithm. We explored rich learning features, including distributed word representation, Unified Medical Language System semantic type, topic features, and features derived from consumer health vocabulary. We compared FOCUS with 2 strong baseline NLP systems. RESULTS: Physicians annotated 90 EHR notes and identified a mean of 9 (SD 5) important terms per note. The Cohen's kappa annotation agreement was .51. The 10-fold cross-validation results show that FOCUS achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.940 for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FOCUS for identifying important terms from EHR notes was 0.866 AUC-ROC. Both performance scores significantly exceeded the corresponding baseline system scores (P\textless.001). Rich learning features contributed to FOCUS's performance substantially. CONCLUSIONS: FOCUS can automatically rank terms from EHR notes based on their importance to patients. It may help develop future interventions that improve quality of care.
2015
(5)
Translating Electronic Health Record Notes from English to Spanish: A Preliminary Study.
Liu, W.; Cai, S.; Balaji, R.; Chiriboga, G.; Knight, K.; and Yu, H.
In ACL-IJCNLP, pages 134, Bei Jing, China, July 2015.
Paper
doi
link
bibtex
@inproceedings{liu_translating_2015, address = {Bei Jing, China}, title = {Translating {Electronic} {Health} {Record} {Notes} from {English} to {Spanish}: {A} {Preliminary} {Study}}, url = {http://aclweb.org/anthology/W/W15/W15-3816.pdf}, doi = {10.18653/v1/W15-3816}, booktitle = {{ACL}-{IJCNLP}}, author = {Liu, Weisong and Cai, Shu and Balaji, Ramesh and Chiriboga, German and Knight, Kevin and Yu, Hong}, month = jul, year = {2015}, pages = {134}, }
Figure-Associated Text Summarization and Evaluation.
Polepalli Ramesh, B.; Sethi, R. J.; and Yu, H.
PLOS ONE, 10(2): e0115671. February 2015.
Paper
doi
link
bibtex
@article{polepalli_ramesh_figure-associated_2015, title = {Figure-{Associated} {Text} {Summarization} and {Evaluation}}, volume = {10}, issn = {1932-6203}, url = {http://dx.plos.org/10.1371/journal.pone.0115671}, doi = {10.1371/journal.pone.0115671}, language = {en}, number = {2}, urldate = {2015-02-26}, journal = {PLOS ONE}, author = {Polepalli Ramesh, Balaji and Sethi, Ricky J. and Yu, Hong}, editor = {Sarkar, Indra Neil}, month = feb, year = {2015}, pmid = {25643357 PMCID: PMC4313946}, pages = {e0115671}, }
DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures.
Yin, X.; Yang, C.; Pei, W.; Man, H.; Zhang, J.; Learned-Miller, E.; and Yu, H.
PLoS ONE, 10(5). May 2015.
Paper
doi
link
bibtex
abstract
@article{yin_detext:_2015, title = {{DeTEXT}: {A} {Database} for {Evaluating} {Text} {Extraction} from {Biomedical} {Literature} {Figures}}, volume = {10}, issn = {1932-6203}, shorttitle = {{DeTEXT}}, url = {http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4423993/}, doi = {10.1371/journal.pone.0126200}, abstract = {Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.}, number = {5}, urldate = {2015-06-03}, journal = {PLoS ONE}, author = {Yin, Xu-Cheng and Yang, Chun and Pei, Wei-Yi and Man, Haixia and Zhang, Jun and Learned-Miller, Erik and Yu, Hong}, month = may, year = {2015}, pmid = {25951377 PMCID: PMC4423993}, }
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.
Methods for Linking EHR Notes to Education Materials.
Zheng, J.; and Yu, H.
AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science, 2015: 209–215. 2015.
link bibtex abstract
link bibtex abstract
@article{zheng_methods_2015, title = {Methods for {Linking} {EHR} {Notes} to {Education} {Materials}}, volume = {2015}, issn = {2153-4063}, abstract = {It has been shown that providing patients with access to their own electronic health records (EHR) can enhance their medical understanding and provide clinically relevant benefits. However, languages that are difficult for non-medical professionals to comprehend are prevalent in the EHR notes, including medical terms, abbreviations, and domain-specific language patterns. Furthermore, limited average health literacy forms a barrier for patients to understand their health condition, impeding their ability to actively participate in managing their health. Therefore, we are developing a system to retrieve EHR note-tailored online consumer-oriented health education materials to improve patients' health knowledge of their own clinical conditions. Our experiments show that queries combining key concepts and other medical concepts present in the EHR notes significantly outperform (more than doubled) a baseline system of using the phrases from topic models.}, language = {eng}, journal = {AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science}, author = {Zheng, Jiaping and Yu, Hong}, year = {2015}, pmid = {26306273}, pmcid = {PMC4525231}, pages = {209--215}, }
It has been shown that providing patients with access to their own electronic health records (EHR) can enhance their medical understanding and provide clinically relevant benefits. However, languages that are difficult for non-medical professionals to comprehend are prevalent in the EHR notes, including medical terms, abbreviations, and domain-specific language patterns. Furthermore, limited average health literacy forms a barrier for patients to understand their health condition, impeding their ability to actively participate in managing their health. Therefore, we are developing a system to retrieve EHR note-tailored online consumer-oriented health education materials to improve patients' health knowledge of their own clinical conditions. Our experiments show that queries combining key concepts and other medical concepts present in the EHR notes significantly outperform (more than doubled) a baseline system of using the phrases from topic models.
Identifying Key Concepts from EHR Notes Using Domain Adaptation.
Zheng, J.; Yu, H.; and Bedford, M. A.
In SIXTH INTERNATIONAL WORKSHOP ON HEALTH TEXT MINING AND INFORMATION ANALYSIS (LOUHI), pages 115, 2015.
Paper
link
bibtex
@inproceedings{zheng_identifying_2015, title = {Identifying {Key} {Concepts} from {EHR} {Notes} {Using} {Domain} {Adaptation}}, url = {http://www.anthology.aclweb.org/W/W15/W15-26.pdf#page=127}, urldate = {2017-02-23}, booktitle = {{SIXTH} {INTERNATIONAL} {WORKSHOP} {ON} {HEALTH} {TEXT} {MINING} {AND} {INFORMATION} {ANALYSIS} ({LOUHI})}, author = {Zheng, Jiaping and Yu, Hong and Bedford, M. A.}, year = {2015}, pages = {115}, }
2014
(4)
Learning to Rank Figures within a Biomedical Article.
Liu, F.; and Yu, H.
PLoS ONE, 9(3): e61567. March 2014.
Paper
doi
link
bibtex
abstract
@article{liu_learning_2014, title = {Learning to {Rank} {Figures} within a {Biomedical} {Article}}, volume = {9}, issn = {1932-6203}, url = {http://dx.plos.org/10.1371/journal.pone.0061567}, doi = {10.1371/journal.pone.0061567}, abstract = {Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. This ever-increasing sheer volume has made it difficult for scientists to effectively and accurately access figures of their interest, the process of which is crucial for validating research facts and for formulating or testing novel research hypotheses. Current figure search applications can't fully meet this challenge as the "bag of figures" assumption doesn't take into account the relationship among figures. In our previous study, hundreds of biomedical researchers have annotated articles in which they serve as corresponding authors. They ranked each figure in their paper based on a figure's importance at their discretion, referred to as "figure ranking". Using this collection of annotated data, we investigated computational approaches to automatically rank figures. We exploited and extended the state-of-the-art listwise learning-to-rank algorithms and developed a new supervised-learning model BioFigRank. The cross-validation results show that BioFigRank yielded the best performance compared with other state-of-the-art computational models, and the greedy feature selection can further boost the ranking performance significantly. Furthermore, we carry out the evaluation by comparing BioFigRank with three-level competitive domain-specific human experts: (1) First Author, (2) Non-Author-In-Domain-Expert who is not the author nor co-author of an article but who works in the same field of the corresponding author of the article, and (3) Non-Author-Out-Domain-Expert who is not the author nor co-author of an article and who may or may not work in the same field of the corresponding author of an article. Our results show that BioFigRank outperforms Non-Author-Out-Domain-Expert and performs as well as Non-Author-In-Domain-Expert. Although BioFigRank underperforms First Author, since most biomedical researchers are either in- or out-domain-experts for an article, we conclude that BioFigRank represents an artificial intelligence system that offers expert-level intelligence to help biomedical researchers to navigate increasingly proliferated big data efficiently.}, language = {en}, number = {3}, urldate = {2015-02-26}, journal = {PLoS ONE}, author = {Liu, Feifan and Yu, Hong}, editor = {Preis, Tobias}, month = mar, year = {2014}, pmid = {24625719 PMCID: PMC3953065}, pages = {e61567}, }
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. This ever-increasing sheer volume has made it difficult for scientists to effectively and accurately access figures of their interest, the process of which is crucial for validating research facts and for formulating or testing novel research hypotheses. Current figure search applications can't fully meet this challenge as the "bag of figures" assumption doesn't take into account the relationship among figures. In our previous study, hundreds of biomedical researchers have annotated articles in which they serve as corresponding authors. They ranked each figure in their paper based on a figure's importance at their discretion, referred to as "figure ranking". Using this collection of annotated data, we investigated computational approaches to automatically rank figures. We exploited and extended the state-of-the-art listwise learning-to-rank algorithms and developed a new supervised-learning model BioFigRank. The cross-validation results show that BioFigRank yielded the best performance compared with other state-of-the-art computational models, and the greedy feature selection can further boost the ranking performance significantly. Furthermore, we carry out the evaluation by comparing BioFigRank with three-level competitive domain-specific human experts: (1) First Author, (2) Non-Author-In-Domain-Expert who is not the author nor co-author of an article but who works in the same field of the corresponding author of the article, and (3) Non-Author-Out-Domain-Expert who is not the author nor co-author of an article and who may or may not work in the same field of the corresponding author of an article. Our results show that BioFigRank outperforms Non-Author-Out-Domain-Expert and performs as well as Non-Author-In-Domain-Expert. Although BioFigRank underperforms First Author, since most biomedical researchers are either in- or out-domain-experts for an article, we conclude that BioFigRank represents an artificial intelligence system that offers expert-level intelligence to help biomedical researchers to navigate increasingly proliferated big data efficiently.
Computational Approaches for Predicting Biomedical Research Collaborations.
Zhang, Q.; and Yu, H.
PLoS ONE, 9(11): e111795. November 2014.
Paper
doi
link
bibtex
abstract
@article{zhang_computational_2014, title = {Computational {Approaches} for {Predicting} {Biomedical} {Research} {Collaborations}}, volume = {9}, issn = {1932-6203}, url = {http://dx.plos.org/10.1371/journal.pone.0111795}, doi = {10.1371/journal.pone.0111795}, abstract = {Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.}, language = {en}, number = {11}, urldate = {2015-02-26}, journal = {PLoS ONE}, author = {Zhang, Qing and Yu, Hong}, editor = {Smalheiser, Neil R.}, month = nov, year = {2014}, pmid = {25375164 PMCID: PMC4222920}, pages = {e111795}, }
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.
Automatically Recognizing Medication and Adverse Event Information From Food and Drug Administration’s Adverse Event Reporting System Narratives.
Polepalli Ramesh, B.; Belknap, S. M; Li, Z.; Frid, N.; West, D. P; and Yu, H.
JMIR Medical Informatics, 2(1): e10. June 2014.
Paper
doi
link
bibtex
@article{polepalli_ramesh_automatically_2014, title = {Automatically {Recognizing} {Medication} and {Adverse} {Event} {Information} {From} {Food} and {Drug} {Administration}’s {Adverse} {Event} {Reporting} {System} {Narratives}}, volume = {2}, issn = {2291-9694}, url = {http://medinform.jmir.org/2014/1/e10/}, doi = {10.2196/medinform.3022}, language = {en}, number = {1}, urldate = {2015-05-02}, journal = {JMIR Medical Informatics}, author = {Polepalli Ramesh, Balaji and Belknap, Steven M and Li, Zuofeng and Frid, Nadya and West, Dennis P and Yu, Hong}, month = jun, year = {2014}, pmid = {25600332}, pmcid = {PMC4288072}, pages = {e10}, }
A robust data-driven approach for gene ontology annotation.
Li, Y.; and Yu, H.
Database: The Journal of Biological Databases and Curation, 2014: bau113. 2014.
00000
Paper
doi
link
bibtex
abstract
@article{li_robust_2014, title = {A robust data-driven approach for gene ontology annotation}, volume = {2014}, issn = {1758-0463}, url = {http://database.oxfordjournals.org/cgi/doi/10.1093/database/bau113}, doi = {10.1093/database/bau113}, abstract = {Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3\% in exact match and 32.5\% in relaxed match. In the post-submission experiment, we obtained 22.1\% and 35.7\% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20\% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8\% F1 and 22.2\% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6\% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks.}, language = {eng}, journal = {Database: The Journal of Biological Databases and Curation}, author = {Li, Yanpeng and Yu, Hong}, year = {2014}, pmid = {25425037}, pmcid = {PMC4243380}, note = {00000 }, pages = {bau113}, }
Gene ontology (GO) and GO annotation are important resources for biological information management and knowledge discovery, but the speed of manual annotation became a major bottleneck of database curation. BioCreative IV GO annotation task aims to evaluate the performance of system that automatically assigns GO terms to genes based on the narrative sentences in biomedical literature. This article presents our work in this task as well as the experimental results after the competition. For the evidence sentence extraction subtask, we built a binary classifier to identify evidence sentences using reference distance estimator (RDE), a recently proposed semi-supervised learning method that learns new features from around 10 million unlabeled sentences, achieving an F1 of 19.3% in exact match and 32.5% in relaxed match. In the post-submission experiment, we obtained 22.1% and 35.7% F1 performance by incorporating bigram features in RDE learning. In both development and test sets, RDE-based method achieved over 20% relative improvement on F1 and AUC performance against classical supervised learning methods, e.g. support vector machine and logistic regression. For the GO term prediction subtask, we developed an information retrieval-based method to retrieve the GO term most relevant to each evidence sentence using a ranking function that combined cosine similarity and the frequency of GO terms in documents, and a filtering method based on high-level GO classes. The best performance of our submitted runs was 7.8% F1 and 22.2% hierarchy F1. We found that the incorporation of frequency information and hierarchy filtering substantially improved the performance. In the post-submission evaluation, we obtained a 10.6% F1 using a simpler setting. Overall, the experimental analysis showed our approaches were robust in both the two tasks.
2013
(2)
Systems for Improving Electronic Health Record Note Comprehension.
Polepalli Ramesh, B.; and Yu, H.
In ACM SIGIR Workshop on Health Search & Discovery, 2013.
Paper
link
bibtex
abstract
@inproceedings{polepalli_ramesh_systems_2013, title = {Systems for {Improving} {Electronic} {Health} {Record} {Note} {Comprehension}}, url = {https://research.nuance.com/wp-content/uploads/2014/12/Systems-for-Improving-Electronic-Health-Record-Note-Comprehension.pdf}, abstract = {Allowing patients access to their physicians’ notes has the potential to enhance their understanding of disease and improve medication adherence and healthcare outcomes. However, a recent study involving over ten thousand patients showed that allowing patients to read their electronic health record (EHR) notes caused confusion, especially for the vulnerable (e.g., lower literacy, lower income) groups. This finding is not surprising as EHR notes contain medical jargon that may be difficult for patients to comprehend. To improve patients’ EHR note comprehension, we are developing a biomedical natural language processing system called NoteAid (http://clinicalnotesaid.org), which translates medical jargon into consumer-oriented lay language. The current NoteAid implementations link EHR medical terms to their definitions and other related educational material. Our evaluation has shown that all NoteAid implementations improve self-rated EHR note comprehension by 23\% to 40\% of lay people.}, booktitle = {{ACM} {SIGIR} {Workshop} on {Health} {Search} \& {Discovery}}, author = {Polepalli Ramesh, Balaji and Yu, Hong}, year = {2013}, }
Allowing patients access to their physicians’ notes has the potential to enhance their understanding of disease and improve medication adherence and healthcare outcomes. However, a recent study involving over ten thousand patients showed that allowing patients to read their electronic health record (EHR) notes caused confusion, especially for the vulnerable (e.g., lower literacy, lower income) groups. This finding is not surprising as EHR notes contain medical jargon that may be difficult for patients to comprehend. To improve patients’ EHR note comprehension, we are developing a biomedical natural language processing system called NoteAid (http://clinicalnotesaid.org), which translates medical jargon into consumer-oriented lay language. The current NoteAid implementations link EHR medical terms to their definitions and other related educational material. Our evaluation has shown that all NoteAid implementations improve self-rated EHR note comprehension by 23% to 40% of lay people.
CiteGraph: A Citation Network System for MEDLINE Articles and Analysis.
Qing, Z.; and Hong, Y.
Studies in Health Technology and Informatics,832–836. 2013.
Paper
doi
link
bibtex
abstract
@article{qing_citegraph:_2013, title = {{CiteGraph}: {A} {Citation} {Network} {System} for {MEDLINE} {Articles} and {Analysis}}, copyright = {©2013 © IMIA and IOS Press.}, issn = {0926-9630}, shorttitle = {{CiteGraph}}, url = {http://www.medra.org/servlet/aliasResolver?alias=iospressISSNISBN&issn=0926-9630&volume=192&spage=832}, doi = {10.3233/978-1-61499-289-9-832}, abstract = {This paper details the development and implementation of CiteGraph, a system for constructing large-scale citation and co-authorship networks from full-text biomedical articles. CiteGraph represents articles and authors by uniquely identified nodes, and connects those nodes through citation and co-authorship relations. CiteGraph network encompasses over 1.65 million full-text articles and 6.35 million citations by 1.37 million unique authors from the Elsevier full-text articles. Our evaluation shows 98\% 99\% F1-score for mapping a citation to the corresponding article and identifying MEDLINE articles. We further analyzed the characteristics of CiteGraph and found that they are consistent with assumptions made using small-scale bibliometric analysis. We also developed several novel network-based methods for analyzing publication, citation and collaboration patterns. This is the first work to develop a completely automated system for the creation of a large-scale citation network in the biomedical domain, and also to introduce novel findings in researcher publication histories. CiteGraph can be a useful resource to both the biomedical community, and bibliometric research.}, urldate = {2016-11-30}, journal = {Studies in Health Technology and Informatics}, author = {Qing, Zhang and Hong, Yu}, year = {2013}, pmid = {23920674}, pages = {832--836}, }
This paper details the development and implementation of CiteGraph, a system for constructing large-scale citation and co-authorship networks from full-text biomedical articles. CiteGraph represents articles and authors by uniquely identified nodes, and connects those nodes through citation and co-authorship relations. CiteGraph network encompasses over 1.65 million full-text articles and 6.35 million citations by 1.37 million unique authors from the Elsevier full-text articles. Our evaluation shows 98% 99% F1-score for mapping a citation to the corresponding article and identifying MEDLINE articles. We further analyzed the characteristics of CiteGraph and found that they are consistent with assumptions made using small-scale bibliometric analysis. We also developed several novel network-based methods for analyzing publication, citation and collaboration patterns. This is the first work to develop a completely automated system for the creation of a large-scale citation network in the biomedical domain, and also to introduce novel findings in researcher publication histories. CiteGraph can be a useful resource to both the biomedical community, and bibliometric research.
2012
(2)
Beyond Captions: Linking Figures with Abstract Sentences in Biomedical Articles.
Bockhorst, J. P.; Conroy, J. M.; Agarwal, S.; O’Leary, D. P.; and Yu, H.
PLoS ONE, 7(7): e39618. July 2012.
Paper
doi
link
bibtex
@article{bockhorst_beyond_2012, title = {Beyond {Captions}: {Linking} {Figures} with {Abstract} {Sentences} in {Biomedical} {Articles}}, volume = {7}, issn = {1932-6203}, shorttitle = {Beyond {Captions}}, url = {http://dx.plos.org/10.1371/journal.pone.0039618}, doi = {10.1371/journal.pone.0039618}, language = {en}, number = {7}, urldate = {2016-11-30}, journal = {PLoS ONE}, author = {Bockhorst, Joseph P. and Conroy, John M. and Agarwal, Shashank and O’Leary, Dianne P. and Yu, Hong}, editor = {Ouzounis, Christos A.}, month = jul, year = {2012}, pmid = {22815711}, pmcid = {PMC3399876}, pages = {e39618}, }
Automatic discourse connective detection in biomedical text.
Ramesh, B. P.; Prasad, R.; Miller, T.; Harrington, B.; and Yu, H.
Journal of the American Medical Informatics Association: JAMIA, 19(5): 800–808. October 2012.
doi link bibtex abstract
doi link bibtex abstract
@article{ramesh_automatic_2012, title = {Automatic discourse connective detection in biomedical text}, volume = {19}, issn = {1527-974X}, doi = {10.1136/amiajnl-2011-000775}, abstract = {OBJECTIVE Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. MATERIALS AND METHODS Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles ({\textasciitilde}112,000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank ({\textasciitilde}1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. RESULTS AND CONCLUSION Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.}, number = {5}, journal = {Journal of the American Medical Informatics Association: JAMIA}, author = {Ramesh, Balaji Polepalli and Prasad, Rashmi and Miller, Tim and Harrington, Brian and Yu, Hong}, month = oct, year = {2012}, pmid = {22744958}, keywords = {Knowledge Bases, NLP, analysis, automated learning, controlled terminologies and vocabularies, discovery, display, image representation, knowledge acquisition and knowledge management, knowledge representations, natural language processing, ontologies, processing, text and data mining methods}, pages = {800--808}, }
OBJECTIVE Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text. MATERIALS AND METHODS Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (~112,000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (~1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores. RESULTS AND CONCLUSION Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.
2011
(12)
AskHERMES: An online question answering system for complex clinical questions.
Cao, Y.; Liu, F.; Simpson, P.; Antieau, L.; Bennett, A.; Cimino, J. J; Ely, J.; and Yu, H.
Journal of Biomedical Informatics, 44(2): 277–288. April 2011.
Paper
doi
link
bibtex
abstract
@article{cao_askhermes_2011, title = {{AskHERMES}: {An} online question answering system for complex clinical questions}, volume = {44}, issn = {1532-0480}, shorttitle = {{AskHERMES}}, url = {http://www.ncbi.nlm.nih.gov/pubmed/21256977}, doi = {10.1016/j.jbi.2011.01.004}, abstract = {{\textless}AbstractText Label="OBJECTIVE" NlmCategory="OBJECTIVE"{\textgreater}Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="DESIGN" NlmCategory="METHODS"{\textgreater}This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="MEASUREMENT" NlmCategory="METHODS"{\textgreater}We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="RESULTS" NlmCategory="RESULTS"{\textgreater}AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS"{\textgreater}AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.{\textless}/AbstractText{\textgreater}}, number = {2}, urldate = {2011-03-25}, journal = {Journal of Biomedical Informatics}, author = {Cao, Yonggang and Liu, Feifan and Simpson, Pippa and Antieau, Lamont and Bennett, Andrew and Cimino, James J and Ely, John and Yu, Hong}, month = apr, year = {2011}, pmid = {21256977 PMCID: PMC3433744}, keywords = {Algorithms, Clinical Medicine, Databases, Factual, Information Storage and Retrieval, Online Systems, Software, expert systems, natural language processing}, pages = {277--288}, }
\textlessAbstractText Label="OBJECTIVE" NlmCategory="OBJECTIVE"\textgreaterClinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.\textless/AbstractText\textgreater \textlessAbstractText Label="DESIGN" NlmCategory="METHODS"\textgreaterThis paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.\textless/AbstractText\textgreater \textlessAbstractText Label="MEASUREMENT" NlmCategory="METHODS"\textgreaterWe compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.\textless/AbstractText\textgreater \textlessAbstractText Label="RESULTS" NlmCategory="RESULTS"\textgreaterAskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.\textless/AbstractText\textgreater \textlessAbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS"\textgreaterAskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.\textless/AbstractText\textgreater
BioN∅T: A searchable database of biomedical negated sentences.
Agarwal, S.; Yu, H.; and Kohane, I.
BMC Bioinformatics, 12(1): 420. 2011.
Paper
doi
link
bibtex
@article{agarwal_biont_2011, title = {{BioN}∅{T}: {A} searchable database of biomedical negated sentences}, volume = {12}, issn = {1471-2105}, shorttitle = {{BioN}∅{T}}, url = {http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-420}, doi = {10.1186/1471-2105-12-420}, language = {en}, number = {1}, urldate = {2016-11-30}, journal = {BMC Bioinformatics}, author = {Agarwal, Shashank and Yu, Hong and Kohane, Issac}, year = {2011}, pmid = {22032181 PMCID: PMC3225379}, pages = {420}, }
Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain.
Liu, F.; Antieau, L. D.; and Yu, H.
Journal of Biomedical Informatics, 44(6): 1032–1038. December 2011.
Paper
doi
link
bibtex
abstract
@article{liu_toward_2011, title = {Toward automated consumer question answering: {Automatically} separating consumer questions from professional questions in the healthcare domain}, volume = {44}, issn = {15320464}, shorttitle = {Toward automated consumer question answering}, url = {http://linkinghub.elsevier.com/retrieve/pii/S1532046411001353}, doi = {10.1016/j.jbi.2011.08.008}, abstract = {OBJECTIVE: Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. DESIGN: We obtained two sets of consumer questions ({\textasciitilde}10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. RESULTS: The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. CONCLUSION: Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.}, language = {en}, number = {6}, urldate = {2016-11-30}, journal = {Journal of Biomedical Informatics}, author = {Liu, Feifan and Antieau, Lamont D. and Yu, Hong}, month = dec, year = {2011}, pmid = {21856442 PMCID: PMC3226885}, keywords = {Artificial Intelligence, Consumer Participation, Databases, Factual, Delivery of Health Care, Humans, Information Dissemination, Information Storage and Retrieval, Internet, Point-of-Care Systems, Semantics, natural language processing}, pages = {1032--1038}, }
OBJECTIVE: Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. DESIGN: We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. RESULTS: The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. CONCLUSION: Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.
Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions.
Agarwal, S.; Liu, F.; and Yu, H.
BMC Bioinformatics, 12(Suppl 8): S10. 2011.
Paper
doi
link
bibtex
abstract
@article{agarwal_simple_2011, title = {Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions}, volume = {12}, issn = {1471-2105}, url = {http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-S8-S10}, doi = {10.1186/1471-2105-12-S8-S10}, abstract = {BACKGROUND: Protein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named "Simple Classifier", and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named "OntoNorm". RESULTS: We evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8\% F1-score for identifying relevant documents, and 52.3\% F1-score for mapping articles to interaction method ontology. CONCLUSION: Our results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles.}, language = {en}, number = {Suppl 8}, urldate = {2016-11-30}, journal = {BMC Bioinformatics}, author = {Agarwal, Shashank and Liu, Feifan and Yu, Hong}, year = {2011}, pmid = {22151701 PMCID: PMC3269933}, pages = {S10}, }
BACKGROUND: Protein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named "Simple Classifier", and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named "OntoNorm". RESULTS: We evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8% F1-score for identifying relevant documents, and 52.3% F1-score for mapping articles to interaction method ontology. CONCLUSION: Our results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles.
Parsing citations in biomedical articles using conditional random fields.
Zhang, Q.; Cao, Y.; and Yu, H.
Computers in Biology and Medicine, 41(4): 190–194. April 2011.
Paper
doi
link
bibtex
abstract
@article{zhang_parsing_2011, title = {Parsing citations in biomedical articles using conditional random fields}, volume = {41}, issn = {00104825}, url = {http://linkinghub.elsevier.com/retrieve/pii/S0010482511000291}, doi = {10.1016/j.compbiomed.2011.02.005}, abstract = {Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95\% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/∼qing/projects/cithit/index.html.}, language = {en}, number = {4}, urldate = {2016-11-30}, journal = {Computers in Biology and Medicine}, author = {Zhang, Qing and Cao, Yong-Gang and Yu, Hong}, month = apr, year = {2011}, pmid = {21419403 PMCID: PMC3086470}, pages = {190--194}, }
Citations are used ubiquitously in biomedical full-text articles and play an important role for representing both the rhetorical structure and the semantic content of the articles. As a result, text mining systems will significantly benefit from a tool that automatically extracts the content of a citation. In this study, we applied the supervised machine-learning algorithms Conditional Random Fields (CRFs) to automatically parse a citation into its fields (e.g., Author, Title, Journal, and Year). With a subset of html format open-access PubMed Central articles, we report an overall 97.95% F1-score. The citation parser can be accessed at: http://www.cs.uwm.edu/∼qing/projects/cithit/index.html.
Figure Text Extraction in Biomedical Literature.
Kim, D.; and Yu, H.
PLoS ONE, 6(1): e15338. January 2011.
Paper
doi
link
bibtex
@article{kim_figure_2011, title = {Figure {Text} {Extraction} in {Biomedical} {Literature}}, volume = {6}, issn = {1932-6203}, url = {http://dx.plos.org/10.1371/journal.pone.0015338}, doi = {10.1371/journal.pone.0015338}, language = {en}, number = {1}, urldate = {2016-11-30}, journal = {PLoS ONE}, author = {Kim, Daehyun and Yu, Hong}, editor = {Uversky, Vladimir N.}, month = jan, year = {2011}, pmid = {21249186 PMCID: PMC3020938}, pages = {e15338}, }
Automatic figure classification in bioscience literature.
Kim, D.; Ramesh, B. P.; and Yu, H.
Journal of Biomedical Informatics, 44(5): 848–858. October 2011.
Paper
doi
link
bibtex
@article{kim_automatic_2011, title = {Automatic figure classification in bioscience literature}, volume = {44}, issn = {15320464}, url = {http://linkinghub.elsevier.com/retrieve/pii/S1532046411000943}, doi = {10.1016/j.jbi.2011.05.003}, language = {en}, number = {5}, urldate = {2016-11-30}, journal = {Journal of Biomedical Informatics}, author = {Kim, Daehyun and Ramesh, Balaji Polepalli and Yu, Hong}, month = oct, year = {2011}, pmid = {21645638 PMCID: PMC3176927}, pages = {848--858}, }
An investigation into the feasibility of spoken clinical question answering.
Miller, T.; Ravvaz, K.; Cimino, J. J.; and Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2011: 954–959. 2011.
Paper
link
bibtex
abstract
@article{miller_investigation_2011, title = {An investigation into the feasibility of spoken clinical question answering}, volume = {2011}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243288/}, abstract = {Spoken question answering for clinical decision support is a potentially revolutionary technology for improving the efficiency and quality of health care delivery. This application involves many technologies currently being researched, including automatic speech recognition (ASR), information retrieval (IR), and summarization, all in the biomedical domain. In certain domains, the problem of spoken document retrieval has been declared solved because of the robustness of IR to ASR errors. This study investigates the extent to which spoken medical question answering benefits from that same robustness. We used the best results from previous speech recognition experiments as inputs to a clinical question answering system, and had physicians perform blind evaluations of results generated both by ASR transcripts of questions and gold standard transcripts of the same questions. Our results suggest that the medical domain differs enough from the open domain to require additional work in automatic speech recognition adapted for the biomedical domain.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Miller, Tim and Ravvaz, Kourosh and Cimino, James J. and Yu, Hong}, year = {2011}, pmid = {22195154}, pmcid = {PMC3243288}, keywords = {Decision Support Systems, Clinical, Feasibility Studies, Humans, Information Storage and Retrieval, Speech Recognition Software, natural language processing}, pages = {954--959}, }
Spoken question answering for clinical decision support is a potentially revolutionary technology for improving the efficiency and quality of health care delivery. This application involves many technologies currently being researched, including automatic speech recognition (ASR), information retrieval (IR), and summarization, all in the biomedical domain. In certain domains, the problem of spoken document retrieval has been declared solved because of the robustness of IR to ASR errors. This study investigates the extent to which spoken medical question answering benefits from that same robustness. We used the best results from previous speech recognition experiments as inputs to a clinical question answering system, and had physicians perform blind evaluations of results generated both by ASR transcripts of questions and gold standard transcripts of the same questions. Our results suggest that the medical domain differs enough from the open domain to require additional work in automatic speech recognition adapted for the biomedical domain.
Apixaban versus warfarin in patients with atrial fibrillation.
Granger, C. B.; Alexander, J. H.; McMurray, J. J. V.; Lopes, R. D.; Hylek, E. M.; Hanna, M.; Al-Khalidi, H. R.; Ansell, J.; Atar, D.; Avezum, A.; Bahit, M. C.; Diaz, R.; Easton, J. D.; Ezekowitz, J. A.; Flaker, G.; Garcia, D.; Geraldes, M.; Gersh, B. J.; Golitsyn, S.; Goto, S.; Hermosillo, A. G.; Hohnloser, S. H.; Horowitz, J.; Mohan, P.; Jansky, P.; Lewis, B. S.; Lopez-Sendon, J. L.; Pais, P.; Parkhomenko, A.; Verheugt, F. W. A.; Zhu, J.; Wallentin, L.; ARISTOTLE Committees; and Investigators
The New England Journal of Medicine, 365(11): 981–992. September 2011.
Paper
doi
link
bibtex
abstract
@article{granger_apixaban_2011, title = {Apixaban versus warfarin in patients with atrial fibrillation}, volume = {365}, issn = {1533-4406}, url = {http://www.nejm.org/doi/full/10.1056/NEJMoa1107039}, doi = {10.1056/NEJMoa1107039}, abstract = {BACKGROUND: Vitamin K antagonists are highly effective in preventing stroke in patients with atrial fibrillation but have several limitations. Apixaban is a novel oral direct factor Xa inhibitor that has been shown to reduce the risk of stroke in a similar population in comparison with aspirin. METHODS: In this randomized, double-blind trial, we compared apixaban (at a dose of 5 mg twice daily) with warfarin (target international normalized ratio, 2.0 to 3.0) in 18,201 patients with atrial fibrillation and at least one additional risk factor for stroke. The primary outcome was ischemic or hemorrhagic stroke or systemic embolism. The trial was designed to test for noninferiority, with key secondary objectives of testing for superiority with respect to the primary outcome and to the rates of major bleeding and death from any cause. RESULTS: The median duration of follow-up was 1.8 years. The rate of the primary outcome was 1.27\% per year in the apixaban group, as compared with 1.60\% per year in the warfarin group (hazard ratio with apixaban, 0.79; 95\% confidence interval [CI], 0.66 to 0.95; P{\textless}0.001 for noninferiority; P=0.01 for superiority). The rate of major bleeding was 2.13\% per year in the apixaban group, as compared with 3.09\% per year in the warfarin group (hazard ratio, 0.69; 95\% CI, 0.60 to 0.80; P{\textless}0.001), and the rates of death from any cause were 3.52\% and 3.94\%, respectively (hazard ratio, 0.89; 95\% CI, 0.80 to 0.99; P=0.047). The rate of hemorrhagic stroke was 0.24\% per year in the apixaban group, as compared with 0.47\% per year in the warfarin group (hazard ratio, 0.51; 95\% CI, 0.35 to 0.75; P{\textless}0.001), and the rate of ischemic or uncertain type of stroke was 0.97\% per year in the apixaban group and 1.05\% per year in the warfarin group (hazard ratio, 0.92; 95\% CI, 0.74 to 1.13; P=0.42). CONCLUSIONS: In patients with atrial fibrillation, apixaban was superior to warfarin in preventing stroke or systemic embolism, caused less bleeding, and resulted in lower mortality. (Funded by Bristol-Myers Squibb and Pfizer; ARISTOTLE ClinicalTrials.gov number, NCT00412984.).}, language = {eng}, number = {11}, journal = {The New England Journal of Medicine}, author = {Granger, Christopher B. and Alexander, John H. and McMurray, John J. V. and Lopes, Renato D. and Hylek, Elaine M. and Hanna, Michael and Al-Khalidi, Hussein R. and Ansell, Jack and Atar, Dan and Avezum, Alvaro and Bahit, M. Cecilia and Diaz, Rafael and Easton, J. Donald and Ezekowitz, Justin A. and Flaker, Greg and Garcia, David and Geraldes, Margarida and Gersh, Bernard J. and Golitsyn, Sergey and Goto, Shinya and Hermosillo, Antonio G. and Hohnloser, Stefan H. and Horowitz, John and Mohan, Puneet and Jansky, Petr and Lewis, Basil S. and Lopez-Sendon, Jose Luis and Pais, Prem and Parkhomenko, Alexander and Verheugt, Freek W. A. and Zhu, Jun and Wallentin, Lars and {ARISTOTLE Committees and Investigators}}, month = sep, year = {2011}, pmid = {21870978}, keywords = {Aged, Anticoagulants, Atrial Fibrillation, Double-Blind Method, Factor Xa Inhibitors, Female, Follow-Up Studies, Hemorrhage, Humans, International Normalized Ratio, Kaplan-Meier Estimate, Male, Middle Aged, Pyrazoles, Pyridones, Stroke, Thromboembolism, Treatment Outcome, Warfarin}, pages = {981--992}, }
BACKGROUND: Vitamin K antagonists are highly effective in preventing stroke in patients with atrial fibrillation but have several limitations. Apixaban is a novel oral direct factor Xa inhibitor that has been shown to reduce the risk of stroke in a similar population in comparison with aspirin. METHODS: In this randomized, double-blind trial, we compared apixaban (at a dose of 5 mg twice daily) with warfarin (target international normalized ratio, 2.0 to 3.0) in 18,201 patients with atrial fibrillation and at least one additional risk factor for stroke. The primary outcome was ischemic or hemorrhagic stroke or systemic embolism. The trial was designed to test for noninferiority, with key secondary objectives of testing for superiority with respect to the primary outcome and to the rates of major bleeding and death from any cause. RESULTS: The median duration of follow-up was 1.8 years. The rate of the primary outcome was 1.27% per year in the apixaban group, as compared with 1.60% per year in the warfarin group (hazard ratio with apixaban, 0.79; 95% confidence interval [CI], 0.66 to 0.95; P\textless0.001 for noninferiority; P=0.01 for superiority). The rate of major bleeding was 2.13% per year in the apixaban group, as compared with 3.09% per year in the warfarin group (hazard ratio, 0.69; 95% CI, 0.60 to 0.80; P\textless0.001), and the rates of death from any cause were 3.52% and 3.94%, respectively (hazard ratio, 0.89; 95% CI, 0.80 to 0.99; P=0.047). The rate of hemorrhagic stroke was 0.24% per year in the apixaban group, as compared with 0.47% per year in the warfarin group (hazard ratio, 0.51; 95% CI, 0.35 to 0.75; P\textless0.001), and the rate of ischemic or uncertain type of stroke was 0.97% per year in the apixaban group and 1.05% per year in the warfarin group (hazard ratio, 0.92; 95% CI, 0.74 to 1.13; P=0.42). CONCLUSIONS: In patients with atrial fibrillation, apixaban was superior to warfarin in preventing stroke or systemic embolism, caused less bleeding, and resulted in lower mortality. (Funded by Bristol-Myers Squibb and Pfizer; ARISTOTLE ClinicalTrials.gov number, NCT00412984.).
Figure summarizer browser extensions for PubMed Central.
Agarwal, S.; and Yu, H.
Bioinformatics, 27(12): 1723–1724. June 2011.
Paper
doi
link
bibtex
@article{agarwal_figure_2011, title = {Figure summarizer browser extensions for {PubMed} {Central}}, volume = {27}, issn = {1367-4803, 1460-2059}, url = {https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btr194}, doi = {10.1093/bioinformatics/btr194}, language = {en}, number = {12}, urldate = {2016-11-30}, journal = {Bioinformatics}, author = {Agarwal, S. and Yu, H.}, month = jun, year = {2011}, pages = {1723--1724}, }
The biomedical discourse relation bank.
Prasad, R.; McRoy, S.; Frid, N.; Joshi, A.; and Yu, H.
BMC Bioinformatics, 12(1): 188. May 2011.
Paper
doi
link
bibtex
abstract
@article{prasad_biomedical_2011, title = {The biomedical discourse relation bank}, volume = {12}, copyright = {2011 Prasad et al; licensee BioMed Central Ltd.}, issn = {1471-2105}, url = {http://www.biomedcentral.com/1471-2105/12/188/abstract}, doi = {10.1186/1471-2105-12-188}, abstract = {Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.}, language = {en}, number = {1}, urldate = {2013-05-23}, journal = {BMC Bioinformatics}, author = {Prasad, Rashmi and McRoy, Susan and Frid, Nadya and Joshi, Aravind and Yu, Hong}, month = may, year = {2011}, pmid = {21605399}, pages = {188}, }
Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.
Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions.
Liu, F.; Tur, G.; Hakkani-Tür, D.; and Yu, H.
Journal of the American Medical Informatics Association: JAMIA, 18(5): 625–630. October 2011.
Paper
doi
link
bibtex
abstract
@article{liu_towards_2011, title = {Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions}, volume = {18}, issn = {1527-974X}, shorttitle = {Towards spoken clinical-question answering}, url = {http://www.ncbi.nlm.nih.gov/pubmed/21705457}, doi = {10.1136/amiajnl-2010-000071}, abstract = {OBJECTIVE To evaluate existing automatic speech-recognition (ASR) systems to measure their performance in interpreting spoken clinical questions and to adapt one ASR system to improve its performance on this task. DESIGN AND MEASUREMENTS The authors evaluated two well-known ASR systems on spoken clinical questions: Nuance Dragon (both generic and medical versions: Nuance Gen and Nuance Med) and the SRI Decipher (the generic version SRI Gen). The authors also explored language model adaptation using more than 4000 clinical questions to improve the SRI system's performance, and profile training to improve the performance of the Nuance Med system. The authors reported the results with the NIST standard word error rate (WER) and further analyzed error patterns at the semantic level. RESULTS Nuance Gen and Med systems resulted in a WER of 68.1\% and 67.4\% respectively. The SRI Gen system performed better, attaining a WER of 41.5\%. After domain adaptation with a language model, the performance of the SRI system improved 36\% to a final WER of 26.7\%. CONCLUSION Without modification, two well-known ASR systems do not perform well in interpreting spoken clinical questions. With a simple domain adaptation, one of the ASR systems improved significantly on the clinical question task, indicating the importance of developing domain/genre-specific ASR systems.}, number = {5}, urldate = {2011-12-13}, journal = {Journal of the American Medical Informatics Association: JAMIA}, author = {Liu, Feifan and Tur, Gokhan and Hakkani-Tür, Dilek and Yu, Hong}, month = oct, year = {2011}, pmid = {21705457}, pages = {625--630}, }
OBJECTIVE To evaluate existing automatic speech-recognition (ASR) systems to measure their performance in interpreting spoken clinical questions and to adapt one ASR system to improve its performance on this task. DESIGN AND MEASUREMENTS The authors evaluated two well-known ASR systems on spoken clinical questions: Nuance Dragon (both generic and medical versions: Nuance Gen and Nuance Med) and the SRI Decipher (the generic version SRI Gen). The authors also explored language model adaptation using more than 4000 clinical questions to improve the SRI system's performance, and profile training to improve the performance of the Nuance Med system. The authors reported the results with the NIST standard word error rate (WER) and further analyzed error patterns at the semantic level. RESULTS Nuance Gen and Med systems resulted in a WER of 68.1% and 67.4% respectively. The SRI Gen system performed better, attaining a WER of 41.5%. After domain adaptation with a language model, the performance of the SRI system improved 36% to a final WER of 26.7%. CONCLUSION Without modification, two well-known ASR systems do not perform well in interpreting spoken clinical questions. With a simple domain adaptation, one of the ASR systems improved significantly on the clinical question task, indicating the importance of developing domain/genre-specific ASR systems.
2010
(7)
Lancet: a high precision medication event extraction system for clinical text.
Li, Z.; Liu, F.; Antieau, L.; Cao, Y.; and Yu, H.
Journal of the American Medical Informatics Association: JAMIA, 17(5): 563–567. October 2010.
Paper
doi
link
bibtex
abstract
@article{li_lancet_2010, title = {Lancet: a high precision medication event extraction system for clinical text}, volume = {17}, issn = {1527-974X}, shorttitle = {Lancet}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20819865}, doi = {10.1136/jamia.2010.004077}, abstract = {OBJECTIVE: This paper presents Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries. DESIGN: Lancet incorporates three supervised machine-learning models: a conditional random fields model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines disambiguation model for identifying the context style (narrative or list). MEASUREMENTS: The authors, from the University of Wisconsin-Milwaukee, participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics provided by the i2b2 challenge, the micro F1 (precision/recall) scores are reported for both the horizontal and vertical level. RESULTS: Among the top 10 teams, Lancet achieved the highest precision at 90.4\% with an overall F1 score of 76.4\% (horizontal system level with exact match), a gain of 11.2\% and 12\%, respectively, compared with the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4\% from 76.4\% to 79.0\%. CONCLUSIONS: Supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score.Lancet based on this learning framework does not rely on expensive manually curated rules. The system is available online at http://code.google.com/p/lancet/.}, number = {5}, urldate = {2010-09-21}, journal = {Journal of the American Medical Informatics Association: JAMIA}, author = {Li, Zuofeng and Liu, Feifan and Antieau, Lamont and Cao, Yonggang and Yu, Hong}, month = oct, year = {2010}, pmid = {20819865 PMCID: PMC2995682}, pages = {563--567}, }
OBJECTIVE: This paper presents Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries. DESIGN: Lancet incorporates three supervised machine-learning models: a conditional random fields model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines disambiguation model for identifying the context style (narrative or list). MEASUREMENTS: The authors, from the University of Wisconsin-Milwaukee, participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics provided by the i2b2 challenge, the micro F1 (precision/recall) scores are reported for both the horizontal and vertical level. RESULTS: Among the top 10 teams, Lancet achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared with the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%. CONCLUSIONS: Supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score.Lancet based on this learning framework does not rely on expensive manually curated rules. The system is available online at http://code.google.com/p/lancet/.
Identifying discourse connectives in biomedical text.
Ramesh, B. P.; and Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2010: 657–661. November 2010.
Paper
link
bibtex
abstract
@article{ramesh_identifying_2010, title = {Identifying discourse connectives in biomedical text}, volume = {2010}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041460/}, abstract = {Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Ramesh, Balaji Polepalli and Yu, Hong}, month = nov, year = {2010}, pmid = {21347060 PMCID: PMC3041460}, keywords = {Algorithms, Artificial Intelligence, Databases, Factual, Humans, Pilot Projects, Supervised Machine Learning, natural language processing}, pages = {657--661}, }
Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.
Biomedical negation scope detection with conditional random fields.
Agarwal, S.; and Yu, H.
Journal of the American Medical Informatics Association: JAMIA, 17(6): 696–701. November 2010.
00033 PMID: 20962133 PMCID: PMC3000754
Paper
doi
link
bibtex
abstract
@article{agarwal_biomedical_2010, title = {Biomedical negation scope detection with conditional random fields}, volume = {17}, issn = {1527-974X}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20962133}, doi = {10.1136/jamia.2010.003228}, abstract = {{\textless}AbstractText Label="OBJECTIVE" NlmCategory="OBJECTIVE"{\textgreater}Negation is a linguistic phenomenon that marks the absence of an entity or event. Negated events are frequently reported in both biological literature and clinical notes. Text mining applications benefit from the detection of negation and its scope. However, due to the complexity of language, identifying the scope of negation in a sentence is not a trivial task.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="DESIGN" NlmCategory="METHODS"{\textgreater}Conditional random fields (CRF), a supervised machine-learning algorithm, were used to train models to detect negation cue phrases and their scope in both biological literature and clinical notes. The models were trained on the publicly available BioScope corpus.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="MEASUREMENT" NlmCategory="METHODS"{\textgreater}The performance of the CRF models was evaluated on identifying the negation cue phrases and their scope by calculating recall, precision and F1-score. The models were compared with four competitive baseline systems.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="RESULTS" NlmCategory="RESULTS"{\textgreater}The best CRF-based model performed statistically better than all baseline systems and NegEx, achieving an F1-score of 98\% and 95\% on detecting negation cue phrases and their scope in clinical notes, and an F1-score of 97\% and 85\% on detecting negation cue phrases and their scope in biological literature.{\textless}/AbstractText{\textgreater} {\textless}AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS"{\textgreater}This approach is robust, as it can identify negation scope in both biological and clinical text. To benefit text mining applications, the system is publicly available as a Java API and as an online application at http://negscope.askhermes.org.{\textless}/AbstractText{\textgreater}}, number = {6}, urldate = {2011-03-25}, journal = {Journal of the American Medical Informatics Association: JAMIA}, author = {Agarwal, Shashank and Yu, Hong}, month = nov, year = {2010}, note = {00033 PMID: 20962133 PMCID: PMC3000754}, keywords = {Humans, natural language processing}, pages = {696--701}, }
\textlessAbstractText Label="OBJECTIVE" NlmCategory="OBJECTIVE"\textgreaterNegation is a linguistic phenomenon that marks the absence of an entity or event. Negated events are frequently reported in both biological literature and clinical notes. Text mining applications benefit from the detection of negation and its scope. However, due to the complexity of language, identifying the scope of negation in a sentence is not a trivial task.\textless/AbstractText\textgreater \textlessAbstractText Label="DESIGN" NlmCategory="METHODS"\textgreaterConditional random fields (CRF), a supervised machine-learning algorithm, were used to train models to detect negation cue phrases and their scope in both biological literature and clinical notes. The models were trained on the publicly available BioScope corpus.\textless/AbstractText\textgreater \textlessAbstractText Label="MEASUREMENT" NlmCategory="METHODS"\textgreaterThe performance of the CRF models was evaluated on identifying the negation cue phrases and their scope by calculating recall, precision and F1-score. The models were compared with four competitive baseline systems.\textless/AbstractText\textgreater \textlessAbstractText Label="RESULTS" NlmCategory="RESULTS"\textgreaterThe best CRF-based model performed statistically better than all baseline systems and NegEx, achieving an F1-score of 98% and 95% on detecting negation cue phrases and their scope in clinical notes, and an F1-score of 97% and 85% on detecting negation cue phrases and their scope in biological literature.\textless/AbstractText\textgreater \textlessAbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS"\textgreaterThis approach is robust, as it can identify negation scope in both biological and clinical text. To benefit text mining applications, the system is publicly available as a Java API and as an online application at http://negscope.askhermes.org.\textless/AbstractText\textgreater
Automatic Figure Ranking and User Interfacing for Intelligent Figure Search.
Yu, H.; Liu, F.; and Ramesh, B. P.
PLoS ONE, 5(10): e12983. October 2010.
Paper
doi
link
bibtex
@article{yu_automatic_2010, title = {Automatic {Figure} {Ranking} and {User} {Interfacing} for {Intelligent} {Figure} {Search}}, volume = {5}, issn = {1932-6203}, url = {http://dx.plos.org/10.1371/journal.pone.0012983}, doi = {10.1371/journal.pone.0012983}, language = {en}, number = {10}, urldate = {2016-11-30}, journal = {PLoS ONE}, author = {Yu, Hong and Liu, Feifan and Ramesh, Balaji Polepalli}, editor = {Wong, Kelvin Kian Loong}, month = oct, year = {2010}, pmid = {20949102 PMCID: PMC2951344}, keywords = {balaji-ramesh, bioinformatics, feifan-liu, hong-yu}, pages = {e12983}, }
An IR-aided machine learning framework for the BioCreative II.5 Challenge.
Cao, Y.; Li, Z.; Liu, F.; Agarwal, S.; Zhang, Q.; and Yu, H.
IEEE/ACM Transactions on Computational Biology and Bioinformatics / IEEE, ACM, 7(3): 454–461. September 2010.
Paper
doi
link
bibtex
abstract
@article{cao_ir-aided_2010, title = {An {IR}-aided machine learning framework for the {BioCreative} {II}.5 {Challenge}}, volume = {7}, issn = {1557-9964}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20671317}, doi = {10.1109/TCBB.2010.56}, abstract = {The team at the University of Wisconsin-Milwaukee developed an information retrieval and machine learning framework. Our framework requires only the standardized training data and depends upon minimal external knowledge resources and minimal parsing. Within the framework, we built our text mining systems and participated for the first time in all three BioCreative II.5 Challenge tasks. The results show that our systems performed among the top five teams for raw F1 scores in all three tasks and came in third place for the homonym ortholog F1 scores for the INT task. The results demonstrated that our IR-based framework is efficient, robust, and potentially scalable.}, number = {3}, urldate = {2010-09-21}, journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics / IEEE, ACM}, author = {Cao, Yonggang and Li, Zuofeng and Liu, Feifan and Agarwal, Shashank and Zhang, Qing and Yu, Hong}, month = sep, year = {2010}, pmid = {20671317}, keywords = {Text mining, bioinformatics (genome or protein) databases, information search and retrieval, systems and software}, pages = {454--461}, }
The team at the University of Wisconsin-Milwaukee developed an information retrieval and machine learning framework. Our framework requires only the standardized training data and depends upon minimal external knowledge resources and minimal parsing. Within the framework, we built our text mining systems and participated for the first time in all three BioCreative II.5 Challenge tasks. The results show that our systems performed among the top five teams for raw F1 scores in all three tasks and came in third place for the homonym ortholog F1 scores for the INT task. The results demonstrated that our IR-based framework is efficient, robust, and potentially scalable.
Automatically extracting information needs from complex clinical questions. Best Paper in International Medical Informatics Association (IMIA) Yearbook 2011.
Cao, Y.; Cimino, J. J; Ely, J.; and Yu, H.
Journal of Biomedical Informatics. July 2010.
Paper
doi
link
bibtex
abstract
@article{cao_automatically_2010, title = {Automatically extracting information needs from complex clinical questions. {Best} {Paper} in {International} {Medical} {Informatics} {Association} ({IMIA}) {Yearbook} 2011}, issn = {1532-0480}, url = {http://www.ncbi.nlm.nih.gov/pubmed/20670693}, doi = {10.1016/j.jbi.2010.07.007}, abstract = {OBJECTIVE: Clinicians pose complex clinical questions when seeing patients, and identifying the answers to those questions in a timely manner helps improve the quality of patient care. We report here on two natural language processing models, namely, automatic topic assignment and keyword identification, that together automatically and effectively extract information needs from ad hoc clinical questions. Our study is motivated in the context of developing the larger clinical question answering system AskHERMES (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS). DESIGN AND MEASUREMENTS: We developed supervised machine-learning systems to automatically assign predefined general categories (e.g. etiology, procedure, and diagnosis) to a question. We also explored both supervised and unsupervised systems to automatically identify keywords that capture the main content of the question. RESULTS: We evaluated our systems on 4654 annotated clinical questions that were collected in practice. We achieved an F1 score of 76.0\% for the task of general topic classification and 58.0\% for keyword extraction. Our systems have been implemented into the larger question answering system AskHERMES. Our error analyses suggested that inconsistent annotation in our training data have hurt both question analysis tasks. CONCLUSION: Our systems, available at http://www.askhermes.org, can automatically extract information needs from both short (the number of word tokens {\textless}20) and long questions (the number of word tokens {\textgreater}20), and from both well-structured and ill-formed questions. We speculate that the performance of general topic classification and keyword extraction can be further improved if consistently annotated data are made available.}, urldate = {2010-09-21}, journal = {Journal of Biomedical Informatics}, author = {Cao, Yong-Gang and Cimino, James J and Ely, John and Yu, Hong}, month = jul, year = {2010}, pmid = {20670693}, keywords = {Keyword extraction, Question analysis, Question answering, natural language processing}, }
OBJECTIVE: Clinicians pose complex clinical questions when seeing patients, and identifying the answers to those questions in a timely manner helps improve the quality of patient care. We report here on two natural language processing models, namely, automatic topic assignment and keyword identification, that together automatically and effectively extract information needs from ad hoc clinical questions. Our study is motivated in the context of developing the larger clinical question answering system AskHERMES (Help clinicians to Extract and aRrticulate Multimedia information for answering clinical quEstionS). DESIGN AND MEASUREMENTS: We developed supervised machine-learning systems to automatically assign predefined general categories (e.g. etiology, procedure, and diagnosis) to a question. We also explored both supervised and unsupervised systems to automatically identify keywords that capture the main content of the question. RESULTS: We evaluated our systems on 4654 annotated clinical questions that were collected in practice. We achieved an F1 score of 76.0% for the task of general topic classification and 58.0% for keyword extraction. Our systems have been implemented into the larger question answering system AskHERMES. Our error analyses suggested that inconsistent annotation in our training data have hurt both question analysis tasks. CONCLUSION: Our systems, available at http://www.askhermes.org, can automatically extract information needs from both short (the number of word tokens \textless20) and long questions (the number of word tokens \textgreater20), and from both well-structured and ill-formed questions. We speculate that the performance of general topic classification and keyword extraction can be further improved if consistently annotated data are made available.
Detecting hedge cues and their scope in biomedical text with conditional random fields.
Agarwal, S.; and Yu, H.
Journal of Biomedical Informatics, 43(6): 953–961. December 2010.
doi link bibtex abstract
doi link bibtex abstract
@article{agarwal_detecting_2010, title = {Detecting hedge cues and their scope in biomedical text with conditional random fields}, volume = {43}, issn = {1532-0480}, doi = {10.1016/j.jbi.2010.08.003}, abstract = {OBJECTIVE: Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are incorrectly identified as factual events. However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature. METHODOLOGY: We used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature. The models were trained on the publicly available BioScope corpus. We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall, precision and F1-score. We compared our models with three competitive baseline systems. RESULTS: Our best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88\% and 86\% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93\% and 90\% in detecting hedge cue phrases and their scope in clinical notes. CONCLUSIONS: Our approach is robust, as it can identify hedge cues and their scope in both biological and clinical text. To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http://hedgescope.askhermes.org. To our knowledge, this is the first publicly available system to detect hedge cues and their scope in biomedical literature.}, language = {eng}, number = {6}, journal = {Journal of Biomedical Informatics}, author = {Agarwal, Shashank and Yu, Hong}, month = dec, year = {2010}, pmid = {20709188}, pmcid = {PMC2991497}, keywords = {Algorithms, Artificial Intelligence, Data Mining, Natural Language Processing, Pattern Recognition, Automated, Vocabulary, Controlled}, pages = {953--961}, }
OBJECTIVE: Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are incorrectly identified as factual events. However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature. METHODOLOGY: We used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature. The models were trained on the publicly available BioScope corpus. We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall, precision and F1-score. We compared our models with three competitive baseline systems. RESULTS: Our best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88% and 86% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93% and 90% in detecting hedge cue phrases and their scope in clinical notes. CONCLUSIONS: Our approach is robust, as it can identify hedge cues and their scope in both biological and clinical text. To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http://hedgescope.askhermes.org. To our knowledge, this is the first publicly available system to detect hedge cues and their scope in biomedical literature.
2009
(8)
Using the Weighted Keyword Model to Improve Information Retrieval for Answering Biomedical Questions.
Yu, H.; and Cao, Y.
Summit on translational bioinformatics, 2009: 143. 2009.
link bibtex abstract
link bibtex abstract
@article{yu_using_2009, title = {Using the {Weighted} {Keyword} {Model} to {Improve} {Information} {Retrieval} for {Answering} {Biomedical} {Questions}}, volume = {2009}, abstract = {Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data.}, journal = {Summit on translational bioinformatics}, author = {Yu, Hong and Cao, Yong-Gang}, year = {2009}, pmid = {21347188 PMCID: PMC3041568}, pages = {143}, }
Physicians ask many complex questions during the patient encounter. Information retrieval systems that can provide immediate and relevant answers to these questions can be invaluable aids to the practice of evidence-based medicine. In this study, we first automatically identify topic keywords from ad hoc clinical questions with a Condition Random Field model that is trained over thousands of manually annotated clinical questions. We then report on a linear model that assigns query weights based on their automatically identified semantic roles: topic keywords, domain specific terms, and their synonyms. Our evaluation shows that this weighted keyword model improves information retrieval from the Text Retrieval Conference Genomics track data.
Investigating and annotating the role of citation in biomedical full-text articles.
Yu, H.; Agarwal, S.; and Frid, N.
In Bioinformatics and Biomedicine Workshop, pages 308–313, November 2009. IEEE
Paper
doi
link
bibtex
abstract
@inproceedings{yu_investigating_2009, title = {Investigating and annotating the role of citation in biomedical full-text articles}, isbn = {978-1-4244-5121-0}, url = {http://ieeexplore.ieee.org/document/5332080/}, doi = {10.1109/BIBMW.2009.5332080}, abstract = {Citations are ubiquitous in scientific articles and play important roles for representing the semantic content of a full-text biomedical article. In this work, we manually examined full-text biomedical articles to analyze the semantic content of citations in full-text biomedical articles. After developing a citation relation schema and annotation guideline, our pilot annotation results show an overall agreement of 0.71, and here we report on the research challenges and the lessons we've learned while trying to overcome them. Our work is a first step toward automatic citation classification in full-text biomedical articles, which may contribute to many text mining tasks, including information retrieval, extraction, summarization, and question answering.}, urldate = {2016-11-30}, booktitle = {Bioinformatics and {Biomedicine} {Workshop}}, publisher = {IEEE}, author = {Yu, Hong and Agarwal, Shashank and Frid, Nadya}, month = nov, year = {2009}, pmid = {21170175 PMCID: PMC3003334}, pages = {308--313}, }
Citations are ubiquitous in scientific articles and play important roles for representing the semantic content of a full-text biomedical article. In this work, we manually examined full-text biomedical articles to analyze the semantic content of citations in full-text biomedical articles. After developing a citation relation schema and annotation guideline, our pilot annotation results show an overall agreement of 0.71, and here we report on the research challenges and the lessons we've learned while trying to overcome them. Our work is a first step toward automatic citation classification in full-text biomedical articles, which may contribute to many text mining tasks, including information retrieval, extraction, summarization, and question answering.
Evaluating the weighted-keyword model to improve clinical question answering.
Cao, Y.; Ely, J.; and Yu, H.
In Bioinformatics and Biomedicine Workshop, pages 331–335, November 2009. IEEE
INSPEC Accession Number: 10975550
Paper
doi
link
bibtex
abstract
@inproceedings{cao_evaluating_2009, title = {Evaluating the weighted-keyword model to improve clinical question answering}, isbn = {978-1-4244-5121-0}, url = {http://ieeexplore.ieee.org/document/5332084/}, doi = {10.1109/BIBMW.2009.5332084}, abstract = {Physicians ask many complex questions during their encounters with patients. Question answering systems provide immediate and direct answers to ad hoc clinical questions, and because these systems might aid in the practice of evidence-based medicine, we are developing the clinical question answering system, AskHERMES, to generate answers to such questions. In this study, we report the evaluation of a new weighted-keyword model for improving our question answering system. As part of this development, a physician manually examined AskHERMES' answers to 20 ad hoc clinical questions created with and without the weighted-keyword model. The results show that the weighted-keyword model improves quality in question answering. AskHERMES can be accessed at http://www.AskHERMES.org.}, urldate = {2016-11-30}, booktitle = {Bioinformatics and {Biomedicine} {Workshop}}, publisher = {IEEE}, author = {Cao, Yong-Gang and Ely, John and Yu, Hong}, month = nov, year = {2009}, note = {INSPEC Accession Number: 10975550}, pages = {331--335}, }
Physicians ask many complex questions during their encounters with patients. Question answering systems provide immediate and direct answers to ad hoc clinical questions, and because these systems might aid in the practice of evidence-based medicine, we are developing the clinical question answering system, AskHERMES, to generate answers to such questions. In this study, we report the evaluation of a new weighted-keyword model for improving our question answering system. As part of this development, a physician manually examined AskHERMES' answers to 20 ad hoc clinical questions created with and without the weighted-keyword model. The results show that the weighted-keyword model improves quality in question answering. AskHERMES can be accessed at http://www.AskHERMES.org.
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.
Agarwal, S.; and Yu, H.
Bioinformatics, 25(23): 3174–3180. December 2009.
Paper
doi
link
bibtex
@article{agarwal_automatically_2009, title = {Automatically classifying sentences in full-text biomedical articles into {Introduction}, {Methods}, {Results} and {Discussion}}, volume = {25}, issn = {1367-4803, 1460-2059}, url = {https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btp548}, doi = {10.1093/bioinformatics/btp548}, language = {en}, number = {23}, urldate = {2016-11-30}, journal = {Bioinformatics}, author = {Agarwal, S. and Yu, H.}, month = dec, year = {2009}, pmid = {21347163}, pmcid = {PMC3041564}, pages = {3174--3180}, }
Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension.
Yu, H.; Agarwal, S.; Johnston, M.; and Cohen, A.
Journal of Biomedical Discovery and Collaboration, 4(1): 1. 2009.
Paper
doi
link
bibtex
abstract
@article{yu_are_2009, title = {Are figure legends sufficient? {Evaluating} the contribution of associated text to biomedical figure comprehension}, volume = {4}, issn = {1747-5333}, shorttitle = {Are figure legends sufficient?}, url = {http://www.j-biomed-discovery.com/content/4/1/1}, doi = {10.1186/1747-5333-4-1}, abstract = {BACKGROUND:Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension.METHODS:Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score.RESULTS:Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39-68\% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30\% of the information. When the full-text article is available, figure comprehension increased to 86-97\%; this indicates that researchers felt that only 3-14\% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles.CONCLUSION:We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend.}, number = {1}, urldate = {2009-03-03}, journal = {Journal of Biomedical Discovery and Collaboration}, author = {Yu, Hong and Agarwal, Shashank and Johnston, Mark and Cohen, Aaron}, year = {2009}, pmid = {19126221 PMCID: PMC2631451}, pages = {1}, }
BACKGROUND:Biomedical scientists need to access figures to validate research facts and to formulate or to test novel research hypotheses. However, figures are difficult to comprehend without associated text (e.g., figure legend and other reference text). We are developing automated systems to extract the relevant explanatory information along with figures extracted from full text articles. Such systems could be very useful in improving figure retrieval and in reducing the workload of biomedical scientists, who otherwise have to retrieve and read the entire full-text journal article to determine which figures are relevant to their research. As a crucial step, we studied the importance of associated text in biomedical figure comprehension.METHODS:Twenty subjects evaluated three figure-text combinations: figure+legend, figure+legend+title+abstract, and figure+full-text. Using a Likert scale, each subject scored each figure+text according to the extent to which the subject thought he/she understood the meaning of the figure and the confidence in providing the assigned score. Additionally, each subject entered a free text summary for each figure-text. We identified missing information using indicator words present within the text summaries. Both the Likert scores and the missing information were statistically analyzed for differences among the figure-text types. We also evaluated the quality of text summaries with the text-summarization evaluation method the ROUGE score.RESULTS:Our results showed statistically significant differences in figure comprehension when varying levels of text were provided. When the full-text article is not available, presenting just the figure+legend left biomedical researchers lacking 39-68% of the information about a figure as compared to having complete figure comprehension; adding the title and abstract improved the situation, but still left biomedical researchers missing 30% of the information. When the full-text article is available, figure comprehension increased to 86-97%; this indicates that researchers felt that only 3-14% of the necessary information for full figure comprehension was missing when full text was available to them. Clearly there is information in the abstract and in the full text that biomedical scientists deem important for understanding the figures that appear in full-text biomedical articles.CONCLUSION:We conclude that the texts that appear in full-text biomedical articles are useful for understanding the meaning of a figure, and an effective figure-mining system needs to unlock the information beyond figure legend. Our work provides important guidance to the figure mining systems that extract information only from figure and figure legend.
Evaluation of the clinical question answering presentation.
Cao, Y.; Ely, J.; Antieau, L.; and Yu, H.
In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 171, 2009. Association for Computational Linguistics
Paper
doi
link
bibtex
@inproceedings{cao_evaluation_2009, title = {Evaluation of the clinical question answering presentation}, isbn = {978-1-932432-30-5}, url = {http://portal.acm.org/citation.cfm?doid=1572364.1572388}, doi = {10.3115/1572364.1572388}, language = {en}, urldate = {2016-11-30}, booktitle = {Proceedings of the {Workshop} on {Current} {Trends} in {Biomedical} {Natural} {Language} {Processing}}, publisher = {Association for Computational Linguistics}, author = {Cao, Yong-Gang and Ely, John and Antieau, Lamont and Yu, Hong}, year = {2009}, pages = {171}, }
FigSum: automatically generating structured text summaries for figures in biomedical literature.
Agarwal, S.; and Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2009: 6–10. November 2009.
Paper
link
bibtex
abstract
@article{agarwal_figsum:_2009, title = {{FigSum}: automatically generating structured text summaries for figures in biomedical literature}, volume = {2009}, issn = {1942-597X}, shorttitle = {{FigSum}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815407/}, abstract = {Figures are frequently used in biomedical articles to support research findings; however, they are often difficult to comprehend based on their legends alone and information from the full-text articles is required to fully understand them. Previously, we found that the information associated with a single figure is distributed throughout the full-text article the figure appears in. Here, we develop and evaluate a figure summarization system - FigSum, which aggregates this scattered information to improve figure comprehension. For each figure in an article, FigSum generates a structured text summary comprising one sentence from each of the four rhetorical categories - Introduction, Methods, Results and Discussion (IMRaD). The IMRaD category of sentences is predicted by an automated machine learning classifier. Our evaluation shows that FigSum captures 53\% of the sentences in the gold standard summaries annotated by biomedical scientists and achieves an average ROUGE-1 score of 0.70, which is higher than a baseline system.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Agarwal, Shashank and Yu, Hong}, month = nov, year = {2009}, pmid = {20351812}, pmcid = {PMC2815407}, keywords = {Algorithms, Artificial Intelligence, Medical Illustration, Periodicals as Topic}, pages = {6--10}, }
Figures are frequently used in biomedical articles to support research findings; however, they are often difficult to comprehend based on their legends alone and information from the full-text articles is required to fully understand them. Previously, we found that the information associated with a single figure is distributed throughout the full-text article the figure appears in. Here, we develop and evaluate a figure summarization system - FigSum, which aggregates this scattered information to improve figure comprehension. For each figure in an article, FigSum generates a structured text summary comprising one sentence from each of the four rhetorical categories - Introduction, Methods, Results and Discussion (IMRaD). The IMRaD category of sentences is predicted by an automated machine learning classifier. Our evaluation shows that FigSum captures 53% of the sentences in the gold standard summaries annotated by biomedical scientists and achieves an average ROUGE-1 score of 0.70, which is higher than a baseline system.
Hierarchical image classification in the bioscience literature.
Kim, D.; and Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2009: 327–331. November 2009.
Paper
link
bibtex
abstract
@article{kim_hierarchical_2009, title = {Hierarchical image classification in the bioscience literature}, volume = {2009}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815366/}, abstract = {Our previous work has shown that images appearing in bioscience articles can be classified into five types: Gel-Image, Image-of-Thing, Graph, Model, and Mix. For this paper, we explored and analyzed features strongly associated with each image type and developed a hierarchical image classification approach for classifying an image into one of the five types. First, we applied texture features to separate images into two groups: 1) a texture group comprising Gel Image, Image-of-Thing, and Mix, and 2) a non-texture group comprising Graph and Model. We then applied entropy, skewness, and uniformity for the first group, and edge difference, uniformity, and smoothness for the second group to classify images into specific types. Our results show that hierarchical image classification accurately divided images into the two groups during the initial classification and that the overall accuracy of the image classification was higher than that of our previous approach. In particular, the recall of hierarchical image classification was greatly improved due to the high accuracy of the initial classification.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Kim, Daehyun and Yu, Hong}, month = nov, year = {2009}, pmid = {20351874}, pmcid = {PMC2815366}, keywords = {Classification, Information Storage and Retrieval, Medical Illustration, Pattern Recognition, Automated}, pages = {327--331}, }
Our previous work has shown that images appearing in bioscience articles can be classified into five types: Gel-Image, Image-of-Thing, Graph, Model, and Mix. For this paper, we explored and analyzed features strongly associated with each image type and developed a hierarchical image classification approach for classifying an image into one of the five types. First, we applied texture features to separate images into two groups: 1) a texture group comprising Gel Image, Image-of-Thing, and Mix, and 2) a non-texture group comprising Graph and Model. We then applied entropy, skewness, and uniformity for the first group, and edge difference, uniformity, and smoothness for the second group to classify images into specific types. Our results show that hierarchical image classification accurately divided images into the two groups during the initial classification and that the overall accuracy of the image classification was higher than that of our previous approach. In particular, the recall of hierarchical image classification was greatly improved due to the high accuracy of the initial classification.
2008
(3)
Translating biology: text mining tools that work.
Cohen, K B.; Yu, H.; Bourne, P. E; and Hirschman, L.
In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, volume 13, pages 551, 2008.
NIHMSID: NIHMS92147
Paper
link
bibtex
@inproceedings{cohen_translating_2008, title = {Translating biology: text mining tools that work}, volume = {13}, url = {http://psb.stanford.edu/psb-online/proceedings/psb08/textmining.pdf}, booktitle = {Pacific {Symposium} on {Biocomputing}. {Pacific} {Symposium} on {Biocomputing}}, author = {Cohen, K Bretonnel and Yu, Hong and Bourne, Philip E and Hirschman, Lynette}, year = {2008}, pmcid = {PMC2934913}, pmid = {20827444}, note = {NIHMSID: NIHMS92147}, pages = {551}, }
A pilot annotation to investigate discourse connectivity in biomedical text.
Yu, H.; Frid, N.; McRoy, S.; Prasad, R.; Lee, A.; and Joshi, A.
In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 92–93, 2008. Association for Computational Linguistics
Paper
link
bibtex
@inproceedings{yu_pilot_2008, title = {A pilot annotation to investigate discourse connectivity in biomedical text}, url = {https://www.aclweb.org/anthology/W/W08/W08-0614.pdf}, booktitle = {Proceedings of the {Workshop} on {Current} {Trends} in {Biomedical} {Natural} {Language} {Processing}}, publisher = {Association for Computational Linguistics}, author = {Yu, Hong and Frid, Nadya and McRoy, Susan and Prasad, Rashmi and Lee, Alan and Joshi, Aravind}, year = {2008}, pages = {92--93}, }
Automatically extracting information needs from Ad Hoc clinical questions.
Yu, H.; and Cao, Y.
AMIA ... Annual Symposium proceedings. AMIA Symposium,96–100. November 2008.
Paper
link
bibtex
abstract
@article{yu_automatically_2008, title = {Automatically extracting information needs from {Ad} {Hoc} clinical questions}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655957/}, abstract = {Automatically extracting information needs from ad hoc clinical questions is an important step towards medical question answering. In this work, we first explored supervised machine-learning approaches to automatically classify an ad hoc clinical question into general topics. We then evaluated different methods for automatically extracting keywords from an ad hoc clinical question. Our methods were evaluated on the 4,654 clinical questions maintained by the National Library of Medicine. Our best systems or methods showed F-score of 76\% for the task of question-topic classification and an average F-score of 56\% for extracting keywords from ad hoc clinical questions.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Yu, Hong and Cao, Yong-Gang}, month = nov, year = {2008}, pmid = {18999100}, pmcid = {PMC2655957}, keywords = {Algorithms, Artificial Intelligence, Communication, Decision Support Systems, Clinical, Information Dissemination, Internet, Pattern Recognition, Automated, Point-of-Care Systems, Remote Consultation, User-Computer Interface, natural language processing}, pages = {96--100}, }
Automatically extracting information needs from ad hoc clinical questions is an important step towards medical question answering. In this work, we first explored supervised machine-learning approaches to automatically classify an ad hoc clinical question into general topics. We then evaluated different methods for automatically extracting keywords from an ad hoc clinical question. Our methods were evaluated on the 4,654 clinical questions maintained by the National Library of Medicine. Our best systems or methods showed F-score of 76% for the task of question-topic classification and an average F-score of 56% for extracting keywords from ad hoc clinical questions.
2007
(5)
Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians.
Yu, H.; Lee, M.; Kaufman, D.; Ely, J.; Osheroff, J. A.; Hripcsak, G.; and Cimino, J.
Journal of Biomedical Informatics, 40(3): 236–251. June 2007.
Paper
doi
link
bibtex
@article{yu_development_2007, title = {Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians}, volume = {40}, issn = {15320464}, url = {http://linkinghub.elsevier.com/retrieve/pii/S1532046407000202}, doi = {10.1016/j.jbi.2007.03.002}, language = {en}, number = {3}, urldate = {2016-11-30}, journal = {Journal of Biomedical Informatics}, author = {Yu, Hong and Lee, Minsuk and Kaufman, David and Ely, John and Osheroff, Jerome A. and Hripcsak, George and Cimino, James}, month = jun, year = {2007}, pmid = {17462961}, keywords = {Algorithms Attitude of Health Personnel Attitude to Computers *Cognition Databases, Bibliographic Databases, Factual *Decision Support Techniques Humans Information Storage and Retrieval Information Systems Internet Logical Observation Identifiers Names and Codes Online Systems *Physicians PubMed Research Design Software}, pages = {236--251}, }
Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles.
Yu, H.; Kim, W.; Hatzivassiloglou, V.; and Wilbur, W. J.
Journal of Biomedical Informatics, 40(2): 150–159. April 2007.
Paper
doi
link
bibtex
abstract
@article{yu_using_2007, title = {Using {MEDLINE} as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles}, volume = {40}, issn = {15320464}, url = {http://linkinghub.elsevier.com/retrieve/pii/S1532046406000621}, doi = {10.1016/j.jbi.2006.06.001}, abstract = {Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many of them represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of those terms. On the other hand, many abbreviations and acronyms are ambiguous, it would be important to map them to their full forms, which ultimately represent the meanings of the abbreviations. In this study, we present a semi-supervised method that applies MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. We first automatically generated from the MEDLINE abstracts a dictionary of abbreviation-full pairs based on a rule-based system that maps abbreviations to full forms when full forms are defined in the abstracts. We then trained on the MEDLINE abstracts and predicted the full forms of abbreviations in full-text journal articles by applying supervised machine-learning algorithms in a semi-supervised fashion. We report up to 92\% prediction precision and up to 91\% coverage.}, language = {en}, number = {2}, urldate = {2016-11-30}, journal = {Journal of Biomedical Informatics}, author = {Yu, Hong and Kim, Won and Hatzivassiloglou, Vasileios and Wilbur, W. John}, month = apr, year = {2007}, pmid = {16843731}, keywords = {*Artificial Intelligence Database Management Systems Information Storage and Retrieval/*methods *Medline *Natural Language Processing Pattern Recognition, Automated/*methods *Periodicals *Terminology}, pages = {150--159}, }
Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many of them represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of those terms. On the other hand, many abbreviations and acronyms are ambiguous, it would be important to map them to their full forms, which ultimately represent the meanings of the abbreviations. In this study, we present a semi-supervised method that applies MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. We first automatically generated from the MEDLINE abstracts a dictionary of abbreviation-full pairs based on a rule-based system that maps abbreviations to full forms when full forms are defined in the abstracts. We then trained on the MEDLINE abstracts and predicted the full forms of abbreviations in full-text journal articles by applying supervised machine-learning algorithms in a semi-supervised fashion. We report up to 92% prediction precision and up to 91% coverage.
The efficacy and safety of apixaban, an oral, direct factor Xa inhibitor, as thromboprophylaxis in patients following total knee replacement.
Lassen, M. R.; Davidson, B. L.; Gallus, A.; Pineo, G.; Ansell, J.; and Deitchman, D.
Journal of Thrombosis and Haemostasis, 5(12): 2368–2375. December 2007.
Paper
doi
link
bibtex
abstract
@article{lassen_efficacy_2007, title = {The efficacy and safety of apixaban, an oral, direct factor {Xa} inhibitor, as thromboprophylaxis in patients following total knee replacement}, volume = {5}, issn = {15387933, 15387836}, url = {http://doi.wiley.com/10.1111/j.1538-7836.2007.02764.x}, doi = {10.1111/j.1538-7836.2007.02764.x}, abstract = {BACKGROUND: Heparins and warfarin are currently used as venous thromboembolism (VTE) prophylaxis in surgery. Inhibition of factor (F) Xa provides a specific mechanism of anticoagulation and the potential for an improved benefit-risk profile. OBJECTIVES: To evaluate the safety and efficacy of apixaban, a potent, direct, oral inhibitor of FXa, in patients following total knee replacement (TKR), and to investigate dose-response relationships. PATIENTS/METHODS: A total of 1238 patients were randomized to one of six double-blind apixaban doses [5, 10 or 20 mg day(-1) administered as a single (q.d.) or a twice-daily divided dose (b.i.d.)], enoxaparin (30 mg b.i.d.) or open-label warfarin (titrated to an International Normalized Ratio of 1.8-3.0). Treatment lasted 10-14 days, commencing 12-24 h after surgery with apixaban or enoxaparin, and on the evening of surgery with warfarin. The primary efficacy outcome was a composite of VTE (mandatory venography) and all-cause mortality during treatment. The primary safety outcome was major bleeding. RESULTS: A total of 1217 patients were eligible for safety and 856 patients for efficacy analysis. All apixaban groups had lower primary efficacy event rates than either comparator. The primary outcome rate decreased with increasing apixaban dose (P = 0.09 with q.d./b.i.d. regimens combined, P = 0.19 for q.d. and P = 0.13 for b.i.d. dosing).A significant dose-related increase in the incidence of total adjudicated bleeding events was noted in the q.d. (P = 0.01) and b.i.d. (P = 0.02) apixaban groups; there was no difference between q.d. and b.i.d. regimens. CONCLUSIONS: Apixaban in doses of 2.5 mg b.i.d. or 5 mg q.d. has a promising benefit-risk profile compared with the current standards of care following TKR.}, language = {en}, number = {12}, urldate = {2016-11-30}, journal = {Journal of Thrombosis and Haemostasis}, author = {Lassen, M. R. and Davidson, B. L. and Gallus, A. and Pineo, G. and Ansell, J. and Deitchman, D.}, month = dec, year = {2007}, pmid = {17868430}, pages = {2368--2375}, }
BACKGROUND: Heparins and warfarin are currently used as venous thromboembolism (VTE) prophylaxis in surgery. Inhibition of factor (F) Xa provides a specific mechanism of anticoagulation and the potential for an improved benefit-risk profile. OBJECTIVES: To evaluate the safety and efficacy of apixaban, a potent, direct, oral inhibitor of FXa, in patients following total knee replacement (TKR), and to investigate dose-response relationships. PATIENTS/METHODS: A total of 1238 patients were randomized to one of six double-blind apixaban doses [5, 10 or 20 mg day(-1) administered as a single (q.d.) or a twice-daily divided dose (b.i.d.)], enoxaparin (30 mg b.i.d.) or open-label warfarin (titrated to an International Normalized Ratio of 1.8-3.0). Treatment lasted 10-14 days, commencing 12-24 h after surgery with apixaban or enoxaparin, and on the evening of surgery with warfarin. The primary efficacy outcome was a composite of VTE (mandatory venography) and all-cause mortality during treatment. The primary safety outcome was major bleeding. RESULTS: A total of 1217 patients were eligible for safety and 856 patients for efficacy analysis. All apixaban groups had lower primary efficacy event rates than either comparator. The primary outcome rate decreased with increasing apixaban dose (P = 0.09 with q.d./b.i.d. regimens combined, P = 0.19 for q.d. and P = 0.13 for b.i.d. dosing).A significant dose-related increase in the incidence of total adjudicated bleeding events was noted in the q.d. (P = 0.01) and b.i.d. (P = 0.02) apixaban groups; there was no difference between q.d. and b.i.d. regimens. CONCLUSIONS: Apixaban in doses of 2.5 mg b.i.d. or 5 mg q.d. has a promising benefit-risk profile compared with the current standards of care following TKR.
A cognitive evaluation of four online search engines for answering definitional questions posed by physicians.
Yu, H.; and Kaufman, D.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing,328–339. 2007.
Paper
link
bibtex
abstract
@article{yu_cognitive_2007, title = {A cognitive evaluation of four online search engines for answering definitional questions posed by physicians}, issn = {2335-6936}, url = {http://psb.stanford.edu/psb-online/proceedings/psb07/yu.pdf}, abstract = {The Internet is having a profound impact on physicians' medical decision making. One recent survey of 277 physicians showed that 72\% of physicians regularly used the Internet to research medical information and 51\% admitted that information from web sites influenced their clinical decisions. This paper describes the first cognitive evaluation of four state-of-the-art Internet search engines: Google (i.e., Google and Scholar.Google), MedQA, Onelook, and PubMed for answering definitional questions (i.e., questions with the format of "What is X?") posed by physicians. Onelook is a portal for online definitions, and MedQA is a question answering system that automatically generates short texts to answer specific biomedical questions. Our evaluation criteria include quality of answer, ease of use, time spent, and number of actions taken. Our results show that MedQA outperforms Onelook and PubMed in most of the criteria, and that MedQA surpasses Google in time spent and number of actions, two important efficiency criteria. Our results show that Google is the best system for quality of answer and ease of use. We conclude that Google is an effective search engine for medical definitions, and that MedQA exceeds the other search engines in that it provides users direct answers to their questions; while the users of the other search engines have to visit several sites before finding all of the pertinent information.}, language = {ENG}, journal = {Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing}, author = {Yu, Hong and Kaufman, David}, year = {2007}, pmid = {17990503}, keywords = {Adult, Computational Biology, Female, Humans, Internet, Male, Middle Aged, Physicians, PubMed}, pages = {328--339}, }
The Internet is having a profound impact on physicians' medical decision making. One recent survey of 277 physicians showed that 72% of physicians regularly used the Internet to research medical information and 51% admitted that information from web sites influenced their clinical decisions. This paper describes the first cognitive evaluation of four state-of-the-art Internet search engines: Google (i.e., Google and Scholar.Google), MedQA, Onelook, and PubMed for answering definitional questions (i.e., questions with the format of "What is X?") posed by physicians. Onelook is a portal for online definitions, and MedQA is a question answering system that automatically generates short texts to answer specific biomedical questions. Our evaluation criteria include quality of answer, ease of use, time spent, and number of actions taken. Our results show that MedQA outperforms Onelook and PubMed in most of the criteria, and that MedQA surpasses Google in time spent and number of actions, two important efficiency criteria. Our results show that Google is the best system for quality of answer and ease of use. We conclude that Google is an effective search engine for medical definitions, and that MedQA exceeds the other search engines in that it provides users direct answers to their questions; while the users of the other search engines have to visit several sites before finding all of the pertinent information.
Frontiers of biomedical text mining: current progress.
Zweigenbaum, P.; Demner-Fushman, D.; Yu, H.; and Cohen, K. B
Briefings in Bioinformatics, 8(5): 358–75. September 2007.
Paper
doi
link
bibtex
abstract
@article{zweigenbaum_frontiers_2007, title = {Frontiers of biomedical text mining: current progress}, volume = {8}, issn = {1477-4054}, shorttitle = {Frontiers of biomedical text mining}, url = {http://www.ncbi.nlm.nih.gov/pubmed/17977867}, doi = {10.1093/bib/bbm045}, abstract = {It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or 'BioNLP' in general, focusing primarily on papers published within the past year.}, number = {5}, urldate = {2009-03-03}, journal = {Briefings in Bioinformatics}, author = {Zweigenbaum, Pierre and Demner-Fushman, Dina and Yu, Hong and Cohen, Kevin B}, month = sep, year = {2007}, pmid = {17977867}, keywords = {Abstracting and Indexing as Topic, Biology, Databases, Bibliographic, Forecasting, Periodicals as Topic, Vocabulary, Controlled}, pages = {358--75}, }
It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation-handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or 'BioNLP' in general, focusing primarily on papers published within the past year.
2006
(7)
The semantics of a definiendum constrains both the lexical semantics and the lexicosyntactic patterns in the definiens.
Yu, H.; and Wei, Y.
In Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL, pages 1–8, New York, USA, 2006.
Paper
link
bibtex
@inproceedings{yu_semantics_2006, address = {New York, USA}, title = {The semantics of a definiendum constrains both the lexical semantics and the lexicosyntactic patterns in the definiens}, url = {https://dl.acm.org/citation.cfm?id=1567621}, booktitle = {Proceedings of the {BioNLP} {Workshop} on {Linking} {Natural} {Language} {Processing} and {Biology} at {HLT}-{NAACL}}, author = {Yu, H. and Wei, Y.}, year = {2006}, pages = {1--8}, }
Towards answering biological questions with experimental evidence: automatically identifying text that summarize image content in full-text articles.
Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium,834–838. 2006.
Paper
link
bibtex
abstract
@article{yu_towards_2006, title = {Towards answering biological questions with experimental evidence: automatically identifying text that summarize image content in full-text articles}, issn = {1942-597X}, shorttitle = {Towards answering biological questions with experimental evidence}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839512/}, abstract = {Images (i.e., figures) are important experimental evidence that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. We propose to build a biological question answering system that provides experimental evidences as answers in response to biological questions. As a first step, we develop natural language processing techniques to identify sentences that summarize image content.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Yu, Hong}, year = {2006}, pmid = {17238458}, pmcid = {PMC1839512}, keywords = {Algorithms, Biological Science Disciplines, Information Storage and Retrieval, Medical Illustration, natural language processing}, pages = {834--838}, }
Images (i.e., figures) are important experimental evidence that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. We propose to build a biological question answering system that provides experimental evidences as answers in response to biological questions. As a first step, we develop natural language processing techniques to identify sentences that summarize image content.
Accessing bioscience images from abstract sentences.
Yu, H.; and Lee, M.
Bioinformatics, 22(14): e547–e556. July 2006.
00039 PMID: 16873519
Paper
doi
link
bibtex
abstract
@article{yu_accessing_2006, title = {Accessing bioscience images from abstract sentences}, volume = {22}, issn = {1367-4803, 1460-2059}, url = {http://bioinformatics.oxfordjournals.org/content/22/14/e547}, doi = {10.1093/bioinformatics/btl261}, abstract = {Images (e.g., figures) are important experimental results that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. On the other hand, biologists live in an age of information explosion. As thousands of biomedical articles are published every day, systems that help biologists efficiently access images in literature would greatly facilitate biomedical research. We hypothesize that much of image content reported in a full-text article can be summarized by the sentences in the abstract of the article. In our study, more than one hundred biologists had tested this hypothesis and more than 40 biologists had evaluated a novel user-interface BioEx that allows biologists to access images directly from abstract sentences. Our results show that 87.8\% biologists were in favor of BioEx over two other baseline user-interfaces. We further developed systems that explored hierarchical clustering algorithms to automatically identify abstract sentences that summarize the images. One of the systems achieves a precision of 100\% that corresponds to a recall of 4.6\%. Contact:hongyu@uwm.edu or hy52@columbia.edu}, language = {en}, number = {14}, urldate = {2013-12-31}, journal = {Bioinformatics}, author = {Yu, Hong and Lee, Minsuk}, month = jul, year = {2006}, note = {00039 PMID: 16873519}, pages = {e547--e556}, }
Images (e.g., figures) are important experimental results that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. On the other hand, biologists live in an age of information explosion. As thousands of biomedical articles are published every day, systems that help biologists efficiently access images in literature would greatly facilitate biomedical research. We hypothesize that much of image content reported in a full-text article can be summarized by the sentences in the abstract of the article. In our study, more than one hundred biologists had tested this hypothesis and more than 40 biologists had evaluated a novel user-interface BioEx that allows biologists to access images directly from abstract sentences. Our results show that 87.8% biologists were in favor of BioEx over two other baseline user-interfaces. We further developed systems that explored hierarchical clustering algorithms to automatically identify abstract sentences that summarize the images. One of the systems achieves a precision of 100% that corresponds to a recall of 4.6%. Contact:hongyu@uwm.edu or hy52@columbia.edu
Beyond information retrieval–medical question answering.
Lee, M.; Cimino, J.; Zhu, H. R.; Sable, C.; Shanker, V.; Ely, J.; and Yu, H.
AMIA ... Annual Symposium proceedings. AMIA Symposium,469–473. 2006.
Paper
link
bibtex
abstract
@article{lee_beyond_2006, title = {Beyond information retrieval--medical question answering}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839371/}, abstract = {Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user's query. Frequently the number of returned documents is large and makes physicians' information seeking "practical only 'after hours' and not in the clinical settings". Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians' information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we implemented MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., "What is X?"). MedQA can be accessed at http://www.dbmi.columbia.edu/{\textasciitilde}yuh9001/research/MedQA.html.}, language = {ENG}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Lee, Minsuk and Cimino, James and Zhu, Hai R. and Sable, Carl and Shanker, Vijay and Ely, John and Yu, Hong}, year = {2006}, pmid = {17238385}, pmcid = {PMC1839371}, keywords = {Decision Support Techniques, Humans, Information Storage and Retrieval, Internet, MEDLINE, Physicians, Pilot Projects, expert systems, natural language processing}, pages = {469--473}, }
Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user's query. Frequently the number of returned documents is large and makes physicians' information seeking "practical only 'after hours' and not in the clinical settings". Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians' information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we implemented MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., "What is X?"). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html.
BioEx: a novel user-interface that accesses images from abstract sentences.
Yu, H.; and Lee, M.
In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 189–192, 2006. Association for Computational Linguistics
Paper
link
bibtex
@inproceedings{yu_bioex:_2006, title = {{BioEx}: a novel user-interface that accesses images from abstract sentences}, url = {http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=EC9B0313E8DBD1F5C07A06DA15ED2944?doi=10.1.1.95.5862&rep=rep1&type=pdf}, booktitle = {Proceedings of the {Human} {Language} {Technology} {Conference} of the {NAACL}, {Companion} {Volume}: {Short} {Papers}}, publisher = {Association for Computational Linguistics}, author = {Yu, Hong and Lee, Minsuk}, year = {2006}, pages = {189--192}, }
Exploring supervised and unsupervised methods to detect topics in biomedical text.
Lee, M.; Wang, W.; and Yu, H.
BMC bioinformatics, 7: 140. March 2006.
Paper
doi
link
bibtex
abstract
@article{lee_exploring_2006, title = {Exploring supervised and unsupervised methods to detect topics in biomedical text}, volume = {7}, issn = {1471-2105}, url = {http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-140}, doi = {10.1186/1471-2105-7-140}, abstract = {BACKGROUND: Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature. RESULTS: We have explored the methods of Topic Spotting, a task of text categorization that applies the supervised machine-learning technique naïve Bayes to assign automatically a document into one or more predefined topics; and Topic Clustering, which apply unsupervised hierarchical clustering algorithms to aggregate documents into clusters such that each cluster represents a topic. We have applied our methods to detect topics of more than fifteen thousand of articles that represent over sixteen thousand entries in the Online Mendelian Inheritance in Man (OMIM) database. We have explored bag of words as the features. Additionally, we have explored semantic features; namely, the Medical Subject Headings (MeSH) that are assigned to the MEDLINE records, and the Unified Medical Language System (UMLS) semantic types that correspond to the MeSH terms, in addition to bag of words, to facilitate the tasks of topic detection. Our results indicate that incorporating the MeSH terms and the UMLS semantic types as additional features enhances the performance of topic detection and the naïve Bayes has the highest accuracy, 66.4\%, for predicting the topic of an OMIM article as one of the total twenty-five topics. CONCLUSION: Our results indicate that the supervised topic spotting methods outperformed the unsupervised topic clustering; on the other hand, the unsupervised topic clustering methods have the advantages of being robust and applicable in real world settings.}, language = {ENG}, journal = {BMC bioinformatics}, author = {Lee, Minsuk and Wang, Weiqing and Yu, Hong}, month = mar, year = {2006}, pmid = {16539745}, pmcid = {PMC1472693}, keywords = {Abstracting and Indexing as Topic, Abstracting and Indexing/methods *Artificial Intelligence Humans *Medline *Natural Language Processing Pattern Recognition, Artificial Intelligence, Automated/*methods *Periodicals *Terminology *Vocabulary, Controlled, Humans, MEDLINE, Pattern Recognition, Automated, Periodicals as Topic, Terminology as Topic, Vocabulary, Controlled, natural language processing}, pages = {140}, }
BACKGROUND: Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature. RESULTS: We have explored the methods of Topic Spotting, a task of text categorization that applies the supervised machine-learning technique naïve Bayes to assign automatically a document into one or more predefined topics; and Topic Clustering, which apply unsupervised hierarchical clustering algorithms to aggregate documents into clusters such that each cluster represents a topic. We have applied our methods to detect topics of more than fifteen thousand of articles that represent over sixteen thousand entries in the Online Mendelian Inheritance in Man (OMIM) database. We have explored bag of words as the features. Additionally, we have explored semantic features; namely, the Medical Subject Headings (MeSH) that are assigned to the MEDLINE records, and the Unified Medical Language System (UMLS) semantic types that correspond to the MeSH terms, in addition to bag of words, to facilitate the tasks of topic detection. Our results indicate that incorporating the MeSH terms and the UMLS semantic types as additional features enhances the performance of topic detection and the naïve Bayes has the highest accuracy, 66.4%, for predicting the topic of an OMIM article as one of the total twenty-five topics. CONCLUSION: Our results indicate that the supervised topic spotting methods outperformed the unsupervised topic clustering; on the other hand, the unsupervised topic clustering methods have the advantages of being robust and applicable in real world settings.
Exploring text and image features to classify images in bioscience literature.
Rafkind, B.; Lee, M.; Chang, S.; and Yu, H.
In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pages 73, 2006. Association for Computational Linguistics
Paper
doi
link
bibtex
@inproceedings{rafkind_exploring_2006, title = {Exploring text and image features to classify images in bioscience literature}, url = {http://portal.acm.org/citation.cfm?doid=1654415.1654428}, doi = {10.3115/1654415.1654428}, language = {en}, urldate = {2016-11-30}, booktitle = {Proceedings of the {HLT}-{NAACL} {BioNLP} {Workshop} on {Linking} {Natural} {Language} and {Biology}}, publisher = {Association for Computational Linguistics}, author = {Rafkind, Barry and Lee, Minsuk and Chang, Shih-Fu and Yu, Hong}, year = {2006}, pages = {73}, }
2004
(2)
Using MEDLINE as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles.
Yu, H.; Kim, W.; Hatzivassiloglou, V.; and John Wilbur, W
In Computer-Based Medical Systems, 2004. CBMS 2004. Proceedings. 17th IEEE Symposium on, pages 27–32, June 2004. IEEE
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{yu_using_2004, title = {Using {MEDLINE} as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles}, isbn = {0-7695-2104-5}, doi = {10.1109/CBMS.2004.1311686}, abstract = {Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical abbreviations. Since many abbreviations are ambiguous, it would be important to map abbreviations to their full forms, which ultimately represent the meanings of the abbreviations. In this study, we present a novel unsupervised method that applies MEDLINE records as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles. We first automatically generated from MEDLINE records a knowledge source or dictionary of abbreviation-full pairs. We then trained on MEDLINE records and predicted the full forms of abbreviations in full-text journal articles by applying supervised machine-learning algorithms in an unsupervised fashion. We report up to 92\% prediction precision and up to 91\% coverage.}, booktitle = {Computer-{Based} {Medical} {Systems}, 2004. {CBMS} 2004. {Proceedings}. 17th {IEEE} {Symposium} on}, publisher = {IEEE}, author = {Yu, Hong and Kim, Won and Hatzivassiloglou, Vasileios and John Wilbur, W}, month = jun, year = {2004}, pages = {27--32}, }
Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical abbreviations. Since many abbreviations are ambiguous, it would be important to map abbreviations to their full forms, which ultimately represent the meanings of the abbreviations. In this study, we present a novel unsupervised method that applies MEDLINE records as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles. We first automatically generated from MEDLINE records a knowledge source or dictionary of abbreviation-full pairs. We then trained on MEDLINE records and predicted the full forms of abbreviations in full-text journal articles by applying supervised machine-learning algorithms in an unsupervised fashion. We report up to 92% prediction precision and up to 91% coverage.
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data.
Rzhetsky, A.; Iossifov, I.; Koike, T.; Krauthammer, M.; Kra, P.; Morris, M.; Yu, H.; Duboué, P. A.; Weng, W.; Wilbur, W. J.; Hatzivassiloglou, V.; and Friedman, C.
Journal of Biomedical Informatics, 37(1): 43–53. February 2004.
00251
doi link bibtex abstract
doi link bibtex abstract
@article{rzhetsky_geneways:_2004, title = {{GeneWays}: a system for extracting, analyzing, visualizing, and integrating molecular pathway data}, volume = {37}, issn = {1532-0464}, shorttitle = {{GeneWays}}, doi = {10.1016/j.jbi.2003.10.001}, abstract = {The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction of information using natural-language processing, information visualization, and generation of specialized knowledge bases for molecular biology. GeneWays is an integrated system that combines several such subtasks. It analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks. GeneWays is designed as an open platform, allowing researchers to query, review, and critique stored information.}, language = {eng}, number = {1}, journal = {Journal of Biomedical Informatics}, author = {Rzhetsky, Andrey and Iossifov, Ivan and Koike, Tomohiro and Krauthammer, Michael and Kra, Pauline and Morris, Mitzi and Yu, Hong and Duboué, Pablo Ariel and Weng, Wubin and Wilbur, W. John and Hatzivassiloglou, Vasileios and Friedman, Carol}, month = feb, year = {2004}, pmid = {15016385}, note = {00251 }, keywords = {*Artificial Intelligence Comparative Study Computer Graphics Database Management Systems Databases, Artificial Intelligence, Bioinformatics, Computer Graphics, Controlled, Database, Database Management Systems, Databases, Factual, Documentation, Factual Documentation/methods Gene Expression Regulation/physiology Information Storage and Retrieval/*methods Internet Metabolism/*physiology *Natural Language Processing *Periodicals Research Support, Gene Expression Regulation, Information Storage and Retrieval, Information extraction, Internet, Knowledge engineering, Metabolism, Molecular interactions, Molecular networks, Non-P.H.S. Research Support, P.H.S. Signal Transduction/physiology *Software *User-Computer Interface Vocabulary, Periodicals as Topic, Signal Transduction, Software, Text mining, U.S. Gov't, User-Computer Interface, Vocabulary, Controlled, natural language processing}, pages = {43--53}, }
The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction of information using natural-language processing, information visualization, and generation of specialized knowledge bases for molecular biology. GeneWays is an integrated system that combines several such subtasks. It analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks. GeneWays is designed as an open platform, allowing researchers to query, review, and critique stored information.
2003
(2)
Extracting synonymous gene and protein terms from biological literature.
Yu, H.; and Agichtein, E.
Bioinformatics, 19(Suppl 1): i340–i349. July 2003.
Paper
doi
link
bibtex
abstract
@article{yu_extracting_2003, title = {Extracting synonymous gene and protein terms from biological literature}, volume = {19}, issn = {1367-4803, 1460-2059}, url = {https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btg1047}, doi = {10.1093/bioinformatics/btg1047}, abstract = {MOTIVATION: Genes and proteins are often associated with multiple names. More names are added as new functional or structural information is discovered. Because authors can use any one of the known names for a gene or protein, information retrieval and extraction would benefit from identifying the gene and protein terms that are synonyms of the same substance. RESULTS: We have explored four complementary approaches for extracting gene and protein synonyms from text, namely the unsupervised, partially supervised, and supervised machine-learning techniques, as well as the manual knowledge-based approach. We report results of a large scale evaluation of these alternatives over an archive of biological journal articles. Our evaluation shows that our extraction techniques could be a valuable supplement to resources such as SWISSPROT, as our systems were able to capture gene and protein synonyms not listed in the SWISSPROT database.}, language = {en}, number = {Suppl 1}, urldate = {2016-11-30}, journal = {Bioinformatics}, author = {Yu, H. and Agichtein, E.}, month = jul, year = {2003}, pmid = {12855479}, keywords = {Abstracting and Indexing/*methods/standards Acetaminophen Algorithms Biology/methods/standards Computational Biology/methods/standards Database Management Systems *Databases, Bibliographic Documentation *Genes Information Storage and Retrieval/methods/standards *Natural Language Processing *Periodicals *Proteins Research Support, Controlled, Non-P.H.S. *Terminology Vocabulary, U.S. Gov't}, pages = {i340--i349}, }
MOTIVATION: Genes and proteins are often associated with multiple names. More names are added as new functional or structural information is discovered. Because authors can use any one of the known names for a gene or protein, information retrieval and extraction would benefit from identifying the gene and protein terms that are synonyms of the same substance. RESULTS: We have explored four complementary approaches for extracting gene and protein synonyms from text, namely the unsupervised, partially supervised, and supervised machine-learning techniques, as well as the manual knowledge-based approach. We report results of a large scale evaluation of these alternatives over an archive of biological journal articles. Our evaluation shows that our extraction techniques could be a valuable supplement to resources such as SWISSPROT, as our systems were able to capture gene and protein synonyms not listed in the SWISSPROT database.
Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences.
Yu, H.; and Hatzivassiloglou, V.
In Proceedings of the 2003 conference on Empirical methods in natural language processing, volume 10, pages 129–136, 2003. Association for Computational Linguistics
Paper
doi
link
bibtex
@inproceedings{yu_towards_2003, title = {Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences}, volume = {10}, shorttitle = {Towards answering opinion questions}, url = {http://portal.acm.org/citation.cfm?doid=1119355.1119372}, doi = {10.3115/1119355.1119372}, language = {en}, urldate = {2016-11-30}, booktitle = {Proceedings of the 2003 conference on {Empirical} methods in natural language processing}, publisher = {Association for Computational Linguistics}, author = {Yu, Hong and Hatzivassiloglou, Vasileios}, year = {2003}, pages = {129--136}, }
2002
(3)
Mapping Abbreviations to Full Forms in Biomedical Articles.
Yu, H.
Journal of the American Medical Informatics Association, 9(3): 262–272. May 2002.
Paper
doi
link
bibtex
@article{yu_mapping_2002, title = {Mapping {Abbreviations} to {Full} {Forms} in {Biomedical} {Articles}}, volume = {9}, issn = {10675027, 1527974X}, url = {http://jamia.oxfordjournals.org/cgi/doi/10.1197/jamia.M0913}, doi = {10.1197/jamia.M0913}, number = {3}, urldate = {2016-11-30}, journal = {Journal of the American Medical Informatics Association}, author = {Yu, H.}, month = may, year = {2002}, pmcid = {PMC344586 PMID: 11971887}, keywords = {*Abbreviations Databases *Proteins Research Support, Non-P.H.S. Research Support, P.H.S. *Software Terminology, U.S. Gov't}, pages = {262--272}, }
Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.
Yu, H.; Hatzivassiloglou, V.; Friedman, C.; Rzhetsky, A.; and Wilbur, W. J.
Proceedings. AMIA Symposium,919–923. 2002.
Paper
link
bibtex
abstract
@article{yu_automatic_2002, title = {Automatic extraction of gene and protein synonyms from {MEDLINE} and journal articles}, issn = {1531-605X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244511/}, abstract = {Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71\% on both MEDLINE and journal articles, and 90\% precision on the more suitable full-text articles alone}, language = {ENG}, journal = {Proceedings. AMIA Symposium}, author = {Yu, Hong and Hatzivassiloglou, Vasileios and Friedman, Carol and Rzhetsky, Andrey and Wilbur, W. John}, year = {2002}, pmid = {12463959}, pmcid = {PMC2244511}, keywords = {Automatic Data Processing, Genes, Information Storage and Retrieval, MEDLINE, Names, Pattern Recognition, Automated, Periodicals as Topic, Proteins, Software}, pages = {919--923}, }
Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71% on both MEDLINE and journal articles, and 90% precision on the more suitable full-text articles alone
Automatically identifying gene/protein terms in MEDLINE abstracts.
Yu, H.; Hatzivassiloglou, V.; Rzhetsky, A.; and Wilbur, W. J.
Journal of Biomedical Informatics, 35(5-6): 322–330. December 2002.
Paper
link
bibtex
abstract
@article{yu_automatically_2002, title = {Automatically identifying gene/protein terms in {MEDLINE} abstracts}, volume = {35}, issn = {1532-0464}, url = {http://www.sciencedirect.com/science/article/pii/S1532046403000327}, abstract = {MOTIVATION: Natural language processing (NLP) techniques are used to extract information automatically from computer-readable literature. In biology, the identification of terms corresponding to biological substances (e.g., genes and proteins) is a necessary step that precedes the application of other NLP systems that extract biological information (e.g., protein-protein interactions, gene regulation events, and biochemical pathways). We have developed GPmarkup (for "gene/protein-full name mark up"), a software system that automatically identifies gene/protein terms (i.e., symbols or full names) in MEDLINE abstracts. As a part of marking up process, we also generated automatically a knowledge source of paired gene/protein symbols and full names (e.g., LARD for lymphocyte associated receptor of death) from MEDLINE. We found that many of the pairs in our knowledge source do not appear in the current GenBank database. Therefore our methods may also be used for automatic lexicon generation. RESULTS: GPmarkup has 73\% recall and 93\% precision in identifying and marking up gene/protein terms in MEDLINE abstracts. AVAILABILITY: A random sample of gene/protein symbols and full names and a sample set of marked up abstracts can be viewed at http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/. Contact. hy52ATcolumbia.edu. Voice: 212-939-7028; fax: 212-666-0140.}, language = {ENG}, number = {5-6}, journal = {Journal of Biomedical Informatics}, author = {Yu, Hong and Hatzivassiloglou, Vasileios and Rzhetsky, Andrey and Wilbur, W. John}, month = dec, year = {2002}, pmid = {12968781}, keywords = {*Genes, *Medline, *Proteins, *Terminology, Abstracting, Abstracting and Indexing as Topic, Automation, Chromosome, Chromosome Mapping, Genes, MEDLINE, Mapping/methods, Proteins, Terminology as Topic, and, indexing}, pages = {322--330}, }
MOTIVATION: Natural language processing (NLP) techniques are used to extract information automatically from computer-readable literature. In biology, the identification of terms corresponding to biological substances (e.g., genes and proteins) is a necessary step that precedes the application of other NLP systems that extract biological information (e.g., protein-protein interactions, gene regulation events, and biochemical pathways). We have developed GPmarkup (for "gene/protein-full name mark up"), a software system that automatically identifies gene/protein terms (i.e., symbols or full names) in MEDLINE abstracts. As a part of marking up process, we also generated automatically a knowledge source of paired gene/protein symbols and full names (e.g., LARD for lymphocyte associated receptor of death) from MEDLINE. We found that many of the pairs in our knowledge source do not appear in the current GenBank database. Therefore our methods may also be used for automatic lexicon generation. RESULTS: GPmarkup has 73% recall and 93% precision in identifying and marking up gene/protein terms in MEDLINE abstracts. AVAILABILITY: A random sample of gene/protein symbols and full names and a sample set of marked up abstracts can be viewed at http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/. Contact. hy52ATcolumbia.edu. Voice: 212-939-7028; fax: 212-666-0140.
2001
(2)
Knowledge-based disambiguation of abbreviations.
Yu, H.
In Proceedings of the AMIA Symposium, pages 1067, 2001.
Paper
link
bibtex
@inproceedings{yu_knowledge-based_2001, title = {Knowledge-based disambiguation of abbreviations}, url = {https://pantherfile.uwm.edu/hongyu/www/files/articles/D010001419.pdf}, booktitle = {Proceedings of the {AMIA} {Symposium}}, author = {Yu, Hong}, year = {2001}, pmcid = {PMC2243340}, pages = {1067}, }
GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles.
Friedman, C.; Kra, P.; Yu, H.; Krauthammer, M.; and Rzhetsky, A.
Bioinformatics, 17(Suppl 1): S74–S82. June 2001.
00521
Paper
doi
link
bibtex
@article{friedman_genies:_2001, title = {{GENIES}: a natural-language processing system for the extraction of molecular pathways from journal articles}, volume = {17}, issn = {1367-4803, 1460-2059}, shorttitle = {{GENIES}}, url = {https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/17.suppl_1.S74}, doi = {10.1093/bioinformatics/17.suppl_1.S74}, language = {en}, number = {Suppl 1}, urldate = {2016-11-30}, journal = {Bioinformatics}, author = {Friedman, C. and Kra, P. and Yu, H. and Krauthammer, M. and Rzhetsky, A.}, month = jun, year = {2001}, pmid = {11472995}, note = {00521 }, keywords = {Artificial Intelligence, Computational Biology, Molecular Biology, Periodicals as Topic, Pilot Projects, natural language processing}, pages = {S74--S82}, }
2000
(1)
A large scale, cross-disease family health history data set.
Yu, H.; and Hripcsak, G.
Proceedings of the AMIA Symposium,1162. 2000.
PMC2243911