Keywords: human blood, lead concentration, quantitative concentration data, literature review, PubMed, risk assessment, toxicoinformatics, automated information retrieval, text mining, search engines, answer set, knowledge base, toxicology literature, open source, online databases
Automated information retrieval for quantitative risk assessment data
The amount of toxicologic literature available can be so copious as to present significant challenges to risk assessors tasked with identifying key studies. As a new approach to managing such information, an information specialist and a toxicologist developed an open source text mining computer program consisting of knowledge bases and search algorithms. Quantitative toxicologic data, such as dose levels or risk numbers, are often presented in the abstracts of scientific literature records, which, in turn, include full or partial abstracts. We chose to examine records containing human blood lead concentration (HBLC) data. The resulting program (HBLCFinder) searches for lead concentration data in a record's abstract then determines the record's relevancy to human blood. After several iterative modifications, we achieved recall (sensitivity), specificity and precision of 86%, 99% and 96%, respectively. The approach may be of use to risk assessors needing to identify quantitative data in online database records.