Keywords: nuclear knowledge management, hierarchical document clustering, frequent item set clustering, taxonomy, concept hierarchy, text mining, concept sets, text mining, semantic relationships, Brazil
A semi–automatic method for extracting a taxonomy for nuclear knowledge using hierarchical document clustering based on concept sets
In this paper, we present a text mining approach for the semi–automatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).