Published

2008-01-01

AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN

Keywords:


Knowledge Management, Information Extraction, Ontologies, Fuzzy String Searching, Word Sense Disambiguation, Semantic Relatedness (es)

Authors

  • SERGIO G. JIMÉNEZ V. Ing. National University of Colombia - Branch Bogota
  • FABIO A. GONZÁLEZ O. Phd. National University of Colombia - Branch Bogota
This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inconsistent abbreviation styles. We propose a new character-level edit distance sensitive to prefixes called root distance and a token-level similarity algorithm for fuzzy acronym detection. Additionally, a WSD strategy using an ontology-based semantic relatedness measure is used to solve the inherent ambiguity of some entities. The WSD module finds a sense combination over all the document length optimizing the document semantic coherence. Our approach seems to be suitable to extract information from data-rich documents describing Orly one main object (i.e. product) by document. The results showed a precision of 78.9% with 99.5% recall using documents and an ontology related to laptop computers domain.

How to Cite

APA

JIMÉNEZ V., S. G. and GONZÁLEZ O., F. A. (2008). AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática, 5(1). https://revistas.unal.edu.co/index.php/avances/article/view/9972

ACM

[1]
JIMÉNEZ V., S.G. and GONZÁLEZ O., F.A. 2008. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática. 5, 1 (Jan. 2008).

ACS

(1)
JIMÉNEZ V., S. G.; GONZÁLEZ O., F. A. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. ava. sis. inf 2008, 5.

ABNT

JIMÉNEZ V., S. G.; GONZÁLEZ O., F. A. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática, [S. l.], v. 5, n. 1, 2008. Disponível em: https://revistas.unal.edu.co/index.php/avances/article/view/9972. Acesso em: 22 jan. 2025.

Chicago

JIMÉNEZ V., SERGIO G., and FABIO A. GONZÁLEZ O. 2008. “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”. Avances En Sistemas E Informática 5 (1). https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Harvard

JIMÉNEZ V., S. G. and GONZÁLEZ O., F. A. (2008) “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”, Avances en Sistemas e Informática, 5(1). Available at: https://revistas.unal.edu.co/index.php/avances/article/view/9972 (Accessed: 22 January 2025).

IEEE

[1]
S. G. JIMÉNEZ V. and F. A. GONZÁLEZ O., “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”, ava. sis. inf, vol. 5, no. 1, Jan. 2008.

MLA

JIMÉNEZ V., S. G., and F. A. GONZÁLEZ O. “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”. Avances en Sistemas e Informática, vol. 5, no. 1, Jan. 2008, https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Turabian

JIMÉNEZ V., SERGIO G., and FABIO A. GONZÁLEZ O. “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”. Avances en Sistemas e Informática 5, no. 1 (January 1, 2008). Accessed January 22, 2025. https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Vancouver

1.
JIMÉNEZ V. SG, GONZÁLEZ O. FA. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. ava. sis. inf [Internet]. 2008 Jan. 1 [cited 2025 Jan. 22];5(1). Available from: https://revistas.unal.edu.co/index.php/avances/article/view/9972

Download Citation

Article abstract page views

147

Downloads

Download data is not yet available.