Publicado

2008-01-01

AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN

Palabras clave:


Knowledge Management, Information Extraction, Ontologies, Fuzzy String Searching, Word Sense Disambiguation, Semantic Relatedness (es)

Descargas

Autores/as

  • SERGIO G. JIMÉNEZ V. Ing. National University of Colombia - Branch Bogota
  • FABIO A. GONZÁLEZ O. Phd. National University of Colombia - Branch Bogota
This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inconsistent abbreviation styles. We propose a new character-level edit distance sensitive to prefixes called root distance and a token-level similarity algorithm for fuzzy acronym detection. Additionally, a WSD strategy using an ontology-based semantic relatedness measure is used to solve the inherent ambiguity of some entities. The WSD module finds a sense combination over all the document length optimizing the document semantic coherence. Our approach seems to be suitable to extract information from data-rich documents describing Orly one main object (i.e. product) by document. The results showed a precision of 78.9% with 99.5% recall using documents and an ontology related to laptop computers domain.

Cómo citar

APA

JIMÉNEZ V., S. G. y GONZÁLEZ O., F. A. (2008). AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática, 5(1). https://revistas.unal.edu.co/index.php/avances/article/view/9972

ACM

[1]
JIMÉNEZ V., S.G. y GONZÁLEZ O., F.A. 2008. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática. 5, 1 (ene. 2008).

ACS

(1)
JIMÉNEZ V., S. G.; GONZÁLEZ O., F. A. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. ava. sis. inf 2008, 5.

ABNT

JIMÉNEZ V., S. G.; GONZÁLEZ O., F. A. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática, [S. l.], v. 5, n. 1, 2008. Disponível em: https://revistas.unal.edu.co/index.php/avances/article/view/9972. Acesso em: 29 mar. 2024.

Chicago

JIMÉNEZ V., SERGIO G., y FABIO A. GONZÁLEZ O. 2008. «AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN». Avances En Sistemas E Informática 5 (1). https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Harvard

JIMÉNEZ V., S. G. y GONZÁLEZ O., F. A. (2008) «AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN», Avances en Sistemas e Informática, 5(1). Disponible en: https://revistas.unal.edu.co/index.php/avances/article/view/9972 (Accedido: 29 marzo 2024).

IEEE

[1]
S. G. JIMÉNEZ V. y F. A. GONZÁLEZ O., «AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN», ava. sis. inf, vol. 5, n.º 1, ene. 2008.

MLA

JIMÉNEZ V., S. G., y F. A. GONZÁLEZ O. «AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN». Avances en Sistemas e Informática, vol. 5, n.º 1, enero de 2008, https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Turabian

JIMÉNEZ V., SERGIO G., y FABIO A. GONZÁLEZ O. «AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN». Avances en Sistemas e Informática 5, no. 1 (enero 1, 2008). Accedido marzo 29, 2024. https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Vancouver

1.
JIMÉNEZ V. SG, GONZÁLEZ O. FA. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. ava. sis. inf [Internet]. 1 de enero de 2008 [citado 29 de marzo de 2024];5(1). Disponible en: https://revistas.unal.edu.co/index.php/avances/article/view/9972

Descargar cita

Visitas a la página del resumen del artículo

144

Descargas

Los datos de descargas todavía no están disponibles.