Publicado

2008-01-01

AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN

Palavras-chave:


Knowledge Management, Information Extraction, Ontologies, Fuzzy String Searching, Word Sense Disambiguation, Semantic Relatedness (es)

Autores

  • SERGIO G. JIMÉNEZ V. Ing. National University of Colombia - Branch Bogota
  • FABIO A. GONZÁLEZ O. Phd. National University of Colombia - Branch Bogota
This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inconsistent abbreviation styles. We propose a new character-level edit distance sensitive to prefixes called root distance and a token-level similarity algorithm for fuzzy acronym detection. Additionally, a WSD strategy using an ontology-based semantic relatedness measure is used to solve the inherent ambiguity of some entities. The WSD module finds a sense combination over all the document length optimizing the document semantic coherence. Our approach seems to be suitable to extract information from data-rich documents describing Orly one main object (i.e. product) by document. The results showed a precision of 78.9% with 99.5% recall using documents and an ontology related to laptop computers domain.

Como Citar

APA

JIMÉNEZ V., S. G. e GONZÁLEZ O., F. A. (2008). AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática, 5(1). https://revistas.unal.edu.co/index.php/avances/article/view/9972

ACM

[1]
JIMÉNEZ V., S.G. e GONZÁLEZ O., F.A. 2008. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática. 5, 1 (jan. 2008).

ACS

(1)
JIMÉNEZ V., S. G.; GONZÁLEZ O., F. A. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. ava. sis. inf 2008, 5.

ABNT

JIMÉNEZ V., S. G.; GONZÁLEZ O., F. A. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. Avances en Sistemas e Informática, [S. l.], v. 5, n. 1, 2008. Disponível em: https://revistas.unal.edu.co/index.php/avances/article/view/9972. Acesso em: 22 jan. 2025.

Chicago

JIMÉNEZ V., SERGIO G., e FABIO A. GONZÁLEZ O. 2008. “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”. Avances En Sistemas E Informática 5 (1). https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Harvard

JIMÉNEZ V., S. G. e GONZÁLEZ O., F. A. (2008) “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”, Avances en Sistemas e Informática, 5(1). Disponível em: https://revistas.unal.edu.co/index.php/avances/article/view/9972 (Acessado: 22 janeiro 2025).

IEEE

[1]
S. G. JIMÉNEZ V. e F. A. GONZÁLEZ O., “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”, ava. sis. inf, vol. 5, nº 1, jan. 2008.

MLA

JIMÉNEZ V., S. G., e F. A. GONZÁLEZ O. “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”. Avances en Sistemas e Informática, vol. 5, nº 1, janeiro de 2008, https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Turabian

JIMÉNEZ V., SERGIO G., e FABIO A. GONZÁLEZ O. “AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN”. Avances en Sistemas e Informática 5, no. 1 (janeiro 1, 2008). Acessado janeiro 22, 2025. https://revistas.unal.edu.co/index.php/avances/article/view/9972.

Vancouver

1.
JIMÉNEZ V. SG, GONZÁLEZ O. FA. AN ONTOLOGY-BASED INFORMATION EXTRACTOR FOR DATA-RICH DOCUMENTS IN THE INFORMATION TECHNOLOGY DOMAIN. ava. sis. inf [Internet]. 1º de janeiro de 2008 [citado 22º de janeiro de 2025];5(1). Disponível em: https://revistas.unal.edu.co/index.php/avances/article/view/9972

Baixar Citação

Acessos à página de resumo

147

Downloads

Não há dados estatísticos.