Show simple item record

dc.contributor.authorCruz Díaz, Noa Patricia
dc.contributor.authorMaña López, Manuel Jesús
dc.identifier.citationCruz Díaz, N.P., Maña López, M.J.: "An analysis of biomedical tokenization : problems and strategies". En: Sixth International Workshop on Health Text Mining and Information Analysis (Louhi), pages 40–49,. Lisbon, Portugal, 17 September 2015en_US
dc.description.abstractChoosing the right tokenizer is a non-trivial task, especially in the biomedical domain, where it poses additional challenges, which if not resolved means the propagation of errors in successive Natural Language Processing analysis pipeline. This paper aims to identify these problematic cases and analyze the out-put that, a representative and widely used set of tokenizers, shows on them. This work will aid the decision making process of choosing the right strategy according to the down-stream application. In addition, it will help developers to create accurate tokenization tools or improve the existing ones. A total of 14 problematic cases were described, show-ing biomedical samples for each of them. The outputs of 12 tokenizers were provided and discussed in relation to the level of agreement among tools
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España*
dc.titleAn analysis of biomedical tokenization : problems and strategiesen_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Atribución-NoComercial-SinDerivadas 3.0 España
Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 España

Copyright © 2008-2010. ARIAS MONTANO. Repositorio Institucional de la Universidad de Huelva
Contact Us | Send Feedback |