MARC: Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Multi-word term extraction from comparable corpora by combining contextual and constituent clues

In this paper we present an approach to automatically extract and align multi-word terms from an English-Slovene comparable health corpus. First, the terms are extracted from the corpus for each language separately using a list of user-adjustable morphosyntactic patterns and a term weighting measure...

Full description

Permalink:	http://skupnikatalog.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:318229/Details
Matična publikacija:	Proceedings of the Workshop on Building and Using Comparable Corpora (BUCC’12) Istanbul : 2012
Glavni autori:	Ljubešić, Nikola, informatičar (-), Vintar, Špela (Author), Fišer, Darja
Vrsta građe:	Članak
Jezik:	eng


LEADER	02457naa a2200253uu 4500
008	131111s2012 xx 1 eng\|d
035			\|a (CROSBI)616813
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|9 445 \|a Ljubešić, Nikola, \|c informatičar
245	1	0	\|a Multi-word term extraction from comparable corpora by combining contextual and constituent clues / \|c Ljubešić, Nikola ; Vintar, Špela ; Fišer, Darja.
246	3		\|i Naslov na engleskom: \|a Multi-word term extraction from comparable corpora by combining contextual and constituent clues
300			\|a 143-147 \|f str.
520			\|a In this paper we present an approach to automatically extract and align multi-word terms from an English-Slovene comparable health corpus. First, the terms are extracted from the corpus for each language separately using a list of user-adjustable morphosyntactic patterns and a term weighting measure. Then, the extracted terms are aligned in a bag-of-equivalents fashion with a seed bilingual lexicon. In the extension of the approach we also show that the small general seed lexicon can be enriched with domain-specific vocabulary by harvesting it directly from the comparable corpus, which significantly improves the results of multi-word term mapping. While most previous efforts in bilingual lexicon extraction from comparable corpora have focused on mapping of single words, the proposed technique successfully augments them in that it is able to deal with multi-word terms as well. Since the proposed approach requires minimal knowledge resources, it is easily adaptable for a new language pair or domain, which is one of its biggest advantages.
536			\|a Projekt MZOS \|f 130-1301679-1380
536			\|a Projekt MZOS \|f FP7-248347
546			\|a ENG
690			\|a 5.04
693			\|a bilingual term extraction, comparable corpora, multi-word expressions, constituent clues \|l hrv \|2 crosbi
693			\|a bilingual term extraction, comparable corpora, multi-word expressions, constituent clues \|l eng \|2 crosbi
700	1		\|a Vintar, Špela \|4 aut
700	1		\|a Fišer, Darja \|4 aut
773	0		\|a 5th Workshop on Building and Using Comparable Corpora (BUCC 2012) (26.5.2012. ; Istanbul, Turska) \|t Proceedings of the Workshop on Building and Using Comparable Corpora (BUCC’12) \|d Istanbul : 2012 \|n Rapp, Reinhard ; Tadić, Marko ; Sharoff, Serge ; Zweigenbaum, Pierre \|g str. 143-147
942			\|c RZB \|u 2 \|v Recenzija \|z Znanstveni - Predavanje - CijeliRad \|t 1.08
999			\|c 318229 \|d 318227

Multi-word term extraction from comparable corpora by combining contextual and constituent clues

Slični primjerci