MARC: Building and using comparable corpora for domain-specific bilingual lexicon extraction

Building and using comparable corpora for domain-specific bilingual lexicon extraction

This paper presents a series of experiments aimed at inducing and evaluating domain- specific bilingual lexica from comparable corpora. First, a small English-Slovene comparable corpus from health magazines was manually constructed and then used to compile a large comparable corpus on health-related...

Full description

Permalink:	http://skupnikatalog.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:317597/Details
Matična publikacija:	4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web Portland : Association for Computational Linguistics, 2011
Glavni autori:	Fišer, Darja (-), Vintar, Špela (Author), Pollak, Senja, Ljubešić, Nikola, informatičar
Vrsta građe:	Članak
Jezik:	eng
Online pristup:	http://aclweb.org/anthology-new/W/W11/W11-1200.pdf


LEADER	02353naa a2200265uu 4500
008	131111s2011 xx 1 eng\|d
035			\|a (CROSBI)552816
040			\|a HR-ZaFF \|b hrv \|c HR-ZaFF \|e ppiak
100	1		\|a Fišer, Darja
245	1	0	\|a Building and using comparable corpora for domain-specific bilingual lexicon extraction / \|c Fišer, Darja ; Ljubešić, Nikola ; Vintar, Špela ; Pollak, Senja.
246	3		\|i Naslov na engleskom: \|a Building and using comparable corpora for domain-specific bilingual lexicon extraction
300			\|a 19-26 \|f str.
520			\|a This paper presents a series of experiments aimed at inducing and evaluating domain- specific bilingual lexica from comparable corpora. First, a small English-Slovene comparable corpus from health magazines was manually constructed and then used to compile a large comparable corpus on health-related topics from web corpora. Next, a bilingual lexicon for the domain was extracted from the corpus by comparing context vectors in the two languages. Evaluation of the results shows that a 2-way translation of context vectors significantly improves precision of the extracted translation equivalents. We also show that it is sufficient to increase the corpus for one language in order to obtain a higher recall, and that the increase of the number of new words is linear in the size of the corpus. Finally, we demonstrate that by lowering the frequency threshold for context vectors, the drop in precision is much slower than the increase of recall.
536			\|a Projekt MZOS \|f 130-1301679-1380
546			\|a ENG
690			\|a 5.04
693			\|a comparable corpora, bilingual lexicon extraction, domain lexicons \|l hrv \|2 crosbi
693			\|a comparable corpora, bilingual lexicon extraction, domain lexicons \|l eng \|2 crosbi
700	1		\|a Vintar, Špela \|4 aut
700	1		\|a Pollak, Senja \|4 aut
700	1		\|9 445 \|a Ljubešić, Nikola, \|c informatičar \|4 aut
773	0		\|a 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web (24.7.2011. ; Portland, SAD) \|t 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web \|d Portland : Association for Computational Linguistics, 2011 \|g str. 19-26
856			\|u http://aclweb.org/anthology-new/W/W11/W11-1200.pdf
942			\|c RZB \|u 2 \|v Recenzija \|z Znanstveni - Predavanje - CijeliRad \|t 1.08
999			\|c 317597 \|d 317595

Building and using comparable corpora for domain-specific bilingual lexicon extraction

Slični primjerci