Bootstrapping bilingual lexicons from comparable corpora for closely related languages
In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in...
| Permalink: | http://skupnikatalog.nsk.hr/Record/ffzg.KOHA-OAI-FFZG:312925/Details |
|---|---|
| Matična publikacija: |
Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings Lecture Notes in Computer Science |
| Glavni autori: | Ljubešić, Nikola, informatičar (-), Fišer, Darja (Author) |
| Vrsta građe: | Članak |
| Jezik: | eng |
| Online pristup: |
http://www.springerlink.com/content/n5m86t5h212h2753/ |
| LEADER | 02036naa a2200253uu 4500 | ||
|---|---|---|---|
| 008 | 131111s2011 xx eng|d | ||
| 020 | |a 9783-642-23537-5 | ||
| 035 | |a (CROSBI)552910 | ||
| 040 | |a HR-ZaFF |b hrv |c HR-ZaFF |e ppiak | ||
| 100 | 1 | |9 445 |a Ljubešić, Nikola, |c informatičar | |
| 245 | 1 | 0 | |a Bootstrapping bilingual lexicons from comparable corpora for closely related languages / |c Ljubešić, Nikola ; Fišer, Darja. |
| 246 | 3 | |i Naslov na engleskom: |a Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages | |
| 300 | |a 91-98 |f str. | ||
| 520 | |a In this paper we present an approach to bootstrap a Croatian- Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries. | ||
| 536 | |a Projekt MZOS |f 130-1301679-1380 | ||
| 546 | |a ENG | ||
| 690 | |a 5.04 | ||
| 693 | |a comparable corpora, bilingual lexicon extraction, bootstrapping |l hrv |2 crosbi | ||
| 693 | |a comparable corpora, bilingual lexicon extraction, bootstrapping |l eng |2 crosbi | ||
| 700 | 1 | |a Fišer, Darja |4 aut | |
| 773 | 0 | |t Text, Speech and Dialogue : 14th International Conference, TSD 2011, Pilsen, Czech Republic, September 1-5, 2011. : Proceedings |d Berlin / Heidelberg : Springer, 2011 |k Lecture Notes in Computer Science |n Habernal, Ivan ; Matoušek, Václav |z 978-3-642-23537-5 |g str. 91-98 |a International Conference, TSD 2011(14 ; 2011 ; Pilsen, Czech Republic) | |
| 856 | |u http://www.springerlink.com/content/n5m86t5h212h2753/ | ||
| 942 | |c RZB |t 1.08 |u 2 |z Znanstveni |v MeđRecenzija | ||
| 999 | |c 312925 |d 312923 | ||


