Iskalni niz:
išči po
išči po
išči po
išči po
Vrsta gradiva:
Jezik:
Št. zadetkov: 6
Raziskovalni podatki
Oznake: web corpus
The Finnish web corpus fiWaC was built by crawling the .fi top-level domain in 2015 for both Finnish and English documents. The corpus was naively tokenised (via spaces), near-deduplicated on paragraph level and paragraph-shuffled. Each paragraph contains metadata on the URL and language identificat ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: parallel corpus;web corpus
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bite ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: parallel corpus;web corpus;multilingual
The srenWaC corpus consists of sentence aligned parallel Serbian-English texts crawled from the .rs top-level domain for Serbia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for b ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: parallel corpus;web corpus
The fienWaC corpus version 1.0 consists of parallel Finnish-English texts crawled from the .fi top-level domain for Finland. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitex ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: parallel corpus;web corpus;multilingual
The slenWaC corpus version 1.0 consists of parallel Slovene-English texts crawled from the .si top-level domain for Slovenia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bite ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: parallel corpus;tourism;multilingual
Sentence aligned parallel corpus built by automatically crawling 25 websites from the tourism domain.
Leto: 2016 Vir: CLARIN.si
Št. zadetkov: 6
Ključne besede:
Leto izdaje:
Avtorji:
Repozitorij:
Tipologija:
Jezik: