Iskalni niz:
išči po
išči po
išči po
išči po
Vrsta gradiva:
Jezik:
Št. zadetkov: 16
Raziskovalni podatki
Oznake: lexicon;morphology;inflection
hrLex is an large inflectional lexicon of Croatian language where each entry consists of a (wordform, lemma, MSD) triple. The MSD tagset follows the revised MULTEXT-East V4 tagset for Croatian and Serbian, available at https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping.
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: lexicon;morphology;inflection
hrLex is an large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD) triple. The MSD tagset follows the revised MULTEXT-East V4 tagset for Croatian and Serbian, available at https://github.com/ffnlp/sethr/blob/master/mte4r-upos.mapping.
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: web corpus
The Croatian web corpus hrWaC was built by crawling the .hr top-level domain in 2011 and again in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated and lemmatised. The corpus is shuffled by paragraphs. Each paragraph contai ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: web corpus
The Bosnian web corpus bsWaC was built by crawling the .ba top-level domain in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated and lemmatised. The corpus is shuffled by paragraphs. Each paragraph contains metadata on the ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: web corpus;lemmatisation
The Serbian web corpus srWaC was built by crawling the .rs top-level domain in 2014. The corpus was near-deduplicated on paragraph level, normalised via diacritic restoration, morphosyntactically annotated and lemmatised. The corpus is shuffled by paragraphs. Each paragraph contains metadata on the ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: morphology;inflection
srLex is a large inflectional lexicon of Serbian language where each entry consists of a (wordform, lemma, MSD, frequency, per-million frequency) 5-tuple. The (wordform, lemma, MSD) triple frequencies are calculated on the srWaC v1.2 corpus. The MSD tagset follows the MULTEXT-East V5 tagset for Bosn ...
Leto: 2016 Vir: CLARIN.si
Raziskovalni podatki
Oznake: part-of-speech tagging;dependency treebank;parsing;named entities;tokenisation;manual annotation;TEI;semantic role labelling
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also manually annotated with syntactic dependencies. Furthermore, about a fifth of ...
Leto: 2018 Vir: CLARIN.si
Raziskovalni podatki
Oznake: computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
ReLDI-NormTag-sr 1.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its auto ...
Leto: 2017 Vir: CLARIN.si
Raziskovalni podatki
Oznake: computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
ReLDI-NormTag-sr 1.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its auto ...
Leto: 2017 Vir: CLARIN.si
Raziskovalni podatki
Oznake: parallel corpus;web corpus
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor (https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bite ...
Leto: 2016 Vir: CLARIN.si
Št. zadetkov: 16
Ključne besede:
Leto izdaje:
Avtorji:
Repozitorij:
Tipologija:
Jezik: