Št. zadetkov: 9
                    
                    
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            news corpus;news discourse
                        
                            A comprehensive corpus of news articles on the topic of language, published in major Slovenian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideolog ...
                        
                            
                                Leto:
                                2020
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            news comments;computer-mediated communication
                        
                            A comprehensive corpus of user comments on online news articles on the topic of language from major Slovenian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about lang ...
                        
                            
                                Leto:
                                2020
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            news corpus;news discourse
                        
                            A comprehensive corpus of news articles on the topic of language, published in major Croatian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologi ...
                        
                            
                                Leto:
                                2020
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            news comments;computer-mediated communication
                        
                            A comprehensive corpus of user comments on online news articles on the topic of language from major Croatian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about langu ...
                        
                            
                                Leto:
                                2020
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            news corpus;news discourse
                        
                            A comprehensive corpus of news articles on the topic of language, published in major Serbian daily newspapers and news portals in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about language’), linguistic ideologie ...
                        
                            
                                Leto:
                                2020
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            news comments;computer-mediated communication
                        
                            A comprehensive corpus of user comments on online news articles on the topic of language from major Serbian daily newspapers and news portals, published in the five-year period of January 1, 2015 - January 1, 2020. The corpus is designed to facilitate research on metalanguage (‘language about langua ...
                        
                            
                                Leto:
                                2020
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            part-of-speech tagging;dependency treebank;parsing;named entities;tokenisation;manual annotation;TEI
                        
                            The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities.
The annotations (and other aspects) of the corpus are documented in the teiHeader and  ...
                        
                            
                                Leto:
                                2018
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            part-of-speech tagging;dependency treebank;parsing;named entities;tokenisation;manual annotation;TEI;semantic role labelling
                        
                            The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation and named entities. About half of the corpus is also manually annotated with syntactic dependencies. Furthermore, about a fifth of  ...
                        
                            
                                Leto:
                                2018
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            computer-mediated communication;tokenisation;word normalisation;part-of-speech tagging;lemmatisation;named entities;manual annotation;TEI
                        
                            ReLDI-NormTagNER-hr 2.1 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Croatian. Each tweet ...
                        
                            
                                Leto:
                                2019
                            
                            
                                Vir:
                                CLARIN.si