Št. zadetkov: 37
                    
                    
                
                    
                        
                        
                        
                            Video in druga učna gradiva
                        
                            Oznake:
                            humanities;linguistics;lexicography;social sciences;society;computer science
                        
                            With the rise of digital media in the last decades, many language-related discussions have found
home on various fora and social media such as Facebook, where users can participate in a shared-interest group to discuss language use, problems and resources. The posts in these groups are formulated b ...
                        
                            
                                Leto:
                                2018
                            
                            
                                Vir:
                                videolectures.net
                        
                     
                
                    
                        
                        
                        
                            Video in druga učna gradiva
                        
                            Oznake:
                            humanities;linguistics
                        
                            Avtomatsko luščenje kolokacij temelji predvsem na izračunu statističnih sopojavitev besed v besedilnem korpusu, vsi tako izluščeni kandidati pa niso ustrezni. Da bi opredelili, kaj je legitimna statistična kolokacija na eni in slovarsko relevantna kolokacija na drugi strani, smo pripravili učno množ ...
                        
                            
                                Leto:
                                2018
                            
                            
                                Vir:
                                videolectures.net
                        
                     
                
                    
                        
                        
                        
                            Objavljeni znanstveni prispevek na konferenci
                        
                            Oznake:
                            large language models;responsible artificial intelligence;safety datasets;Slovene;
                        
                            In the paper, we present the initial preparatory phase of the compilation of a Slovene safety dataset containing harmful or offensive prompts and safe responses to them. The dataset will be used to fine-tune Slovene large language models in order to prevent unwanted model behavior and misuse by mali ...
                        
                            
                                Leto:
                                2024
                            
                            
                                Vir:
                                Fakulteta za računalništvo in informatiko (UL FRI)
                        
                     
                
                    
                        
                        
                        
                            Video in druga učna gradiva
                        
                            Oznake:
                            humanities;linguistics
                        
                            
                        
                            
                                Leto:
                                2018
                            
                            
                                Vir:
                                videolectures.net
                        
                     
                
                    
                        
                        
                        
                            Samostojni znanstveni sestavek ali poglavje v monografski publikaciji
                        
                            Oznake:
                            spletna besedila;diseminacija korpusov;avtorske pravice;varstvo osebnih podatkov;prosti in odprti dostop;
                        
                            Korpusi spletnih besedil so uporabni pri izdelavi jezikovnih priročnikov, v korpusnojezikoslovnih raziskavah in pri razvoju jezikovnih tehnologij. Izdelava takšnih korpusov je kljub neposredni dostopnosti besedil zapletena in draga, zato je zelo pomembno, da omogočimo njihovo čim večjo dostopnost či ...
                        
                            
                                Leto:
                                2015
                            
                            
                                Vir:
                                Filozofska fakulteta (UL FF)
                        
                     
                
                    
                        
                        
                        
                            Samostojni znanstveni sestavek ali poglavje v monografski publikaciji
                        
                            Oznake:
                            gradnja korpusa;spletna slovenščina;računalniško posredovana komunikacija;uporabniške spletne vsebine;jezik družbenih omrežij;
                        
                            Spletna besedila tako po svetu kot v Sloveniji predstavljajo vse večji delež jezikovne produkcije, uporabniške spletne vsebine pa postajajo vse pomembnejši vir znanja in vplivajo tudi na nadaljnji razvoj jezika. Če želimo ta potencial izkoristiti, moramo temeljito proučiti spletni segment jezikovne  ...
                        
                            
                                Leto:
                                2015
                            
                            
                                Vir:
                                Filozofska fakulteta (UL FF)
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
                        
                            Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Slovene. As the corpus has bee ...
                        
                            
                                Leto:
                                2016
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
                        
                            Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Slovene. As the corpus has bee ...
                        
                            
                                Leto:
                                2016
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            computer-mediated communication;tokenisation;word normalisation;manual annotation;TEI
                        
                            Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation and word normalisation of non-standard Slovene. The corpus is also automatically annotated with morphosyntac ...
                        
                            
                                Leto:
                                2016
                            
                            
                                Vir:
                                CLARIN.si
                        
                     
                
                    
                        
                        
                        
                            Raziskovalni podatki
                        
                            Oznake:
                            computer-mediated communication;tokenisation;word normalisation;manual annotation;TEI
                        
                            Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation and word normalisation of non-standard Slovene. As the corpus has been carefully manually annotated, it is a ...
                        
                            
                                Leto:
                                2016
                            
                            
                                Vir:
                                CLARIN.si