dc.contributor.author |
Daðason, Jón Friðrik |
dc.contributor.author |
Loftsson, Hrafn |
dc.date.accessioned |
2022-09-30T18:21:45Z |
dc.date.available |
2022-09-30T18:21:45Z |
dc.date.issued |
2022-09-30 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/297 |
dc.description |
IceEval is a benchmark for evaluating and comparing the quality of pre-trained language models. The models are evaluated on a selection of four NLP tasks for Icelandic: part-of-speech tagging (using the MIM-GOLD corpus), named entity recognition (using the MIM-GOLD-NER corpus), dependency parsing (using the IcePaHC-UD corpus) and automatic text summarization (using the IceSum corpus). IceEval includes scripts for downloading the datasets, splitting them into training, validation and test splits and training and evaluating models for each task. The benchmark uses the Transformers, DiaParser and TransformerSum libraries for fine-tuning and evaluation.
IceEval er tól til að meta og bera saman forþjálfuð mállíkön. Líkönin eru metin á fjórum máltækniverkefnum fyrir íslensku: mörkun (með MIM-GOLD málheildinni), nafnakennslum (með MIM-GOLD-NER málheildinni), þáttun (með IcePaHC-UD málheildinni) og sjálfvirkri samantekt (með IceSum málheildinni). IceEval inniheldur skriftur til að sækja gagnasöfnin, skipta þeim í þjálfunar- og prófunargögn og að fínstilla og meta líkön fyrir hvert verkefni. Transformers, DiaParser og TransformerSum forritasöfnin eru notuð til að fínstilla líkönin. |
dc.language.iso |
isl |
dc.publisher |
Reykjavik University |
dc.rights |
Apache License 2.0 |
dc.rights.uri |
https://opensource.org/license/apache2-0-php/ |
dc.rights.label |
PUB |
dc.subject |
benchmark |
dc.subject |
language model |
dc.subject |
named entity recognition |
dc.subject |
pos-tagging |
dc.subject |
dependency parsing |
dc.subject |
summarization |
dc.title |
IceEval - Icelandic Natural Language Processing Benchmark 22.09 |
dc.type |
toolService |
metashare.ResourceInfo#ContentInfo.detailedType |
tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent |
true |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Jón Friðrik Daðason jond19@ru.is Reykjavik University |
sponsor |
Icelandic Ministry of Education, Science and Culture Semantic analysis - BERT-type language models (I8) Language Technology Programme for Icelandic 2019-2023 nationalFunds |
files.size |
142592 |
files.count |
1 |