Sýna einfalda færslu atriðis
dc.contributor.author |
Daðason, Jón Friðrik |
dc.contributor.author |
Loftsson, Hrafn |
dc.date.accessioned |
2022-09-30T18:21:45Z |
dc.date.available |
2022-09-30T18:21:45Z |
dc.date.issued |
2022-09-30 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/297 |
dc.description |
IceEval is a benchmark for evaluating and comparing the quality of pre-trained language models. The models are evaluated on a selection of four NLP tasks for Icelandic: part-of-speech tagging (using the MIM-GOLD corpus), named entity recognition (using the MIM-GOLD-NER corpus), dependency parsing (using the IcePaHC-UD corpus) and automatic text summarization (using the IceSum corpus). IceEval includes scripts for downloading the datasets, splitting them into training, validation and test splits and training and evaluating models for each task. The benchmark uses the Transformers, DiaParser and TransformerSum libraries for fine-tuning and evaluation.
IceEval er tól til að meta og bera saman forþjálfuð mállíkön. Líkönin eru metin á fjórum máltækniverkefnum fyrir íslensku: mörkun (með MIM-GOLD málheildinni), nafnakennslum (með MIM-GOLD-NER málheildinni), þáttun (með IcePaHC-UD málheildinni) og sjálfvirkri samantekt (með IceSum málheildinni). IceEval inniheldur skriftur til að sækja gagnasöfnin, skipta þeim í þjálfunar- og prófunargögn og að fínstilla og meta líkön fyrir hvert verkefni. Transformers, DiaParser og TransformerSum forritasöfnin eru notuð til að fínstilla líkönin. |
dc.language.iso |
isl |
dc.publisher |
Reykjavik University |
dc.rights |
Apache License 2.0 |
dc.rights.uri |
https://opensource.org/license/apache2-0-php/ |
dc.rights.label |
PUB |
dc.subject |
benchmark |
dc.subject |
language model |
dc.subject |
named entity recognition |
dc.subject |
pos-tagging |
dc.subject |
dependency parsing |
dc.subject |
summarization |
dc.title |
IceEval - Icelandic Natural Language Processing Benchmark 22.09 |
dc.type |
toolService |
metashare.ResourceInfo#ContentInfo.detailedType |
tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent |
true |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Jón Friðrik Daðason jond19@ru.is Reykjavik University |
sponsor |
Icelandic Ministry of Education, Science and Culture Semantic analysis - BERT-type language models (I8) Language Technology Programme for Icelandic 2019-2023 nationalFunds |
files.size |
142592 |
files.count |
1 |
Files in this item
This item is
Publicly Available
and licensed under:
Apache License 2.0
- Name
- IceEval.zip
- Size
- 139.25
KB
- Format
- application/zip
- Description
- Unknown
- MD5
- a19ffef0b11fa77b3f8cfd9709ce46dc
Download file
Preview
- IceEval
- download_datasets.py8 kB
- mim-gold-ner-splits.json23 kB
- readme.txt2 kB
- run_ner.py26 kB
- requirements.txt143 B
- finetune.py9 kB
- get_results.py3 kB
- lib
- diaparser
- cmds
- biaffine_dependency.py3 kB
- __init__.py0 B
- cmd.py1 kB
- utils
- parallel.py1 kB
- __init__.py694 B
- fn.py3 kB
- vocab.py2 kB
- alg.py24 kB
- data.py7 kB
- logging.py1 kB
- transform.py27 kB
- config.py1 kB
- embedding.py1 kB
- metric.py5 kB
- corpus.py5 kB
- field.py15 kB
- common.py87 B
- catalog
- catalog-1.json3 kB
- upload-models.sh2 kB
- __init__.py143 B
- catalog.py6 kB
- modules
- __init__.py394 B
- lstm.py7 kB
- scalar_mix.py1 kB
- matrix_tree_theorem.py1 kB
- dropout.py3 kB
- mlp.py1 kB
- bert.py8 kB
- char_lstm.py2 kB
- affine.py1 kB
- parsers
- biaffine_dependency.py11 kB
- parser.py14 kB
- __init__.py178 B
- __init__.py76 B
- models
- __init__.py117 B
- dependency.py15 kB
- transformersum
- extractive.py65 kB
- classifier.py8 kB
- __init__.py0 B
- abstractive.py48 kB
- pooling.py4 kB
- data.py42 kB
- convert_to_extractive.py28 kB
- poly_lr_decay.py1 kB
- predict.py2 kB
- helpers.py16 kB
- main.py18 kB
Sýna einfalda færslu atriðis