Sýna einfalda færslu atriðis

 
dc.contributor.author Daðason, Jón Friðrik
dc.contributor.author Loftsson, Hrafn
dc.date.accessioned 2022-09-30T18:21:45Z
dc.date.available 2022-09-30T18:21:45Z
dc.date.issued 2022-09-30
dc.identifier.uri http://hdl.handle.net/20.500.12537/297
dc.description IceEval is a benchmark for evaluating and comparing the quality of pre-trained language models. The models are evaluated on a selection of four NLP tasks for Icelandic: part-of-speech tagging (using the MIM-GOLD corpus), named entity recognition (using the MIM-GOLD-NER corpus), dependency parsing (using the IcePaHC-UD corpus) and automatic text summarization (using the IceSum corpus). IceEval includes scripts for downloading the datasets, splitting them into training, validation and test splits and training and evaluating models for each task. The benchmark uses the Transformers, DiaParser and TransformerSum libraries for fine-tuning and evaluation. IceEval er tól til að meta og bera saman forþjálfuð mállíkön. Líkönin eru metin á fjórum máltækniverkefnum fyrir íslensku: mörkun (með MIM-GOLD málheildinni), nafnakennslum (með MIM-GOLD-NER málheildinni), þáttun (með IcePaHC-UD málheildinni) og sjálfvirkri samantekt (með IceSum málheildinni). IceEval inniheldur skriftur til að sækja gagnasöfnin, skipta þeim í þjálfunar- og prófunargögn og að fínstilla og meta líkön fyrir hvert verkefni. Transformers, DiaParser og TransformerSum forritasöfnin eru notuð til að fínstilla líkönin.
dc.language.iso isl
dc.publisher Reykjavik University
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/license/apache2-0-php/
dc.rights.label PUB
dc.subject benchmark
dc.subject language model
dc.subject named entity recognition
dc.subject pos-tagging
dc.subject dependency parsing
dc.subject summarization
dc.title IceEval - Icelandic Natural Language Processing Benchmark 22.09
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
contact.person Jón Friðrik Daðason jond19@ru.is Reykjavik University
sponsor Icelandic Ministry of Education, Science and Culture Semantic analysis - BERT-type language models (I8) Language Technology Programme for Icelandic 2019-2023 nationalFunds
files.size 142592
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
IceEval.zip
Size
139.25 KB
Format
application/zip
Description
Unknown
MD5
a19ffef0b11fa77b3f8cfd9709ce46dc
 Download file  Preview
 File Preview  
  • IceEval
    • download_datasets.py8 kB
    • mim-gold-ner-splits.json23 kB
    • readme.txt2 kB
    • run_ner.py26 kB
    • requirements.txt143 B
    • finetune.py9 kB
    • get_results.py3 kB
    • lib
      • diaparser
        • cmds
          • biaffine_dependency.py3 kB
          • __init__.py0 B
          • cmd.py1 kB
        • utils
          • parallel.py1 kB
          • __init__.py694 B
          • fn.py3 kB
          • vocab.py2 kB
          • alg.py24 kB
          • data.py7 kB
          • logging.py1 kB
          • transform.py27 kB
          • config.py1 kB
          • embedding.py1 kB
          • metric.py5 kB
          • corpus.py5 kB
          • field.py15 kB
          • common.py87 B
        • catalog
          • catalog-1.json3 kB
          • upload-models.sh2 kB
          • __init__.py143 B
          • catalog.py6 kB
        • modules
          • __init__.py394 B
          • lstm.py7 kB
          • scalar_mix.py1 kB
          • matrix_tree_theorem.py1 kB
          • dropout.py3 kB
          • mlp.py1 kB
          • bert.py8 kB
          • char_lstm.py2 kB
          • affine.py1 kB
        • parsers
          • biaffine_dependency.py11 kB
          • parser.py14 kB
          • __init__.py178 B
        • __init__.py76 B
        • models
          • __init__.py117 B
          • dependency.py15 kB
      • transformersum
        • extractive.py65 kB
        • classifier.py8 kB
        • __init__.py0 B
        • abstractive.py48 kB
        • pooling.py4 kB
        • data.py42 kB
        • convert_to_extractive.py28 kB
        • poly_lr_decay.py1 kB
        • predict.py2 kB
        • helpers.py16 kB
        • main.py18 kB

Sýna einfalda færslu atriðis