Sýna einfalda færslu atriðis

 
dc.contributor.advisor
dc.contributor.author Ármannsson, Bjarki
dc.contributor.author Ingimundarson, Finnur Ágúst
dc.contributor.author Sigurðsson, Einar Freyr
dc.date.accessioned 2024-09-26T13:24:39Z
dc.date.available 2024-09-26T13:24:39Z
dc.date.issued 2024-09-25
dc.identifier.uri http://hdl.handle.net/20.500.12537/342
dc.description This package contains a benchmarking data set to evaluate the grammatical knowledge and linguistic ability of Large Language Models (LLMs) for Icelandic. It is meant to help LLM developers to improve their model's Icelandic proficiency in a measurable way. The published benchmark set contains 1160 hand-written items spread over 19 subcategories of syntax, morphology and semantics, tested with 5 different methods. We also include a set of translation tasks to test a model's language understanding and grammatical capabilities in producing Icelandic text. See README for further information. Þessi pakki hefur að geyma prófunargögn til að meta málkunnáttu og málfræðilega getu stórra mállíkana fyrir íslensku. Þeim er ætlað að nýtast þeim sem þróa stór mállíkön við að bæta íslenskukunnáttu líkana sinna á mælanlegan hátt. Prófunargögnin samanstanda af 1160 handskrifuðum færslum sem ná yfir nítján undirflokka í setningafræðilegum, orðhlutafræðilegum og merkingarfræðilegum atriðum. Fimm ólíkar prófunaraðferðir eru notaðar við mat á getu líkans. Pakkinn inniheldur einnig þýðingarpróf sem eiga að meta málskilning líkansins og málhæfni við að mynda texta á íslensku. Nánari upplýsingar má finna í README skránni.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject Large Language Models
dc.subject LLMs
dc.subject Linguistic Benchmarks
dc.subject Grammaticality tests
dc.subject Anaphora resolution
dc.subject Word formation
dc.subject Coreference resolution
dc.subject Wug tests
dc.subject Fragment answering
dc.subject Word sense disambiguation
dc.title Icelandic Linguistic Benchmark for LLMs 24.09
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Bjarki Ármannsson bjarki.armannsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Culture and Business Affairs Benchmarking data sets for LLMs (G12) Language Technology for Icelandic nationalFunds
files.size 34630
files.count 2


 Files in this item

 Download all files in item (33.82 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
README.md
Size
5.7 KB
Format
Unknown
Description
README
MD5
fc3c64333ae7e2b2d54f00d1686a2617
 Download file
Icon
Name
ice_llm_linguistic_benchmark.zip
Size
28.11 KB
Format
application/zip
Description
ice_llm_linguistic_benchmark
MD5
327bd834946095222cff8e5d58d3d61a
 Download file  Preview
 File Preview  
    • ice_benchmark_set.jsonl260 kB
    • translation_tasks.jsonl29 kB

Sýna einfalda færslu atriðis