Sýna einfalda færslu atriðis

 
dc.contributor.author Ármannsson, Bjarki
dc.contributor.author Ingimundarson, Finnur Ágúst
dc.contributor.author Sigurðsson, Einar Freyr
dc.date.accessioned 2024-10-30T14:38:20Z
dc.date.available 2024-10-30T14:38:20Z
dc.date.issued 2024-10-30
dc.identifier.uri http://hdl.handle.net/20.500.12537/348
dc.description This package contains a benchmarking data set to evaluate the grammatical knowledge and linguistic ability of Large Language Models (LLMs) for Icelandic. It is meant to help LLM developers to improve their model's Icelandic proficiency in a measurable way. The published benchmark set contains 1160 hand-written items spread over 19 subcategories of syntax, morphology and semantics, tested with 5 different methods. We also include a set of translation tasks to test a model's language understanding and grammatical capabilities in producing Icelandic text. See README for further information. Þessi pakki hefur að geyma prófunargögn til að meta málkunnáttu og málfræðilega getu stórra mállíkana fyrir íslensku. Þeim er ætlað að nýtast þeim sem þróa stór mállíkön við að bæta íslenskukunnáttu líkana sinna á mælanlegan hátt. Prófunargögnin samanstanda af 1160 handskrifuðum færslum sem ná yfir nítján undirflokka í setningafræðilegum, orðhlutafræðilegum og merkingarfræðilegum atriðum. Fimm ólíkar prófunaraðferðir eru notaðar við mat á getu líkans. Pakkinn inniheldur einnig þýðingarpróf sem eiga að meta málskilning líkansins og málhæfni við að mynda texta á íslensku. Nánari upplýsingar má finna í README skránni.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.subject Icelandic
dc.subject Large Language Models
dc.subject LLMs
dc.subject Linguistic Benchmark
dc.subject Word Sense Disambiguation
dc.subject Grammaticality Judgments
dc.subject Anaphora Resolution
dc.subject Coreference Resolution
dc.subject Wug test
dc.title Icelandic Linguistic Benchmark for LLMs 24.10
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Bjarki Ármannsosn bjarki.armannsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Culture and Business Affairs Benchmarking data sets for LLMs (G12) Language Technology for Icelandic nationalFunds
files.size 30713
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
ice_llm_linguistic_benchmark.zip
Size
29.99 KB
Format
application/zip
Description
ice_llm_linguistic_benchmark
MD5
fda4d705cd395ca2bb8b08a89954dd48
 Download file  Preview
 File Preview  
    • ice_benchmark_set.jsonl260 kB
    • README.md5 kB
    • translation_tasks.jsonl29 kB

Sýna einfalda færslu atriðis