dc.contributor.author | Ármannsson, Bjarki |
dc.contributor.author | Ingimundarson, Finnur Ágúst |
dc.contributor.author | Sigurðsson, Einar Freyr |
dc.date.accessioned | 2024-10-30T14:38:20Z |
dc.date.available | 2024-10-30T14:38:20Z |
dc.date.issued | 2024-10-30 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/348 |
dc.description | This package contains a benchmarking data set to evaluate the grammatical knowledge and linguistic ability of Large Language Models (LLMs) for Icelandic. It is meant to help LLM developers to improve their model's Icelandic proficiency in a measurable way. The published benchmark set contains 1160 hand-written items spread over 19 subcategories of syntax, morphology and semantics, tested with 5 different methods. We also include a set of translation tasks to test a model's language understanding and grammatical capabilities in producing Icelandic text. See README for further information. Þessi pakki hefur að geyma prófunargögn til að meta málkunnáttu og málfræðilega getu stórra mállíkana fyrir íslensku. Þeim er ætlað að nýtast þeim sem þróa stór mállíkön við að bæta íslenskukunnáttu líkana sinna á mælanlegan hátt. Prófunargögnin samanstanda af 1160 handskrifuðum færslum sem ná yfir nítján undirflokka í setningafræðilegum, orðhlutafræðilegum og merkingarfræðilegum atriðum. Fimm ólíkar prófunaraðferðir eru notaðar við mat á getu líkans. Pakkinn inniheldur einnig þýðingarpróf sem eiga að meta málskilning líkansins og málhæfni við að mynda texta á íslensku. Nánari upplýsingar má finna í README skránni. |
dc.language.iso | isl |
dc.publisher | The Árni Magnússon Institute for Icelandic Studies |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | Icelandic |
dc.subject | Large Language Models |
dc.subject | LLMs |
dc.subject | Linguistic Benchmark |
dc.subject | Word Sense Disambiguation |
dc.subject | Grammaticality Judgments |
dc.subject | Anaphora Resolution |
dc.subject | Coreference Resolution |
dc.subject | Wug test |
dc.title | Icelandic Linguistic Benchmark for LLMs 24.10 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Bjarki Ármannsosn bjarki.armannsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies |
sponsor | Ministry of Culture and Business Affairs Benchmarking data sets for LLMs (G12) Language Technology for Icelandic nationalFunds |
files.size | 30713 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- ice_llm_linguistic_benchmark.zip
- Size
- 29.99 KB
- Format
- application/zip
- Description
- ice_llm_linguistic_benchmark
- MD5
- fda4d705cd395ca2bb8b08a89954dd48