dc.contributor.author | Ingólfsdóttir, Svanhvít Lilja |
dc.contributor.author | Snæbjarnarson, Vésteinn |
dc.date.accessioned | 2022-06-01T09:11:58Z |
dc.date.available | 2022-06-01T09:11:58Z |
dc.date.issued | 2022-05-31 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/217 |
dc.description | The Icelandic Error Corpus (http://hdl.handle.net/20.500.12537/73) was used to fine tune the Icelandic language model IceBERT-xlmr-ic3 for token classification. The objective was to train grammatical error detection models that could classify whether a token range contains a particular error type. The model can mark tokens as including one of the following issue categories: coherence, grammar, orthography, other, style and vocabulary. The overall F1 score is 71 and for individual categories as follows: coherence: 0; grammar: 63; orthography: 86; other: 0; vocabulary: 15.2. |
dc.description | Íslenska villumálheildin (http://hdl.handle.net/20.500.12537/73) var notuð til að fínþjálfa íslenska mállíkanið IceBERT-xlmr-ic3 fyrir flokkun á tókum/orðum. Markmiðið var að þjálfa líkan sem getur greint hvort orð innihaldi ákveðna villutegund. Líkanið getur merkt við orð með einu af eftirfarandi mörkum: coherence, grammar, orthography, other, style og vocabulary. F1 yfir heildina er 71. |
dc.language.iso | isl |
dc.publisher | Miðeind ehf |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.subject | ged |
dc.subject | grammatical error detection |
dc.title | Error Classifier (Icelandic Error Corpus categories) for Tokens (22.05) |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Svanhvít Lilja Ingólfsdóttir svanhvit@mideind.is Miðeind ehf |
sponsor | Ministry of Education, Science and Culture Spell and grammar checking with neural networks (L14) Language Technology for Icelandic 2019-2023 nationalFunds |
files.size | 853250188 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- icebert-xlmr-ic3-iec.tar.gz
- Size
- 813.72 MB
- Format
- application/gzip
- Description
- transformers model checkpoint
- MD5
- 626e94e7d34f65f67b6dd44160c71bd0
- icebert-xlmr-ic3-iec
- config.json1 kB
- training_args.bin2 kB
- unigram.json14 MB
- special_tokens_map.json239 B
- tokenizer_config.json398 B
- tokenizer.json8 MB
- pytorch_model.bin1 GB
- trainer_state.json6 kB