dc.contributor.author | Ingólfsdóttir, Svanhvít Lilja |
dc.contributor.author | Ragnarsson, Pétur Orri |
dc.contributor.author | Snæbjarnarson, Vésteinn |
dc.date.accessioned | 2022-01-26T10:18:35Z |
dc.date.available | 2022-01-26T10:18:35Z |
dc.date.issued | 2022-01-25 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/183 |
dc.description | The Icelandic Error Corpus (IEC) was used to fine tune the Icelandic language model IceBERT for sentence classification. The objective was to train grammatical error detection models that could classify whether a sentence contains a particular error type. The model can mark sentences as including one or more of the following issues: coherence, grammar, orthography, other, style and vocabulary. The overall F1 score is a modest 64%. --- Íslenska villumálheildin (IEC) var notuð til að fínþjálfa íslenska mállíkanið IceBERT fyrir flokkun á setningum. Markmiðið var að þjálfa líkan sem getur greint hvort setning innihaldi ákveðna villutegund. Líkanið getur merkt við setningar með einum eða fleiri mörkum af eftirfarandi: coherence, grammar, orthography, other, style og vocabulary. F1 yfir heildina er 64%. |
dc.language.iso | isl |
dc.publisher | Miðeind ehf |
dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0/ |
dc.rights.label | PUB |
dc.subject | iec |
dc.subject | ged |
dc.subject | grammatical error detection |
dc.subject | icelandic error corpus |
dc.title | Multilabel Error Classifier (Icelandic Error Corpus categories) for Sentences (22.01) |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Vésteinn Snæbjarnarson vesteinn@mideind.is Miðeind ehf |
sponsor | Ministry of Education, Science and Culture Spell and grammar checking with neural networks (L14) Language Technology for Icelandic 2019-2023 nationalFunds |
files.size | 459882463 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- IceBERT-ged-sentence-supercats-multilabel.zip
- Size
- 438.58 MB
- Format
- application/zip
- Description
- Unknown
- MD5
- d3dabf9a8285862fa3fd295f5f551996
- IceBERT-ged-sentence-supercats-multilabel
- README.md378 B
- pytorch_model.bin474 MB
- tokenizer_config.json1 kB
- merges.txt581 kB
- vocab.json912 kB
- config.json1 kB
- run_inference.py679 B
- tokenizer.json1 MB
- special_tokens_map.json772 B