Error Classifier (Icelandic Error Corpus categories) for Tokens (22.05)

Ingólfsdóttir, Svanhvít Lilja; Snæbjarnarson, Vésteinn

dc.contributor.author	Ingólfsdóttir, Svanhvít Lilja
dc.contributor.author	Snæbjarnarson, Vésteinn
dc.date.accessioned	2022-06-01T09:11:58Z
dc.date.available	2022-06-01T09:11:58Z
dc.date.issued	2022-05-31
dc.identifier.uri	http://hdl.handle.net/20.500.12537/217
dc.description	The Icelandic Error Corpus (http://hdl.handle.net/20.500.12537/73) was used to fine tune the Icelandic language model IceBERT-xlmr-ic3 for token classification. The objective was to train grammatical error detection models that could classify whether a token range contains a particular error type. The model can mark tokens as including one of the following issue categories: coherence, grammar, orthography, other, style and vocabulary. The overall F1 score is 71 and for individual categories as follows: coherence: 0; grammar: 63; orthography: 86; other: 0; vocabulary: 15.2.
dc.description	Íslenska villumálheildin (http://hdl.handle.net/20.500.12537/73) var notuð til að fínþjálfa íslenska mállíkanið IceBERT-xlmr-ic3 fyrir flokkun á tókum/orðum. Markmiðið var að þjálfa líkan sem getur greint hvort orð innihaldi ákveðna villutegund. Líkanið getur merkt við orð með einu af eftirfarandi mörkum: coherence, grammar, orthography, other, style og vocabulary. F1 yfir heildina er 71.
dc.language.iso	isl
dc.publisher	Miðeind ehf
dc.rights	Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.label	PUB
dc.subject	ged
dc.subject	grammatical error detection
dc.title	Error Classifier (Icelandic Error Corpus categories) for Tokens (22.05)
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
has.files	yes
branding	Clarin IS Repository
contact.person	Svanhvít Lilja Ingólfsdóttir svanhvit@mideind.is Miðeind ehf
sponsor	Ministry of Education, Science and Culture Spell and grammar checking with neural networks (L14) Language Technology for Icelandic 2019-2023 nationalFunds
files.size	853250188
files.count	1

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)

Name: icebert-xlmr-ic3-iec.tar.gz
Size: 813.72 MB
Format: application/gzip
Description: transformers model checkpoint
MD5: 626e94e7d34f65f67b6dd44160c71bd0

Download file Preview

File Preview

icebert-xlmr-ic3-iec
- config.json1 kB
- training_args.bin2 kB
- unigram.json14 MB
- special_tokens_map.json239 B
- tokenizer_config.json398 B
- tokenizer.json8 MB
- pytorch_model.bin1 GB
- trainer_state.json6 kB

Sýna einfalda færslu atriðis

Files in this item

Samstarfsaðilar, stjórn og fjármögnun

Gagnasafn

Meira