Multilabel Error Classifier (Icelandic Error Corpus categories) for Sentences (22.01)

Ingólfsdóttir, Svanhvít Lilja; Ragnarsson, Pétur Orri; Snæbjarnarson, Vésteinn

dc.contributor.author	Ingólfsdóttir, Svanhvít Lilja
dc.contributor.author	Ragnarsson, Pétur Orri
dc.contributor.author	Snæbjarnarson, Vésteinn
dc.date.accessioned	2022-01-26T10:18:35Z
dc.date.available	2022-01-26T10:18:35Z
dc.date.issued	2022-01-25
dc.identifier.uri	http://hdl.handle.net/20.500.12537/183
dc.description	The Icelandic Error Corpus (IEC) was used to fine tune the Icelandic language model IceBERT for sentence classification. The objective was to train grammatical error detection models that could classify whether a sentence contains a particular error type. The model can mark sentences as including one or more of the following issues: coherence, grammar, orthography, other, style and vocabulary. The overall F1 score is a modest 64%. --- Íslenska villumálheildin (IEC) var notuð til að fínþjálfa íslenska mállíkanið IceBERT fyrir flokkun á setningum. Markmiðið var að þjálfa líkan sem getur greint hvort setning innihaldi ákveðna villutegund. Líkanið getur merkt við setningar með einum eða fleiri mörkum af eftirfarandi: coherence, grammar, orthography, other, style og vocabulary. F1 yfir heildina er 64%.
dc.language.iso	isl
dc.publisher	Miðeind ehf
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	https://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.subject	iec
dc.subject	ged
dc.subject	grammatical error detection
dc.subject	icelandic error corpus
dc.title	Multilabel Error Classifier (Icelandic Error Corpus categories) for Sentences (22.01)
dc.type	toolService
metashare.ResourceInfo#ContentInfo.detailedType	tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent	true
has.files	yes
branding	Clarin IS Repository
contact.person	Vésteinn Snæbjarnarson vesteinn@mideind.is Miðeind ehf
sponsor	Ministry of Education, Science and Culture Spell and grammar checking with neural networks (L14) Language Technology for Icelandic 2019-2023 nationalFunds
files.size	459882463
files.count	1

Files in this item

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Name: IceBERT-ged-sentence-supercats-multilabel.zip
Size: 438.58 MB
Format: application/zip
Description: Unknown
MD5: d3dabf9a8285862fa3fd295f5f551996

Download file Preview

File Preview

IceBERT-ged-sentence-supercats-multilabel
- README.md378 B
- pytorch_model.bin474 MB
- tokenizer_config.json1 kB
- merges.txt581 kB
- vocab.json912 kB
- config.json1 kB
- run_inference.py679 B
- tokenizer.json1 MB
- special_tokens_map.json772 B

Show simple item record

Files in this item

Partners, Coordination, Funding

Repository

More