dc.contributor.author |
Sigurðardóttir, Helga Svala |
dc.date.accessioned |
2021-10-25T16:39:28Z |
dc.date.available |
2021-10-25T16:39:28Z |
dc.date.issued |
2021-10-01 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/158 |
dc.description |
A corpus of:
* 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer
* 70,000 sentences taken from sports news, both before normalization and normalized using Regína normalizer
* 40,000 sentences taken from all domains, manually normalized
Textasafn sem samanstendur af:
* 70,000 setningum af almennum fréttum, bæði fyrir normun og eftir normun með Regínu normara
* 70,000 setningum af íþróttafréttum, bæði fyrir normun og eftir normun með Regínu normara
* 40,000 handnormuðum setningum úr alls konar texta |
dc.language.iso |
isl |
dc.publisher |
Reykjavik University |
dc.relation.isreferencedby |
https://aclanthology.org/2021.nodalida-main.45.pdf |
dc.relation.replaces |
http://hdl.handle.net/20.500.12537/155 |
dc.rights |
Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri |
https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label |
PUB |
dc.source.uri |
https://github.com/cadia-lvl/regina_normalizer |
dc.subject |
text-normalization |
dc.subject |
normalization |
dc.title |
Text Normalization Corpus 21.10 (2021-10-25) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
no |
branding |
Clarin IS Repository |
contact.person |
Helga Svala Sigurðardóttir helgas@ru.is Reykjavik University |
sponsor |
Ministry of Education, Science and Culture Text Normalization Corpus (T9) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
6 files |
files.size |
0 |
files.count |
0 |