dc.contributor.author |
Sigurðardóttir, Helga Svala |
dc.date.accessioned |
2021-10-11T15:21:07Z |
dc.date.available |
2021-10-11T15:21:07Z |
dc.date.issued |
2021-10-01 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/155 |
dc.description |
A corpus of:
* 70,000 sentences taken from general text, both before normalization and normalized using Regína normalizer
* 70,000 sentences taken from sports news, both before normalization and normalized using Regína normalizer
* 40,000 sentences taken from all domains, manually normalized
Textasafn sem samanstendur af:
* 70,000 setningum af almennum fréttum, bæði fyrir normun og eftir normun með Regínu normara
* 70,000 setningum af íþróttafréttum, bæði fyrir normun og eftir normun með Regínu normara
* 40,000 handnormuðum setningum úr alls konar texta |
dc.language.iso |
isl |
dc.publisher |
Reykjavik University |
dc.relation.isreferencedby |
https://aclanthology.org/2021.nodalida-main.45.pdf |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/158 |
dc.rights |
Icelandic Gigaword Corpus |
dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-gigaword-corpus |
dc.rights.label |
PUB |
dc.source.uri |
https://github.com/cadia-lvl/regina_normalizer |
dc.subject |
text-normalization |
dc.subject |
normalization |
dc.title |
Text Normalization Corpus 21.10 |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Helga Svala Sigurðardóttir helgas@ru.is Reykjavik University |
sponsor |
Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) Máltækniáætlun Máltækniáætlun nationalFunds |
size.info |
6 files |
files.size |
17796824 |
files.count |
1 |
withdrawn.reason |
Duplication of http://hdl.handle.net/20.500.12537/158 |