dc.contributor.author |
Símonarson, Haukur Barri |
dc.contributor.author |
Snæbjarnarson, Vésteinn |
dc.contributor.author |
Þorsteinsson, Vilhjálmur |
dc.date.accessioned |
2020-09-29T15:37:24Z |
dc.date.available |
2020-09-29T15:37:24Z |
dc.date.issued |
2020-09-28 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/70 |
dc.description |
Synthetic back-translated training corpus for neural machine translation. The GreynirT2T Transformer network created the corpus by translating Icelandic and English sentences. The English sentences (44,7m) are retrieved from the Wikipedia, Newscrawl and Europarl corpora. The Icelandic sentences (31,3m) are sourced from the Icelandic Gigaword Corpus.
Samhliða gervimálheild með bakþýddum þjálfunargögnum fyrir vélþýðingar. Tauganetið GreynirT2T Transformer bjó til málheildina með því að þýða enskar og íslenskar setningar. Ensku setningarnar (44,7m) eru fengnar úr Wikipedia, Newscrawl og Europarl málheildunum. Þær íslensku eru fengnar úr Risamálheildinni (31,3m). |
dc.language.iso |
isl |
dc.language.iso |
eng |
dc.publisher |
Miðeind ehf. |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/127 |
dc.rights |
Icelandic Gigaword Corpus Part1 |
dc.rights.uri |
https://repository.clarin.is/repository/xmlui/page/license-gigaword-corpus-p1 |
dc.rights.label |
PUB |
dc.source.uri |
https://github.com/mideind/GreynirT2T |
dc.subject |
parallel corpus |
dc.subject |
machine translation |
dc.subject |
back translation |
dc.subject |
neural machine translation |
dc.title |
En-Is Synthetic Parallel Corpus (20.09) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Vilhjálmur Þorsteinsson clarin@mideind.is Miðeind ehf. |
sponsor |
Ministry of Education, Science and Culture Back-translation data selection and filtering (V2b) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
76000000 sentences |
files.size |
7528925378 |
files.count |
1 |