Show simple item record

 
dc.contributor.author Snæbjarnarson, Vésteinn
dc.contributor.author Símonarson, Haukur Barri
dc.contributor.author Ragnarsson, Pétur Orri
dc.contributor.author Jónsson, Haukur Páll
dc.contributor.author Ingólfsdóttir, Svanhvít Lilja
dc.contributor.author Þorsteinsson, Vilhjálmur
dc.date.accessioned 2021-09-26T20:11:55Z
dc.date.available 2021-09-26T20:11:55Z
dc.date.issued 2021-09-23
dc.identifier.uri http://hdl.handle.net/20.500.12537/125
dc.description Provided are a general domain IS-EN and EN-IS translation models developed by Miðeind ehf. They are based on a multilingual BART model (https://arxiv.org/pdf/2001.08210.pdf) and finetuned for translation on parallel and backtranslated data. The model is trained using the Fairseq sequence modeling toolkit by PyTorch. Provided here are a model files, sentencepiece subword-tokenizing model and dictionary files for running the model locally. You can run the scripts infer-enis.sh and infer-isen.sh to test the model by translating sentences command-line. For translating documents and evaluating results you will need to binarize the data using fairseq-preprocess and use fairseq-generate for translating. Please refer to the Fairseq documentation for further information on running a pre-trained model: https://fairseq.readthedocs.io/en/latest/ - Pakkinn inniheldur almenn þýðingarlíkön fyrir áttirnar IS-EN og EN-IS þróuð af Miðeind ehf. Þau eru byggð á margmála BART líkani (https://arxiv.org/pdf/2001.08210.pdf) og fínþjálfuð fyrir þýðingar. Líkönin eru þjálfað með Fairseq og PyTorch. Líkönin sjálf og ásamt sentencepiece tilreiðingarlíkani eru gerð aðgengileg. Skripturnar infer-enis.sh og infer-isen.sh gefa dæmi um hvernig er hægt að keyra líkönin á skipanalínu. Til að þýða stór skjöl og meta niðurstöður þarf að nota fairseq-preprocess skipunina ásamt fairseq-generate. Frekari upplýsingar er að finna í Fairseq leiðbeiningunum: https://fairseq.readthedocs.io/en/latest/
dc.language.iso isl
dc.language.iso eng
dc.publisher Miðeind ehf
dc.relation.isreferencedby https://arxiv.org/abs/2109.07343
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://velthyding.is
dc.subject machine translation
dc.subject neural machine translation
dc.subject model
dc.title GreynirTranslate - mBART25 NMT models for Translations between Icelandic and English (1.0)
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
demo.uri https://velthyding.is
contact.person Vésteinn Snæbjarnarson vesteinn@mideind.is Mideind ehf
sponsor Ministry of Education, Science and Culture Back-translation data selection and filtering (V2b) Language Technology for Icelandic 2019-2023 nationalFunds
files.size 18763034650
files.count 6


 Files in this item

 Download all files in item (17.47 GB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
data-bin_enis.zip
Size
168 bytes
Format
application/zip
Description
BPE dictionaries (all the same)
MD5
cfe021da42831f1fff330a99c5a7e152
 Download file  Preview
 File Preview  
Icon
Name
infer_enis.sh
Size
340 bytes
Format
Unknown
Description
Inference script for English to Icelandic direction.
MD5
6a97202a2c3c1cb88dee77505b141eb9
 Download file
Icon
Name
sentence.bpe.model
Size
4.83 MB
Format
Unknown
Description
Sentencepiece subword vocabulary file
MD5
bf25eb5120ad92ef5c7d8596b5dc4046
 Download file
Icon
Name
mbart_nmt_isen.pt
Size
8.73 GB
Format
Unknown
Description
NMT model file for Icelandic to English direction
MD5
2edc01e257219b5144301d65d37ce30a
 Download file
Icon
Name
mbart_nmt_enis.pt
Size
8.73 GB
Format
Unknown
Description
NMT model file for English to icelandic direction
MD5
de9966bc87df775ead03d1214d802a7e
 Download file
Icon
Name
infer_isen.sh
Size
339 bytes
Format
Unknown
Description
Inference script for the Icelandic to English translation direction
MD5
af59d6b248d9b25a59781549d21bda33
 Download file

Show simple item record