dc.contributor.author | Snæbjarnarson, Vésteinn |
dc.contributor.author | Símonarson, Haukur Barri |
dc.contributor.author | Ragnarsson, Pétur Orri |
dc.contributor.author | Jónsson, Haukur Páll |
dc.contributor.author | Ingólfsdóttir, Svanhvít Lilja |
dc.contributor.author | Þorsteinsson, Vilhjálmur |
dc.date.accessioned | 2021-09-26T20:11:55Z |
dc.date.available | 2021-09-26T20:11:55Z |
dc.date.issued | 2021-09-23 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/125 |
dc.description | Provided are a general domain IS-EN and EN-IS translation models developed by Miðeind ehf. They are based on a multilingual BART model (https://arxiv.org/pdf/2001.08210.pdf) and finetuned for translation on parallel and backtranslated data. The model is trained using the Fairseq sequence modeling toolkit by PyTorch. Provided here are a model files, sentencepiece subword-tokenizing model and dictionary files for running the model locally. You can run the scripts infer-enis.sh and infer-isen.sh to test the model by translating sentences command-line. For translating documents and evaluating results you will need to binarize the data using fairseq-preprocess and use fairseq-generate for translating. Please refer to the Fairseq documentation for further information on running a pre-trained model: https://fairseq.readthedocs.io/en/latest/ - Pakkinn inniheldur almenn þýðingarlíkön fyrir áttirnar IS-EN og EN-IS þróuð af Miðeind ehf. Þau eru byggð á margmála BART líkani (https://arxiv.org/pdf/2001.08210.pdf) og fínþjálfuð fyrir þýðingar. Líkönin eru þjálfað með Fairseq og PyTorch. Líkönin sjálf og ásamt sentencepiece tilreiðingarlíkani eru gerð aðgengileg. Skripturnar infer-enis.sh og infer-isen.sh gefa dæmi um hvernig er hægt að keyra líkönin á skipanalínu. Til að þýða stór skjöl og meta niðurstöður þarf að nota fairseq-preprocess skipunina ásamt fairseq-generate. Frekari upplýsingar er að finna í Fairseq leiðbeiningunum: https://fairseq.readthedocs.io/en/latest/ |
dc.language.iso | isl |
dc.language.iso | eng |
dc.publisher | Miðeind ehf |
dc.relation.isreferencedby | https://arxiv.org/abs/2109.07343 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | PUB |
dc.source.uri | https://velthyding.is |
dc.subject | machine translation |
dc.subject | neural machine translation |
dc.subject | model |
dc.title | GreynirTranslate - mBART25 NMT models for Translations between Icelandic and English (1.0) |
dc.type | toolService |
metashare.ResourceInfo#ContentInfo.detailedType | tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent | true |
has.files | yes |
branding | Clarin IS Repository |
demo.uri | https://velthyding.is |
contact.person | Vésteinn Snæbjarnarson vesteinn@mideind.is Mideind ehf |
sponsor | Ministry of Education, Science and Culture Back-translation data selection and filtering (V2b) Language Technology for Icelandic 2019-2023 nationalFunds |
files.size | 18763034650 |
files.count | 6 |
Files in this item
Download all files in item (17.47 GB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- data-bin_enis.zip
- Size
- 168 bytes
- Format
- application/zip
- Description
- BPE dictionaries (all the same)
- MD5
- cfe021da42831f1fff330a99c5a7e152
- Name
- infer_enis.sh
- Size
- 340 bytes
- Format
- Unknown
- Description
- Inference script for English to Icelandic direction.
- MD5
- 6a97202a2c3c1cb88dee77505b141eb9
- Name
- sentence.bpe.model
- Size
- 4.83 MB
- Format
- Unknown
- Description
- Sentencepiece subword vocabulary file
- MD5
- bf25eb5120ad92ef5c7d8596b5dc4046
- Name
- mbart_nmt_isen.pt
- Size
- 8.73 GB
- Format
- Unknown
- Description
- NMT model file for Icelandic to English direction
- MD5
- 2edc01e257219b5144301d65d37ce30a
- Name
- mbart_nmt_enis.pt
- Size
- 8.73 GB
- Format
- Unknown
- Description
- NMT model file for English to icelandic direction
- MD5
- de9966bc87df775ead03d1214d802a7e
- Name
- infer_isen.sh
- Size
- 339 bytes
- Format
- Unknown
- Description
- Inference script for the Icelandic to English translation direction
- MD5
- af59d6b248d9b25a59781549d21bda33