Show simple item record

 
dc.contributor.author Símonarson, Haukur Barri
dc.contributor.author Snæbjarnarson, Vésteinn
dc.date.accessioned 2022-05-09T15:48:32Z
dc.date.available 2022-05-09T15:48:32Z
dc.date.issued 2022-05-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/212
dc.description This is a pipeline for creating GreynirSeq domain-aware translation models. A valid checkpoint of a base translation model based on mBART25 can be finetuned as a domain translation model. The resulting model can be queried using a label for the requested domain. We recommend the English -- Icelandic translation models available in https://repository.clarin.is/repository/xmlui/handle/20.500.12537/125 . The included preprocess script expects a .tsv input file with the three fields (domains, english, icelandic), this is the training corpus. The script finetune.sh can be run to fine tune the model until convergence. Finally, one can run evaluate.sh to compute BLEU over the development set of Flores. See the README file for further details on setting up an environment and fetching data.
dc.description Þessi hugbúnaður sækir íslenskt enskt þýðingarlíkan og getur aðlagað það fyrir þjálfun á samhliða gögnum sem eru merkt eftir óðali. Kóðinn styður aðlögun á líkönum sem eru sambærileg mBART25. Fyrir þýðingar á milli ensku og íslensku mælum við með líkönunum hér: https://repository.clarin.is/repository/xmlui/handle/20.500.12537/125 . Á meðal skrifta í pakkanum er preprocess.sh sem er notuð til að forvinna þjálfunargögn á .tsv-sniði með þremur dálkum (óðal, enska, íslenka). Skriftan finetune.sh keyrir þjálfun í samleitni, og að lokum má keyra evaluate.sh til að meta BLEU yfir Flores-prófunarmálheildina. Frekari leiðbeiningar eru í README-skránni.
dc.publisher Miðeind ehf
dc.rights The MIT License (MIT)
dc.rights.uri https://opensource.org/licenses/mit-license.php
dc.rights.label PUB
dc.source.uri https://github.com/mideind/domain-translation-pipeline
dc.subject nmt
dc.subject machine translation
dc.subject neural machine translation
dc.title GreynirSeq Domain Translation Pipeline (22.06)
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent false
has.files yes
branding Clarin IS Repository
demo.uri https://velthyding.is
contact.person Haukur Barri Símonarson haukur@mideind.is Miðeind ehf
sponsor Ministry of Education, Science and Culture Machine translation V4b Language Technology for Icelandic 2019-2023 nationalFunds
files.size 4757242
files.count 2


 Files in this item

 Download all files in item (4.54 MB)
This item is
Publicly Available
and licensed under:
The MIT License (MIT)
Icon
Name
domain-translation-pipeline-main.zip
Size
4.54 MB
Format
application/zip
Description
Domain translation pipeline
MD5
87c350e4352f99627423ce3e7bbd4453
 Download file  Preview
 File Preview  
  • domain-translation-pipeline-main
    • environment.yaml1 kB
    • dicts
      • dict.txt3 MB
      • dict.en_XX.txt3 MB
      • dict.is_IS.txt3 MB
    • README.md1 kB
    • .gitignore1 kB
    • finetune.sh1 kB
    • reshape_checkpoint_embeddings.py2 kB
    • data
      • dummy.tsv225 B
    • fairseq_user_dir
      • transl_domain_bart.py10 kB
      • __init__.py33 B
    • preprocess.sh2 kB
    • LICENSE1 kB
    • evaluate.sh1 kB
    • preprocess.py2 kB
Icon
Name
README.md
Size
1.66 KB
Format
Unknown
Description
README
MD5
8cae4610810e498e3feb89076ed2642e
 Download file

Show simple item record