Show simple item record

 
dc.contributor.author Jasonarson, Atli
dc.contributor.author Steingrímsson, Steinþór
dc.contributor.author Sigurðsson, Einar Freyr
dc.contributor.author Daðason, Jón Friðrik
dc.date.accessioned 2022-09-27T14:08:38Z
dc.date.available 2022-09-27T14:08:38Z
dc.date.issued 2022-10-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/273
dc.description ENGLISH: This Universal Dependencies parser for Icelandic was trained with Diaparser [1] on IcePaHC [2] and UD_Icelandic-Modern [3], the latter one having been revised before training, as some duplicate sentences had to be removed. The parser utilizes information from an ELECTRA language model [4]. Its UAS (unlabeled attachment score) is 89.52 and its LAS (labeled attachment score) is 86.23.
dc.description ICELANDIC: Þessi UD-þáttari var þjálfaður með Diaparser [1] á IcePaHC [2] og UD_Icelandic-Modern [3] en síðarnefnda málheildin var uppfærð fyrir þjálfun tólsins, þar sem fjarlægðar voru úr henni endurteknar setningar. Þáttarinn nýtir sér upplýsingar úr ELECTRA-mállíkani [4]. Hann skorar 89.52 á UAS (unlabeled attachment score) og 86.23 á LAS (labeled attachment score). [1] Diaparser: https://github.com/Unipisa/diaparser [2] IcePaHC: https://github.com/UniversalDependencies/UD_Icelandic-IcePaHC/ [3] UD_Icelandic-Modern: https://github.com/UniversalDependencies/UD_Icelandic-Modern/ [4] electra-base-igc-is: https://huggingface.co/jonfd/electra-base-igc-is
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.isreplacedby http://hdl.handle.net/20.500.12537/302
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/license/apache2-0-php/
dc.rights.label PUB
dc.subject universal dependencies
dc.subject parsing
dc.title Biaffine-based UD Parser 22.10
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
contact.person Steinþór Steingrímsson steinthor.steingrimsson@arnastofnun.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture (Mennta- og menningamálaráðuneytið) I5 – Parsers Language Technology for Icelandic 2019-2023 nationalFunds
files.size 871982003
files.count 2


 Files in this item

 Download all files in item (831.59 MB)
This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
biaffine-dependency-parser.zip
Size
831.58 MB
Format
application/zip
Description
Biaffine-dependency-parser
MD5
467f558f6aa6b7e5a6b36ebddff43d57
 Download file  Preview
 File Preview  
  • biaffine-dependency-parser
    • transformer_models
      • electra-base-igc-is
        • config.json466 B
        • README.md666 B
        • tokenizer_config.json73 B
        • special_tokens_map.json112 B
        • pytorch_model.bin422 MB
        • .gitattributes1 kB
        • vocab.txt253 kB
    • parse_file.py607 B
    • Tokenizer
      • src
        • tokenizer
          • __init__.py2 kB
          • tokenizer.py125 kB
          • __pycache__
            • abbrev.cpython-38.pyc8 kB
            • __init__.cpython-38.pyc2 kB
            • tokenizer.cpython-38.pyc62 kB
            • definitions.cpython-38.pyc15 kB
            • version.cpython-38.pyc194 B
          • abbrev.py13 kB
          • definitions.py27 kB
          • version.py22 B
          • py.typed0 B
          • main.py9 kB
          • Abbrev.conf46 kB
      • setup.py3 kB
      • .gitignore1 kB
      • README.rst39 kB
      • setup.cfg64 B
      • test
        • toktest_large.txt559 kB
        • test_tokenizer_tok.py18 kB
        • toktest_normal.txt12 kB
        • test_index_calculation.py20 kB
        • toktest_normal_gold_expected.txt13 kB
        • toktest_edgecases.txt6 kB
        • toktest_edgecases_gold_expected.txt6 kB
        • test_detokenize.py2 kB
        • Overview.txt34 kB
        • toktest_large_gold_perfect.txt576 kB
        • toktest_large_gold_acceptable.txt583 kB
        • example.txt3 kB
        • toktest_sentences.txt21 kB
        • toktest_edgecases_diff.txt751 B
        • test_tokenizer.py102 kB
      • .github
      • .git
        • logs
        • info
          • exclude240 B
        • config261 B
        • packed-refs2 kB
        • index2 kB
        • HEAD23 B
        • refs
        • description73 B
        • hooks
          • push-to-checkout.sample2 kB
          • applypatch-msg.sample478 B
          • commit-msg.sample896 B
          • pre-push.sample1 kB
          • pre-rebase.sample4 kB
          • post-update.sample189 B
          • pre-receive.sample544 B
          • pre-applypatch.sample424 B
          • update.sample3 kB
          • pre-commit.sample1 kB
          • pre-merge-commit.sample416 B
          • fsmonitor-watchman.sample4 kB
          • prepare-commit-msg.sample1 kB
        • objects
          • pack
            • pack-258487a22aa4bdca999801128b1c799f60c4d907.idx74 kB
            • pack-258487a22aa4bdca999801128b1c799f60c4d907.pack988 kB
          • info
          • branches
          • release.sh251 B
          • LICENSE1 kB
          • MANIFEST.in74 B
        • electra-ud-parser
          • parser.train.log99 kB
          • parser474 MB
        • requirements.txt36 B
        • test_file.txt103 B
      Icon
      Name
      README.md
      Size
      1.97 KB
      Format
      Unknown
      Description
      Readme
      MD5
      bffde57167f046cffc8bd2bf8b1e5552
       Download file

      Show simple item record