Show simple item record

 
dc.contributor.author Friðriksdóttir, Steinunn Rut
dc.contributor.author Jasonarson, Atli
dc.date.accessioned 2021-08-12T13:07:14Z
dc.date.available 2021-08-12T13:07:14Z
dc.date.issued 2021-08-12
dc.identifier.uri http://hdl.handle.net/20.500.12537/123
dc.description ALEXIA is a command-line based corpus tool used for comparing a certain vocabulary to that of a larger corpus or corpora. In order to maintain lexicons, dictionaries and terminologies, it is necessary to be able to systematically go through large amounts of text considered to be representative of the language or category in question in order to find potential gaps in the data. ALEXIA provides an easy way to generate such candidate lists. In order to successfully run ALEXIA, the user must run main.py This script offers two language options, Icelandic and English. It guides the user through a series of options, including the necessary set-up of SQL-databases. After the setup is completed, the user is offered the option of continuing to the actual program. The user is greeted with a welcome message and asked whether to create the default databases for the demo version of the program or if they want to provide their own lexicon files. If the default set-up is chosen, the user must indicate whether to use the Database of Icelandic Morphology (DIM) or A Dictionary of Contemporary Icelandic (DCI) whose vocabulary is then compared to that of the Icelandic Gigaword Corpus (IGC). A number of filters is used to limit distortion from the results. __ ALEXIA er málheildartól sem er keyrt í gegnum skipanalínuna og tilgangur þess er að bera saman orðaforða gagnasafns við orðaforða stórrar málheildar. Það er nauðsynlegt til þess að viðhalda orðasöfnum, orðabókum og íðorðabönkum að geta farið kerfisbundið í gegnum mikið magn texta sem er álitinn táknrænn fyrir tungumálið eða efnisflokkinn sem er verið að skoða hverju sinni. ALEXIA býður upp á auðvelda leið til þess að smíða slíka orðalista. Til þess að nota orðtökutólið með góðum árangri þarf notandinn að keyra main.py í gegnum skipanalínuna2 Skriftan býður upp á tvo tungumálavalmöguleika, ensku og íslensku. Hún leiðir notandann í gegnum ýmsa valmöguleika, þar á meðal uppsetningu SQL-gagnagrunna. Að uppsetningunni lokinni er notandanum boðið að halda áfram í keyrsluhluta forritsins. Notandinn er spurður hvort eigi að búa til gagnagrunna í gegnum sjálfvirka uppsetningu eða hvort hann vilji leggja til eigin orðasafnsskjöl. Ef sjálfgefin uppsetning er valin þarf notandinn að gefa til kynna hvort nota eigi Beygingarlýsingu íslensks nútímamáls (BÍN) eða Nútímamálsorðabókina (NMO) sem inntak. Orðaforði þeirra er þá borinn saman við orðaforða Risamálheildarinnar (RMH). Ýmiskonar síum er beitt til þess að úttakið verði sem best. The linked video includes detailed description of the tool's use // Myndbandið sem fylgir hér í hlekk inniheldur nákvæmar upplýsingar um notkun tólsins.
dc.language.iso isl
dc.publisher The Árni Magnússon Institute for Icelandic Studies
dc.relation.replaces http://hdl.handle.net/20.500.12537/95
dc.rights Apache License 2.0
dc.rights.uri https://opensource.org/license/apache2-0-php/
dc.rights.label PUB
dc.source.uri https://github.com/steinunnfridriks/ALEXIA
dc.subject lexicon acquisition
dc.subject lexicon acquisition tool
dc.subject corpus harvesting
dc.subject gigaword corpus
dc.subject dmii
dc.subject islex dictionary
dc.subject corpus tool
dc.title ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent true
has.files yes
branding Clarin IS Repository
demo.uri https://www.youtube.com/watch?v=RLwgnak4-rU
contact.person Steinunn Rut Friðriksdóttir srf2@hi.is The Árni Magnússon Institute for Icelandic Studies
contact.person Atli Jasonarson atlijas@simnet.is The Árni Magnússon Institute for Icelandic Studies
sponsor Ministry of Education, Science and Culture Lexicon Acquisiton Tool (I2) Language Technology for Icelandic 2019-2023 nationalFunds
files.size 794245
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Apache License 2.0
Icon
Name
ALEXIA.zip
Size
775.63 KB
Format
application/zip
Description
zipped folder
MD5
b905f5ab469ffe3b8fcbe6d82a03aa1e
 Download file  Preview
 File Preview  
  • ALEXIA
    • lexicon.txt-1 B
    • README.md-1 B
    • output
      • DIM
        • IGC_wordform_plus_lemma.freq-1 B
        • IGC_lemma_colloc.freq-1 B
        • IGC_lemma.freq-1 B
        • IGC_texttypes.csv-1 B
        • IGC_wordform.freq-1 B
        • IGC_lemma_plus_wordform.freq-1 B
      • user_defined
        • lexicon_frequencylist.freq-1 B
        • lexicon_collocations.freq-1 B
      • DCI
        • IGC_lemma_colloc.freq-1 B
        • IGC_lemma.freq-1 B
        • IGC_texttypes.csv-1 B
        • IGC_lemma_plus_wordform.freq-1 B
    • alexia
      • IGC_extractor.py-1 B
      • __pycache__
        • txt_to_data.cpython-38.pyc-1 B
        • collocation_output.cpython-38.pyc-1 B
        • prepare_data.cpython-38.pyc-1 B
        • request_file.cpython-38.pyc-1 B
        • find_texttype_freqs.cpython-38.pyc-1 B
        • rename_directories.cpython-38.pyc-1 B
        • make_directories.cpython-38.pyc-1 B
        • IGC_extractor.cpython-38.pyc-1 B
        • dci_extractor.cpython-38.pyc-1 B
        • base_plus_other.cpython-38.pyc-1 B
        • base_output.cpython-38.pyc-1 B
      • dci_extractor.py-1 B
      • prepare_data.py-1 B
      • islensk_utgafa
        • find_texttype_freqs.py-1 B
        • base_plus_other.py-1 B
        • collocation_output.py-1 B
        • txt_to_data.py-1 B
        • base_output.py-1 B
        • setup_icelandic.py-1 B
        • run_icelandic.py-1 B
        • __pycache__
          • run_icelandic.cpython-38.pyc-1 B
          • base_plus_other.cpython-38.pyc-1 B
          • setup_icelandic.cpython-38.pyc-1 B
          • find_texttype_freqs.cpython-38.pyc-1 B
          • collocation_output.cpython-38.pyc-1 B
          • txt_to_data.cpython-38.pyc-1 B
          • base_output.cpython-38.pyc-1 B
      • request_file.py-1 B
      • rename_directories.py-1 B
      • english_version
        • find_texttype_freqs.py-1 B
        • base_plus_other.py-1 B
        • run.py-1 B
        • collocation_output.py-1 B
        • txt_to_data.py-1 B
        • setup.py-1 B
        • base_output.py-1 B
        • __pycache__
          • base_plus_other.cpython-38.pyc-1 B
          • find_texttype_freqs.cpython-38.pyc-1 B
          • collocation_output.cpython-38.pyc-1 B
          • txt_to_data.cpython-38.pyc-1 B
          • run.cpython-38.pyc-1 B
          • base_output.cpython-38.pyc-1 B
          • setup.cpython-38.pyc-1 B
      • sql
        • corpus_to_sql.py-1 B
        • sql_lookup.py-1 B
        • __pycache__
          • corpus_to_sql.cpython-38.pyc-1 B
          • sql_lookup.cpython-38.pyc-1 B
          • word_to_db.cpython-38.pyc-1 B
        • word_to_db.py-1 B
    • databases
      • .gitkeep-1 B
    • IGC_filters.txt-1 B
    • LESTU.md-1 B
    • requirements.txt-1 B
    • .git
      • logs
      • info
        • exclude-1 B
      • config-1 B
      • packed-refs-1 B
      • index-1 B
      • HEAD-1 B
      • refs
      • description-1 B
      • hooks
        • applypatch-msg.sample-1 B
        • pre-push.sample-1 B
        • commit-msg.sample-1 B
        • post-update.sample-1 B
        • pre-rebase.sample-1 B
        • pre-receive.sample-1 B
        • update.sample-1 B
        • pre-applypatch.sample-1 B
        • pre-commit.sample-1 B
        • pre-merge-commit.sample-1 B
        • fsmonitor-watchman.sample-1 B
        • prepare-commit-msg.sample-1 B
      • objects
        • pack
          • pack-061f5191c22ec0596acca63bd26751c3006e858e.idx-1 B
          • pack-061f5191c22ec0596acca63bd26751c3006e858e.pack-1 B
        • info
        • branches
        • main.py-1 B
        • LICENSE-1 B
        • corpora

      Show simple item record