Show simple item record

 
dc.contributor.author Arnardóttir, Þórunn
dc.date.accessioned 2024-08-29T15:00:16Z
dc.date.available 2024-08-29T15:00:16Z
dc.date.issued 2024-09-01
dc.identifier.uri http://hdl.handle.net/20.500.12537/336
dc.description Icelandic Gigaword Corpus JSONL Converter is a tool for converting the unannotated version of the Icelandic Gigaword Corpus (IGC; http://hdl.handle.net/20.500.12537/253) to JSONL format. The converter takes in original XML files from IGC and converts them to JSONL format, adding information on the subcorpus' quality and domain, which is obtained from an attached file created by the Árni Magnússon Institute for Icelandic Studies. For further information on the output format, see the attached README. JSONL-varpari fyrir Risamálheild er tól til þess að varpa ómarkaðri útgáfu af Risamálheildinni (http://hdl.handle.net/20.500.12537/253) yfir á JSONL-snið. Varparinn tekur við upprunalegri XML-skrá Risamálheildarinnar og skilar henni á JSONL-sniði ásamt því að bæta við upplýsingum um gæði og óðal undirmálheildarinnar, en þær upplýsingar eru fengnar úr skjali sem fylgir með varparanum og var búið til af Stofnun Árna Magnússonar í íslenskum fræðum. Sjá README-skrá fyrir frekari upplýsingar um úttakssnið.
dc.publisher Miðeind ehf.
dc.rights The MIT License (MIT)
dc.rights.uri https://opensource.org/licenses/mit-license.php
dc.rights.label PUB
dc.subject igc
dc.subject converter
dc.subject jsonl
dc.title Icelandic Gigaword Corpus JSONL Converter
dc.type toolService
metashare.ResourceInfo#ContentInfo.detailedType tool
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent false
has.files yes
branding Clarin IS Repository
contact.person Þórunn Arnardóttir thorunn@mideind.is Miðeind ehf.
sponsor Ministry of Education, Science and Culture Conversion of IGC format for LLMs (G10) Language Technology for Icelandic 2019-2023 nationalFunds
files.size 13136
files.count 2


 Files in this item

 Download all files in item (12.83 KB)
This item is
Publicly Available
and licensed under:
The MIT License (MIT)
Icon
Name
README
Size
3.23 KB
Format
Unknown
Description
The tool's README
MD5
1a9e6cf67cb8e6defc24592e07eeff85
 Download file
Icon
Name
IGC-converter.zip
Size
9.6 KB
Format
application/zip
Description
A zip file containing all files relevant to the converter
MD5
a8ba66210ae4030e291ce24324819add
 Download file  Preview
 File Preview  
  • IGC-converter
    • Flokkun.tsv5 kB
    • scripts
      • convert_xml.py22 kB
      • __init__.py45 B
    • README3 kB
    • requirements.txt16 B
    • convert_IGC.py2 kB

Show simple item record