dc.contributor.author |
Arnardóttir, Þórunn |
dc.date.accessioned |
2024-08-29T15:00:16Z |
dc.date.available |
2024-08-29T15:00:16Z |
dc.date.issued |
2024-09-01 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/336 |
dc.description |
Icelandic Gigaword Corpus JSONL Converter is a tool for converting the unannotated version of the Icelandic Gigaword Corpus (IGC; http://hdl.handle.net/20.500.12537/253) to JSONL format. The converter takes in original XML files from IGC and converts them to JSONL format, adding information on the subcorpus' quality and domain, which is obtained from an attached file created by the Árni Magnússon Institute for Icelandic Studies. For further information on the output format, see the attached README.
JSONL-varpari fyrir Risamálheild er tól til þess að varpa ómarkaðri útgáfu af Risamálheildinni (http://hdl.handle.net/20.500.12537/253) yfir á JSONL-snið. Varparinn tekur við upprunalegri XML-skrá Risamálheildarinnar og skilar henni á JSONL-sniði ásamt því að bæta við upplýsingum um gæði og óðal undirmálheildarinnar, en þær upplýsingar eru fengnar úr skjali sem fylgir með varparanum og var búið til af Stofnun Árna Magnússonar í íslenskum fræðum. Sjá README-skrá fyrir frekari upplýsingar um úttakssnið. |
dc.publisher |
Miðeind ehf. |
dc.rights |
The MIT License (MIT) |
dc.rights.uri |
https://opensource.org/licenses/mit-license.php |
dc.rights.label |
PUB |
dc.subject |
igc |
dc.subject |
converter |
dc.subject |
jsonl |
dc.title |
Icelandic Gigaword Corpus JSONL Converter |
dc.type |
toolService |
metashare.ResourceInfo#ContentInfo.detailedType |
tool |
metashare.ResourceInfo#ResourceComponentType#ToolServiceInfo.languageDependent |
false |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Þórunn Arnardóttir thorunn@mideind.is Miðeind ehf. |
sponsor |
Ministry of Education, Science and Culture Conversion of IGC format for LLMs (G10) Language Technology for Icelandic 2019-2023 nationalFunds |
files.size |
13136 |
files.count |
2 |