| dc.contributor.author |
Þorsteinsson, Vilhjálmur |
| dc.contributor.author |
Óladóttir, Hulda |
| dc.date.accessioned |
2022-01-21T12:51:29Z |
| dc.date.available |
2022-01-21T12:51:29Z |
| dc.date.issued |
2022 |
| dc.identifier.uri |
http://hdl.handle.net/20.500.12537/176 |
| dc.description |
Icegrams is a Python 3 package that encapsulates a large trigram library for Icelandic. 14 million unique trigrams and their frequency counts are heavily compressed using radix tries and quasi-succinct indices employing Elias-Fano encoding. This enables the ~43 megabyte compressed trigram file to be mapped directly into memory, with no ex ante decompression, for fast queries (typically ~10 microseconds per lookup). More information at: https://github.com/icelandic-lt/Icegrams
Icegrams er Python 3 pakki sem inniheldur stórt safn orðaþrennda (trigrams) fyrir íslensku. Í safninu eru um 14 milljónir ólíkra þrennda ásamt tíðniupplýsingum. Öllu safninu hefur verið þjappað niður í u.þ.b. 43 megabæti sem varpað er beint í minni þannig að uppfletting er mjög hraðvirk (~10 míkrósekúndur fyrir hverja uppflettingu). Frekari upplýsingar á: https://github.com/icelandic-lt/Icegrams |
| dc.language.iso |
isl |
| dc.publisher |
Miðeind ehf. |
| dc.relation.replaces |
http://hdl.handle.net/20.500.12537/80 |
| dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/368 |
| dc.rights |
The MIT License (MIT) |
| dc.rights.uri |
https://opensource.org/licenses/mit-license.php |
| dc.rights.label |
PUB |
| dc.source.uri |
https://github.com/icelandic-lt/Icegrams |
| dc.subject |
language model |
| dc.subject |
trigrams |
| dc.subject |
ngrams |
| dc.title |
Icegrams v1.1.1 |
| dc.type |
languageDescription |
| metashare.ResourceInfo#ContentInfo.detailedType |
other |
| metashare.ResourceInfo#ContentInfo.mediaType |
text |
| has.files |
yes |
| branding |
Clarin IS Repository |
| contact.person |
Vilhjálmur Þorsteinsson mideind@mideind.is Miðeind ehf. |
| sponsor |
Ministry of Education, Science and Culture Word lists and language models (L4) Language Technology for Icelandic 2019-2023 nationalFunds |
| size.info |
14M trigrams |
| files.size |
154247 |
| files.count |
2 |