dc.contributor.author |
Þorsteinsson, Vilhjálmur |
dc.contributor.author |
Óladóttir, Hulda |
dc.date.accessioned |
2020-09-30T16:29:42Z |
dc.date.available |
2020-09-30T16:29:42Z |
dc.date.issued |
2020-09-25 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12537/80 |
dc.description |
Icegrams is a Python 3 package that encapsulates a large trigram library for Icelandic. 14 million unique trigrams and their frequency counts are heavily compressed using radix tries and quasi-succinct indices employing Elias-Fano encoding. This enables the ~43 megabyte compressed trigram file to be mapped directly into memory, with no ex ante decompression, for fast queries (typically ~10 microseconds per lookup). More information at: https://github.com/mideind/Icegrams
Icegrams er Python 3 pakki sem inniheldur stórt safn orðaþrennda (trigrams) fyrir íslensku. Í safninu eru um 14 milljónir ólíkra þrennda ásamt tíðniupplýsingum. Öllu safninu hefur verið þjappað niður í u.þ.b. 43 megabæti sem varpað er beint í minni þannig að uppfletting er mjög hraðvirk (~10 míkrósekúndur fyrir hverja uppflettingu). Frekari upplýsingar á: https://github.com/mideind/Icegrams |
dc.language.iso |
isl |
dc.publisher |
Miðeind ehf. |
dc.relation.replaces |
http://hdl.handle.net/20.500.12537/55 |
dc.relation.isreplacedby |
http://hdl.handle.net/20.500.12537/176 |
dc.rights |
The MIT License (MIT) |
dc.rights.uri |
https://opensource.org/licenses/mit-license.php |
dc.rights.label |
PUB |
dc.source.uri |
https://github.com/mideind/Icegrams/releases/tag/1.0.2 |
dc.subject |
language model |
dc.subject |
trigrams |
dc.subject |
ngrams |
dc.title |
Icegrams (2020-09-30) |
dc.type |
languageDescription |
metashare.ResourceInfo#ContentInfo.detailedType |
other |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
Clarin IS Repository |
contact.person |
Vilhjálmur Þorsteinsson mideind@mideind.is Miðeind ehf. |
sponsor |
Ministry of Education, Science and Culture Word lists and language models (L4) Language Technology for Icelandic 2019-2023 nationalFunds |
size.info |
14M trigrams |
files.size |
43217919 |
files.count |
3 |