Files in this item
Download all files in item (2.91 GB)
- Name
 - README.txt
 - Size
 - 7.47 KB
 - Format
 - Text file
 - Description
 - README
 - MD5
 - 5e2fd58fd910c640c779738262d5111b
 
*******************************************************************************
*************THE ICELANDIC GIGAWORD CORPUS 2 IN JSONL FORMAT ******************
************ http://hdl.handle.net/20.500.12537/335        ********************
*******************************************************************************
This package contains those subcorpora of the Icelandic Gigaword Corpus, version 
22.10 (http://hdl.handle.net/20.500.12537/253), that have been published with an 
restricted licence, in a jsonl format, which is suitable for LLM training.
-----------------------------------------------------------------------------
ABOUT THE ICELANDIC GIGAWORD CORPUS (IGC):
Version 22.10 can be downloaded here: http://hdl.handle.net/20.500.12537/253
The Icelandic Gigaword Corpus (IGC) contains 8 corpora, in total almost 2,4 
billion words:
Open licence:
 IGC-Journals	20.9 million words
 IGC-Law	53.3      -
 IGC-News1	396.7     -
 IGC-Parla	254.1     -
 IGC-Social	724.0     -
 IGC-Wik . . .
                                            
- Name
 - IGC2_jsonl.zip
 - Size
 - 2.91 GB
 - Format
 - application/zip
 - Description
 - IGC2_jsonl
 - MD5
 - 7579c71ee2833a202d2e91b1f3865a35
 
- IGC2
- README.txt7 kB
 - converted-corpora
- IGC-News2
- IGC-News2-baendabladid.jsonl86 MB
 - IGC-News2-dfs.jsonl15 MB
 - IGC-News2-bbl.jsonl56 MB
 - IGC-News2-stundin_serblad.jsonl23 kB
 - IGC-News2-fotbolti.jsonl644 MB
 - IGC-News2-433.jsonl69 MB
 - IGC-News2-morgunbladid.jsonl3 GB
 - IGC-News2-bb.jsonl39 MB
 - IGC-News2-skessuhorn.jsonl186 MB
 - IGC-News2-mbl.jsonl2 GB
 - IGC-News2-dv_is.jsonl819 MB
 - IGC-News2-fjardarpostur.jsonl3 MB
 - IGC-News2-bondi.jsonl3 MB
 - IGC-News2-pressan.jsonl13 MB
 - IGC-News2-bleikt.jsonl44 MB
 - IGC-News2-stundin.jsonl125 MB
 - IGC-News2-stundin_blad.jsonl67 MB
 - IGC-News2-kylfingur.jsonl26 MB
 - IGC-News2-frettatiminn.jsonl33 MB
 - IGC-News2-kjarninn_blad.jsonl5 MB
 - IGC-News2-kjarninn.jsonl114 MB
 - IGC-News2-frettatiminn_bl.jsonl86 MB
 - IGC-News2-eyjan.jsonl125 MB
 - IGC-News2-vf.jsonl177 MB
 
 - IGC-Books
- IGC-Books.jsonl134 MB
 
 
 - IGC-News2
 - userlicense_igc_restricted.pdf61 kB
 - example.py2 kB
 - datasets-info
- IGC-Books.jsonl152 B
 - IGC-News2.jsonl3 kB