dc.contributor.author | Skarphéðinsson, Njáll |
dc.contributor.author | Guðmundsson, Breki |
dc.contributor.author | Smári, Steinar |
dc.contributor.author | Lárusdóttir, Marta |
dc.contributor.author | Einarsson, Hafsteinn |
dc.contributor.author | Kahn, Abuzar |
dc.contributor.author | Nyberg, Eric |
dc.contributor.author | Loftsson, Hrafn |
dc.date.accessioned | 2023-05-03T11:37:08Z |
dc.date.available | 2023-05-03T11:37:08Z |
dc.date.issued | 2022 |
dc.identifier.uri | http://hdl.handle.net/20.500.12537/311 |
dc.description | The Reykjavik University Question-Answering Dataset (RUQuAD) is a corpus which contains question/answer (QA) pairs. Each answered question is associated with a paragraph in which the answer to the given question is marked. Version 22.02 of RUQuAD consists of approximately 20,800 questions and 12,700 answers. The corpus data was collected in 2021-2022 by about 1,000 crowd-workers using the GameQA mobile app platform. The answers were sourced from five sources in four separate domains: The Icelandic Wikipedia, The Icelandic Web of Science ("Vísindavefurinn"), the news websites mbl.is and visir.is, and The Icelandic Government Information website ("Stjórnarráðið"). The corpus is intended as a training corpus for developing QA models for Icelandic. Note that the answers in this version of RUQuAD have NOT been post-edited/standardized. The corpus is divided into two parts: RUQuAD-1 and RUQuAD-2. RUQuAD-1 contains paragraphs from all the above sources EXCEPT The Icelandic Web of Science and can be used with a CC BY license. The paragraphs in RUQuAD-2 are ONLY collected from The Icelandic Web of Science and can be used with a special license (developed by the Arni Magnusson Institute for Icelandic Studies) which allows the underlying data to be used for linguistic research and the development of language technology, but cannot by licensed with CC BY. This part is RUQuAD-2. It contains 2.273 questions. RUQuAD-1 is available at http://hdl.handle.net/20.500.12537/310. |
dc.description | [ÍSLENSKA] Reykjavik University Question-Answering Dataset (RUQuAD) er málheild sem samanstendur af pörum af spurningum og svörum (QA). Sérhver spurning, sem á sér svar, tengist ákveðinni efnisgrein þar sem svarið við spurningunni er merkt. Útgáfa 22.02 af RUQuAD samanstendur af u.þ.b. 20.800 spurningum og 12.700 svörum. Gögnum málheildinnar var safnað 2021-2022 af u.þ.b. 1.000 "vinnumönnum" með GameQA appinu. Svörin voru fengin af fimm vefsvæðum úr fjórum mismunandi sviðum: Íslenska Wikipedia, Vísindavefurinn, fréttasíðurnar mbl.is og visir.is og Stjórnarráðið. Málheildin er ætluð sem þjálfunarmálheild fyrir íslensk QA líkön. Athugið að í þessari útgáfu hafa svörin ekki verið leiðrétt/stöðluð eftirá. Málheildinni er skipt upp í tvo hluta, RUQuAD-1 og RUQuAD-2. RUQuAD-1 inniheldur efnisgreinar frá öllum vefsvæðunum NEMA af Vísindavefnum og er gefin út með CC BY leyfi. Efnisgreinar í RUQuAD-2 koma EINGÖNGU af Vísindavefnum og er gefnar út með sérstöku leyfi (samið er af Árnastofnun) sem gerir kleift að nota undirliggjandi gögn í rannsóknum í málvísindum og þróun í máltækni - þennan hluta er hins vegar ekki hægt að gefa út með CC BY leyfi. Þessi hluti er RUQuAD-2, sem inniheldur 2.273 spurningar/svör. RUQuAD-1 má finna á http://hdl.handle.net/20.500.12537/310. |
dc.language.iso | isl |
dc.publisher | Reykjavik University |
dc.relation.isreferencedby | https://aclanthology.org/2023.eacl-demo.18.pdf |
dc.rights | License for the Use of a Corpus in Research and Development |
dc.rights.uri | https://repository.clarin.is/licenses/license_for_corpus_in_research_and_developement.pdf |
dc.rights.label | PUB |
dc.source.uri | https://gameqa.app/ |
dc.subject | Questions |
dc.subject | Answers |
dc.subject | Question-Answering dataset |
dc.title | Reykjavik University Question-Answering Dataset 2 (RUQuAD-2) - version 22.02 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | Clarin IS Repository |
contact.person | Hrafn Loftsson hrafn@ru.is Reykjavik University |
size.info | 2273 items |
files.size | 864501 |
files.count | 1 |
Files in this item
This item is
License for the Use of a Corpus in Research and Development
Publicly Available
and licensed under:License for the Use of a Corpus in Research and Development
- Name
- RUQuAD2.zip
- Size
- 844.24 KB
- Format
- application/zip
- Description
- Unknown
- MD5
- 561c4b51ffce5e30f410aeb1b5a611e4