Show simple item record

 
dc.contributor.author Skarphéðinsson, Njáll
dc.contributor.author Guðmundsson, Breki
dc.contributor.author Smári, Steinar
dc.contributor.author Lárusdóttir, Marta
dc.contributor.author Einarsson, Hafsteinn
dc.contributor.author Kahn, Abuzar
dc.contributor.author Nyberg, Eric
dc.contributor.author Loftsson, Hrafn
dc.date.accessioned 2023-05-03T11:37:08Z
dc.date.available 2023-05-03T11:37:08Z
dc.date.issued 2022
dc.identifier.uri http://hdl.handle.net/20.500.12537/311
dc.description The Reykjavik University Question-Answering Dataset (RUQuAD) is a corpus which contains question/answer (QA) pairs. Each answered question is associated with a paragraph in which the answer to the given question is marked. Version 22.02 of RUQuAD consists of approximately 20,800 questions and 12,700 answers. The corpus data was collected in 2021-2022 by about 1,000 crowd-workers using the GameQA mobile app platform. The answers were sourced from five sources in four separate domains: The Icelandic Wikipedia, The Icelandic Web of Science ("Vísindavefurinn"), the news websites mbl.is and visir.is, and The Icelandic Government Information website ("Stjórnarráðið"). The corpus is intended as a training corpus for developing QA models for Icelandic. Note that the answers in this version of RUQuAD have NOT been post-edited/standardized. The corpus is divided into two parts: RUQuAD-1 and RUQuAD-2. RUQuAD-1 contains paragraphs from all the above sources EXCEPT The Icelandic Web of Science and can be used with a CC BY license. The paragraphs in RUQuAD-2 are ONLY collected from The Icelandic Web of Science and can be used with a special license (developed by the Arni Magnusson Institute for Icelandic Studies) which allows the underlying data to be used for linguistic research and the development of language technology, but cannot by licensed with CC BY. This part is RUQuAD-2. It contains 2.273 questions. RUQuAD-1 is available at http://hdl.handle.net/20.500.12537/310.
dc.description [ÍSLENSKA] Reykjavik University Question-Answering Dataset (RUQuAD) er málheild sem samanstendur af pörum af spurningum og svörum (QA). Sérhver spurning, sem á sér svar, tengist ákveðinni efnisgrein þar sem svarið við spurningunni er merkt. Útgáfa 22.02 af RUQuAD samanstendur af u.þ.b. 20.800 spurningum og 12.700 svörum. Gögnum málheildinnar var safnað 2021-2022 af u.þ.b. 1.000 "vinnumönnum" með GameQA appinu. Svörin voru fengin af fimm vefsvæðum úr fjórum mismunandi sviðum: Íslenska Wikipedia, Vísindavefurinn, fréttasíðurnar mbl.is og visir.is og Stjórnarráðið. Málheildin er ætluð sem þjálfunarmálheild fyrir íslensk QA líkön. Athugið að í þessari útgáfu hafa svörin ekki verið leiðrétt/stöðluð eftirá. Málheildinni er skipt upp í tvo hluta, RUQuAD-1 og RUQuAD-2. RUQuAD-1 inniheldur efnisgreinar frá öllum vefsvæðunum NEMA af Vísindavefnum og er gefin út með CC BY leyfi. Efnisgreinar í RUQuAD-2 koma EINGÖNGU af Vísindavefnum og er gefnar út með sérstöku leyfi (samið er af Árnastofnun) sem gerir kleift að nota undirliggjandi gögn í rannsóknum í málvísindum og þróun í máltækni - þennan hluta er hins vegar ekki hægt að gefa út með CC BY leyfi. Þessi hluti er RUQuAD-2, sem inniheldur 2.273 spurningar/svör. RUQuAD-1 má finna á http://hdl.handle.net/20.500.12537/310.
dc.language.iso isl
dc.publisher Reykjavik University
dc.relation.isreferencedby https://aclanthology.org/2023.eacl-demo.18.pdf
dc.rights License for the Use of a Corpus in Research and Development
dc.rights.uri https://repository.clarin.is/licenses/license_for_corpus_in_research_and_developement.pdf
dc.rights.label PUB
dc.source.uri https://gameqa.app/
dc.subject Questions
dc.subject Answers
dc.subject Question-Answering dataset
dc.title Reykjavik University Question-Answering Dataset 2 (RUQuAD-2) - version 22.02
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding Clarin IS Repository
contact.person Hrafn Loftsson hrafn@ru.is Reykjavik University
size.info 2273 items
files.size 864501
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
License for the Use of a Corpus in Research and Development
Icon
Name
RUQuAD2.zip
Size
844.24 KB
Format
application/zip
Description
Unknown
MD5
561c4b51ffce5e30f410aeb1b5a611e4
 Download file  Preview
 File Preview  
  • __MACOSX
    • RUQuAD2
      • ._README.txt-1 B
      • ._.DS_Store-1 B
  • RUQuAD2
    • test.json-1 B
    • README.txt-1 B
    • .DS_Store-1 B
    • train.json-1 B
    • full_dataset.json-1 B

Show simple item record