Files in this item

 Download all files in item (305.47 KB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
data-05.jsonl
Size
61.76 KB
Format
Unknown
Description
Split 5
MD5
12937f1f7eddeee1c0dc60acdf5e49df
 Download file
Icon
Name
data-01.jsonl
Size
58.36 KB
Format
Unknown
Description
Split 1
MD5
31bfcc3e5497341e437b26e2c7ffc205
 Download file
Icon
Name
data-02.jsonl
Size
60.58 KB
Format
Unknown
Description
Split 2
MD5
0e33703c2895ad8bd2dd3375b0c6ab7c
 Download file
Icon
Name
data-03.jsonl
Size
61.4 KB
Format
Unknown
Description
Split 3
MD5
9435ec65648e27d6ac48e4496912f12b
 Download file
Icon
Name
create_cross_validation_splits.py
Size
1.19 KB
Format
Unknown
Description
Train validation generation script
MD5
57c58221f4c0c5dabf164e3151f11b26
 Download file
Icon
Name
data-04.jsonl
Size
60.65 KB
Format
Unknown
Description
Split 4
MD5
e162e5e96e9e01172055dffbf30a37fa
 Download file
Icon
Name
README.txt
Size
1.53 KB
Format
Text file
Description
Readme
MD5
9d00689c7d56430f48a06cae4cbb744a
 Download file  Preview
 File Preview  
The Icelandic WinoGrande dataset v. 1.0

The WinoGrande dataset (Sakaguchi et al., 2020), used for evaluating common sense capabilities of neural language models, is inspired by the original WinoGrad
dataset (Levesque et al., 2012), but its problems are designed to minimize biases which the models may rely on when solving them. We systematically go through
the WinoGrande test set (1767 examples) and translate and adapt sentences to work in Icelandic. While the English WinoGrande problems are not always constructed as pairs, in our adaptation, we create sentence pairs where it is feasible. We also found some of the examples to be specific to culture, subjective, or otherwise inapplicable for translation. Those examples were either adjusted or skipped. The result is a dataset of 1095 examples. The size of the Icelandic dataset is closes in size to the small variant of the English dataset (640 examples). Included in the dataset is a five-fold split and a python script that should be used . . .