Files in this item
Download all files in item (305.47 KB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- data-05.jsonl
- Size
- 61.76 KB
- Format
- Unknown
- Description
- Split 5
- MD5
- 12937f1f7eddeee1c0dc60acdf5e49df
- Name
- data-01.jsonl
- Size
- 58.36 KB
- Format
- Unknown
- Description
- Split 1
- MD5
- 31bfcc3e5497341e437b26e2c7ffc205
- Name
- data-02.jsonl
- Size
- 60.58 KB
- Format
- Unknown
- Description
- Split 2
- MD5
- 0e33703c2895ad8bd2dd3375b0c6ab7c
- Name
- data-03.jsonl
- Size
- 61.4 KB
- Format
- Unknown
- Description
- Split 3
- MD5
- 9435ec65648e27d6ac48e4496912f12b
- Name
- create_cross_validation_splits.py
- Size
- 1.19 KB
- Format
- Unknown
- Description
- Train validation generation script
- MD5
- 57c58221f4c0c5dabf164e3151f11b26
- Name
- data-04.jsonl
- Size
- 60.65 KB
- Format
- Unknown
- Description
- Split 4
- MD5
- e162e5e96e9e01172055dffbf30a37fa
- Name
- README.txt
- Size
- 1.53 KB
- Format
- Text file
- Description
- Readme
- MD5
- 9d00689c7d56430f48a06cae4cbb744a
The Icelandic WinoGrande dataset v. 1.0 The WinoGrande dataset (Sakaguchi et al., 2020), used for evaluating common sense capabilities of neural language models, is inspired by the original WinoGrad dataset (Levesque et al., 2012), but its problems are designed to minimize biases which the models may rely on when solving them. We systematically go through the WinoGrande test set (1767 examples) and translate and adapt sentences to work in Icelandic. While the English WinoGrande problems are not always constructed as pairs, in our adaptation, we create sentence pairs where it is feasible. We also found some of the examples to be specific to culture, subjective, or otherwise inapplicable for translation. Those examples were either adjusted or skipped. The result is a dataset of 1095 examples. The size of the Icelandic dataset is closes in size to the small variant of the English dataset (640 examples). Included in the dataset is a five-fold split and a python script that should be used . . .