LINDAT/CLARIN repository About and Policies



Mission Statement


The ultimate objective of CLARIN ERIC (which LINDAT/CLARIN is part of) is to advance research in humanities and social sciences by giving researchers unified single sign-on access to a platform which integrates language-based resources and advanced tools at a European level. This shall be implemented by the construction and operation of a shared distributed infrastructure that aims at making language resources, technology and expertise available to the humanities and social sciences (henceforth abbreviated HSS) research communities at large. See more information about LINDAT/CLARIN.

To know more about CLARIN ERIC visit CLARIN-ShortGuide.pdf


Terms of Service

To achieve our mission statement,we set out some ground rules through the Terms of Service. By accessing or using any kind of data or services provided by the Repository, you agree to abide by the Terms contained in the above mentioned document.

Data in LINDAT/CLARIN repository are made available under the licence attached to the resources. In case there is no licence, data is made freely available for access, printing and download for the purposes of non-commercial research or private study. Users must acknowledge in any publication, the Deposited Work using a persistent identifier (see Citing Data), its original author(s)/creator(s), and any publisher where applicable. Full items must not be harvested by robots except transiently for full-text indexing or citation analysis. Full items must not be sold commercially unless explicitly granted by the attached licence without formal permission of the copyright holders.


About Repository

It is like a library for linguistic data and tools.

  • Search for data and tools and easily download them.
  • Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)

About UFAL

The Institute of Formal and Applied Linguistics (UFAL) at the Computer Science School, Faculty of Mathematics and Physics, Charles University, Czech Republic was established in 1990 as a continuation of the research and teaching activities carried out by the former Laboratory of Algebraic Linguistics since the early 60s at the Faculty of Philosophy and later at the Faculty of Mathematics and Physics, Charles University in Prague, is a primarily research department working on many topics in the area of Computational Linguistics, and on many research projects both nationally and internationally. However, the Institute of Formal and Applied Linguistics is also a regular department in the sense that it carries a comprehensive teaching program both for the Master's degree (Mgr., or MSc.) as well as for a doctorate (Ph.D.) in Computational Linguistics. Both programs are taught in Czech and English. The Institute is also a member of the double-degree "Master's LCT programme" of the EU. Students also can take advantage of the Erasmus program for typically semester-long stays at partner Universities abroad.


License Agreement and Contracts

At the moment, UFAL distinguishes three types of contracts.

  • For every deposit, we enter into a standard contract with the submitter, the so-called "Distribution License Agreement", in which we describe our rights and duties and the submitter acknowledges that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf.
  • Everyone who downloads data is bound by the licence assigned to the item - in order to download protected data, one has to be authenticated and needs to electronically sign the licence. A list of available licenses in our repository can be found here.
  • For submitters, there is a possibility for setting custom licences to items during the submission workflow.

Intellectual Property Rights

As mentioned in the section License Agreement and Contracts, we require the depositor of data or tools to sign a Distribution License Agreement, which specifies that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf. This means that depositors are solely responsible for taking care of IPR issues before publishing data or tools by submitting them to us.
Should anyone have a suspicion that any of the datasets or tools in our repository violate Intellectual Property Rights, they should contact us immediately at our help desk.


Privacy Policy

Read our Privacy Policy in order to learn how we manage personal data collected by the LINDAT/CLARIN repository and services.


Metadata Policy

Deposited content must be accompanied by sufficient metadata describing its content, provenance and formats in order to support its preservation and dissemination. Metadata are freely accessible and are distributed in the public domain (under CC0). However, we reserve the right to be informed about commercial usage of metadata from LINDAT/CLARIN repository including a description of your use case at Help Desk.


Preservation Policy

LINDAT/CLARIN is committed to the long-term care of items deposited in the repository, to preserve the research and to help in keeping research replicable and strives to adopt the current best practice in digital preservation. See the Mission Statement. We follow best practice guidelines, standards and regulations set forth by CLARIN, OAIS and/or Charles University.

In order to stay a reliable and trustworthy repository, we undergo periodical assessments by CLARIN and CTS/DSA.

To fulfill the commitments, the repository ensures that datasets are ingested and distributed in accordance with their license (see agreements and contracts). Sometimes (for licenses that do not permit public access) this means only authorized users can access the dataset.

The submission workflow as described in deposit and the work of our editors ensures discoverability (by requiring accurate metadata) via our search engine, externally through OAI-PMH and in page metadata for certain web crawlers. Metadata are freely accessible.

There are various automated procedures including fixity checks, to ensure integrity of the submitted datasets and completeness of metadata. On the system level we employ various on-site and off-site backup strategies and hardware monitoring. The datasets are accessible online.

We view data and tools as primary research outputs, each submission receives a Persistent IDentifier for reference and the users are guided to use them. Changes in a dataset after it has been published are not permitted, new submission is required instead. The old a new submissions are linked through their metadata (see faq for more details).

Through regular participation in CLARIN activities, Open Repositories and various other meetings, schools and conferences, the repository staff is informed of new developments in technologies and/or initiatives.

The various export options offered by the repository system (DSpace) ensures that data and their metadata are not locked in and can be moved to a different repository system.

The repository encourages the usage of specific file formats as recommended by CLARIN. The preferred file formats will change over time, in which case the repository will make every effort to migrate to other formats, while keeping originals intact for reproducibility purposes (ie. migrated item will be a new repository record linked to the old). The guiding principles for format selection are: open standards are preferred over proprietary standards, formats should be well-documented, verifiable and proven, text-based formats are preferred over binary formats where possible, in the case of digitalization of analogue signal lossless or no compression is recommended.

In the case of a withdrawal of funding, the repositories content would be transferred to another CLARIN centre. While the legal aspects of the process of relocating data to another institution are underway the hosting institute (UFAL) offers a timeframe of at least 10 years, in which it will provide access to the data.