Deliverable D6.1 entitled “Overview of Datasets for the Sign Languages of Europe”, authored by Maria Kopf, Marc Schulder, and Thomas Hanke (University of Hamburg) was submitted to the European Commission for review on July 29, 2021.
The document was developed as part of the work performed by the Resources Harmonization Working Group. It identifies 26 linguistic corpora that can be explored as high-quality training data for automatic translation within the project as opposed to loosely aligned broadcast data. For each dataset, the document specifies what parts of the data are available under what access conditions. It also lists 26 elicitation formats used in several corpora to identify those parts of the available corpora that could be explored to build multilingual resources. To support the construction of an interlingual index across European sign languages, the document also lists 41 available lexical resources (lexical databases and dictionaries) and their characteristics.