Title DiscAn: Towards a Discourse Annotation system for Dutch language corpora
Project coordinator Prof. dr. T.J.M. Sanders
Clarin Centre  
Budget € 80K
Abstract Although discourse is a crucial level in language and communication, existing corpora of Dutch langauge lack annotation at this level. The DiscAn project sets the first step to change this situation for Dutch, in line with international tendencies. It has four main goals: 1) to standardize and open up an existing set of Dutch corpus analyses of coherence relations and discourse connectives; 2) to develop the foundations for a discourse annotation system that can be used in Dutch natural language corpora; 3) to improve the metadata within CLARIN by investigating existing CMDI profiles or adding a new CMDI profiles specialy suited for this type of analysis; 4) to inventorize the required discourse categories and investigate to what extent these could be included in ISOCAT categories for discourse that are currently being developed; 5) to further develop an interdisciplinary discourse community of linguists, corpus and computational linguists in The Netherlands and Belgium, in order to initiate further research on cross-linguistic comparison in a European context.
Title FESLI: Functional Elements in Specific Language Impairment
Project coordinator Prof. dr. F. Weerman
Clarin Centre Meertens Instituut
Budget € 79K
Abstract Specific Language Impairment (SLI) is a developmental language disorder that is visible in particular in the acquisition of functional elements. In order to understand it better a comparison has to be made of the profile of functional elements in several groups of learners both qualitatively and quantitatively. To this end the FESLI project will make available Dutch corpora from monolingual and bilingual children with SLI, and corpora from typically developing bilingual children. Complemented with the existing CHILDES corpora from typically developing monolingual children evidence from all relevant learners groups will be available. Based on the technology from the COAVA project FESLI will develop additional tools that allow quantitative and qualitative comparisons of functional elements.
Title GrNe: Grieks-Nederlands woordenboek (Greek-Dutch dictionary)
Project coordinator Prof. dr. I. Sluiter
Clarin Centre INL
Budget € 55K
Abstract This project combines a resource curation and demonstrator project. It concerns the new dictionary (ancient)1 Greek-Dutch, currently under construction at the Leiden University Classics Department.
[Resource curation:] For this resource (originally developed as book), a DTD will be developed and a substantial part of the data will be put into XML. The purpose is to make the data visible, uniquely referable and accessible via the web. Proper documentation will be provided. Since the dictionary is still under construction, it cannot be put online completely within the 12 months of the project.
[Demonstrator:] However, as the second half of the project, we will develop an interface with the search functions ultimately envisaged for the whole online dictionary. Search functions will include searches for Greek lemmata; search of Greek declined or conjugated word-forms that will lead to the correct lemma (‘parser’); searches for Dutch words leading to different Greek lemmata; etymological searches.
Title Language Archive of Insular South East Asia and West New Guinea
Project coordinator Dr. Marian Klamer
Clarin Centre MPI
Budget € 74K
Abstract The geographical region of insular South East Asia and New Guinea is well-known as an area of mega-biodiversity. Less well-known is the extreme linguistic diversity in this area: over a quarter of the world’s 6000 languages are spoken here. As small minority languages, most of these will cease to be spoken in the coming few generations. This project will ensure the preservation of unique records of languages and the cultures encapsulated by them in the region. The language resources have been gathered by twenty linguists at, or in collaboration with Dutch universities over the last 40 years, and will be compiled and archived in collaboration with The Language Archive (TLA) in Nijmegen. The resulting archive will constitute an unrivaled collection of multimedia materials and written documents from over 50 languages in Insular South East Asia and West New Guinea. At TLA, the data will be archived according to state-of-the-art standards (TLA holds the Data Seal of Approval): the component metadata infrastructure CMDI will be used; all metadata categories as well as relevant units of annotation will be linked to the ISO data category registry ISOcat. This guarantees the proper integration of the language resources into the CLARIN framework. Through the archive, future speaker communities and researchers will be able to plumb the materials for answers to their own questions, even if they do not themselves know the language, and even if the language dies.
Title PILNAR: Pilgrimage Narratives: creating a corpus for studying the profile of the modern pilgrim
Project coordinator Prof. dr. P. Post
Clarin Centre Meertens Instituut
Budget € 79K
Abstract Pilgrimage narratives, especially travel accounts, have been used as a favorite source for research into ritual and religious dynamics for a long time. This project proposal is intended to establish a core corpus of modern pilgrimage narratives. It will consist of Dutch texts written after ca. 2000 that present the thoughts and impressions of pilgrims to Santiago de Compostela. This source has not or hardly been used for contemporary research in the cultural sciences.
In previous exploratory research (Post 1992, 1994ab, 1998), it has become clear that the corpus of stories intended here is an excellent source for research into the profile (or, better, profiles) of the modern pilgrim. To open up this source and map these profiles, the heuristic tool, “Fields of the Sacred” will be used (Post 2010, 2011).

Title BILAND: Towards a flexible and stable CLARIN-supported web-application for bilingual historical analysis of discourses in news media
Project coordinator Prof.dr. T. Pieters
Clarin Centre Huygens ING
Budget € 120K
Abstract This project aims at using and populating the basic CLARIN infrastructure to enable bilingual and biscriptural advanced forms of discourse analysis in large historical datasets. The challenge is to convert a specific text mining technology, into an accessible CLARIN compliant web-application addressing research questions of the intended user group of historians. The demonstrator will further build on the text mining tools developed in previous CLARIN projects. The interdisciplinary project-team will tailor existing tools to the language specific needs of comparative historical research, with a special focus on the identity, intensity and location of discourses about heredity, genetics and eugenics in Dutch and German news media between 1863 and 1940. The challenge is to incorporate the semantics of two different languages and cultures. The development of this bilingual demonstrator will contribute to inventory a list of requirements the CLARIN infrastructure should meet and desiderata it preferably should offer within European contexts.
Title Cornetto-LMF-RDF: Curated Cornetto database in LMF and RDF
Project coordinator Prof.dr. P. Vossen
Clarin Centre INL
Budget € 117K
Abstract This is a combined curation and demonstrator project in which the Dutch Cornetto database is converted to LMF and RDF and made available on a CLARIN Centre for efficient querying. As a semantic resource in which words and concepts are interlinked within the data and to other databases (e.g. wordnets in other languages and ontologies) this project will address many issues on the representation of meaning and user-queries to these data, such as the complex data structure (semantic and structural) and semantic linkage, such as hypernym chains of concepts or semantic typing of words. The project will combine a new release of Cornetto (version 2) with the data from DutchSemCor (a semantic annotation of text corpora) and a Dutch sentiment lexicon. The results are presented in LMF and the wordnet part also in RDF and SKOS. This bridges the standardization and metadata requirements of ISO and W3C.
Title D-LUCEA: Database of the Longitudinal Utrecht Collection of English Accents
Project coordinator Dr. H. Quené
Clarin Centre MPI
Budget € 67K
Abstract The proposed D-LUCEA project concerns the curation of a database of existing speech recordings of L1 and L2 speakers of English. The recorded speakers are students from an international student community where English is used as lingua franca. These students are being recorded longitudinally throughout their 3-year period on campus, using read and spontaneous speech in L1 and in L2 English (or in L1 English only). The proposed project aims to make the presently existing recordings compatible with CLARIN standards, and to make the resulting database accessible to researchers worldwide. This is achieved by creating and developing and verifying metadata, creating documentation, and by issueing persistent identifiers to primary data and metadata. The resulting database will be of interest for research and development in linguistics, language education (pronunciation training), speech technology (foreign accent detection, language recognition, speech recognition), and sociophonetics.
Title EMIT-X: Early-Modern Image and Text eXchange
Project coordinator Prof. dr. E. Stronks
Clarin Centre Huygens ING
Budget € 20K
Abstract In the Early-Modern period images and texts were thought of as closely related. In books, paintings and buildings they were often used jointly in an effort to talk both to the mind and to the senses. A prime example of this bimedial culture is the emblem book. Research of the intertwining of visual and textual traditions in the emblem has profited immensely from digitization efforts, for digital editions facilitate complex searches in large corpora of emblems. The Emblem Project Utrecht (EPU) digitized Dutch emblems books, and other corpora were published by groups elsewhere in Europe and the US. In a community effort coordinated in the Open Emblem Group, an exchange format has been designed to facilitate the aggregation of material from individual projects. EMIT-X will implement this exchange format in an OAI data provider, facilitating sharing and re-use of the EPU data and preparing the way for related projects elsewhere.
Title MIGMAP: Interactive migration maps for the 20th century
Project coordinator Dr. G. Bloothooft
Clarin Centre Meertens Instituut
Budget € 54K
Abstract People migrate and take their social-cultural-linguistic identities with them. Since in their new environment this leads to interactions, knowledge of migration is of high interest to the understanding of, for instance, sociolinguistic and dialect diffusion processes. Based on the availability of places of birth and residence (in 2006) of the Dutch population (16 million alive, 6 million deceased but included) and their family relations from the Civil Registration, migrations patterns between municipalities (and immigration from abroad) can be presented over three generations in the 20th century. The project will develop a web application where the user first chooses generation (forward or backward in time) and gender, while the migration map of The Netherlands related to an interactively pointed municipality (or other aggregation unit) is shown. The existing map-making software module "Kaart" of the Meertens Institute will be transformed into a generic, standards-based tool for the creation and presentation of maps with complex spatio-temporal diffusion data in a user friendly and interactive way.
Title Multicon: Multilayer Concordance Functions in ELAN and ANNEX
Project coordinator Dr. O.A. Crasborn
Clarin Centre MPI
Budget € 55K
Abstract Collocations generated by concordancers are a standard instrument in the exploitation of text corpora for the analysis of language use. Multimodal corpora show similar types of patterns, activities that frequently occur together, but there is no CLARIN-compliant tool that offers facilities for visualising such patterns. Examples include timing of eye contact with respect to speech, and the alignment of activities of the two hands in signed languages. This project proposes to enhance the standard CLARIN tools ELAN and ANNEX for multimodal annotation to address these needs, first of all by improving the query and concordancing functions, and secondly by generating visualisations of multilayer collocations that allow for intuitive explorations and analyses of such data. This will provide a boost to the linguistic fields of gesture and sign language studies, as it will improve the exploitation of multimodal corpora.
Title Namescape: Mapping the Landscape of Names in Modern Dutch Literature
Project coordinator Prof. dr. K. van Dalen-Oskam
Clarin Centre Huygens ING
Budget € 120K
Abstract Recent research has conclusively proven names in literary works can only put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or "namescape"). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. The proposed project aims to fill the need by annotating a substantial amount of literary works with a rich tag set, thereby enabling the participating parties to perform their research in more depth than previously possible. Several exploratory visualization tools will help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator. The main tools will be made available as CLARIN compliant web services for use in other contexts.
Title PoliMedia: Interlinking multimedia for the analysis of media coverage of political debates
Project coordinator Prof. dr. H. Beunders
Clarin Centre Beeld en Geluid
Budget € 111K
Abstract Analysing media coverage across several types of media-outlets is a challenging task for (media) historians. Up until now, the focus has been on newspaper articles: being generally available in digital, computer-readable format, these can be studied relatively easily. Cross-media comparisons between different types of media-outlets have however rarely been undertaken, even though such comparisons have top priority on the wish-list of (media) historians as this could give better insight into the choices that different media-outlets make. A specific example of media coverage research investigates the coverage of political debates and how the representation of topics and people change over time. The PoliMedia project aims to showcase the potential of cross-media analysis for research in the humanities, by (i) curating automatically detected semantic links between four data sets of different media types, and (ii) developing a demonstrator application that allows researchers to deploy such an interlinked collection for quantitative and qualitative analysis of media coverage of debates in the Dutch parliament.
Title VK: Verrijkt Koninkrijk (Enriched Kingdom)
Project coordinator  Prof. dr. K. Ribbens
Clarin Centre DANS
Budget € 111K
Abstract Dr Loe de Jong's Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog remains the most appealing history of German occupied Dutch society (1940-1945). Published between 1969 and 1991, the 30 volumes still combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars. The aim of this project is twofold; in the demonstrator part of the project advanced tools and techniques are applied to gather data on De Jong's perception of the much debated issue of pillarization (Dutch: 'verzuiling') and group identity. In the resource curation part of the project the corpus will be enriched and made available to the CLARIN-community for further research. The overall budget for the project is € 119,993 and the partners are: NIOD Institute for War, Holocaust and Genocide Studies (NIOD), University of Amsterdam (UvA), Vrije Universiteit Amsterdam (VUA), Meertens Institute and Data Archiving and Networked Services (DANS-KNAW).