LOT Winterschool 2015: CLARIN course description
Title course: Linguistic Research using CLARIN
Teacher: Jan Odijk and guest teachers
The lectures will be each day (Jan 12 - Jan 16) from 9:00-11:00 hrs
Address: Trans 10, 3512 JK Utrecht
Email address: firstname.lastname@example.org
Course description: CLARIN is a research infrastructure for humanities researchers who work or want to work with digital language data. This course will introduce the CLARIN infrastructure, and some components and services contained in it that are relevant to linguists.
Why is it important to you?
CLARIN enables you
- To easily find data, tools and services that you can use in your research
- To enrich your own corpus automatically with sophisticated linguistic annotations such as part of speech tags, full syntactic structures, etc., to search, browse in this enriched corpus, and to analyze the corpus using these annotations.
- to store data and tools resulting from your research , ensuring their long term preservation and that they are available to you and other researchers
The course will teach about specific services and application in the CLARIN infrastructure that contribute to these goals, with a focus on contributions made by the Netherlands CLARIN-NL project. Each session will consist of a lecture and some will include a hands-on session to learn to work with CLARIN tools. Most tools illustrated will operate on data from the Dutch language.
The first day and the last day will consist of updated versions of the corresponding lectures in the CLARIN for Linguists course given in summer 2014. The other days will introduce other data and tools from the CLARIN infrastructure than in summer 2014.
Note (2014-11-07): there has been a change in the day to day programme: the contents of day 2 and day 3 have been swapped: the MIMORE guest lecture by Sjef Barbiers is now on Wednesday (as in the programme below)!
Note (2015-01-07): there has been a change in the day to day programme: the contents of day 2 have moved to day 4, and day 2 is filled with Search & Analysis using OpenSONAR.
The day to day program will include:
Day 1: Introduction – Jan Odijk, Utrecht University
- Part 1: Introduction to CLARIN, context; searching for data with CLARIN; Virtual Language Observatory, Metadata Search; Overview of the whole course;
- Part 2: I will use a concrete linguistic research question to illustrate how CLARIN and data and tools in CLARIN can be used to improve the empirical base for linguistic research;
Day 2: Search and Analysis with OpenSONAR – Jan Odijk, Utrecht University
I will illustrate how the OpenSONAR interface to the SONAR Dutch text corpus (500 million tokens) enables you to search for examples containing specific words or combinations of words and their morpho-syntactic properties
Day 3: MIMORE (Microcomparative Morphosyntax Research Tool) – Sjef Barbiers, Meertens Institute / Utrecht University.
The MIMORE tool enables researchers to investigate morphosyntactic variation in the Dutch dialects by searching three related databases with a common on-line search engine. The search results can be visualized on geographic maps and exported for statistical analysis. The three databases involved are DynaSAND (the dynamic syntactic atlas of the Dutch dialects), DiDDD (Diversity in Dutch DP Design) and GTRP (Goeman, Taeldeman, van Reenen Project).
Course Material: MIMORE educational module
Day 4: Enrich your Own Corpora – Jan Odijk, Utrecht University
I will illustrate how you can enrich your own (Dutch) data with various kinds of linguistic annotation: spelling corrections, part-of-speech codes, morpho-syntactic features, full syntactic structure, co-reference relations, and more. I will also show how you can use the results of this tool in search engines so that you can search, browse and carry out analyses of your enriched data. You will be able to experiment with the tools yourself in the hands-on part of this lecture
Day 5: CLARIN-compatibility and Wrap-up – Jan Odijk, Utrecht University
How can you make your data or tools CLARIN-compatible, and why would you do that in the first place? How can you store your data/tools in the CLARIN infrastructure? The role of CLARIN-centres, types of CLARIN-centres in the Netherlands. Concluding Overview.
- Odijk, J. (2014), ` The CLARIN infrastructure in the Netherlands: What is it and how can you use it?’, unpublished ms., Utrecht University [pdf]
Oostdijk, N., Reynaert, M., Hoste, V., Schuurman, I. (2013) The Construction of a 500 Million Word Reference Corpus of Contemporary Written Dutch in: Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme (eds. P. Spyns, J. Odijk), Springer Verlag. [pdf] (Open Access)
- TTNWW: Handleiding TST tools voor het Nederlands als Web services in een Workflow, Meertens Institute, 2013 [pdf]
- Odijk, J. and A. van Hessen (2011), "Sharing Resources in CLARIN-NL", Proceedings of the Language Resources, Technology and Services in the Sharing Paradigm workshop at IJCNLP 2011, pp. 98-106 (November 2011), Chiang Mai Thailand. [pdf]
Odijk, J. (2014), ` The CLARIN infrastructure in the Netherlands: Design and Construction’, unpublished ms., Utrecht University [pdf]
Uytvanck, D. van, Stehouwer, H. and Lampen, L. (2012), "Semantic metadata mapping in practice: the Virtual Language Observatory". In Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard B., Mariani J., Odijk, J. and Piperidis, S. (eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey: European Language Resources Association (ELRA), pp. 1029-1034. [pdf]