Motivation - SIGUL

Porting a NLP system (for instance a speech recognition system or a syntactic parser) to a lesser-resourced language requires techniques that go far beyond the basic re-training of the models.

Indeed, processing a new language often leads to new challenges (special phonetic and phonological systems, word segmentation problems, fuzzy grammatical structure, unwritten language etc.).

The lack of resources requires, on its side, innovative data collection methodologies (via community sourcing, for instance) or models for which information is shared between languages (e.g. multilingual acoustic models) or even approaches that do not need annotated data (e.g. zero-resource or zero-shot methods).

In addition, some social and cultural aspects related to the context of the targeted language bring additional problems: languages with many dialects in different regions, code-switching phenomena, massive presence of non-native speakers.

It is also important to bridge the gap between language experts, native speakers and technology experts.

Finally, Digital Humanities offer new opportunities to work on ancient languages which are inherently under-resourced.

Therefore, the main goal of SIGUL will be to increase interaction among researchers interested in all the above topics.