Expires
2014-04-23
Title
Inclusive-MT
Inclusive Machine Translation for the EU
Inclusive-MT focuses on the role of language in the two-way interactions between technology and society, aiming to close a gap that excludes minority languages and morphologically complex languages. Inclusive-MT achieves this by implementing rule-based machine translation (MT) tools that are appropriate for populations in the EU currently underserved by prevailing statistical MT resources.
Theme/ Activity/ Topic
ICT
ICT 17- 2014: Cracking the language barrier
Organization
UiT The Arctic University of Norway
Department of Language and Linguistics
Country
Norwey
Contact
http://www.ideal-ist.eu/ps-no-89152
Laura Janda
laura.janda@uit.no
Description
Proposal Outline:
Language is a crucial factor in human identity and culture, with fundamental implications for economic and civil security. People who speak languages overlooked by mainstream services face injustices in access to information and connectivity. Thus opportunities for social mobility, trade, and the building of mutual trust are lost. Inclusive-MT delivers tools that remove information-age inequities and thereby promotes fundamental democratic values in European society. We focus primarily on the Barents and Baltic regions where the challenges are acute, although other parts of Europe are also in our purview.
Inclusive-MT is a multidisciplinary research project that delivers seamless machine translation services for underserved populations in the European digital market. Linguists, computer scientists, programmers, and SMEs collaborate to study the use of language resources and provide machine translation coverage extendable to EU languages, regardless of number of speakers, grammatical structure, and lexical complexity. Inclusive-MT builds on the ground-breaking successes of Giellatekno (http://giellatekno.uit.no/english.html), the Saami Language Technology Center at the Arctic University of Norway and its partners at: the University of Tartu (Estonia), University of Helsinki (Finland), and the University of Alacant (Spain). Partner SMEs can include: Morphologic (Estonia), Prompsit (Spain), and Kaldera (Norway). This project places special focus on minority languages such as the Saami languages and morphologically complex languages that consequently have “weak or no” machine translation support (cf. META-net language white papers), such as Estonian, Finnish, and Russian.
The “small” languages of the EU are particularly poorly served by current MT systems since the training data they require cannot be feasibly obtained and the grammatical structures of minority languages are often highly complex and radically different from English and other benchmark languages of such systems. Inclusive-MT thus has an ideal testing ground for developing translation systems that overcome the challenges of extreme linguistic differences and small, underserved populations.
Inclusive-MT serves language pairs that are not represented in the Europarl matrix, breaking ground in two dimensions: a) by including Russian we provide MT for a major neighbor to the EU, and b) by providing MT for morphologically complex languages, we move away from the West European bias of using English as a hub language. Examples of the types of linguistic challenges that Inclusive-MT solves include radical differences in the structure of gender, aspect, case, and verbal agreement. We represent minority and morphologically complex languages in their own terms and provide our tools free of charge, thus leveling the playing field for all users regardless of size, linguistic features, or economic resources.
Inclusive-MT studies language resource behaviors, particularly on mobile devices, and strategically targets usage domains in implementation, such as:
Use of social media in both private and corporate communication and adaptation of MT tools for these environments;
Civil status of Russians living in Estonia and other EU countries;
Visibility and status for the Saami languages in Norway, Sweden, Finland, and Russia.
The solutions provided by Inclusive-MT are portable to any language, regardless of its morphological and syntactic complexity and divergence from lingua franca languages like English. We tackle typical translation obstacles such as compounding, word order, and differences in terms of analytic vs. synthetic packaging of meaning. Rule-based MT does not demand huge parallel corpora as a prerequisite, thus removing a limitation that has kept minority languages shut out of the machine translation market. Thanks to the plasticity of this project and a commitment to provide open-source and open-access products, Inclusive-MT can play a major role in protecting the linguistic heritage and rights of EU citizens and their global neighbors.
Inclusive-MT is a value-added translation system that additionally supports language users, learners, and researchers. Because rule-based machine translation undertakes grammatical and lexical analysis, these can serve as input to electronic dictionaries and learning modules. Lemmatized electronic dictionaries are essential for languages with complex morphology, where the inflected form of a word may differ radically from its dictionary heading; for example, Giellatekno’s online dictionary of North Saami can locate a noun from input of any of its 130 inflected forms. The linguistic analysis of a rule-based system can feed Intelligent Computer Assisted Language Learning (ICALL) resources to support real human communication across language borders. Multipurposing thus makes rule-based MT the most efficient choice for integration of digital and live communication. Furthermore, this analysis can be used by linguists to extract significant trends from language corpora, thus strengthening language research.
Inclusive-MT is not a proof-of-concept project. It continues the trajectory of success laid out by its partners that have already developed functional idirectional MT tools for Norwegian (both Bokmål and Nynorsk) and North Saami and has parallel tools under development for Estonian, Finnish, and Russian. The track record of Giellatekno and its partners guarantees that results will be achieved, yielding robust MT systems that service the languages targeted by this project.
Keywords:
inclusion, User empowerment, communication; rule-based machine translation, identity
PARTNER PROFILE SOUGHT
Required skills and Expertise:
We are open to partnerships with experts in both rule-based and statistical machine translation.
It is also possible to partake in more general projects involving equity of access and human identity in relation to connectivity and communication.
Description of work to be carried out by the partner(s) sought:
Research comparing access and use of machine translation among minority vs. majority populations.
Research on use of language and its relation to identity and culture.
Research on linguistic differences between minority and majority languages in Europe.
Type of partner(s) sought:
Both Academic and SMEs, provided there is a commitment to open-source and open access of products.
Looking for a Coordinator for your proposal:
Yes