Expires
2014-04-23
Title
[EuroLangNet 21+3] European network 21+3 for HLT suport of
Multilingual Knowledge Based Processes
Theme/ Activity/ Topic
ICT
ICT 17- 2014: Cracking the language barrier
Organization
Slovak University of Technology
Country
Slovakia
Contact
http://www.ideal-ist.eu/ps-sk-89303
stefan.svetsky@stuba.sk
Description
The pilot and robust network will create a shared platform for multilingual resource and tools applications, and for benchmarking activities tailored on MT&LR area within multilingual knowledge based processes in the field of materials sciences and technology, industrial testing, technical standardization, occupation health and safety, environmental protection and general R&D topics. In context with the MT&LR area, the focus will be on investigation how to transfer and integrate the content of global and EU multilingual datasets and national corpora into personalized knowledge based processes, which are daily performed by European individuals to be sustainable and competitive within their jobs. This is planned to achieve via monitoring the state-of-the art, best practice exchange, assistance with multidisciplinary MT research, testing and evaluation of existing personalized solutions, and their adaptation to individual users; and this all within the close synergy with major multilingual European platforms, networks and national policies and programs of the Consortium members.
An individual focus will be on the developing categories of set of keywords which would enable “switching” between all 24 European languages via batch knowledge Internet retrieving and advanced multilingual search in the various categories of knowledge based processes, that require automatic or half-automatic MT&LR support, when running on (i) personal cloud, (ii) desktop client computers and (iii) university’s server. This, will be made with intention to solve the existing absence of interoperability in technical and no-technical fields (as it is natural for individuals), as well, regarding harmonization in the framework of multidisciplinary research.
A special focus will be on the testing new paradigm of multilingual support based on a prepared patent application, including implementing of in-house developed personal software tool. This will enable to integrate basic elements of human language technologies into knowledge based processes, such as natural language processing, speech technology, machine translation, information and knowledge processing. This bottom up integration of human language technology into the individuals’ processes, especially in relation to the implementing higher quality of automatic MT, should give synergic added value to human computer interaction, or higher level of knowledge economy in general. This also affects the solving issues, such as exploring if even the use of existing European datasets and corpora is suitable for individuals when managing multilingual knowledge in real technical practice within global market conditions (e.g. this requires researchers to explore their compatibility with technical standards terminology, patent classification systems, etc.).
PROJECT DESCRIPTION
Proposal Outline:
The title 21+3 means that network will cover 21 referenced languages + 3 languages with high level of MT (due to the best practice and experience exchange), i.e. our intention is to have min. 21+3 participants in the project Consortium. The concrete idea is to work in “multi-pairs”, as is described in the following text. From the Slovak language point of view, all common knowledge based processes are in principle multilingual knowledge based in the view of the global or European market condition (see http://www.meta-net.eu/whitepapers/e-book/english.pdf). Moreover, from the ICT point of view, these processes are uncertain and work with unstructured data. According to our findings in the research on technology-enhanced learning (including activities for FP7 Consortiums KEPLER /2007 and L3Pulse /2013), the automation such processes requires a parallel solving of three autonomous categories: (i) the modelling processes – to be computerizable, (ii) the development of tools and applications – here automatic machine translation, knowledge processing in natural language, text to speech, speech reckognition, and (iii) the automation of work on desktop client computers and networks (clouds, servers) – for instance, adaptation to operating system, data transfer, conversion of text-, image- and multi-medial formats, etc. In addition, one should consider that in real practice all multilingual knowledge based processes consist from sequences of sub-processes / steps, however only a part of them requires automatic or half-automatic MT. Thus, a combination of these issues mentioned should be implemented into workpackages structure when planning any project.
In the view of above mentioned, we consider the following potential workpackage structure as a benchmark background or part of the research or innovation project (e.g. as an autonomous workpackage), with focus on automatic and half-automatic translation within multilingual knowledge based processes:
WP1 State-of -the-art in MT&LR / HLT (Human Language Technologies)
• Exploring existing multilingual processes in global market (categories, performers, application areas)
• Exploring disposable multilingual applications/services, datasets, national corpora and European databases and their suitability for individuals
• Evaluation – selection of main processes (which should be selected for solving in the workpackages)
WP2 Modelling / modifying processes using multilingual knowledge to be suitable for automation
• Multilingual knowledge base design in relation to domain content
• Unification and design of processes to be computerisable
• Evaluation of multilingual resources, knowledge base and processes for following computerisation (prevailing via MT)
WP3 Case studies/Modelling/Testing Informatics Tools and Applications for MT&LR (HLT in general)
• Cloud computing applications practice
• Combined off-line and online applications (testing the batch knowledge processing paradigm)
• Recommended system for suitable large/big datasets
• Design of knowledge sets in multi-formats for self-regulated processes (note: combined text – image – audio – video formats)
• Exploring suitability of MT, text-to-speech, speech reckognition and emerging technologies regarding personal processes
• Testing existing / developed MT-solution within automation of processes running on personal computers, clouds and networks
WP4 Automation of Multilingual Knowledge Based Processes in Natural Language (MT&LR)
• Testing computerisation of unified processes and domain content
• Implementing cloud – and server based multilingual applications
• HTL resources: performing multilingual WEB-monitoring, advanced search and retrieving, including testing Internet services
• Transfer of scientific heritage via multilingual approach into education and training
• Comparison of suitability and quality level of MT, text-to-speech, speech reckognition and emerging technologies
WP5 Multilingual Benchmarking Portal
• Design of cooperation portal
• benchmarking, communication, best practices and information exchange
• results dissemination
WP6 Project management
Our ICT 17 project idea is based on 1 + 23 approach, i.e. on automatic MT from Slovak language to all EU languages and back (similarly e.g. automatic MT from Cz to 23 languages), including an intention to transform positive experiences from the three major MT languages (En, Fr, Es).
We are at the beginning development of a multilingual “Switcher” which will switch between one European language to other 23 language. This enables us to solve the automation of multilingual based processes, thus, automatic machine translation will be a sub-part of the overall automation. This will be focused on mentioned technical fields (material science and technology, technical standards and industrial testing, environmental protection, occupational safety and health, R&D activities,…). This supposes three categories of selected multilingual resources for automatic MT: (i) European and global technical databases, datasets and repositories, e.g. patents and standards databases, chemical databases, etc.), (ii) official multilingual datasets like CEF and national corpora, maybe also so called “Big Data” in connection with ICT 15 call results, (iii) general EU no-technical sources as is CORDIS or europa.eu (e.g. nowadays, environmental laws are already at disposal in all EU languages). Of course, there is a large scale of other mono- or bilingual sources at disposal.
Note: Example of a concrete action – One example of use of the “Switcher” for knowledge based process: “Proposing invention and patent application”.
This process requires sequences of sub-processes to be automated, e.g. individual searching on patents in European database Espacenet offers more than 80 million document — external retrievals are made parallel by National Patent Office or any scientific-technical information center from some word patent databases — commonly hundreds of patents abstract in English is the first result, thus, a Slovak researcher must reading and evaluating them in English anguage and compare if his invention is “new” (very high cognitive load) — at the end he must write report in English, as well, if the patent application is dedicated for EU it must be translated, etc. This is one of the simplest example how could be the “Switcher” used. More complicated case are when switching from Slovak Language to 23 EU languages and when searching in e.g. ten significant recommended EU datasets, corpora and internet databases (result is 23 x 10 = 230 hits). This example explains a very significant case, that automatic MT for individual user must be strictly tailor made. Because if not, the number of hits would be extremely high and not suitable for common use. Therefore a system of keywords must be developed for each MT application area. Thus, personal support is radical different if any researcher group develops solution like a national corpora and so on. It requires another ICT approach. This example as well demonstrates, that in real practice one must investigate if the existing multilingual European datasets, corpora and databases are suitable for sharing them within automatic MT, respectively it must be explored how these sources and repositories could be used for the personalized automatic MT.
Contact person:
Stefan Svetsky +421 949 541835
Keywords:
Human Language Technologies, automatic translation, technology-enhanced learning, Digital content, learning analytics, speech recognition, text processing
knowledge mining, knowledge processing, Human Computer Interaction
PARTNER PROFILE SOUGHT
Required skills and Expertise:
[Project leader]
Skills and Expertize with project management, including scientific – financial – risk – IPR – gender management, and all appropriated issues according to the Horizon 2020 manuals.
Skills and Expertise in the field of Human Language Technologies with target on MT&LR, who is able to lead and coordinate ICT 17 => CSA or the ultidisciplinary research project or innovation project (where the robust EuroLangNet 1+23 network could assist or our team, e.g. via leading a workpackage in context with the subject and project description).
[Partners for consortium members]
Min. one partner from each EU country who has any skills and expertizes for supporting network activities according to the subject and project description.
Description of work to be carried out by the partner(s) sought:
Work to be carried out by the partners is described in subject and project description and depends on the manuscript of workpackage structure. Thus, each partner should find his place or perform similar activities. It is also related to previous partners activities, conditions and references, e.g. see our case:
We have experiences mostly with CSA (FP7 to FP5) in relation to scientific-technical topics, especially with multilingual issues within knowledge based processes performed in global industrial market, as well in educational area (within a didactic driven technology-enhanced learning we had to implement several IT disciplines). Also, we work in national standardization ISO/CEN committees or ICT society, in which terminological language issues are solved (En / De / Fr to Sk), including having skills with translation ISO/CEN standards into Slovak standards in engineering field (e.g. corrosion protection).
We are able to perform solving and leading individual workpackages or topics within a research / innovation project or CSA. However we have not capacities and skills for writing complet proposals and project management. On the other hand, we need HLT suport, e.g. automatic MT within daily performed multilingual knowledge based processes. For this purpose, we have developed an infrastructure, tools, in-house software and tested all categories of HLT in these processes. For instance we cover knowledge processing in natural language, but we found that actual speech technologies are not yet suitable for common personal use (e.g. as a Slovak I have very – very low succes in speech reckognition in spoken English, therefore I do not use Nuance software at all ).
Concerning MT I have developed a simple support based on thesaurus when writing scientific papers, or we use Google translator and Systran (www.systran.fr), however it has no Slovak translation. To make simple translation word by word is not problem, it is only a question of bi-language datasets, however why should we developed it because many local or global solutions exist?
Type of partner(s) sought:
R&D institutes, universities, high-tech companies, international consortiums, etc. which are focusing on human language technologies, especially on automatic machine translation with interest in implementing this into multilingual knowledge based processes (technical, educational, research, innovative, market activities and mental processes of experts and common users in general).
Despite of the fact that this EoI is written for CSA we are looking for leader – coordinator who has intention to write for ICT 17:
a) research project
b) innovation project
c) CSA.
Looking for a Coordinator for your proposal:
Yes