Niklas Laxström

From devsummit
Tags Machine Translation, Translation
Primary Session Next Steps for Languages and Cross Project Collaboration
Secondary Sessions

Translation as a way to grow and connect our communities

The Wikimedia movement depends a lot on translation, but I believe we are not currently using the full potential of it. This affects us in many ways -€“ most importantly: - language barriers isolate communities -€“ but we all need to work together, - our content is not accessible to every human, - our movement is massively multilingual, but not the forerunner in using translation and other language technology.

We should improve our translation tools and leverage machine translation in a sustainable way. Translation should be a core part of our infrastructure and integrate into our projects seamlessly. It will help our communities to grow, as demonstrated by the Content Translation tool. I suggest three focus areas.

  1. 1 Find partners to build high quality open-source machine translation

Our projects run on free software. Currently, we depend a lot on proprietary data-driven (statistical) machine translation. For translation to be an essential part of our infrastructure, then this is neither sustainable nor acceptable. We already use expert-driven (rule-based) open-source machine translation software, e.g. Apertium, which provides some high quality language pairs. However, the proprietary services cover a lot more language pairs, albeit with lower quality. Building machine translation engines is hard work, therefore we should find partners to pursue both data-driven and expert-driven engines. The impact of this could be big and extend beyond our movement.

  1. 2 Bring translation everywhere

We already have good translation tools, but we need to move beyond user interface and Wikipedia pages. We should integrate translation tools into our discussion systems to support multilingual discussions as well as to understand discussions in foreign languages. This should be combined with summarizing tools.

We have a lot of (structured) content that can be translated but doesn't have a proper tooling for translation, e.g. Wikidata and Commons image description, labels in SVG files. We should adapt and integrate our existing translation tools to support these types of content.

We should also make language selection available to all users, including those not logged-in in our multilingual projects, such as Wikidata, to show the translations.

  1. 3 Improve our translation tools

Our translation tools have serious issues that result in slower translations or not being translated at all.

Our translation memory is not working well. It often fails to suggest good matches. This is apparent when translating the Weekly Tech News. Translators' time is wasted when they need to re-translate (introducing inconsistencies) or searching previous translation manually. Without improvement our translation memory is not suitable for use in Content Translation either.

When translating documentation pages, announcements, etc. using the Translate extension, a significant amount of extra markup is added to the wikitext. Editors find this markup inconvenient and justifiably resist using this tool. This feature should be improved so that it works with Visual Editor and doesn't require additional mark-up in the wikitext.