Sunday, April 19, 2015

Google Machine Translation

Aziz writes to inform me that the machine translation from Albanian into English provided by Google is very poor, while translations from other languages (Italian, for example) are more fluent.
 
With respect to the contrast, it is my understanding that the machine translation process has something to do with AI and the fact that Google Translator does not have much experience with Albanian. It will be interesting to see if this improves with respect to this and other "less used" languages. People familiar with Google translator have  noticed that as a cursor passes over the text of a translation, Google will ask for corrections and alternative phrasings. Evidently, this is one way  that Google Translator "learns.

Here is a description of the process from Wikipedia:

Google Translate does not apply grammatical rules, since its algorithms are based on statistical analysis rather than traditional rule-based analysis . . . Google Translate does not translate from one language to another (L1 → L2). Instead, it often translates first to English and then to the target language (L1 → EN → L2). However, because English, like all human languages, is ambiguous and depends on context, this can cause translation errors. For example, translating vous from French to Russian gives vous → you → ты OR Bы/вы. If Google were using an unambiguous, artificial language as the intermediary, it would be vous → you → Bы/вы OR tu → thou → ты. Such a suffixing of words disambiguates their different meanings. Hence, publishing in English, using unambiguous words, providing context, using expressions such as "you all" often make a better one-step translation . . . [A] solid base for developing a usable statistical machine translation system for a new pair of languages from scratch would consist of a bilingual text corpus (or parallel collection) of more than a million words, and two monolingual corpora each of more than a billion words.[33] Statistical models from these data are then used to translate between those languages . . . To acquire this huge amount of linguistic data, Google used United Nations documents. The UN typically publishes documents in all six official UN languages, which has produced a very large 6-language corpus . . . When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation. By detecting patterns in documents that have already been translated by human translators, Google Translate makes intelligent guesses (AI) as to what an appropriate translation should be.
 
To see the entire article, please click HERE.

No comments: