I read an interesting article today in the San Francisco Chronicle about machine translation. We’ve all used Google Translate (or perhaps another online machine translation tool such as Altavista) at some time or another, and your mileage can certainly vary. Some similar languages play nicely with one another – Spanish to Italian, for example. However, translate Chinese to English and you start getting some pretty odd-sounding results.
This is mainly due to the way machine translation works: the vast majority of languages are so idiomatic, having undergone slow but steady change over a long period of time, that translating from one to another just isn’t as simple as replacing each word in a sentence with its counterpart in the target language. This is why professional translation agencies still enjoy a roaring trade, despite the ease and convenience of machine translation services.
However, machine translation isn’t as simple as all that. Contextual translation has come in leaps and bounds over the past decade or so, where translation algorithms analyze a huge corpus of texts (aided in recent years by the vast amounts of text available on the internet) and start to work out patterns and shapes in which the target language is used. Thus homonyms such as “strike” (a word with 88 different definitions in some dictionaries) can be translated contextually by cross-referencing texts written by humans, enabling the machine translation to give an accurate translation instead of simply guessing at the correct equivalent in the target language.
This has enabled services such as Google Translate to perform well enough to do basic translation for foreign language websites – you may still find plenty of broken English, but at least you can understand the basic gist of the text.
While you may not think that machine translation’s complexity and accuracy being at the equivalent of a 10 year old’s standard is not particularly impressive, it is certainly leaps and bounds ahead of what we had only a decade ago:
The algorithm’s understanding of language “has moved from a 2-year-old infant to something close to an 8 or 10-year-old child,” said Amit Singhal, a Google Fellow, an honorific reserved for the company’s top engineers. “They’re still not approaching the conversations you’d have as a teenager.”
You can read the whole article here.