The Dawn of Machine Translation
Machine translation (MT) has come a long way since its inception, from an obscure technology whose shortcomings nearly caused it to be dropped, to its use in thousands of communication applications worldwide. It has enabled the globalization of businesses on a scale and at a speed that could never have been envisaged even 30 years ago. How has this happened, when at first the technology showed so little promise? In this article, we will explore in greater detail at the evolution of MT’s development.
The First Machine Translation System
Back in 1949, mathematician and scientist, Warren Weaver, published a memorandum titled ‘Translation’. It was this text that set in motion a revolution that was to be machine translation (MT) and influenced many researchers in the years to come. In it he put forward four ways that computers could be used for translation purposes:
- Using context for homonyms (words with more than one meaning – such as pen or park), in order to determine which definition to use.
- Using a set of premises from which the computer could use logic to come to a conclusion.
- Using cryptography to decipher texts.
- Proposing that all languages have linguistic universals, e.g., they all have an alphabet made up of vowels and consonants and all sentences have a structure which contains such parts as verbs and nouns.
It was this belief in a universality of language that led to what has probably become the most quoted paragraph in the history of MT:
Think, by analogy, of individuals living in a series of tall closed towers, all erected over a common foundation. When they try to communicate with one another, they shout back and forth, each from his own closed tower. It is difficult to make the sound penetrate even the nearest towers, and communication proceeds very poorly indeed. But, when an individual goes down his tower, he finds himself in a great open basement, common to all the towers. Here he establishes easy and useful communication with the persons who have also descended from their towers. ” 1
One of those to be influenced by Weaver’s memorandum was Yehosua Bar-Hillel. Bar-Hillel not only became the first permanent MT researcher at the Massachusetts Institute of Technology (MIT), he also organized the first conference on machine translation there. During the first conference, he stated that MT needed to embrace syntactic parsing in order for it to succeed and that “completely automatic and autonomous mechanical translation … is, in general, practically excluded, even with respect to scientific texts.”2 He reasoned that, with the technology available at the time, MT could only be an aid to translation and not much else. Even so, by the end of the conference, the majority of the attendees were optimistic about the future of MT. It was proposed that one of them, Leon Dostert of Georgetown University, should go ahead and build a simple system that could prove to the world that MT was indeed a viable proposition for future funding.
Dostert returned to Georgetown and set up a partnership with IBM, the technological giant. By 1954, they had developed a working machine and demonstrated its effectiveness to the public at IBM’s headquarters in NewYork. The demonstration was only on a small scale, translating just 250 words and more than 60 sentences from Russian into English. However, it was a huge success and prompted considerable interest, not only from the public and media but also governmental organizations. In the following years, MT research programs began in Russia, Japan, and France, and in the United States, the United States Air Force started to use the ‘Automatic TranslatorMark I’.
These first translation machines worked by translating one language into another, word by word, using bilingual dictionaries. This, of course, led to many problems with accuracy, especially when it came to sentence structure and homonyms.
The next few years promised much but, in fact, delivered little and in 1964 the Automatic Language Processing Advisory Committee (ALPAC) was set up with the purpose of discovering what progress had been made in the field of MT. Its report, in 1966, was damning – not only was MT inaccurate, but it was also slower and more costly than its human counterpart. Such an adverse judgment halted future funding for the majority of MT research in many countries, and it looked as though the concept of machine translation was all but dead in the water.
All was not yet lost, however: Canada, France, and Germany continued to develop the accuracy of MT systems and, in 1968, a former Georgetown MTman, Peter Toma, set up a company called LATSEC (Language Automated Translation System and Electronic Communications) in La Jolla, California.This company would develop a system that was better than its predecessors at translation and called it SYSTRAN (Systems Analysis Translator). This was later to become the name of the company itself.
The Influence of Geopolitics
During this time, the USSR and the United States were locked in an ever-increasing spiral of paranoia. Ever since the end of World War II, the two sides had become suspicious of the others’ motives. Incidents such as the Cuban Missile Crisis in 1962 and the Vietnam War, which started in 1965 and ended, for the US in 1973, only served to heighten fears of a new, more devastating conflict. This ‘Cold War’, as it became known, created a need for a machine which could translate vast amounts of scientific and technological documents from Russian into passable English. The sheer quantity of the information meant that there were not enough human translators available who could do the job quickly enough.
By 1970, Toma had developed the SYSTRAN program to a point where its Russian to English translation was good enough to be used by the United States Air Force Foreign Technology Division. Later it was also used by NASA for the Russo-American Apollo-Soyez space mission. Not content with one bilingual pair, Toma then developed SYSTRAN so that it could translate between certain European languages. Once that had been accomplished, the system was adopted by the Commission of the European Communities.
The quality of SYSTRAN’s translation, although better than what had been before, was nevertheless only just adequate for the needs of its users. It still was not sophisticated enough to cater to the growing requirements of the commercial world, where translation was in higher demand due to the expansion in international trade. The next huge step forward in MT, rule-based translation (RBMT) was definitely a technological advance, but was it good enough to satisfy the ever-increasing need for fast and accurate MT?
To find out more, let’s now explore a little bit further down the road of machine translation, particularly in the newly developed Rules-Based and Statistical Machine Translation models.
Rule-Based and Statistical Machine Translation
In the early days of machine translation, we examined some of the research into MT. Further development nearly didn’t survive because of withdrawal of funding. This had been due to their lack of accuracy, fluency and because they were just not cost-effective. Let’s now explore the next generations of MT systems and whether they were any better. Searching For the Holy Grail of Machine Translation During the 70s, the demand for translation grew due to an increase in international trade. There was a renewed interest in MT, and tech companies and academic institutions once again found themselves funded to research the subject, often from government grants. The Holy Grail was to create a system that was not only faster and more accurate than its predecessors but also cost-efficient.
Many countries now had their own research projects. In Canada, in 1976, TAUM (Traduction Automatique à l’Université de Montréal) produced Météo which translated weather reports between French and English (and still does). In addition, there was SUSY (Saarbrücker ÜbersetzungsSYstem), developed in Germany, GETA-Ariane from France, and Rosetta from Holland. In the States, Carnegie-Mellon University was working on a knowledge-based system, while in Japan, the Mu project explored the translation of Japanese to English.
These new forms of translations, although still only bi-lingual, no longer just used dictionaries. Instead, algorithms were developed that set down rules for how the machine should handle the syntax and semantics of different languages. In particular, rule-based machine translation (RBMT), as it was called, was found to be very useful for translations between texts when they were from a specific discipline – medicine, for example – as the same words and phrases tended to occur again and again. It was a significant improvement on the old dictionary style MT systems, but it still had numerous flaws. The RBMT system, although better than those before it, was not accurate enough for the needs of commercial customers and it was also incredibly costly due to the number of linguists needed to write the rules, maintain dictionaries, and correct errors. Nevertheless, research into MT persisted and took the field into a new, and more promising area.
Statistical Machine Translation Enters the World
During the late 1980s and 90s, huge strides were also being made in the world of computing. Computer processing units became ever more powerful and could handle larger loads of data more quickly than at any other time previously. These advances also benefitted MT scientists who were able to harness this new progress to develop a new kind of system, one that used statistics and probability rather than rules to decide how a sentence should be translated from one language into another.
The first experimental translating machine using statistical methods was developed in the 1990s. It was called Candide and was the brainchild of IBM. Statistical Machine Translation relies on having a large body (corpus) of parallel texts – the same narrative already translated into both languages. It then uses what it has learned from the contents to translate a new document using probability to determine what words mean and how they are placed in a sentence.
The first SMT machines, like Candide, were word based – that is they concentrated on translating one word at a time. As with the rule-based system, this created many problems with accuracy as they could not always detect idioms or compound words, or even determine the context. To try and get around this issue, new SMTs were developed which translated small sections of text at a time instead of just a single word. This became known as phrase-based translation and machines based on this are still used today. In turn, the fluency of phrase-based SMT was improved through the development of syntax-based translation and hierarchical phrase-based translation (a combination of phrase and syntax-based SMT).
Moses and Google Translate
A famous example of a purely statistical system was the first Moses engine. Launched in 2007, it was used extensively in the industrial and academic fields and, before the older version of Google Translate came online, was the most popular of open source SMTs.3
Another one was, of course, the afore-mentioned Google Translate, the version that operated pre-November 2016. Unlike the current version which uses deep neural networks, the old Google Translate used phrase-based SMT technology. Because of the popularity of the search engine, it was widely used as a free online translator, but it was notoriously inaccurate, as several online webpages point out, with hilarious examples.
One such example was from 2012 when the Catalan government decided to translate its website for an international readership. Instead of spending money to hire a professional translator, it opted to use Google Translate. Imagine their red faces when it was pointed out to them that Google had translated the former president, Josep Tarradellas a ‘George Washington’, and the agriculture minister Josep Maria Pelegri as ‘Joseph and Mary Pilgrim’.4
As this example shows, SMT has not been without its difficulties. For a start, finding large enough samples of parallel texts in every language is not always easy. SMT has also had trouble recognizing ‘rare words’ – those that do not appear in common vocabularies, as well as dealing with idioms and different word orders. Nevertheless, over the years, SMT has seen many improvements – the introduction of the hierarchical model and linguistic meta-data to name just a couple, and the results have been quite impressive, albeit still far from perfect. A hybrid system that uses both RBMT and SMT improves fluency significantly and has proved popular within the language industry.
A couple of years ago, that would have been the end of this story, but since 2015, the rise of MT in both the private and public sector has been meteoric, all thanks to one thing – the neural network.
The Rise of Neural Networks in Translation
We looked at how machine translation (MT) developed from very basic dictionary translation through to using rule-based algorithms and then to statistical systems. Statistical based translation (SMT), although slowly increasing in accuracy, was still not progressing quickly enough for the needs of a global online population. One thing was to change all that in 2015 – the introduction of neural networks to the language industry. However, before we get what’s happening now, we first need to understand the development of the field of neural networks, commonly known as artificial intelligence (AI).
AI – the Artificial Brain
Artificial intelligence is everywhere – your cell phone, your car, your computer and even in your home. It seems to have arrived so suddenly, and yet AI – the basis of neural machine translation (NMT) has been in the background all along. In fact, the first recognized work on artificial intelligence was done by McCulloch and Pitts as early as 1943.
The term ‘artificial intelligence’ was originated by computer scientist John McCarthy in 1956 at a conference that he organized. Out of this, Allen Newell, Herbert Simon, and Cliff Shaw developed a computer program known as Logic Theorist. It was designed to act in the same way as a human brain would when solving problems and managed to do quite well by finding the solutions to 38 of 52 mathematical theorems. Along with the men mentioned above, Marvin Minsky of MIT and Arthur Samuel of IBM were the pioneers of this new field of science, and they and their students went on to improve and create programs that could play chess and even speak English.
By the mid-sixties, at the height of the Cold War, research into AI became of great importance to the US Department of Defense as, just as with machine translation, America urgently needed new technology to stay ahead of the Russians. However, prototype AI programs, although promising much, did not live up to their promises. Soon, funding for research was significantly reduced, leading to what became known as the ‘AI winter’.
Throughout the decades that followed, the fortunes of AI resembled arollercoaster. Advances were made that encouraged new funding, only to be followed by more setbacks. Nevertheless, research persisted and, with everincreasing computing power and data to learn from, a new and advanced technique called ‘deep learning’ enabled AI to be useful in multiple situations.
The application of AI, or neural networks (NN) as it is also known, to machine translation started in 2003 when a language program using NN was set up by researchers from the University of Montreal. In 2013 a great leap forward was made when Nal Kalchbrenner and Phil Blunsom developed an end-to-end encoder-decoder model capable of deep learning, and it was this model that is generally considered to be the first of the true neural machine translators (NMTs).
Neural Machine Translation Takes on SMT
So, how does NMT work? To explain the system as simply as possible, it mimics the neural structures of the brain, with nodes of data known as neurons and the pathways between them called synapses. Unlike statistical machine translators (SMTs), today’s neural networks do not work in a linear plane but instead have many layers and the input language travels through the encoder and then traces a pathway through the nodes, sometimes looping back or moving sideways as it picks up the information it needs. It also learns from previous examples that have been stored for future reference. Once it has found the solution, it moves to the decoder where the text is translated into the target language. Instead of just a word or a short phrase, NMTtranslates whole sentences at once, no matter how long, which improves contextual accuracy more than SMT could ever hope to.
Of course, it is far more complicated than that, and neural networks can take weeks and months of training on parallel texts to achieve any sense of accuracy. Also, varied types of neural networks are better at different tasks – for example, convolutional networks work well with image recognition and recurrent networks with speech recognition, but both can be worked together to create a phenomenal system that can translate both written text (including handwriting) and speech. Having a program with such complexity means that instead of only working with language pairs, it can translate from any learned language to another. This is called zero-shot translation. Of course, NMT is only as good as the amount of material it has to learn from, and so some languages are still not as easy to translate accurately as others.
NMT All Around Us
NMT can now be found at the basis of many translation programs, especially those connected with social media. One of the giants in this world has been Google Translate. Originally an online translation engine using SMT (often with incomprehensible results), it now uses its own NMT and accuracy has improved dramatically. Microsoft translation (and its online Google-equivalent, Bing) is very close behind. Skype (another part of the Microsoft family), Facebook, Amazon, and E-bay all use deep neural networks to translate both written words and speech and fluency is improving year on year.
Although NMT is still an emergent technology, compared to the development of other forms on MT in the past, the path of its advances and adoption has been an exponential marvel. It is improving and learning all of the time, sometimes in ways that its developers don’t even understand. It does, however, still have some challenges to overcome. For a start it requires more training time than SMT, the same word may be translated in an inconsistent way, and it still has trouble recognizing words that are not contained in its vocabulary.
Man Vs Machine Translation: Who Won?
Earlier last year in 2017 in South Korea, the International Interpretation Translation Association (IITA) hosted an event where a group of four professional translators competed against 3 machine translation (NMT) programs.
On the technology side there was Google’s translation powered tool “Google Translate,” South Korea’s largest internet provider Naver’s Papago translation app and Systran.
Both teams received four articles:
- English: Fox Business
- English: Thank You For Being Late, Thomas Friedman (selection)
- Korean: Opinion piece by Kim Seo-ryung
- Korean: Part of Mothers & Daughters, Kang Kyeong-ae
Random articles were provided to both teams with the allotted time of 50 minutes to complete the task of translating from Korean to English and English to Korean.The results were judged on accuracy, language expression, logic and organization.
As it turned out, the machines were fast, very fast. Producing translations in a few minutes, while the human translators took nearly an hour. When scores were compiled, Google scored a 28 out of 60, Papago a 17 and Systran a 15. The human translators scored 49 out of 60.
While professional translators may have clearly won this competition, there is more than what meets the eye. Some proponents of AI-powered MT would argue that machine translation by itself is not where the magic happens. What this competition demonstrates, in our opinion, is two-fold. One, we agree that humans are not going to be replaced anytime soon. As one of the commentators of the competition noted:
Though research shows machine translators can perform about 85 to 90 percent of human experts, they still make absurd mistakes. Today’s event is more about how much humans can benefit from using these machines and boost work efficiency of professional translators.”
The second point is that these MT solutions reinforce the notion that the role of the translator is even more critical, albeit as human editors. While replacing the human is many years away, depending on the task, machines, when coupled with the right team of human editors during post editing, will be a force to be reckoned with. In certain legal, medical and literary circumstances, the machine may be useful, but could also prove to either add more work in the editing process or be so wrong that it could be a liability. Perhaps this is a matter of project management and workflow optimization?5 6
So What Next?
The rapid evolution of NMT has, understandably, caused concern amongst professional translators. The fear that artificial intelligence will take over people’s jobs and lives one day seems like it is no longer in the realm of science fiction, but is, in fact, just looming over the horizon of the near future. Those who work in the language industry already work with machine translation, as well as computer-aided translation tools, but these are part of the process and not the whole. Translators are often still needed to pre or post edit pieces of text to make them accurate.
In an interview with translator Gwenydd Jones on the Marketing Tips for Translators blog, she sees such machines as an aid to being more productive rather than as something about to oust her from her job:
You can train the engine to make it more accurate and post edit that translation, and that puts it up to a very high standard so that it is a completely human written text, but the time that you would have spent in the drafting phase of that translation has been shortened, and so your daily productivity grows.” 7
She did not consider the advances in MT to be a threat to human translators, at least not in the short term. But, she added, it was always wise for those in the profession to stay informed about new developments.
So, even with the rapid advances in NMT, a fully autonomous and accurate translation machine has not yet been achieved and probably will not be for the foreseeable future. Linguistics is such a complicated field, with nuances and even body language that a computer would struggle to pick up, let alone translate. On the other hand, the human brain is already fine-tuned for such purposes. For the best of all worlds, it may be that Gwenydd is right – collaboration between human and machine may yet prove to be the most productive and yet most accurate way to move into the future.
Recent Updates on MT & NMT
Although we are nowhere near the stage where human translators will become redundant, it seems that the level of accuracy of neural machine translation has reached a new milestone. In January 2018, Dr. Antonio Toral, Assistant Professor at the Computational Linguistics group of the University of Groningen, and Andy Way, the Professor in Computing and Deputy Director of the Adapt Center for Digital Content Technology, submitted a research paper to Arxiv.org regarding the level of accuracy of NMT on a series of literary works.8
This experiment took 12 popular literary classics and ran them through both an NMT system and a phrase based statistical machine translation system to translate them from English to Catalan – a language which poses challenges yet also has a great deal of training material. The novels, which included Ulysses by Joyce, Harry Potter and the Deathly Hallows by Rowling and Lord of the Flies by Golding, were all written after the 1920s, as novels written any earlier would have contained different styles and vocabularies to those more contemporary.
The results were promising. Using the BLEU (Bilingual Evaluation Understudy) method to score the outcome, they found that NMT scored consistently higher than PBSMT for accuracy. In addition, human translators whose native language was Catalan but who were also fluent in English, evaluated sections of the MT translation in three of the books. Once again, the NMT outperformed its rival. It was estimated that “between 17% and 34% of the translations … are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.”9
While this percentage of accuracy still in no way threatens the job of a human translator, it does provide them with something that could assist them in their work, especially when it comes to post-editing. With up to a third of sentences needing no correction, this would increase both the efficiency and speed of translation work.
- “Warren Weaver Memorandum, July 1949”, MT News International, no. 22, July 1999, pp.5-6, 15
- “Hutchins, John, “Milestones in Machine Translation”, Language Today, no. 13, October 1998, pp. 12-13.
- According to http://www.statmt.org/moses/?n=moses.overview, 2013. Accessed 2017/11/14
- “Lost in Translation”, The Olive Press Newspaper, Issue 137, 2012/06/12, p.42, accessed at https://issuu.com/theolivepress/docs/issue137, 2017/12/15
- A Translation Showdown: Man vs Machine Translation, Alison Kroulek. February 28, 2017. http://www.k-international.com/blog/human-translation-vs-machine-translation-contest/
- Humans beat AI in language translation, Kim Han-joo, Yonhap News Agency. February 21, 2017. http://english.yonhapnews.co.kr/news/2017/02/21/0200000000AEN20170221012500320.html
- Episode 110, “Here today, here tomorrow – machine translation and us – Interview with Gwenydd Jones”, Marketing Tips for Translators, broadcast 2016/11/21, accessed at http://marketingtipsfortranslators.com/episode-110-today-tomorrow-machine-translation-us-interview-gwenydd-jones/ 2017/12/14/
- Gino Diño, ‘Machine Translates Literature and About 25% Was Flawless, Research Claims,’ 01/19/2018, https://slator.com/technology/machine-translates-literature-and-about-25-was-flawless-research-claims/, accessed on 01/30/2018