Before I joined The CJK Group to organize foreign language document reviews in response to US-based electronic discovery, I often heard that the western world gets confused when it comes to the Japanese language. As a native Japanese speaker, this was initially hard for me to process. But now, after having spent several years in the E-Discovery industry, I understand why. In this article, I will examine two perplexing Japanese terms with a rich and convoluted history. It will not discuss any client-specific information but is rather an analysis of key Japanese terminology and its use in history and across time. I find this fascinating not only as it relates to linguistics but also as it intersects with advanced machine learning tools used in the E-Discovery industry.
The rhetorical question I want to initially pose is whether interpreting and translating certain Japanese terminology is hard. If you’re interested in a historical deep dive in Interpreters & Translators, read my colleagues’ article on ͞The Role of The Interpreter & Translator Throughout History. Returning to the question of whether translating ambiguous Japanese terminology is hard, surely this is a loaded question. It depends, right? Allow me to be a bit more precise. In the context of legal disputes where precision is critical and when meaning needs to be understood, how do you reconcile seemingly ambiguous terminology. Rather than wax intellectually about this, let’s take an actual real-world example. In this series, I will examine two Japanese terms, known as “Sontaku” and “Tokusai.”
Disclaimer & Goals
While I do not intend to provide comprehensive knowledge of the Japanese language, I will give you reasons as to why Japanese communication within its unique context alters meaning.Borrowing from recent developments in the press in and outside of Japan, I’m confident this will provoke more questions than it answers. At the very least, this will highlight the importance among professional service providers that regularly interface with foreign language data sets.
Most Japanese did not know of the word Sontaku until earlier this year when it was introduced among the mainstream press for the first time. This was due to the Japanese Prime Minister Shinzo Abe’s political affair, the Toshiba debacle and other private sector scandals that have become public in Japan. It was during the public airing of these scandals that the word Sontaku became suddenly famous and even selected as one of two buzzwords in 2017. 1
As reported by the Asahi Shinbun, “Sontaku refers to the pre-emptive, placatory following of an order that has not been given.”2 To unpack this explanation, let me quote another part of the same article. The word Sontaku, is further demonstrated by way of the Japanese Prime Minister’s political affair:
The Osaka-based educational corporation acquired state-owned property for a fraction of its appraised value. There are suspicions this came about because the bureaucrats involved in the transaction practiced sontaku to accommodate what they believed were the wishes of the Prime Minister’s Office, as well as those of Prime Minister Shinzo Abe’s wife Akie, who had been named honorary principal of an elementary school to be constructed on the site and run by Moritomo Gakuen. Those bureaucrats in question were attached to the Finance Ministry, the Ministry of Land, Infrastructure, Transport and Tourism, and the Osaka prefectural government. It is reasonable to suspect that all these offices practiced sontaku.”3
Our purpose is not to evaluate the issues surrounding Prime Minister Shinzo Abe, but rather provide an illustration by way of an example. My point is to demonstrate how certain events or activities can occur without verbal or written orders.
According to a Japanese linguist, Hiroaki Iima, the word Sontaku has been in use since the 10th century. In many ways, the actual practice of sontaku is hard wired among the Japanese. As a matter of fact, when I look back at my career in Japan, I certainly practiced a form of sontaku without even realizing. As a side-note, we also have some similar words such as “空気を読む” (reading air) or “以心伝心” (understand each other without exchanging any words). What this illustrates is some type of factual situation where an activity occurs without a clear order. In some ways, it is almost like the English equivalent of the term “Plausible deniability.” While I’m fluent in English, I’m not 100% certain if this is the substantive equivalent. My general understanding, however, is that the outcome of this English term complicates actual participation or at least muddies the water.
Why This Matters?
One of the main problems of this type of thinking is that it is difficult to identify where responsibility lies. In legal disputes or in an adversarial setting, this is critical. Asserting that X occurred necessitates, more often than not, that the person actually had knowledge or has participated in a particular activity. This type of precision permits a deeper sense of what happened rather than just asserting generalities. To gain a clearer sense of this terminology, permit me to put forth a historical example.
During WWII, a purpose-built, rocket-powered human-guided kamikaze attack aircraft, called Ohka, was invented and deployed. It is said that one of the second lieutenants in the Japanese navy came up with an idea of a kamikaze attack glider in 1944. It is also said that this concept did not actually go through a proper arsenal development procedure in the Navy, but rather originated as an idea from the second lieutenant. However, the Yokosuka Naval Air Technical Arsenal thought that this idea unofficially came from the high-ranking government officials. From this point of view, the Yokosuka Naval Air Technical Arsenal led this concept and developed Ohka without a clear decision-making process or without following a systematic procedure.4
All this occurred prior to the Tokubetsu Kōgekitai (“Special Attack Unit”), which systematically executed and officially formalized the Kamikaze attack. This is one of many examples of Sontaku. If you’re interested in observing other instances of Sontaku, review The International Military Tribunal for the Far East (IMTFE). Furthermore, my conjecture is that prosecutors such as Joseph Keenan or Arthur Strettell Comyns Carr struggled to identify the actual chain of decision makers.
In the context of ascertaining who, what and where, sontaku-type scenarios conceal (though not necessarily deliberately) certain critical elements. I’m careful, however, to not suggest that something nefarious is afoot or some type of conspiracy theory. Instead, I’m interested in how this impacts fact development and translation among lawyers reviewing the electronic record and professional translators. As a corollary, during WWII, there was also debate about how the word “Mokusatsu” (黙殺) was translated and understood by the Allied forces. (Read The Worst Translation in History? Translation in Perilous Times)
While Sontaku is not about translation per se, it concerns how one begins to piece together the facts related to who originated the initial order or command. Is there even a direct chain of responsibility? This makes precision surrounding responsibility very difficult to tease out. The translation and interpretation question, merely adds to the confusion that exists when you are looking at a large volume of data that make up decision making, knowledge and behavior. Let’s now turn to another set of Japanese terms that prove to be challenging not only for Attorneys but also among translators and particularly enhanced Machine translation tools.
Translation, E-Discovery & Machines
A few months ago, Facebook’s machine-translation error landed itself on the pages of many global newspapers. I initially thought that this would become a topic of discussion in e-discovery circles and surely among those that closely follow machine translation. I was wrong. There was a good amount of coverage in the popular mainstream news, but not in the legal or Language Service provider (LSPs) community. Here’s some facts as presented by The Guardian:
Facebook has apologised after an error in its machine-translation service saw Israeli police arrest a Palestinian man for posting “good morning” on his social media proﬁle. The man, a construction worker in the West Bank settlement of Beitar Illit, near Jerusalem, posted a picture of himself leaning against a bulldozer with the caption “يصبحهم”, or “yusbihuhum”, which translates as “good morning”. But Facebook’s artiﬁcial intelligence-powered translation service, which it built after parting ways with Microsoft’s Bing translation in 2016, instead translated the word into “hurt them” in English or “attack them” in Hebrew.”5
The AI tool that was used is part of the new wave of more advanced translation tools, known as Neural machine translation or “NMT.” This advanced translation technology has a long history going back to the 1950s, although of course it was not based on neural networks. To learn more about this burgeoning field, read The Evolution of Machine Translation. For our purposes it simply means an algorithm that “mimics the neural structures of the brain, with nodes of data known as neurons and the pathways between them called synapses.”6
While I believe technology is rapidly advancing in its sophistication, it is often overstated. We seldom hear about the mistakes or misuse of technology.
Another similar scenario occurred in China’s most popular messaging application called “WeChat.” Developed by Tencent, WeChat is one of the most multi-faceted web-based application in the world. According to China Internet Watch, there are 980 million monthly active users (MAU) and over 38 billion messages sent every day on its chat platform.7
In the context of machine learning and translation, this is a rich field of potentially valuable information. Like Facebook, WeChat has developed their own propriety NMT software. Similar to Facebook, it also faced an embarrassing moment related to its translation software.
Ann James, Machine-learning Biases and WeChat Translation
Ann James is originally from Texas and lives in Shanghai.8 She is involved in theater and runs a production company called Dreamweaver (formerly known as Urban Aphrodite) based in Shanghai. She also stars in China’s highest grossing film of all-time.9
Aside from our focus on the translation insensitivity that she experienced, she’s rather a big deal in China. Nonetheless, as somebody who lives in China, she relies heavily on its home-grown chat applications. In the case of China, that would be WeChat.
In a group chat she had with her friends on WeChat, the application she was using automatically translated a series of words that discussed her being late to a function. In WeChat, their integrated NMT software, learned from the large volumes of data points to conceivably generate an automated translation. To her shock and dismay, the app translated the world “foreigner” and “black” to the n-word. It did this automatically and translated “the N-word still late.”
WeChat apologized for the translation of the Chinese term “黑老外”or “hei laowai.” In Ann James’ own words:
“Dear WeChat. Can you please change the translation of “hei laowai” from “nigger” to what it actually means, which is in fact, “black foreigner.” Jesus”10
In her response to her friend in the group chat, she states, “I know it’s not you—It’s something in the programming.” Indeed, the algorithm here failed to understand cultural mores, attitudes and racism.11 Adjustments in the software, according to WeChat’s official press release was implemented, but this behooves the underlying point: Technology needs human editors to be deeply involved to continually intervene, especially on matters of critical importance related to culture, criminal culpability, cross-border legal discovery disputes, and medical.
It is not unusual that companies consider using machine-translation. This makes sense because foreign language document reviews (in CJK languages but also German, Arabic and Russian) are significantly more expensive than English. The vast volumes of data obligate clients to aggressively identify technology-oriented solutions that will help reduce the costs. At the same time, this needs to be done with care and attention to quality. As I discussed above regarding sontaku, it is not an easy work flow to simply push data into a translation queue and expect the output to be readily comprehended. This applies to both human editors and machines.
The word “Tokusai” (トクサイ)is a shortened version of 特別採用, pronounced “Tokubetsu Saiyo”. As you can see, we Japanese simply took the “toku” from “Tokusai” and “Sai” from “Saiyo” to create the new word “Tokusai.”
This word is composed of two elements. One is Toubetsu meaning “special” and Saiyo which generally means “adoption” or “hiring.”
In the Japanese metal manufacturing industry, the word “Tokusai” was and still is used as technical jargon. What it basically means is that when ﬁnished products such as aluminum or steel do not meet the standards required by their clients, but still are good enough to the purpose of use, a metal manufacturer calls it “Tokusai” and notiﬁes their clients, then ships after obtaining their approval. This is the original meaning of “Tokusai” in the industry.12
However, over time “Tokusai” started to mean that even if the ﬁnished products do not meet the criteria set by their clients, they falsiﬁed the data results and shipped the products without client’s consent.13
Getting interesting, right? In respect to e-discovery, I believe there are two issues that should cause some pause when dealing with scenarios like this
First, machine-translation is not even able to translate this word. As of November 6th, 2017, the word (トクサイ) is translated as “Tokushi” on Google Translate. It is not even an English word! You may think that if we come across words we do not know, we can cross reference what those words mean. Naturally, I then googled “Tokusai” and a Japanese word that means special envoy appeared in my query results“特使”(pronounced “Tokushi”) .
Secondly, the meaning of “Tokusai” has shifted over time in the industry. This semantic shift is more than just meaning. This shift in meaning impacts and complicates anyone attempting to piece together the factual events. As reported in the Japanese business press referenced above, sub-standard products with falsified data, would be shipped and the client was not notified.
In a nutshell, the exact same word has two diﬀerent meanings. Let’s say even if machine-translation is able to replace the Japanese word with an English word, it does not mean that the translated English word still conveys the intended meaning.
Here’s a few more examples to demonstrate this point: Japanese politicians say konjac (which is actually a rubbery jelly-like substance that is eaten and made from potato) to describe 1 million yen. Or when Japan’s three biggest banks are described by color as red, green and blue.
Japanese use many technical jargon that without insight into the cultural, company-specific and historical knowledge of a particular word, meaning would prove to be elusive. Many of these words are not even understood by native Japanese speakers. There are countless examples like this.
The accuracy of the prevailing machine-translation systems is diminished when it comes to complicated foreign documents as discussed above. Managing costs is important but evaluating evidence require a certain degree of precision. Technology can aid in this process, but it can also make severely dire mistakes that can cost a fortune, impact a person’s life and even stymie a business transaction related to due diligence. In the context of legal electronic discovery review, teams of even highly skilled multi-language Attorneys could potentially miss the smoking gun, let alone technology translation tools.
- 4 藤田 元信, 特攻兵器「桜花」は、日本軍の「忖度」が生んだ哀しい失敗作だった (Fujita, Motonobu. “The Kamikaze Attack Weapon Ohka Was a Miserable Failed Work Produced by “Sontaku”) https://gendai.ismedia.jp/articles/-/52362?page=2
- Interview on SoundCloud: https://soundcloud.com/mister-bizu/nnp-spotlight-ann-james-part-1
- https://www.jiji.com/jc/article?k=2017112901327 “Tokusai & Business Practice Abuse = Data Falsification, Arrogance & Lack of Self-Reliance.”