Linguists and machines working in harmony

How technology is learning to recognise sense from nonsense, happiness from anger and European Directives from National Media editorials.

The notion that everything can be fluently translated from any language into any other language by a machine is up there with the paperless office and the self-driving car. Everybody thinks it ought to be possible. Everybody thinks that it will happen eventually. In the meantime, as we fly in a plane, lie on an operating table or press the brakes in our car, we don’t want to think the instructions were translated by a machine. We feel reassured that a human being who knows what they’re talking about had a hand in it.

But not all of life is about potentially life threatening situations. If text is spelt incorrectly then it’s probably not a crisis, and it might only be necessary to get the broadest gist of an email from a supplier. A legal document from the German office could be treated differently from a note about their annual leave but how
do we know which is which?

This White Paper explains how we all stand at the brink of a revolution that will provide faster, more accurate and more cost-effective translation: the right translation for the right situation done in the right way – because machines and humans will finally learn how to work it out together.

Everyone will Speak the Same Language
Along with all those other myths is the notion that eventually, everyone will speak the same language – a global tongue. English is often put forward as the candidate for this, which understandably annoys many other nationalities. The good news for the objectors is that there are few signs of it happening. The best-known attempt to develop a global
language was in 1887 when Esperanto was launched and despite nearly 130 years having elapsed, there are still estimated to be less than one million Esperanto speakers and no country has adopted it officially.

Instead, there are still nearly 7,000 languages in the world and interest in retaining old languages, as opposed to letting them die out, is actually growing. Not all these are in common usage however and typically, a modern language solution company will translate in and out of around 400 languages on a daily basis.

The Translation Start Point
Translation doesn’t start with a translator, or a computer, or a French/English dictionary. It starts with a source document in a language you do or don’t understand, with a desire to end up with the same document in a language you do or don’t understand. Here is the paradox of translation. There is always one end of the process that the buyer doesn’t understand. In that respect it is very different from almost anything else. In most business processes, procurement is a clear process of understanding a start and end point and being clear on how to measure both. Translation is not that straightforward. So where do translation’s complexities start? Again, at the source…

84 words from a European Commission Directive: The fact that only the words “Regulation 44” (and not “Regulation 44/03”) would be now indicated in the amended directive was source of concerns for the NL delegation, as from the Series 02 to the Series 03 of amendments to R44 a significant improvement was made (during the 1990’s) in the development of R44, so important that Directive 2003/20/EC (amending Directive 91/671/EEC) was de facto rendering obsolete (and not anymore allowed for use) CRS approved to the standards of Regulation 44/00, /01 or /02

84 words from a European Commission Directive: The fact that only the words “Regulation 44” (and not “Regulation 44/03”) would be now indicated in the amended directive was source of concerns for the NL delegation, as from the Series 02 to the Series 03 of amendments to R44 a significant improvement was made (during the 1990’s) in the development of R44, so important that Directive 2003/20/EC (amending Directive 91/671/EEC) was de facto rendering obsolete (and not anymore allowed for use) CRS approved to the standards of Regulation 44/00, /01 or /02.

The two examples here are both in English. They are both the same length but they read very differently. The excessive sentence length,sentence complexity, use of words, excessive punctuation, use of Latin and unexplained acronyms make the lower example almost unintelligible when compared with the way the other example is written. This is clear to see when both items are in English. One may be much more difficult to understand but it is easy to assess which is which and – to a great extent – what topic each covers.

Once overlaid with another language the assessment problems increase exponentially. Imagine being presented with these examples in a language you can’t read? How would you know which was the most complex? Which one is a European directive and which refers to the subject of happiness?

“It is this level of assessment that is so important in achieving the most accurate, speedy and cost-effective translation.”

Understanding Translation Accuracy
If an organisation needs a document translating, let’s say from a source document in Spanish to a target document in French, what’s the most important requirement:

There isn’t actually an answer to this and, in fact, changing any one of these priorities will have an impact on the others.

The most important requirement for any translation outcome is likely to change from document to document. A company ordering translation of a manual for operating an aircraft would place accuracy at the top, with time and cost being a much lower consideration. A business updating its website information on an hourly basis would consider time as the driver and a ‘fast-sell’ catalogue may prioritise cost.

“There are three drivers for everybody who buys translation services: cost, time and accuracy. These are mutually exclusive; trying to do a translation cheaper or quicker will impact on accuracy. Maximising accuracy will take longer.”

Where is ‘Perfection’?
So if translation accuracy is being sacrificed in the face of time and cost constraints, how inaccurate are we prepared to be? On a scale of 1 being the source language and 10 being the perfect translation into the target language, where does accuracy lie, and how measurable is it?

Manufacturing and engineering companies, as an example, are extremely good at assessing quality but their approach, which
looks at specific measurement and often focuses on absence of defects, does not necessarily transfer into language. Language can be factually correct but still sound wrong because fluency, feel, colloquialisms and common usage are all factors that we subconsciously note when we read or hear our own language.

“A man who now makes his living as an interpreter and as an expert at placing any speaker in Europe by listening to their speech speaks to Eliza Doolittle. He deems her English too perfect for an English woman and announces that Eliza must be a foreign princess. – Synopsis of Pygmalion: George Bernard Shaw”

At present, all Language Service Providers (LSPs) face a stumbling block. They know that some documents can be translated perfectly adequately by a machine. Others need human translation with the use of translation memory. Others are so important that they need solely human translation with one or more reviews by other human translators. Deciding which document can go down which route is a process of assessment.

The consequences of getting that decision wrong could be serious so LSPs use a variety of methods for the assessment of source documentation. Some use Project Managers whereas others send sample documents straight to linguists for assessment or use a combination of the two. Whichever option is chosen, it’s a time-consuming operation right at the start of the translation process.

Being in a position where making a decision about the best approach to translation, taking into account how much time is required to deliver an appropriate level of accuracy within an acceptable cost often cannot be predicted ahead of the assessment stage. This makes it challenging for LSPs to quote for projects and accurately project turnaround time, this is understandably frustrating for the LSP and the customer alike.

Human assessment of source documentation is also a highly subjective process. A Project Manager or linguist might check a few sentences and find them reasonably straightforward and use those to judge a large amount of text. On other occasions the text may seem clear but actually refers to a series of images (say on a website) that haven’t been supplied, this creates misunderstandings further down the process.

A lack of context within the source document may lead to a raft of queries later on, for example when translating into a language that uses masculine and feminine genders.

Also problematic is subject matter. The majority of linguists specialise in specific subject areas, for example legal, automotive or marketing. Ideally, they will only be asked to translate documents in those fields but in reality, linguists often receive documents for translation outside their specialisms. This is partly because project managers become accustomed to receiving specific types of work from specific clients, such as legal documents from law firms. Suddenly a piece of marketing copy comes through in (say) German and they don’t recognise it as being different.

“Assessment failures that cause increased queries and incorrect routing of documents lead to increased risk of inaccurate translations”

Both of these issues – increased queries and incorrect document routing – are effectively assessment failures which lead to increased
risk of poor quality translations. The risk is heightened within the translation industry due to two key factors:

1. The industry’s extensive use of freelance labour
2. The inability of the industry’s customers to effectively measure the product.

Virtually every LSP in the world uses the same body of linguists, most of whom are freelance workers. As such, linguists are effectively piece-workers, paid by the word. Anything that requires them to raise queries or turn down work that’s outside their specialism costs them money. This opens up a clear conflict between some translator’s professional integrity and their need to earn.

This would be less of a concern if the customer could adequately review and evaluate the result. But in translation that is rarely possible. The customer has paid for translation simply because they are unable to do it themselves. There is always one end of the operation, source or target, that they don’t understand. Their faith is placed in the linguist and the LSP.

That faith will be rewarded by the LSP that is able to establish an objective, cost-effective and objective automated assessment process – and that will be Computational Linguistics.

Already, Computational Linguistics is making its mark in the literary world. U.S. based Patrick Juola is a computational forensic linguist who made his name by outing J.K. Rowling as the author of “The Cuckoo’s Calling”, a book she penned under another name. His software runs thousands of microanalyses of tiny, subconscious data points that individually, seem meaningless: like how frequently a writer uses the word “too,” or whether he or she refers to a piece of furniture as a “sofa” or a “couch”. But when compiled, these micro-patterns form an elaborate map of an individual’s linguistic patterns.

In the field of criminality, linguists from the University of Essex in Colchester, UK, and the Center for Mind/Brain Sciences in Trento, Italy, are working together to develop a system to spot lies. This seeks out the overuse of linguistic hedges such as “to the best of my knowledge”, or overzealous expressions such as “I swear to God”. The researchers claim that the system is already nearly 75% accurate at indicating whether a defendant or witness is being deceptive.

The LSP and the Customer
Though it seems far from the realm of courtroom lies and award-winning authors, the issues affecting translation in the business world are not that much different. Time accuracy and cost can spell the difference between success and failure and when businesses complain about LSPs, their criticisms virtually always fall into one of these three categories.

There has long been an understanding in the Information Technology space of the notion of ‘garbage in-garbage out’ and translation is no different. To a significant degree, the quality of a finished translation will depend on the quality of the source material, the format in which it is presented and the accompanying support reference material that enables a linguist to place it in context.

“It is very rare that I see source text and think ‘This is really well written – Professional Linguist”

Linguists are clear, and largely in agreement, about the issues that create the greatest translation challenges. These fall broadly into three categories: inaccurate (wrong content to start with), technical (wrong formatting / file presentation or conversion), and misleading (unexpected content e.g. marketing copy when legal was expected).

There are also assumptions made by nonlinguists about how linguists work. As an example, it is often much more difficult to translate a few words than a large piece of text because of the difficulty of placing words in the right context. Particular problems can be created by a list, such as machine parts. These will need to be put into different genders in some languages and, with an indiscriminate list, this is virtually impossible. Source documentation is often provided in English written by someone who isn’t actually a native English speaker.

Supporting reference material is also vital. So much that needs translation these days is visual; it refers to marketing literature, to web pages, to images. And it always targets specific audiences. The more information that a linguist has on the purpose of the text, the better job he or she will do.

Lastly, acronyms, in-house jargon or industry abbreviations are dreaded by all translators and can end up delaying an important piece of translation for hours or even days.

As first-line assessment improves then the LSP/Customer partnership will aim to address all the issues that impact on the three areas that typically create errors in the translation process.

Future View – Predictions & Angry Tweets
In the future all documents may or may not be translated by machines but in reality, its likely to be more exciting than that, as we use machines to filter out the nonsense, tell us when people are angry with us and enable us to focus in on what we should be reading.

In the language services industry, the future will be about people and machines working together in a way they’ve never done before. As professional linguists work through documents, machine translators will learn from them dynamically and then use algorithms to predict the frequency of the next word, saving
them thousands of keystrokes. Translation memories will use statistics to predict and auto-complete whole screeds of text.

Research is also well advanced on ‘sentiment analysis’. This will enable a machine to classify feelings by reading, for example, a million tweets and, by using key words and sentence structures, identify which ones are angry tweets. This uses a combination of machine translation and a naïve Bayes classifier as a form of Artificial Intelligence.

As a reviewer is reviewing text in the future, a machine is likely to be reviewing it too, and training itself to know which sentences the reviewer is likely to want to edit. In doing so, it will learn the reviewer’s likely behaviour which can be fed back to the human translator. This leads to the question of whether the reviewer will make a change which then leads to predictive analytics and a riskweighted decision on whether to use machine translation for a particular sentence.

A future in which everyone speaks the same language is not only implausible but strangely undesirable. Inextricably linked with culture, tradition, history, geography, and close human associations, language is something to cherish. The complexities of language are its strength and the reason why it’s taken until now for computing power to really start to make an impact.