Blog post dated Feb 4, 2019

Machine Translation for Companies

"I've read that neural machine translations have reached the quality of human translations. From now on, we should have everything translated automatically and then merely do some post-editing."

Have you heard views like these expressed in your company? The subject of machine translation (MT) has given rise to extremely polarized opinions and is highly susceptible to speculation, misunderstandings, and myths.

A company that seriously considers introducing machine translation as part of its workflow needs to initiate a comprehensive project in order to tackle this challenge in a realistic and professional manner.

An Article by

Flurina Schwendimann
Freelance Translator

Christian Weih
Management Board, Across Systems

Before making a decision, there are a number of factors that should by all means be taken into account.

The translation industry is changing at an ever-increasing rate and greatly depends on the advancement of technology. Thus, the first article "Machine Translation and now? Part 1 for Translators" focused on translators and ways how they can effectively prepare for the new market situation.

Readers from corporate settings are encouraged to read this article as well in order to gain a better understanding of how translators are affected. For the successful introduction of a machine translation system in the company, it is vital for both parties to cooperate.

This article provides some general information, but cannot replace professional advice on MT systems. Therefore, enterprises would do well to get assistance from a professional MT consultant.

Generic vs. Customizable Systems

Generic systems include Google Translate, DeepL, Microsoft Translator, and Amazon Translate. The main characteristic of these systems is that they have been trained with huge amounts of data from various subject areas. As a result, the translations are very fluent, but the terminology may not suit every subject area, or a mistranslation may even occur in a particular domain due to the lack of training data. Therefore, generic systems are more suitable for enterprises that do not use highly specialized terminology.

By contrast, customizable systems are trained with customer-specific data in order to take both the terminology and the corporate language into consideration in the translations. These engines deliver better raw translations that require less post-editing.

For this, two different approaches are used: Some providers use a single engine for each language, which automatically detects the various domains. Other providers recommend a separate customized engine for each language and domain. Both approaches take the corporate terminology into consideration. After all, different terms are often used for the same concept in different domains, e.g. in contracts and in technical documentation.

Due maintenance of the translation memories and terminology databases is a key precondition for the customization.

The providers of customizable systems include SYSTRAN, SmartMATE, KantanMT, and Omniscien.

Due maintenance of the translation memories and terminology databases is a key precondition for the customization.

Extra Tip

The Article "Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences" by Adam Geitgey can help you to better understand the basics of machine translation at the technical level even if you do not have any background knowledge.

The Right Cooperation Partner

Currently, there are more than 100 companies that offer machine translation and related services.

The "Machine Translation Market Report 2017", prepared by TAUS (Translation Automation User Society), provides an overview of the market. In the report (Joscelyne et al. 2017: 27), the providers are divided into six different categories:

  • MT Pure Players: These companies develop their own MT technology and sell it either on a license basis or on a software-as-a-service (SaaS) basis. Examples include SYSTRAN, PROMT, KantanMT, and Omniscien.
  • Corporate MT Users: This is a rapidly growing group of large translation buyers that build their own MT capabilities, often on the basis of an open-source MT technology such as OpenNMT or Marian.
  • Language Service Providers (LSP): An increasing number of language service providers are also developing MT systems. Examples in this area include RWS Moravia and Capita Translation.
  • Value-added resellers: These are translation technology providers that, in addition to their own services, resell MT engines of various providers. Examples include Memsource, Lingotek, and Lingo24.
  • Professional services: Some companies specialize in consulting services for the various systems, provision of training data, staff training, or evaluation of machine translations. Examples include Datamundi, CrossLang, and Appen. Some companies, such as berns language consulting, do not offer their own machine translation, but perform services such as the analysis of the company's texts, extraction of terminology, training of engines of various providers with personalized data, evaluation of the results, and integration of machine translation.
  • Free MT providers: Users can have text translated free of charge in order to get an overview of foreign-language content. Examples include Google Translate and DeepL.

Before opting for a cooperation partner, you should intensively examine various providers and analyze the pros and cons. You should be aware that the introduction of an MT system is a protracted process that does not guarantee quick ROI. A good cooperation partner will draw your attention to this issue and will not make any unrealistic promises.

The introduction of an MT system is a protracted process that does not guarantee quick ROI.

The Quality Differences

As explained in the first part of this article, stylistically demanding texts are rather unsuitable for machine translation. By contrast, MT is ideal for short, standardized sentences.

Nevertheless, a number of providers have specialized in the machine translation of marketing texts. The company should check on an individual basis whether the delivered quality meets its requirements.

The quality of the raw translations can be improved by means of pre-editing. A study conducted by the because Group in this area revealed the following:

Using Google Translate, a cooking recipe was translated from English into German. The error rate was 9.24 percent (50 errors in a text of 542 words). Following a revision of the source text in Simplified Technical English, the error rate dropped to 7.43 percent. In a last step, the source text was adjusted on the basis of rules. In this way, an error rate of 3.95 percent was achieved.

The quality of machine translation can be improved by means of pre-editing.

The Training Data

As the saying goes in IT, "garbage in, garbage out". This is also true in the field of machine translation. Therefore, large subject-specific corpora are needed for customizing an MT engine. Globalese, a platform on which customized engines can be created, specifies a minimum size of 100,000 segments per engine and domain.

At this point, the question arises whether machine translation is actually only suitable for enterprises with large translation memories. The answer is yes and no.

The selected cooperation partner will show you different ways of gaining access to the needed training data. Still, internal translation memories and terminology databases are a must for the customization.

As the amount of internal data is usually not sufficient, internal translation memories are often enriched with external, domain-specific corpora for the training. Distinction can be made between free and payable corpora:

  • Texts of the MT service provider: Your cooperation partner usually has corpora of various domains. Depending on the type of contract, these corpora may be available directly or may be offered for purchase.
  • Publicly available corpora: Free-of-charge corpora from various subject areas are available online and can be used to train MT systems. For instance, corpora such as the one of the European parliament or Wikipedia can be downloaded from the OPUS project website.
  • Payable corpora: These are curated data sets from various domains. The most important points of contact for these corpora are the Data Cloud and Matching Data of TAUS, which comprise more than 35 billion words in 600 language pairs.

Extra Tip

TAUS is one of the first and most important points of contact when doing research on machine translation. The website features numerous articles and e-books with plenty of information. For example, check out the following articles:

The data needed for the development of neural systems must be of a high quality and domain-specific.

The Implementation Costs

If you think that you will immediately save money as soon as you start using machine translation, you will be sorely disappointed. Of course, machine translation will eventually pay off for enterprises that translate millions of words every year.

Initially, however, money needs to be invested: An additional project manager may need to be hired, the provider of the MT system needs to be paid for his services, the training data need to be purchased, and the translators and post-editors need to be trained and paid.

The Collaboration with Translators

For the project to be successful, it is important to collaborate closely with the translators and post-editors at the end of the supply chain, as post-editing is indispensable for the quality assurance of machine-translated texts. Although the quality of neural machine translation is progressing from day to day, the quality of human translation cannot be achieved at the document level.

Post-editing is not a skill that a translator automatically has or can develop overnight. A translation must learn how to make decisions and perform corrections speedily. To work productively, he needs to be able to post-edit about 7,000 words a day, compared to an average of 2,000 words a day for a "normal" specialized translation.

For this reason, it might be good to offer regular translators training in the field of post-editing. As your company is investing a lot of money in innovative technology, it is vital to have the needed expertise at the end of the supply chain in order to make effective use of the new possibilities.

Fair compensation of the freelance translators is another precondition to achieve success. One of the main reasons why translators might be reluctant to accept post-editing projects is the meager payment they often receive for editing low-quality raw translations.

The reason for this is that in recent years, enterprises have increasingly created translations with generic engines and then submitted the raw results for post-editing. Due to the large amount of errors, the texts often need to be retranslated—for a compensation that merely amounts to a third of the normal rate per word.

White Paper
Introducing Machine Translation

By submitting your data, you consent to our processing the data in the scope specified in the Privacy Policy for the purpose of handling your request and to our contacting you via the same communication route. Moreover, we will process the data on the basis of our legitimate interests. We will delete the data as soon as your request has obviously been completed or if you effectively object to the further processing of your data by us. For more information on this subject, please refer to our Privacy Policy.

For this reason, open and honest communication between the client and the provider is urgently recommended. If you know that the engine you use generates a lot of errors, you should inform the (potential) translator of this and pay a higher rate. If the engine produces fewer errors, the rate per word can be adjusted accordingly.

However, as the editing overhead cannot be estimated accurately in advance, it might be good to introduce an hourly rate for post-editing.

Extra Tip

If you are interested in machine translation and have further questions about its introduction, we offer extensive training on your premises. Further information can be found on the website Introduction of MT with Across.