Machine translation for companies

"I've read that neural machine translations have reached the quality of human translations. From now on, we should have everything translated automatically and then just do a little post-editing!"

Have you heard views like that expressed in your company? The subject of machine translation (MT) has given rise to polarized opinions and is highly susceptible to speculation, misunderstandings, and myths.

If a company is seriously considering integrating machine translation into its workflow, getting the most out of this tool requires that you initiate a comprehensive project to tackle it.

The pace of change is accelerating all the time in the translation industry and greatly depends on the advancement of technology. This article provides some general information but cannot replace professional advice on MT systems. It is a good idea to consult a professional consultant, especially in case of extensive projects.

Before integrating machine translation in the translation process, various factors must be taken into consideration.

Generic vs. customizable machine translation systems

Google Translate, DeepL, Microsoft Translator, Amazon Translate, and the like are generic machine translation systems. The key feature of these systems is that they have been trained with huge amounts of data from a wide subject-area spectrum. This makes the translations very fluent, but the terminology may not suit every subject area, or a mistranslation may even occur in a particular domain due to the lack of training data. Generic systems are therefore more suitable for enterprises that do not use highly specialized terminology.

By contrast, customizable machine translation systems are trained with the customer's own data in order to take both the terminology and the company's unique language into consideration in the translations. These engines deliver better raw translations that require less post-editing.

There are two different approaches to the use of customizable machine translation systems. Some providers use a single engine for each language, with the engine then detecting the various domains automatically. Other providers recommend a separate customized engine for each language and domain. With both approaches, the company's own terminology is taken into consideration in the training of the engine. After all, a variety of different terms are often used for the same concept in different domains, such as in contracts and in technical documentation.

Appropriate maintenance of the translation memories and terminology databases is a key requirement to be met for the customization.

The providers of customizable systems include SYSTRAN, SmartMATE, KantanMT, and Omniscien.

Extra tip:
The article "Machine learning is fun part 5: Language translation with deep learning and the magic of sequences" by Adam Geitgey can help you to better understand the fundamentals of machine translation at the technical level even if you do not have any background knowledge.

Choosing the right machine translation provider

There are now more than 100 companies offering machine translation and related services.

The "Machine Translation Market Report 2017", prepared by TAUS (Translation Automation User Society), provides an overview of the market. In the report (Joscelyne et al. 2017: 27), the providers are divided into six different categories:

  • Pure MT providers: These companies develop their own MT technology and sell it either on a license basis or on a software-as-a-service (SaaS) basis. Examples include SYSTRAN, PROMT, KantanMT, TextShuttle, and Omniscien.
  • Corporate MT systems: This is a rapidly growing group of large translation buyers that build their own MT capabilities, often on the basis of an open-source MT technology such as OpenNMT or Marian.
  • Language service providers (LSP): An increasing number of language service providers are also developing MT systems. Examples in this area include RWS Moravia and Capita Translation.
  • Value-added resellers: These are translation technology providers that, in addition to their own services, resell MT engines of various providers. Examples include Memsource, Lingotek, and Lingo24.
  • Professional services: Some companies specialize in consulting services for the various systems, provision of training data, staff training, or evaluation of machine translations. Examples include Datamundi, CrossLang, and Appen. Some language service providers, such as blc, do not offer their own machine translation but instead perform services such as analysis of a company's texts, extraction of terminology, training of engines of various providers with customized data, evaluation of the results, and integration of machine translation into the process.
  • Free MT providers: Users can have text translated free of charge in order to get an overview of foreign-language content. Examples include Google Translate and DeepL.

Before opting for a partner, you should intensively examine various MT engines and analyze the pros and cons. You should be aware that the implementation of an MT system is an extensive process that often does not guarantee quick ROI. A good cooperation partner will draw your attention to this issue and not make any unrealistic promises. In the long term, investing in machine translation represents a meaningful upgrade of your translation process.

The quality of machine translations

As also explained in the first article on machine translation, stylistically demanding texts are not particularly well suited for machine translation alone. By contrast, MT is ideal for short, standardized sentences.

Nevertheless, a number of providers have specialized in the machine translation of marketing texts. The company should check whether the delivered quality meets its requirements in a given case.

The quality of the raw translations can be improved through pre-editing. A study conducted in this area by the because Group revealed the following: Using Google Translate, a recipe was translated from English into German. The error rate was 9.24 percent (50 errors in a text of 542 words). Following a revision of the source text into Simplified Technical English, the error rate dropped to 7.43 percent. In a final step, the source text was adjusted on the basis of rules. An error rate of 3.95 percent was achieved on this basis.

Whether you work with static or neural machine translation, the quality always falls short of what can be achieved with human translation. In addition to pre-editing, post-editing is an essential part of the translation process that ultimately ensures the quality of the translated content. For example, defined quality criteria can be checked and the use of proper terminology ensured during post-editing.

The post-editing overhead depends on the result of the machine translation. Obviously, the quality of machine translations can only be as good as the content used to train the machine. A database containing your corporate terminology is just as important for this as a relevant volume of texts.

Training data for the MT systems

As the saying goes in IT, "garbage in, garbage out". This is also true in the field of machine translation. Therefore, large subject-specific corpora are needed for customizing an MT engine.

This raises the question whether machine translation is really only suitable for organizations with large translation memories. The answer is yes and no. The selected cooperation partner will show you different ways of gaining access to the needed training data. Still, internal translation memories and terminology databases are a must for the customization.

During the period of active further development of statistical systems, the accepted rule of thumb was that the more data was fed in, the better a machine translation system would get. For the development of neural systems, this premise is no longer fully applicable. While neural machine translation also requires vast amounts of data (millions of words), it must also be of sufficiently high quality and domain-specific.

As the amount of internal data from the company is usually not sufficient, the training requires that internal translation memories are often supplemented with external, domain-specific corpora, that is, additional datasets. Distinction should be made between free and fee-based corpora.

  • Texts provided by the MT service provider: Your MT provider usually has corpora from various domains. Depending on the type of contract, these corpora may be made available to you directly or may be offered for separate purchase.
  • Publicly available corpora: Free-of-charge corpora from various subject areas are available online and can be used to train machine translation systems. For instance, corpora such as the one from the European parliament or Wikipedia can be downloaded from the OPUS project website.
  • Fee-based corpora: These are curated datasets from various domains. The most important point of contact for these corpora is the data cloud of TAUS, which comprises more than 35 billion words in 600 language pairs.

The cost of machine translation

If you think that you will immediately save money as soon as you start using machine translation, you will be sorely disappointed. Of course, machine translation will eventually pay off for enterprises that translate millions of words every year. First, however, money needs to be invested. The machine translation system provider costs money, and training data needs to be procured.

You should be sure to consider internal resources as well: to fully integrate machine translation into your workflow, consider having a dedicated project manager. No matter what form of MT you choose, human resources will be needed for post-editing. You can either make arrangements for this internally or hire external translators. Keep in mind that post-editing is something that is done separately for each language.

The greatest MT savings potential can be achieved when it is combined with human translation and pre- or post-editing. If machine translation is integrated into existing technologies, such as a translation management system, it is then possible to take advantage of the complete databases and very quickly get your investment to pay for itself.

For this reason, it might be good to offer your regular translators training in the field of post-editing. As your company is investing a lot of money in innovative technology, it is vital to have the needed expertise at the end of the supply chain in order to make effective use of new opportunities.

Fair compensation of the freelance translators is another precondition to achieve success. One of the main reasons why translators might be reluctant to accept post-editing projects is the meager payment they often receive for editing low-quality raw translations.

The reason for this is that in recent years, enterprises have increasingly created translations with generic engines and then sent off the untouched results for post-editing. Due to the high rate of errors, the texts often need to be retranslated—for a third of the normal per-word rate.

 

Collaborating with the translators

Rollout of machine translation can only be successful through close cooperation with the translators and post-editors at the end of the supply chain. After all, post-editing is indispensable for managing the quality of machine-translated texts. Although the quality of neural machine translation is progressing from day to day, the quality of human translation cannot be achieved at the document level.

Post-editing is not a skill that a translator automatically has or can develop overnight. Post-editing requires making quick decisions and corrections. To work productively, a corrector needs to be able to post-edit about 7,000 words a day, compared to an average of 2,000 words a day for a "conventional" specialized translation.

For this reason, open and honest communication between customers and providers is urgently recommended. If you know that the engine you use generates a lot of errors, you should inform the language service provider of this and pay a higher rate. If the engine produces fewer errors, the rate per word can be adjusted accordingly.

However, as the editing overhead cannot be estimated accurately in advance, it might be a good idea to introduce an hourly rate for post-editing. In the last step, the final translation should be reviewed once more in order to eliminate any remaining errors. The principle of dual control should also be followed for machine-translated texts, especially if companies need to prepare translations according to established quality standards.

Conclusion: Tapping into the full potential

The use of new technologies and artificial intelligence is associated with great potential for accelerating translation processes. However, you should plan the rollout of machine translation comprehensively to achieve the best results. After all, there are several stumbling blocks that are necessary to clear out of the way through appropriate preparation.

Whether you choose static or neural machine translation, it is necessary to have a sufficient volume of training data. Such data has a significant influence on the translation quality and also has a direct impact on turnaround times and costs for services such as post-editing. In addition, you will usually make the most of MT potential if your machine translation system is linked to a translation management system. On this basis, everyone affiliated with the supply chain has direct access to your databases and your criteria for checking quality. The Across Language Server also makes it possible for you to seamlessly integrate machine translation into your translation process.