Artificial intelligence and machine translation

Across develops own MT system

News from the world of AI

For anyone interested in the topic of artificial intelligence (AI), 2022 was a momentous year. It is remarkable what can now be done with AI. For example, it was announced in September that game designer Jason Allen had won first place in an art competition in the digital art/digitally enhanced photography category with an AI-generated artwork. He used the AI software Midjourney for this, which created the artwork over the course of 80 hours.

DALL·E 2 from OpenAI operates according to a very similar principle. Unveiled in April 2022, the software caught the attention of the public because it is also able to artificially create visual images such as photographs, drawings, paintings, 3D visualizations, and more. The software has been freely available for everyone to use since September 2022. For everyone who hasn't yet seen what the new AI can do, here is a small collection of images created with DALL-E 2:

The way the software works is quite simple: Users enter text to describe what they want to see, and the AI quickly delivers four (original!) image versions of the desired motif. The more precisely you describe the subject matter you want to see, the more complex the image becomes.

OpenAI's latest project has been available to the general public for just a few days (as of December 7, 2022) and is at least as ambitious as DALL-E 2. It is a text generator that works in a manner reminiscent of other chatbots, yet is completely different: ChatGPT. You can ask ChatGPT all sorts of questions, and the answers it provides are surprisingly good in most cases. The text generator can even output entire scripts that solve a wide variety of problems in a programming language of your choice.

As an example, we asked ChatGPT to explain neural machine translation. Our submission to ChatGPT: "Explain the advantages and disadvantages of neural machine translation. Write a short article about it." The output is as follows, and you can judge the quality for yourself:

  • Neural machine translation (NMT) offers several advantages over traditional machine translation. One of the most important benefits is the improved translation quality achieved through the use of neural networks. Neural networks are powerful tools that make it possible to identify complex patterns and relationships in data and thus improve translations.
  • Another advantage of NMT is its ability to translate entire sentences, not just individual words or phrases. This results in smoother and more natural translation that is more similar to the work done by human translators.
  • However, there are also some disadvantages to using NMT. One of the biggest drawbacks is that the technology is still very expensive and complex, making it difficult for many companies and organizations to implement. In addition, NMT is still not perfect and can sometimes output errors or unnatural translations.
  • Overall, NMT offers improved translation quality and the ability to translate entire sentences, but its high cost and potential errors still make it difficult for many users to use.

This brief excursion into the news of the world of artificial intelligence is meant to underscore our view on the translation industry: In the future, (nearly) every translation will be supported by machine translation (MT), with language experts involved to a greater or lesser extent.

The future of machine translation

This brief excursion into the news of the world of artificial intelligence is meant to underscore our view on the translation industry: In the future, (nearly) every translation will be supported by machine translation (MT), with language experts involved to a greater or lesser extent.

Before neural machine translation came along, few people would have agreed with this statement, but it is now widely accepted that the quality of MT is good enough to add real value to the translation process.

According to a Gartner study, 75% of the global translation volume will be machine translated by 2025— and all of us in the translation industry will need to adapt to this new reality. However, two key aspects will determine the efficiency of machine translation:

  1. The perfect interaction of man and machine.
  2. The size and quality of the translation memory (TM) and terminology database (TB) as well as how they are integrated into the MT system.

In the area of artificial intelligence and machine translation, a lot has also happened at Across this year that will help face these challenges. In addition to the expansion of interfaces with existing providers, we will be rolling out our own machine translation system at the beginning of 2023: AcrossMT.

The perfect interaction of man and machine

The output of the MT engines is getting better and better, but in many cases, what it produces cannot be used without modification. This is where there is a role for post-editors, who improve the quality of machine translation. Distinction is made between light and full post-editing. Further information on this topic is available in the article "Post-editing—better quality for machine translation".

We know how difficult and tedious the work can be for post-editors sometimes because the way in which work is done today is no longer in keeping with the times. To take full advantage of the possibilities of machine translation, there is a need for new tools that are not yet available on the market—but more on that later.

Size and quality are what matter

Most of us have worked with MT at some point by now, even if only for testing purposes. And in most cases, we most likely were using a generic engine. Although the good results of generic engines cannot be ignored, a customized engine is the preferred solution in most use cases.

When an engine is trained with a company's large translation memory and then a terminology database is imported, the quality of the output is improved even further. On this basis, translators and post-editors can work better, faster, and with greater accuracy.

Precisely this data has been available in many companies for years. Companies that are currently using their existing data only for translation memory matches have a great deal of untapped potential at their disposal: The data is a goldmine, as an MT engine trained with such information is extremely valuable. The results can be further customized with additional metadata: style, gender, domain, subject, sentence length, language variant—everything is possible.

The Across solution

At Across, we have experienced this paradigm shift at first hand in recent years, and we have also seen it play out among our customers. According to customer interviews, our customers see clear trends in their companies: the orders are getting bigger, efficiency and cost pressures are increasing, translations are being made into more and more languages, and so on. The majority of the customers we interviewed are also certain that MT is the future. However, many still say that they do not use MT in their translation processes.

We asked ourselves: What is the problem? The answers we received were clear: companies have too little time to deal with the topic, the topic is too complex, it is hard to get an overview, and support is needed to move forward.

Across conducted intensive research to identify potential areas of opportunity for bringing about the perfect interplay of human and machine. We have seen that the potential is definitely there, but in order to take full advantage of it, it is necessary to develop new features for machine translation that MT vendors have not yet brought to the marketplace. One thing is clear: In the future, a translation management system or a CAT tool will only be successful if it is extensively linked to an MT system. However, without our own MT or a close partnership with a specific provider, we will not be able to achieve our goals.

We weighed all the options and alternatives, but it was clear to us at the end of our research process: We must and will develop our own MT system—AcrossMT. Why? We want to bring the data in-house, we don't want a black box or dependencies, and we want to have an impact on data quality and connect the MT to our own systems. This approach will accelerate our processes and ensure better translation quality for our customers.

However, our system is not meant to be just another one of many that you connect to the translation management system (TMS) via an application programming interface (API). No, we are developing a completely new product with the MT system at its core. However, by the time this product is launched, we will already be able to offer features that will set us apart: For example, data will be regularly exchanged between AcrossMT and the Across Language Server. Based on these data, the engines are constantly retrained. This makes sure that all available data are used and the engines progressively get better.

First details about AcrossMT

AcrossMT is a customized MT solution with a primary focus on the technical industry that will be trained with customer data. According to our tests, the results of AcrossMT are measurably higher than those of generic MT solutions. AcrossMT also has a high degree of data security, as no data leaves the company, and the pricing is fair (based only on volume).

As a long-standing customer of Across, you are perfectly positioned to use a high-quality MT engine that is trained with your data thanks to crossTank and crossTerm. Since Across relies on its own MT system that is extensively integrated into the TMS, you are in very good hands, both today and in the future.

AcrossMT will be available at the beginning of 2023, but feel free to contact us today, as we would be happy to advise you.