Tip 3: Understand Machine Translation
As mentioned previously, you can become a post-editing expert by knowing as much as possible about machine translation. For example, you should know how the various systems work and how enterprises can prepare their texts for machine translation.
Though neural machine translation is increasingly becoming the system of choice, it is good to know the basics of rule-based and statistical systems.
Rule-based Machine Translation
The rule-based approach is the conventional MT method. The development of a rule-based system is very costly and time-consuming, as every linguistic peculiarity needs to be entered manually. For this reason, this approach is gradually being abandoned. Nevertheless, it is to be noted that rule-based machine translation delivers good terminology proposals, as the system is systematically trained with corporate terminology. Moreover, the translations are always complete, and the results are predictable. The main disadvantage is that the translations sound very mechanical, and the sentence structure is not presented very effectively.
Statistical Machine Translation
Statistical machine translation is based on the approach of creating translations on the basis of probability calculations. The information required for this is extracted from bilingual corpora. As the sentence structures and the terminology are different in every corpus, the output might be marred by a lack of consistency, which can impair the legibility. Moreover, these systems can produce incomplete translations, wrongly add information, or make capitalization and spelling mistakes.
Neural Machine Translation
Neural machine translation is based on an artificial neural network (ANN) that mimics the neural connections in the brain. This approach, too, involves the analysis of parallel corpora for the translation. The difference is that in the ANN, the grammatical context of the sentences is implicitly taken into consideration. This means that the texts are not translated at phrase level, but at sentence level, which greatly contributes to the legibility. To date, the greatest challenge of neural machine translation is the still limited vocabulary that the models are able to process (currently 50,000 to 80,000 words). Therefore, the post-editing needs to focus more on the lexicon than on the grammar. The disadvantages of this approach are the same as those of statistical machine translation. As translations created with NMT are very pleasant to read, there is a risk of overlooking errors. Nevertheless, neural machine translation is the approach that currently delivers the best results. Therefore, a more careful review at the lexical level is usually worth the effort.