Across performs several text preprocessing algorithms to prepare text for translation, such as
- splitting sentences and words ("tokenization")
- detecting abbreviations
- detecting and converting the format of dates, times, and numbers.
These preprocessing functions are language-specific and can be modified in the system settings of the Across Language Server.
The Text Preprocessing API makes these preprocessing functions available for other applications, based on individually defined rules.
The current version of the API allows splitting text into sentence ranges, converting numeric fields, and temporarily adding and deleting abbreviations.