Removal of Duplicates

Remove sentence duplicates

Sentence duplicates are duplicate or multiple identical segments. Normally and usually, a segment should occur only once in crossTank.

The Remove sentence duplicates action deletes all the duplicates existing in crossTank. Apart from normal sentence duplicates, the action also deletes sentence duplicates which merely differ in terms of the date, time or number format or placeables, tags or formatting. In addition, translation duplicates are also removed that may be created during the deletion of sentence duplicates.

The action makes distinction between upper and lower case spelling. Thus, the two sentences/segments "Danach Höhe neu einstellen" and "danach Höhe neu einstel­len" would not be treated as sentence duplicates.

Generally, segments and translation units are automatically adjusted or changed only when two crossTank options are enabled. These are the option for auto-changes for date, time, and number formats and the option for auto-adjustment of placeables, formatting, and tags.

Both options are enabled by default. You will find the options under Tools > System Settings > General > crossTank.

Remove sentence duplicates by text

The action Remove sentence duplicates by text enables you to quickly and easily remove duplicates of a source-language segment.

If, for example, you encounter a sentence with (a lot of) identical or seemingly identical translation units in the fuzzy search of crossDesk or in the concordance search of the crossTank Manager, you can easily remove these duplicates.

To do this, you can enter a source-language sentence for which sentence duplicates may exist in the input field. If necessary, select the source and target languages in which the duplicates are to be removed.

  • The following types of duplicates are taken into consideration and deleted by the action:
  • Sentence duplicates
  • Sentence duplicates with redundant elements
  • Translation duplicates

The action makes distinction between upper and lower case spelling. Thus, the two sentences/segments "Danach Höhe neu einstellen" and "danach Höhe neu einstel­len" would not be treated as sentence duplicates.

Remove translation duplicates

From the technical perspective, a crossTank entry is a connection between two segments (a source segment and a target segments) that are stored separately in the crossTank database. Translation duplicates are duplicate or multiple identical crossTank entries, i.e. multiple connections between the same source and target segments. Thus, they are redundant and are not needed.

The action Remove translation duplicates deletes translation duplicates from the crossTank database. Translation duplicates usually originate from program or database errors. The action deletes the superfluous entries, so that only one crossTank entry is left after the deletion.

From the technical perspective, a crossTank entry is a connection between two segments (a source segment and a target segments) that are stored separately in the crossTank database. Translation duplicates are duplicate or multiple identical crossTank entries, i.e. multiple connections between the same source and target segments. Thus, translation duplicates are redundant and are not needed.

  • The deletion of translation duplicates is subject to the following rules:
  • All existing values of single-value system attributes must be identical. If one or several single-value attributes of two translation units differ, the duplicates will not be removed.
  • The values of multiple-value system attributes are merged.

Sentence duplicates with redundant elements

By default, Across stores segments in which only certain elements differ as one segment. These elements may be date, time, or number formats or placeables, formatting, or tags.

If possible, this one stored segment is auto-adjusted if, for example, a segment that contains the same text but different numbers needs to be translated – so that a 100% match is also available in this case.

Due to handling or program errors, crossTank may contain several segments (duplicates) that only differ in terms of the date, time, or number formats or placeables, formatting, or tags.

If there are tags in the segments, both such segments that only have different tag names and segments whose attributes or attribute values are different are removed.

The following examples demonstrate the properties of this special type of duplicates:

Example 1: Duplicates with redundant numbers

  • Segment #1: "Bolt 4.5 x 60 mm"
  • Segment #2: "Bolt 5.0 x 60 mm"

Normally, only one crossTank segment would exist. Across would automatically change the contained numbers if necessary.

Example 2: Duplicates with redundant tags

  • Segment #1: "This is a <b>test</b>."
  • Segment #2: "This is a <i>test</i>."

Normally, only one crossTank segment would exist. Across would automatically adjust the contained tags if necessary.