Splitting Paragraphs

By default, the source texts – and thus the target texts – are displayed and edited in the form of paragraphs in crossDesk. Instead of this paragraph-based display and editing of the texts, you can also select sentence-based display and editing in crossDesk. In this way, paragraphs comprising several sentences are split into individual sentences and displayed separately. The so-called splitting of paragraphs greatly facilitates the processing especially of documents with many long paragraphs in crossDesk.

Additionally, you can also determine that paragraphs exceeding a certain number of characters be split.

You can select sentence-based paragraph splitting for each individual format and, by using different document templates, for certain documents via the Splitting Settings button under Tools > System Settings > Document Settings > desired document template.

Splitting must be selected before a document is checked in and can not be undone. To undo the splitting, you must check in the respective document once more without splitting.

The settings defined in the document settings templates, e.g. those for locking paragraphs, are applied before the splitting. Thus, the paragraphs will first be locked and then split.

The splitting is not applied to paragraphs of ML formats (HTML, SGML, and XML) with length restriction.

Sentence-Based Splitting of Paragraphs

Sentence-based splitting of paragraphs can be selected using a corresponding option in the splitting settings of the respective document settings template.

The following example demonstrates by means of a paragraph consisting of three sentences how sentence-for-sentence splitting works:

Sentence-based splitting
Paragraph-based splitting
cDesk_source-view_segmentierung-satzbasiert
cDesk_source-view_segmentierung-absatzbasiert
The paragraph has been split into three sentences. Each sentence is displayed separately in crossDesk.
All three sentences of the paragraph are displayed in one paragraph in crossDesk.

When using sentence splitting, no sentence detection will be performed in the Target Editor, as all source-language paragraphs merely contain one sentence.

Across performs the sentence-based paragraph splitting on the basis of the sentence detection settings. These can be accessed in the Language settings section in the system settings under Tools > System Settings > General > Language Settings.

Trimming of Leading and Trailing Whitespace Characters

An additional option allows you to determine that leading and trailing whitespace characters before and after the sentences to be translated are to be trimmed in crossDesk during the translation of the document.

By default, the whitespaces between the sentences in the original paragraphs are displayed in crossDesk in the course of the sentence splitting (except for an individual normal whitespace, which is always automatically hidden). Using the option for trimming leading and trailing whitespace characters, these leading and trailing whitespace characters can automatically be trimmed in crossDesk. During check-out, the trimmed whitespaces are automatically reinserted in the target document.

The option especially affects the storage of the translation units in crossTank: If the option is activated, the translation units will be stored to crossTank without leading and trailing whitespaces. As a result, the match rates of the crossTank search hits will be better, and the number of translation units stored will be smaller.

Example

The following example demonstrates the functionality of the option by means of a paragraph consisting of two sentences separated by a tab:

Whitespace trimming option deactivated
Whitespace trimming option activated
cDesk_source-view_ohne-whitespace-tilgung
cDesk_source-view_mit-whitespace-tilgung
The tab has not been removed and is therefore displayed before the second sentence.
The tab has been removed and is therefore not displayed before the second sentence.

Apart from normal whitespaces (0x0020), the trimmed whitespace characters may also be tabs or no-break spaces. The option takes all characters defined as whitespaces in the language settings into consideration.

sysset_allg_spracheinstellungen_wildcards
  • By default, the following characters are defined as whitespaces:
  • NULL character
  • Horizontal tab
  • Spaces
  • Carriage return
  • Line feed
  • Vertical tab (soft return)

Splitting Paragraphs after Certain Number of Characters

Splitting of paragraphs after a certain number of characters can be selected using a corresponding option in the splitting settings of the respective document settings template. After activating the option, indicate a desired number of characters (including spaces) after which the splitting should be applied.

  • The splitting of paragraphs that are longer than the specified number of characters is governed by the following rules (example: paragraph splitting after 400 characters):
  • If the paragraph consists of several sentences, it will be split after the end of the last complete sentence before the defined number of characters is reached.
  • If the paragraph consists of a single sentence, it will be split according to the defined number of characters. Words are not split. If the defined number of characters is reached within a word, the paragraph will be split before this word.
  • If the number of characters in the paragraph exceeds the defined number of characters by multiples, the paragraph will be split into multiple paragraphs.

The following example shows how the splitting after a certain number of characters works:

No splitting
Split paragraph after 400 characters
cDesk_source-view_kein-splitting-nach-400-zeichen
cDesk_source-view_splitting-nach-400-zeichen
The paragraph has not been split and is displayed as one paragraph in crossDesk.
The paragraph contains more than 400 characters and has therefore been split after the last sentence before the number of 400 characters is reached.