- Release Notes v7.0
- Getting Started
- System Management
- General Information
- Users
- Groups & User Crowds
- Softkeys
- Reports for User and Group Information
- User Settings Templates
- Workflows
- Relations
- Languages
- Fonts
- User settings
- System settings
- Editing States
- User Dictionary
- crossGrid
- crossGrid Packaging Templates
- crossTank
- crossWAN Packaging Templates
- Subjects
- Information
- Machine Translation
- Project Settings Templates
- Quality Management v6.3
- Quality Management v7.0
- Reporting
- Segmentation
- Language Settings
- Structure Attributes
- System Attributes
- Search Center
- Concordance Search Results
- Stopwords
- Term Extraction
- Document Settings
- Document Associations
- Display Text
- .NET Resources
- Excel 2000-2003
- Excel 2007-2016
- IDML
- MIF 7
- MIF 8-2019
- PowerPoint 2000-2003
- PowerPoint 2007-2016
- QuickSilver
- Tagged HTML
- Tagged SGML
- Tagged XML
- Tagged XML v2
- Visual XML
- Windows Resources
- Word 2000-2003
- Word 2007-2016
- XLIFF
- Regular expressions
- System attributes
- Project Management
- Projects
- Project View
- Project settings
- Functions of the Module
- Project Search
- Project creation
- Adding attachements
- Releasing Projects
- Document and Project Updates
- Project status
- Exporting projects
- Importing projects
- Activating/Deactivating Projects
- Duplicating Projects
- Archiving Projects
- Change workflow
- Changing Workflows (Several Documents)
- Documents
- Reports
- Tasks
- Quality management
- Formats
- The Project Archive
- crossGrid
- Project Management Cockpit
- The Filter Editor
- crossAnalytics
- Linguistic Supply Chain Management (LSCM)
- crossWAN Project Management
- Partitioning
- Relay Translations
- Document preparation
- Term Extraction and Term Translation
- External Editing of Documents
- The EN 15038 Standard Workflow
- The ISO 17100 Standard Workflow
- crossConnect for External Editing
- Finishing pre-translated tasks automatically
- Projects
- Task Processing
- Working in crossDesk
- Paragraph States
- Empty Paragraphs
- Modes
- Customizing crossDesk
- Tasks in Across
- Comments
- Bookmarks
- Paragraph Numbering
- Sorting Paragraphs
- Context View/Source View
- crossTerm Window in crossDesk
- crossView
- Fuzzy search
- Concordance search
- crossSearch
- Spell-check and User Dictionary
- Pre-translations
- Store Translations Wizard
- The Target Editor
- Preview
- QM Check in crossDesk
- Search and Replace
- Correction
- Reviews
- Redelegation to the Translator
- Quick Translate
- Local Data in the Offline Client
- crossWAN
- TM Management
- Terminology Management
- Concept-Oriented Terminology System
- Definitions
- The crossTerm Manager
- crossTerm settings
- crossTerm Manager User Interface
- Searching for Entries/Terms
- Entry and term elements
- Editing Entries/Terms
- Delete Entries/Term(s)
- Merging Entries
- Duplicating Entries
- Manual correction
- crossTerm Reports
- crossTerm Import
- crossTerm Export
- crossTerm Data Maintenance
- crossTerm Web
- crossMining
- crossSearch
- Browser-based Work
- Editing of Special Formats
- Menus, Icons, and Keyboard Shortcuts
Sentence detection
The sentence detection used by Across is rule-based, i.e. Across uses rules to determine where a sentence ends and a new sentence begins.
You can import or export language settings in XML format via Import or Export.
Sentence rules are structured as follows:
Part | Function | Example |
1 | Specifies which separator the rule handles | [?] |
2 | Type of rule, i.e. whether the rule defines the end of a sentence (+) or not (-). | + or - |
3 | The actual rule | [?^_] |
Default sentence rules
By default, Across uses the following sentence rules (Standard language set > Sentence rules):
Wildcard | Function |
[!]+[!^_] | An exclamation mark followed by a white space is interpreted as the end of a sentence. |
[!]-[!^_^a] | An exclamation mark followed by a white space and a lower case letter is not interpreted as the end of a sentence. |
[.]+[.^_] | A period followed by a white space is interpreted as the end of a sentence. |
[.]-[.^_^a] | A period followed by a white space and a lower case letter is not interpreted as the end of a sentence. |
[.]-[^_^n.] | A space followed by a one-digit number and a period is not interpreted as the end of a sentence. Multi-digit numbers should be mapped by means of additional rules with multiple placeholders n such as, for example, [.]‑[^_^n^n.] for a two-digit number. |
[?]+[?^_] | A question mark followed by a white space is interpreted as the end of a sentence. |
[?]-[?^_^a] | A question mark followed by a white space and a lower case letter is not interpreted as the end of a sentence. |
[n]+[.\n] | A period followed by a backslash and the letter n is interpreted as the end of a sentence. The background to this rule is that the character sequence \n represents a line break, especially in the localization of software resources. In the following string, for example, the sentence ends after \n according to the rule: Cannot load file.\nError: 0x%x |
[n]+[!\n] | An exclamation mark followed by a backslash and a letter n is interpreted as the end of a sentence. |
[n]+[?\n] | A question mark followed by a backslash and a letter n is interpreted as the end of a sentence. |
[t]+[.\t] | A period followed by a backslash and a letter t is interpreted as the end of a sentence. The background to this rule is that the character sequence \t represents a horizontal tabulator, especially in the localization of software resources. In the following string, for example, the sentence ends after \t according to the rule: &Find...\tCtrl+F |
[t]+[!\t] | An exclamation mark followed by a backslash and a letter t is interpreted as the end of a sentence. |
[t]+[?\t] | A question mark followed by a backslash and a letter t is interpreted as the end of a sentence. |
Example:
[.]+[.^_] | Defines the end of a sentence: A period (.) followed by a white space (^_) is interpreted as the end of a sentence. Usually, the underscore _ stands for a white space. The ^ character before the underscore defines the following character - the underscore - as a placeholder. Without the ^ character, the subsequent character would be interpreted as a normal character – i.e. as an actual underscore – and not as a wildcard. |
[.]-[.^_^a] | However, defines an exception for the example above: If a period is followed by a white space and a lower case letter (^a), the sentence has not ended. |
In the word combination "This is a sentence. This is another sentence.", the first period constitutes the end of the sentence, because it is followed by a white space. In the word combination "But not. this!" however, it does not, because the period is followed by a lower case letter and no capital letter follows.
Abbreviations
The definition of abbreviations represents a special case of the sentence rules: An abbreviation in a source text will only be identified as such and not taken to represent the end of a sentence if duly defined as an abbreviation.
Uppercase and lowercase spelling is not taken into consideration for abbreviations. The abbreviation "max." will be identified as such even if a sentence contains "Max." (at the beginning of the sentence).
In the list of abbreviations, the abbreviations are sorted in ascending order according to the ASCII code of the characters. Abbreviations with an initial umlaut or accent are displayed at the end of the list of abbreviations.
Abbreviations added while editing the sentence detection in crossDesk are automatically added to the language settings.