- General Information
- Connecting Third-party Systems
- crossConnect for content systems
- crossConnect for External Editing
- Purpose and usage
- Requirements
- Implementation
- Across XLIFF format
- Across-specific Extensions
- <xliff> Element Attributes
- <file> Element Extensions
- <trans-unit> Element Extension
- Paragraph States
- Paragraph State Flags
- <source> and <target> Element Content
- <bpt> Element Attributes
- <ph> Element Attributes
- <x> Element Attributes
- Across-specific Properties
- Analysis Results
- Sample Files
- Across XLIFF - import, export and segmentation
- Context information
- Exporting best matches in Across XLIFF
- Hyperlinks to XLIFF
- Secure file handling with C#
- Secure file handling with JAVA
- Workflow and vendor configuration
- Sample code - Integrated solution
- Across XLIFF format
- Generic File Connector
- Display Texts
- APIs
- APIs - Technology
- crossTank API v1
- crossTank API v2
- crossTerm API v1
- crossTerm API v2
- crossAPI SI
- Requirements
- Function Return Types
- crossAPI SI and Java
- List of Objects in crossAPI SI
- Sample - transferring checkout files via FileManager
- Sample - VBS
- Text Preprocessing API
Sentence splitting
Across uses an internal algorithm to split text into sentences and words. This functionality can be accessed via the /api/v1/TextRanges endpoint. It returns sentence, word, and field (date, time, number) ranges for a given input text.
The sentence splitting rules are read from the Across Server settings. These settings are specific for languages and sub-languages and can be changed in the system settings of the Across Client under General > Language Settings.
URL | Method | Description |
/api/v1/TextRanges | POST | Parses an input sentence and returns ranges for sentences, words, and fields based on language-specific rules. |
The parameter detect mode allows specifying which information should be returned:
- SentenceRanges: ranges of sentences in input text
- WordRanges: ranges of words in input text
- FieldRanges: ranges and types of numeric fields (dates, times, numbers)
Example 1: Tokenization
This example performs a tokenization of an input text, meaning it requests sentence, word, and field ranges for a text.
POSThttp://localhost/across/textpreprocessing/api/v1/TextRanges
{ "text": "This is the first sentence. This is sentence 2.0.", "languageId": 9, "detectMode": "SentenceRanges,WordRanges,FieldRanges" }
Response
{ "sentences": [ { "begin": 0, "end": 27 }, { "begin": 28, "end": 49 } ], "words": [ { "begin": 0, "end": 4 }, { "begin": 5, "end": 7 }, { "begin": 8, "end": 11 }, { "begin": 12, "end": 17 }, { "begin": 18, "end": 26 }, { "begin": 28, "end": 32 }, { "begin": 33, "end": 35 }, { "begin": 36, "end": 44 }, { "begin": 45, "end": 46 }, { "begin": 47, "end": 48 } ], "fields": [ { "begin": 45, "end": 48, "type": "Number", "convertedValue": null } ] }
Example 2: Numeric field conversion
This example retrieves field ranges for a number and converts this number from English to German formatting, using the fieldValuesLanguageId parameter.
POSThttp://localhost/across/textpreprocessing/api/v1/TextRanges
{ "text": "This is sentence 2.0.", "languageId": 9, "detectMode": "SentenceRanges,WordRanges,FieldRanges", "fieldValuesLanguageId": 7 }
Response
{ "sentences": [ { "begin": 0, "end": 21 } ], "words": [ { "begin": 0, "end": 4 }, { "begin": 5, "end": 7 }, { "begin": 8, "end": 16 }, { "begin": 17, "end": 18 }, { "begin": 19, "end": 20 } ], "fields": [ { "begin": 17, "end": 20, "type": "Number", "convertedValue": "2,0" } ] }