• Show convenient version of this site
  • Deutsch
  • English
Contact
Newsletter
  • Products
    • ALS
      • Across Language Server
        • Translation Management
        • Terminology Management
        • Translation Memory
      • Editions
      • Interfaces
    • ATE
      • Across Translator Edition
      • Editions
      • Download
      • Across Account
    • Elanion
      • Overview
      • Login
  • Solutions
    • Customers
      • Enterprises
      • Language Service Providers
      • Translators
    • Industries
      • E-Commerce & Trade
      • Pharmaceuticals & medicine
      • IT & Software
    • Departments
      • Marketing & E-Commerce
      • Technical Documentation
      • Software User Interfaces
  • Services
    • Hosting
    • Training
    • Consulting
  • Partners
    • Language Service Providers
    • Universities
  • Company
    • Across Systems
    • News
    • Events
    • Career
    • Contact
  • Knowledge
    • Blog
    • Video Library
    • Case Studies
    • White Papers
    • Fact Sheets
    • File Formats
    • Expert Features
  • Support
    • Online Help
    • FAQ
    • Support Request
    • Updates
    • New Functions
Schedule Consultation
  • Online-Help
  • SDK
  • APIs
  • Text Preprocessing API
  • Sentence splitting
  • General Information
    • Connectors
    • LCIDs
    • GUIDs
    • Paragraph states and flags
    • crossTerm Web
    • crossTransform
      • crossTransform - examples
  • Connecting Third-party Systems
    • crossConnect for content systems
      • Requirements
      • Configuration
        • Configuring the connector
        • Job configuration
        • Testing crossConnect
      • Troubleshooting
    • crossConnect for External Editing
      • Purpose and usage
        • Use case - Machine translation
        • Use Case - Review and QA
        • Use case - Machine review
      • Requirements
      • Implementation
        • Across XLIFF format
          • Across-specific Extensions
          • <xliff> Element Attributes
          • <file> Element Extensions
          • <trans-unit> Element Extension
          • Paragraph States
          • Paragraph State Flags
          • <source> and <target> Element Content
          • <bpt> Element Attributes
          • <ph> Element Attributes
          • <x> Element Attributes
          • Across-specific Properties
          • Analysis Results
          • Sample Files
          • Across XLIFF - import, export and segmentation
          • Context information
          • Exporting best matches in Across XLIFF
          • Hyperlinks to XLIFF
        • Secure file handling with C#
        • Secure file handling with JAVA
        • Workflow and vendor configuration
          • Adding workflows to crossAutomate Host Manager
          • Adjusting preset watchfolders
          • The filter rules
          • External editing workflows
        • Sample code - Integrated solution
    • Generic File Connector
      • Process Overview
      • Exchange Folder Structure
      • Package Format
        • Control File
  • Display Texts
    • Solution approaches
    • The Across solution in detail
    • Requirements
    • Integration
    • The display text format
      • Line height vs. line spacing
      • DT-XML format structure
        • DT-XML - Main elements
        • Children of paragraphStyles and characterStyles
        • Children of the sizeInfos element
        • Children of the paragraph element
        • DT-XML v5 example
        • DT-XML v4 example
    • The display text package
      • Creating a display text package
  • APIs
    • APIs - Technology
    • crossTank API v1
      • Requirements
      • Testing the crossTank API installation
      • OData-URLs in Chrome
      • Example
    • crossTank API v2
      • Requirements
      • QuickStart
        • Searching for Translations
        • Creating Translations
      • crossTank API v2 Overview
        • Finding Translations
        • Creating Translations
        • Translation Properties
        • Formatting and Auto-Adjustment of Formatting and Numbers
      • C# Example
    • crossTerm API v1
      • Scenarios of use
        • Example - usage information
        • Example - finding terms
      • Requirements
      • Special Controllers
      • OData-URLs in Chrome
      • crossTerm API with C#
      • crossTerm API with Java
      • crossTerm API with JavaScript
    • crossTerm API v2
      • Requirements
      • QuickStart
        • Authentication Basics
        • First Request Examples
      • crossTerm API v2 Example in C#
        • Used Functions
      • crossTank API, crossTerm API v2 - Authentication
      • crossTerm API v2 - Examples of Use
        • Finding Terms of Entry
        • Searching Terminology via API
    • crossAPI SI
      • Requirements
      • Function Return Types
      • crossAPI SI and Java
        • Example: hot integration
        • Executing the hot integration example
      • List of Objects in crossAPI SI
        • AssignManager
        • Authorization
        • CheckInManager
        • CheckOutManager
        • CrossTankManager
        • CrossTermManager
        • DocumentManager
        • FileManager
        • LanguageJobManager
        • LanguageManager
        • LicenseManager
        • Message
        • ObjectManager - Overview
          • Part 1
          • Part 2
          • Part 3
        • ReportManager
          • ID values for analysis and analysis-result
        • ProjectManager - Overview
          • Part 1
          • Part 2
          • Attribute Information
        • SqlQuery
        • SystemManager
        • TaskManager
        • WanGridManager - Overview
          • Part 1
          • Part 2
      • Sample - transferring checkout files via FileManager
      • Sample - VBS
    • Text Preprocessing API
      • Introduction
      • How to browse the API
      • Sentence splitting
      • Abbreviations

Sentence splitting

Across uses an internal algorithm to split text into sentences and words. This functionality can be accessed via the /api/v1/TextRanges endpoint. It returns sentence, word, and field (date, time, number) ranges for a given input text.

The sentence splitting rules are read from the Across Server settings. These settings are specific for languages and sub-languages and can be changed in the system settings of the Across Client under General > Language Settings.

URL
Method
Description
/api/v1/TextRanges
POST
Parses an input sentence and returns ranges for sentences, words, and fields based on language-specific rules.

The parameter detect mode allows specifying which information should be returned:

  • SentenceRanges: ranges of sentences in input text
  • WordRanges: ranges of words in input text
  • FieldRanges: ranges and types of numeric fields (dates, times, numbers)

Example 1: Tokenization

This example performs a tokenization of an input text, meaning it requests sentence, word, and field ranges for a text.

POSThttp://localhost/across/textpreprocessing/api/v1/TextRanges

{
  "text": "This is the first sentence. This is sentence 2.0.",
  "languageId": 9,
  "detectMode": "SentenceRanges,WordRanges,FieldRanges"
}

Response

{
  "sentences": [
    {
      "begin": 0,
      "end": 27
    },
    {
      "begin": 28,
      "end": 49
    }
  ],
  "words": [
    {
      "begin": 0,
      "end": 4
    },
    {
      "begin": 5,
      "end": 7
    },
    {
      "begin": 8,
      "end": 11
    },
    {
      "begin": 12,
      "end": 17
    },
    {
      "begin": 18,
      "end": 26
    },
    {
      "begin": 28,
      "end": 32
    },
    {
      "begin": 33,
      "end": 35
    },
    {
      "begin": 36,
      "end": 44
    },
    {
      "begin": 45,
      "end": 46
    },
    {
      "begin": 47,
      "end": 48
    }
  ],
  "fields": [
    {
      "begin": 45,
      "end": 48,
      "type": "Number",
      "convertedValue": null
    }
  ]
}

Example 2: Numeric field conversion

This example retrieves field ranges for a number and converts this number from English to German formatting, using the fieldValuesLanguageId parameter.

POSThttp://localhost/across/textpreprocessing/api/v1/TextRanges

{
  "text": "This is sentence 2.0.",
  "languageId": 9,
  "detectMode": "SentenceRanges,WordRanges,FieldRanges",
  "fieldValuesLanguageId": 7
}

Response

{
  "sentences": [
    {
      "begin": 0,
      "end": 21
    }
  ],
  "words": [
    {
      "begin": 0,
      "end": 4
    },
    {
      "begin": 5,
      "end": 7
    },
    {
      "begin": 8,
      "end": 16
    },
    {
      "begin": 17,
      "end": 18
    },
    {
      "begin": 19,
      "end": 20
    }
  ],
  "fields": [
    {
      "begin": 17,
      "end": 20,
      "type": "Number",
      "convertedValue": "2,0"
    }
  ]
}
Register for our Newsletter now

Our newsletter provides you with exclusive news about the Across Language Server, often even before the official announcement. The newsletter also provides you with early information on events, webinars, and training sessions.

  • Legal Notice
  • GTC
  • Privacy Statement
  • Cookies
  • info@across.net