• Show convenient version of this site
  • Deutsch
  • English
Contact
Newsletter
  • Products
    • ALS
      • Across Language Server
        • Translation Management
        • Terminology Management
        • Translation Memory
      • Editions
      • Interfaces
    • ATE
      • Across Translator Edition
      • Editions
      • Download
      • Across Account
    • Elanion
      • Overview
      • Login
  • Solutions
    • Customers
      • Enterprises
      • Language Service Providers
      • Translators
    • Industries
      • E-Commerce & Trade
      • Pharmaceuticals & medicine
      • IT & Software
    • Departments
      • Marketing & E-Commerce
      • Technical Documentation
      • Software User Interfaces
  • Services
    • Hosting
    • Training
    • Consulting
  • Partners
    • Language Service Providers
    • Universities
    • Organisations
  • Company
    • Across Systems
    • News
    • Events
    • Career
    • Contact
  • Knowledge
    • Blog
    • Video Library
    • Case Studies
    • White Papers
    • Fact Sheets
    • File Formats
    • Expert Features
  • Support
    • Online Help
    • FAQ
    • Support Request
    • Updates
    • New Functions
Schedule Consultation
  • Online-Help
  • crossMining
  • Usage
  • Working with crossMining
  • Statistical Lexica
  • About crossMining
  • Installation
    • On a Client PC
      • Creating a Generic Softkey
        • Save to Storage Media
        • Send via E-Mail
      • Registering the Generic Softkey
  • Usage
    • Start
    • Working with crossMining
      • Statistical Lexica
        • Creation
        • Distribution
      • Availability of Statistical Lexica
        • Auto-completion in Across
        • Testing the Auto-Completion
      • Terminology Harvesting
        • Addition of Target-Language Terms
        • Bilingual Term Extraction
      • Moses SMT Phrase Tables
    • Quit crossMining
  • Settings
    • Basic Settings
    • Advanced Settings
    • Connection
    • Character Handling
    • Terminology Harvesting
  • Troubleshooting
  • Uninstalling

Statistical Lexica

Statistical lexica form the basis for the work with the various functions of crossMining. These are created automatically in several steps and are mainly based on the crossTank data of an Across Language Server. Optionally, the existing terminology in crossTerm can also be taken into consideration when creating lexica.

Furthermore, statistical lexica can be created on the basis of Moses SMT phrase tables, a free system for statistical machine translation.

The statistical lexica have the file extension DIC and are created for a particular language pair. The lexica can only be used in one direction for the other crossMining functions, i.e. only for the language direction selected during creation.

Attention

Before you continue using the statistical lexica for the other functions of crossMining, you should test the lexicon creation thoroughly on the basis of your specific data and, if necessary, with professional help in order to ensure the most suitable values and settings for your data.

A certain amount of data (translation units) is necessary for the efficient use of crossMining. The smaller the amount of data available for the calculation of probabilities, the poorer the results will be. Generally, about 10,000 translation units (per language pair) should be provided, though this does not mean that good results cannot be achieved with fewer translation units.

The quality of the results also depends on the respective language or language combination. Languages with a simpler morphological structure, such as English, enable good results even with a relatively small amount of data. In contrast, the satisfactory determination of probabilities for highly inflectional languages like Finnish is only possible from a larger amount of training data. Moreover, the language direction is also important.

As the creation of the lexicon is very resource-intensive, it may take some time, depending on the data volume. Therefore, you should only run the lexicon creation at times when the computer has nothing or little else to do.

Tip

Of course, it is possible to create statistical lexica as often as necessary. Creating new lexica is recommended especially when the crossTank data have changed substantially, e.g. after importing a large translation memory or upon completion of a major translation project. Some users may want to create lexica at regular intervals, e.g. once a month.

See also
Import of Moses SMT Phrase Tables
Register for our Newsletter now

Our newsletter provides you with exclusive news about the Across Language Server, often even before the official announcement. The newsletter also provides you with early information on events, webinars, and training sessions.

  • Legal Notice
  • GTC
  • Privacy Statement
  • Cookies
  • info@across.net