Statistics in Reports
The following presentation explains the figures contained in Across reports, which may be used as the basis for calculating the cost of translation projects.
After generating a report and especially if the report is going to be used as the basis for billing a translation, we recommend checking the report for correctness and plausibility.
Across reports are created on the basis of the source text. The source text analysis starts at the first source text paragraph.
When counting words in a source text, Across applies certain rules. Other applications that also offer functions for counting words, such as MS Word and other CAT tools, may apply different rules. Thus, the word count result in Across may differ from that delivered by these other tools.
Across endeavors to make its counting method transparent and intelligible for users. Therefore, the following presentation documents the different figures contained in Across reports in detail, which may be used as the basis for calculating the cost of translation projects.
Information in Individual Rows of a Report
Details (Total)
Presents all statistics without any deductions. The total number of translation units - is calculated from the net number (Details (Net)) plus locked units, hidden units, repetitions, as well as context matches and 100% matches from the crossTank analysis results, sorted by languages.
Repetitions
The sum of repetitions without the first occurrence of a segment to be translated.
- In documents: The repetitions are counted individually in each document. The subtotals are added to the specified value.
- In project: The repetitions are counted for the entire project. Repetitions between the documents are also taken into consideration.
Document 1 | Document 2 |
Repetitions in individual documents | |
Repetitions in doc 1: 1 repetition "AAA" + 1 repetition "BBB" = 2 repetitions | Repetitions in doc 2: 1 repetition "CCC" = 1 repetition |
= 3 repetitions in documents | |
Repetitions in the project | |
1 repetition "AAA" + 2 repetitions "BBB" + 2 repetitions "CCC" = 5 repetitions | |
= 5 repetitions in the project |
Please note that the information about project-wide repetitions is determined regardless of the individual source and target languages and may therefore be incorrect in some cases. If, for example, a project involves the translation of a document from English to German and an almost identical document from English to French, the values for the project-wide repetitions will be determined from the two English source documents. However, as the target languages of the two translations are different, the values are irrelevant. Therefore, separate reports should be generated for documents with the same source and target languages.
Please also note that the Across reports merely contain the repetitions at project and document level, but not separately for individual partitions. If necessary, the number of repetitions per partition must be determined manually.
Apart from normal repetitions, Across reports also display match repetitions.
crossTank Analysis Results
Lists the results of the crossTank analysis by match rate, sorted by languages. Only match rates for which hits are available are displayed.
- Matches in documents: The matches are counted individually in each document; repetitions in the individual documents are taken into consideration. However, repetitions between the documents are not taken into consideration (see above example under Repetitions). The counting results are added to the specified value.
- Matches in project: The matches are counted for all documents; repetitions between the documents are also taken into consideration (see above example under Repetitions).
- Inserted from bilingual document: The segments to be translated have been prefilled with translations from a bilingual XLIFF document. Apart from the source text, XLIFF files may also already contain translated segments.
- Tip
If crossTank hits (100% matches, fuzzy matches, etc.) also exist for the prefilled segments, these matches will not be shown separately in the respective reporting categories.
- 100%: Matches whose content and style fully correspond to the segments to be translated.
- 90-99%, 80-89%, etc.: Due to differences from the segments to be translated in terms of content and style, the match rate of these fuzzy matches is below 100%. The percentage differences between 100% matches and fuzzy matches are the result of content differences and penalties applied.
- Tip
The grading of matching ranges corresponds to the specifications in the reporting settings under Tools > System Settings > General > Reporting.
- No match: For these segments, no hits were found in crossTank or the match rate was below the defined threshold.
- 100% match (MT): 100% matches that were inserted through the connection of machine translation systems. MT matches are listed separately and are not contained in the 100% matches.
- 100% match/not inserted (multiple matches): 100% matches that were not inserted in the translation because there were several 100% matches for the sentence. 100% matches not inserted are already included in the 100% matches and must not be added again to the calculation of the Details (total).
- Tip
You can determine that 100% matches are to be inserted in the translation even if there are several 100% matches for a sentence. To do this, enable the option Insert pre-translation when multiple 100% matches exist under Tools > System Settings > General > crossTank > Pre-Translation Settings.
- 100% match / partial match (paragraph context): 100% matches in paragraphs for which there were no 100% matches for one or several of the sentences of the respective paragraph. These 100% matches are already included in the 100% matches and must not be added again to the calculation of the Details (total).
- Tip
By default, these 100% matches are not inserted in the respective paragraphs. For the 100% matches to be inserted in the translation, activate the option Insert placeholders when not all segments can be pre-translated (under Tools > System Settings > General > crossTank > Pre-translation). The placeholder ##NO_MATCH## will thus be inserted in the translation for every segment for which there is no 100% match.
- Match / not inserted (paragraph validation failed): matches not inserted in the translation because otherwise the target paragraph validation would have failed. These not inserted matches are already included in the 100%, 90-99%, 80-89% etc. match ranges and must not be added again to the calculation of the Details (total).
Explanation: During pre-translation of files in, amongst others, the ML formats HTML, SGML, and XML, existing matches may not be inserted in the translation as this would cause the validation of the respective target paragraph to fail. For example, matches may not be inserted in the translation if the target paragraph does not contain a closing tag for an opening tag or if the length restriction for an element would be exceeded. Elements of HTML, SGML, and XML files may contain length restrictions by means of which the translation of these elements is limited to a certain number of characters. In this case, the translation must not be longer than the specified number of characters.- In the following cases, the paragraph validation may fail, resulting in the listing of such paragraphs under the respective category:
- Tagged HTML, Tagged SGML, Tagged XML: If a length restriction exists and has not been complied with.
- Tagged XML: If the well-formedness check by the respective QM criterion fails.
- Visual XML: If a length restriction exists and has not been complied with and if the structure check by the respective QM criterion fails.
- QuickSilver: If the prefix component was not stored at the beginning of the paragraph.
- Resource files: If the paragraphs Product version and File version in the version information section contain incorrect version data. The data must be separated by commas.
- Context match: A match for which not only the properties of a normal 100% match – i.e., 100% correspondence between the content and style of the current sentence and a crossTank hit – but also the context corresponds. The context considered here comprises the preceding and subsequent sentences. Context matches are listed separately and are not contained in the 100% matches.
- Structure match: A match for which not only the properties of a normal 100% match – i.e., 100% correspondence between the content and style of the current sentence and a crossTank hit – but also the structure attribute corresponds. Structure matches are listed separately and are not contained in the 100% matches.
- Context and structure matches: A match for which not only the properties of a normal 100% match – i.e., 100% correspondence between the content and style of the current sentence and a crossTank hit – but also the context and structure attribute corresponds. Context and structure matches are listed separately and are not contained in the 100% matches.
- Protected matches: A match that was pre-translated with a released crossTank entry and inserted in the Target Editor. During this process, the match was protected and thus cannot be edited. Protected matches are listed separately and are therefore not contained in the 100% matches.
Match repetitions: (100%, 90-99-%, No match repetitions)
In addition to the normal repetitions that are presented regardless of any search hits in crossTank (matches), the match repetitions are displayed under consideration of any existing crossTank matches.
Match repetitions are relevant if a customer is to be billed for repetitions without any existing crossTank matches, but not for repetitions with 100% matches.
- Across reports display match repetitions as follows:
- 100% match repetitions (100% repetitions)
- Attention
Please note that apart from the 100% matches, the 100% repetitions also contain the other subcategories of 100% matches (context matches, structure matches, and context and structure matches). Moreover, the 100% matches also contain repetitions of MT matches and repetitions of prefilled paragraphs from bilingual XLIFF documents.
- Repetitions for various match ranges such as, for example, 90-99%, 80-89% etc.
- Repetitions, for which no matches exist (No match repetitions)
The sum of all match repetitions in a document or project represents the total number of repetitions in the document or project. Match repetitions are not presented separately, but are included in the normal repetitions.
Example
A document contains sentence A twice, and a 75% match is available for this sentence in crossTank. The document also contains sentence B three times, with a 100% match in crossTank.
Repetitions: In total, the document contains three repetitions for sentences A and B (one of sentence A and two of sentence B).
Match repetitions: The document has one 70-79% match repetition (of sentence A) and two 100% match repetitions (of sentence B).
Details (Net)
Calculated from the total number of translation units (Details (total)) minus locked units, hidden units, repetitions, as well as context matches and 100% matches from the crossTank analysis results, sorted by languages.
- In documents: Values based on the categories in documents.
- In project: Values based on the categories in project.
Information in Individual Columns of a Report
Standard Lines
The number of standard lines is calculated from the total number of characters contained in a text divided by the number of characters of the standard line. The number of standard lines is displayed with two decimal places. By default, one standard line in Across is 50 characters.
You can change the setting for a standard line under Tools > System Settings >General > Reporting.
Words
Words are text segments that are separated from another text segment by at least one separator (space or punctuation mark).
Details and examples
Hyphenated compounds | Each word in a compound is considered a separate word (e.g., day-to-day = three words). |
Internet addresses (without hyperlink) | Each word that is a component of an Internet address is counted (e.g., www.across.net = three words). On the contrary, Internet addresses with hyperlinks are counted as fields. |
E-mail addresses (without hyperlink) | Each word component of an e-mail address is considered a separate word (e.g., john.smith@across.net = four words). In contrast, e-mail addresses with hyperlinks are counted as fields. |
Numbers | As a general rule, numbers are not counted as words but are listed under Digits and Numbers. |
Fields and Placeables
A field stands for information that is displayed in the form of gray fields (placeables) or green fields in the source text and that must be inserted in the target text (e.g., dates, index fields, hyperlinks, tags, automatically generated tables of content, etc.).
Details and examples
Internet addresses (with hyperlink) | |
E-mail addresses (with hyperlink) |
Letters
A letter is any character including symbols (for example, ä, ö, ü, ß, à, ê, etc.) that render a sound or a sound combination. This also includes syllabaries such as hiragana and katakana, in which each syllable is counted as one letter. Logograms such as Chinese characters are listed under Asian characters.
For languages that use different writing systems, we recommend checking the Characters column.
Numbers
A "number" is a contiguous string of numerals. They are displayed with a blue overline in Across crossDesk. This includes data like numbers and dates.
Numbers are counted under consideration of the particular language settings (see Tools > System Settings > General > Language Settings). Numbers that meet the requirements of the language settings and that Across recognizes as valid numbers are counted differently from invalid numbers that do not comply with the language settings.
Example
On the basis of dates in an English source text, the following example demonstrates how numbers are counted.
Date 1 | Date 2 |
Valid vs. invalid date specification | |
The date was recognized as a valid date format, as it corresponds to one of the specifications for English dates (YYYY-MM-DD) in the language settings. | In contrast, the following date was not recognized as a valid date format, as it does not correspond to any of the two specifications for English dates (YYYY-MM-DD or MM/DD/YYYY) in the language settings. |
Number of Numbers Presented in the Across Report | |
1 number | 3 numbers |
Punctuation Marks
A punctuation mark is a graphical symbol for structuring or organizing a sentence or text. Apart from concluding punctuation marks (full stop, question mark, exclamation mark, and semicolon), there are structuring punctuation marks (comma and colon) and other marks (quotation marks, parentheses, and dashes).
Standard punctuation marks include the following: . ! ? ; , : „ “ " ( ) { } [ ] < >
User-defined punctuation marks are not considered.
Asian Characters
Under Asian characters, only the characters contained in the following Unicode blocks are listed. All other Asian characters such as Hiragana and Katakana, are listed under Letters.
CJK Unified Ideographs (Unicode characters 4E00-9FFF) | An overview of characters is available e.g. at www.unicode.org/charts/PDF/U4E00.pdf, ca. 32 MB. |
CJK Compatibility Ideographs (Unicode characters F900-FAFF) | An overview of characters is available e.g. at www.unicode.org/charts/PDF/UF900.pdf, ca. 1 MB. |
Hangul Syllables (Unicode characters AC00-D7A3) | An overview of characters is available e.g. at www.unicode.org/charts/PDF/UAC00.pdf, ca. 4 MB. |
Other Characters
This category comprises all characters that do not belong to one of the above-mentioned characters.
- Important notes
- The reports are created on the basis of the source documents. Thus, invoices generated on the basis of the reports are also based on the source documents – not on the translated documents.
- Counting takes place in the same manner for all document formats (Excel, Word, FrameMaker, etc.).
- The number of standard lines is a derived value and may be subject to rounding differences. All other figures are determined by counting.
- The counting takes place in the same way for languages using the Latin, Greek, Cyrillic, and Arabic alphabets. The characters are not listed separately, but together in a column (e.g., Characters). In contrast, Asian symbols are listed separately (see Asian characters).
- Comments in Word are not taken into consideration.
- Headers, footers, and footnotes are taken into consideration and are counted in the respective categories.
- Tabs and soft line breaks are considered as spaces and are therefore counted as separators.
- Hard line breaks are not taken into consideration.
- Image objects are not taken into consideration.
- WordArt objects are taken into consideration.
For every target language, reporting-relevant settings are determined at the creation of the report, such as the penalties for crossTank hits, are listed in the reports. Only penalties whose value is > 0 are displayed.
If the rules for sentence detection (e.g., by adding "-" as a new separator) are changed only after a document is checked in, the resulting changes are not taken into consideration in the counting results, as the sentence structure is analyzed and (unchangeably) saved during the check-in process. However, if the sentence detection rules are changed prior to check-in, the resulting changes are taken into account in the counting results.