Term Extraction
Stopwords or Terms
When the terminologist responsible for the term extraction opens the task, he gets a term candidate list.
By means of a mouse click, the terminologist determines which term candidates are terms and which ones are stopwords.
To be able to determine stopwords, the corresponding user must have a right for managing stopwords (see the right Stopwords in the System Settings section of the user group rights). Terminologists have this right by default.
Stopwords are words that are filtered out during term extraction and are not offered as term candidates. Typically, stopwords are, for example, articles, expletives or conjunctions. The larger the list of stopwords, the more precise the results of the term extraction will be.
The context often plays a major role in deciding whether a word is a term or a stopword. Therefore, the icon above the term candidate list can be used to display the context of the respective term candidate.
Please note that the extraction of terms from source documents in Asian languages is not possible due to the morphological structure of the languages.
Learning system
Term candidates for which crossTerm entries already exist are displayed in blue and bold type in the list. Thus, the terminologist can concentrate on what matters: terms that are new and that have not yet been translated. In addition, once the term extraction task has been completed, all stopwords are saved in a list and are no longer displayed as term candidates. The more often and the more intensively you use the term extraction feature, the more valuable it will become to you as a translation tool.
Words that are highlighted in bold and in blue font in the term candidate list already exist as terms in crossTerm and therefore only need to be translated and selected as terms if no target-language equivalents exist for these terms in crossTerm.
For words already marked as stopwords in Across, the checkbox is activated and grayed out.
When you double-click a term candidate, it is highlighted in color in the Source View. When you double-click it again, the display goes to the next place that a term candidate has been found in the Source View.
Editing Term Candidates
Term candidates may need to be edited, e.g. to change a plural noun to singular. To do this, simply click the selected term candidate. Subsequently, you can perform the needed changes. Click Enter or change to another term candidate to save the changes.
The source-language terms can no longer be modified during the term translation after the term extraction. Therefore, the source-language terms must always be modified during the term extraction.
List of term candidates
The term candidate list can be customized. For example, it can be sorted alphabetically or by frequency by clicking the respective column head. Furthermore, various filter functions can be used for filtering the following elements from the list:
- Terms: All term candidates that are already marked as terms by activating the respective checkbox are hidden.
- Non-terms: All term candidates not yet marked as terms are hidden. Accordingly, all words marked as terms are displayed.
- Words whose frequency is below a defined threshold.
- Words whose number of characters is below a defined threshold.
- Single words: All term candidates consisting of only one word are hidden.
- Three-word combinations: All term candidates consisting of three words are hidden.
- Stopwords: All term candidates already marked as stopwords are hidden.
In addition to the filter functions, you can use the icon to add individually selected words in the Source View to the term candidate list.
Finish task
Upon completion of the term extraction, i.e. after you have selected all desired term candidates as terms or stopwords, you can finish the task by clicking the icon in the crossDesk toolbar.
After a term extraction task is finished, all term candidates marked as stopwords are automatically added to the respective stopword list under Tools > System Settings > Terminology > Stopwords.
Term candidates marked as terms are offered for translation in the subsequent term translation.