Bilingual Term Extraction
In addition to the normal monolingual term extraction within Across, crossMining enables an additional term extraction in which the source-language term candidates are already proposed with their potential target-language equivalents.
After a term candidate pair is automatically extracted and proposed, the user can confirm it as a term pair. Upon confirmation, the term pair is automatically sent to crossTerm, where it is created as a new terminology entry. The terms are set to Unreleased. A user currently logged in to crossMining will be registered as the creator of the entries and terms. Thus, they can subsequently be searched for systematically (e.g. by using the filter for searching for unreleased terms and/or for terms created by a respective user), edited, and released.
Proceed as follows to perform a bilingual term extraction:
- Start the bilingual term extraction via the
icon in the crossMining toolbar or via the menu item Tools > Terminology Harvesting.
- The terminology harvesting dialog appears. The bilingual term extraction can be performed in the Extract new terminology tab.
- To select the statistical lexicon that you want to use the basis for the term extraction, click File > Load lexicon.
- A dialog window lists all statistical lexica stored in the output folder. Click Select path to select a different folder.
- Select the lexicon you would like to use.
- Determine the minimum frequency and probability from which terms are to be proposed and stopwords not to be accounted for.
- Click OK to start the term extraction.
- crossMining now proposes term pairs.
In addition to the proposed source and target-language terms, the probability of correspondence between the source and target-language terms, the co-occurrence (i.e. common occurrence of two terms) count, and the IDs of the respective entries in crossTerm are displayed.
Click one of the column headers of the table to change the table sorting on the basis of the selected column.
You can narrow down the list of displayed source and target terms by entering one or several characters in the filter input fields. Only the source/target terms beginning with these letters will be displayed. To limit the list to terms ending in particular characters, you can use the asterisk (*) (e.g. *ion to display only terms ending in "ion"). To limit the list to terms containing one or several characters, you can place the asterisk at the beginning and end of the filter string (e.g. *r* to display only terms containing the letter "r").
Moreover, the context in which the source and target-language terms are used in crossTank entries are displayed.
- Select the term pair(s) (or the respective table rows) you want to add in crossTerm.
Use the Ctrl or Shift key for multiple term selection.
If necessary, you can manually correct the proposed terms.
- Click Add new entry and confirm the subsequent message with Yes.
The term pair(s) is/are sent to crossTerm and created as new entries. The entries are created in the Across instance determined for this purpose in the terminology harvesting settings (under Tools > Settings > Terminology harvesting).
Every term is assigned the picklist values and text fields that are also defined in the terminology harvesting settings. Furthermore, the terms are set to Unreleased. The user currently logged in to crossMining will be entered as the author of the terms.
- A message confirms the successful creation of the entries. Click OK.
- The terms that were just created in crossTerm are removed from the list. Continue until you have added all desired term pairs to crossTerm.
Click File > Close to finish the bilingual term extraction.