Fuzzy search

A fuzzy search function is available for crossTerm. The fuzzy (or spelling-tolerant) search in crossTerm will even find hits if the search term and search hit do not match 100 percent but are merely similar.

The fuzzy search can be activated by means of the corresponding item in the drop-down list under the input field of the crossTerm search.

In the following example, the misspelled word "devise" is searched for. Thanks to the fuzzy search, the (correctly spelled) term "device" is found nevertheless:

cTeM_suchleiste_fuzzy_treffer

The menu item Tools > Fuzzy Search Similarity Threshold enables the selection of the minimum percentage of similarity between the search string and the possible search hits.

A higher similarity threshold will also return more precise search results.

Example

Let us assume that crossTerm contains the term Schraubendreher (German for screwdriver). If you search for Shraubendräher, Schraubendreher will be found if the similarity threshold is 80%, but not if the similarity threshold is 90%.

Explanation

The calculation of the similarity threshold takes place on the basis of the Levenshtein distance.

Tip

Levenshtein distance between two character strings is the minimum number of editing operations – i.e. every operation in which a character is inserted, deleted, or replaced – needed to transform the first character string into the second.

For the above example, this means that two editing operations are needed to transform Shraubendräher into Schraubendreher. In this case, the Levenshtein distance is 2.

Now let us assume that a similarity threshold of 80% has been selected for the fuzzy search. When searching for Shraubendräher in crossTerm with the activated fuzzy search, a formula can be used to determine whether the term Schraubendreher will be found. The formula is:

Number of letters of the search term x (1 - similarity threshold)

In other words, the number of letters of the search term multiplied by the result of 1 minus the similarity threshold.

The result of the calculation is always rounded up or down to the nearest whole number.

The result of the formula for the above example is as follows:

13 x (1-0.8) = 13 x 0.2 = 2.6

Schraubendreher is found because the Levenshtein distance is 2 and the allowed distance is 3 (rounded) due to the selected similarity threshold.

In contrast, the calculation for a similarity threshold of 90% is as follows:

13 x (1-0.9) = 13 x 0.1 = 1.3

In this case, Schraubendreher is not found, as the Levenshtein distance is again 2, but the allowed distance is only 1 character (rounded) due to the selected similarity threshold.