Lemma and Unicode normalization
Summarize
Summary of Lemma and Unicode normalization
AI Search enhances search recall by automatically normalizing inflected words (lemmas) and Unicode glyphs during both indexing and search query processing. This normalization allows users to find content regardless of variant forms of their search terms, such as pluralizations, verb tenses, or accented characters. All normalization features are enabled by default and cannot be configured.
Show less
Key Features
- Lemma Normalization: AI Search converts inflected words to their root or base forms (lemmas) to enable matching across different word forms. For example, indexed words like selling are expanded to include the lemma sell, so a search for sold also matches the indexed record. This supports multiple languages including English, Arabic, French, German, Japanese, Korean, Chinese, and others.
- Decompounding: For German, Danish, Hungarian, Korean, Norwegian (Bokmål), and Swedish, AI Search indexes both compound words and their individual components. For example, the German compound Humanressourcen is indexed as Human, ressourcen, and the full compound to improve search accuracy.
- Unicode Normalization: AI Search performs Unicode normalization (NFKD and NFKC forms) to handle accented characters and glyph variants. For instance, the accented term resumé is indexed along with the unaccented resume, so either search returns the record. This ensures glyph variants are matched to their nearest equivalents.
Interaction with Other Search Features
- Genius Results: Terms added by lemma or Unicode normalization do not activate Genius Result configurations triggered by exact terms.
- Result Improvement Rules: Normalized query terms can trigger result improvement rules if they match the configured triggers.
- Stop Words: Stop words are removed before normalization and thus are not normalized.
- Synonyms: Synonym terms are exempt from normalization.
- Typo Handling: Lemma and Unicode normalization are applied to auto-corrected search query terms, enhancing search robustness.
Practical Impact for ServiceNow Customers
By leveraging lemma and Unicode normalization, ServiceNow AI Search ensures users retrieve relevant results even when their search queries use different word forms or character variations than the indexed content. This improves search recall without requiring additional configuration. Customers can expect more accurate and comprehensive search results across supported languages and scripts, simplifying content discovery and enhancing user experience.
AI Search normalizes inflected words and Unicode glyphs during indexing and at search query time. Normalization improves search recall and enables users to find content with variant forms of their search query terms.
Normalization features are automatically enabled and aren't configurable.
Lemma normalization
Many languages include inflected forms of terms, such as plural nouns or verb tenses. AI Search normalizes inflected terms found in indexed content and search queries. Normalization enables matching based on a root form, such as the singular for a plural noun or the base form for a conjugated verb. This root form is called a lemma, and this process is referred to as lemma normalization.
For example, when a source record includes the conjugated verb selling, AI Search expands the indexed term to include the lemma form sell in addition to selling. When a user searches for the past-tense conjugated form sold, AI Search expands the search query term to include the lemma form sell as well as sold. Because the indexed term and the search query term include matching forms, the user's search returns the selling record as a result.
Decompounding
In addition to normalizing lemmas for German, Danish, Hungarian, Korean, Norwegian (Bokmål), and Swedish, AI Search indexes compound words and their individual component words. For example, when indexing a German record that contains the compound word Humanressourcen, AI Search indexes the component terms Human and ressourcen in addition to the compound term.
Unicode normalization
AI Search performs Unicode normalization on indexed terms and search query terms. This normalization makes alphabetical Unicode glyphs searchable using their nearest equivalent characters.
For example, when indexing a record containing the term resumé, AI Search expands the term to also include the non-accented form resume. This record appears as a search result when users search for either resume or resumé.
Unicode normalization includes NFKD (compatibility decomposition) and NFKC (compatibility composition) stages. For more information on these normalization forms, see the Unicode Standard Annex #15, https://www.unicode.org/reports/tr15/.
Interaction with other search features
The following table describes interactions between normalization and other search features.
| Feature | Interaction with lemma and Unicode normalization |
|---|---|
| Genius Results | Search query terms added by lemma or Unicode normalization can't trigger Genius Result configurations with Term trigger conditions. |
| Result improvement rules | A search query term added by lemma or Unicode normalization can trigger a result improvement rule if it matches the rule's Query trigger. |
| Stop words | If a search query term is defined as a stop word, AI Search removes that term without normalizing it. |
| Synonyms | If a search query term is defined as a synonym, AI Search doesn't normalize it. |
| Typo handling | AI Search performs lemma and Unicode normalization on auto-corrected search query terms. |