Data normalization

  • Release version: Yokohama
  • Updated January 30, 2025
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Data normalization

    Data normalization in ServiceNow's Document Intelligence standardizes extracted data into consistent formats across all fields. This process enhances data usability by enabling easier grouping, analysis, and integration with other applications on the ServiceNow AI Platform.

    Show full answer Show less

    Supported Field Types

    The normalization process converts several field types to standard formats:

    • Date: Converted to a standard format such as YYYY-MM-DD.
    • Reference fields: Matched against a reference table to ensure consistency. For example, a Vendor field is matched to the corresponding company name in a reference table.
    • Integer: Whole numbers like 12.
    • Decimal: Numbers with up to two decimal places, e.g., 12.5.
    • Floating point number: Numbers with up to seven decimal places, e.g., 12.0000000.

    These conversions help maintain data integrity and consistency across document processing tasks.

    Data Display and Editing

    Normalized data is displayed alongside the extracted field values. Users can manually adjust converted values if necessary, such as correcting misread characters in dates (e.g., letter "O" misread as zero).

    Handling Ambiguous Data

    When extracted data can be interpreted in multiple ways, Document Intelligence uses default settings defined in the use case configuration to resolve ambiguity. For example, the default date format preference (month-first or day-first) dictates how ambiguous dates are converted.

    Users may need to review and confirm these interpretations during document tasks, and automated processing can pause to ensure accuracy depending on configuration.

    Certain types of data extracted from documents are converted into a standard format so that they appear the same across all fields.

    This process increases the usefulness of the data by enabling it to be grouped and analyzed more easily. It also supports integration with other applications on the ServiceNow AI Platform.

    Field types

    The following field types are converted to support data normalization:

    Field type Description
    Date Standard date format. For example, YYYY-MM-DD.
    Reference field

    A field that uses a field in another table as a standard. DocIntel matches the extracted data to the standard.

    For example, a use case has a reference field called Vendor that points to the Name column in the Company table as the reference. When processing a document task, DocIntel extracts “Degas Dairy Products, Inc” from the document and fills the Vendor field with that value. DocIntel compares the value to the company names in the reference table and finds “Degas Dairy Products, Inc” as a match. In the document task, “Degas Dairy Products, Inc” is matched to “Degas Dairy Products, Inc” in the reference.

    Reference field flow.

    Integer Whole number. For example, 12.
    Decimal Number with up to two decimal places. For example, 12.5 or 12.55.
    Floating point number Number with up to seven decimal places. For example, 12.0 to 12.0000000.

    To set the field type, see Create a field for data extraction.

    Display

    A completed data extraction field shows the converted value next to it.

    Data extraction integer field and its converted value field. Data extraction date field and its converted value field.

    You can adjust the converted date value by selecting Edit.

    Note:
    In some cases, the data extracted from the document may not be in a valid format to be converted. For example, if DocIntel read the letter O instead of a number 0 in a date field (11.12.2o23), then it would not be converted. In this case, edit the field to the correct format.

    Ambiguous data

    If there is data in a document that can be understood in more than one way, DocIntel interprets that value based on the default selected for it in the use case configuration. DocIntel must interpret an ambiguous value in order to accurately convert it to the normalized format.

    For example, a use case has a Date field, and Month first is selected as the default order to interpret ambiguous dates. When a document containing the date 1/2/2024 is processed for the use case, DocIntel interprets that date as January 2, not February 1, when it extracts that value and converts it.

    In such cases, the user completing a document task may need to confirm or correct the conversion of ambiguous values. Depending on the field’s configuration in the use case, automated document processing may be interrupted to ensure the conversion is accurate.