Preparing for Instance Data Replication

Release version: Yokohama

Updated January 30, 2025

6 minutes to read

Summarize

Summarized using AI

Summary of Preparing for Instance Data Replication

This guide helps ServiceNow customers prepare their instances for Instance Data Replication (IDR) to ensure successful and efficient data synchronization between producer and consumer instances. It covers important considerations on geographic placement, table and column selection, data integrity, and configuration best practices to optimize replication performance and avoid common issues.

Show full answer Show less

Preparing Your Instances

Geographic proximity: To reduce replication latency, keep producer and consumer instances in the same geographic region and under the same customer. Cross-region replication is possible but may cause delays.
Data source control: Treat the producer instance as the exclusive source of truth. Avoid concurrent data imports or updates on the consumer instance that can cause duplicate records or mismatches.

Deciding Which Tables and Columns to Replicate

Tables: Avoid replicating system tables (e.g., those prefixed with sys) to prevent insert/update failures. Avoid continuous replication of CMDB tables due to performance risks; if necessary, replicate on a schedule or with filtered conditions.
Table hierarchies: Replicate all related parent and child tables to maintain data integrity and preserve table hierarchies.
Table relationships: Include referenced tables in replication sets to ensure reference fields are populated correctly on the consumer instance.
Columns: Replicate only necessary columns. Exclude frequently updated system columns to avoid excessive replication activity and potential capacity impacts.

Organizing Replication Sets

Single table per replication set: Allows granular control, easier error isolation, and parallel replication jobs.
Grouped tables per replication set: Easier setup and management, but replication issues affect all tables in the group. Limit groups to five or fewer tables for manageability.

Preparing Target Tables on the Consumer Instance

Ensure target tables are empty or aligned with producer sysid values before replication starts to prevent duplicates and mismatches.
If pre-existing data with differing sysids cannot be avoided, either clean the consumer tables or use a custom Coalesce field to uniquely identify records.

Reviewing Business Rules, ACLs, and Data Policies

Business rules: Verify that rules on the consumer instance do not block the idr.system user from inserting, updating, or deleting records. Errors appear in the Replication Payload Error table.
Access Control Lists (ACLs): Confirm idr.system has necessary roles and permissions on target tables to allow replication.
Data policies: Ensure replicated data complies with any data policies on the consumer instance to prevent replication failures.

Practical Outcomes

Reduced replication latency and improved reliability by geographic and instance preparation.
Maintained data integrity through careful table, column, and hierarchy selection.
Flexible replication management via organized replication sets.
Minimized replication errors by preparing consumer tables and reviewing business logic and security settings.

Before you set up Instance Data Replication (IDR), analyze the tables and columns in the producer and consumer instances to replicate data successfully.

Preparing your instances

Reduce latency by ensuring that the producer and consumer instances are in the same geographic region and belong to the same customer. You can replicate data between instances across different geographic regions. However, replication latency can occur when instances are located in data centers in separate regions.

Deciding which tables to replicate

Before you create a producer replication set, decide which tables to replicate. Avoid replicating system tables (tables with a sys_ prefix). If you replicate system tables (like sys_user, sys_user_group, or sys_user_grmember) that have existing data in the consumer instance, insert and update failures can occur during replication. If you decide to replicate these tables, you may have additional work afterwards to clean these tables.

Avoid continuous replication of CMDB tables. Replicating CMDB data as changes occur can create performance issues or unforeseen consequences with replication due to the number of records involved. If you must replicate CMDB tables, consider scheduling replication or use conditions to constrain the count of replicated records and ensure all required columns are included in the replication set.

For a list of tables to avoid replicating, see Excluded tables in Instance Data Replication.

Analyzing table hierarchies

For each table you want to replicate, determine if the table is a parent or child table. If the table belongs to a parent-child hierarchy, decide if you want to preserve the hierarchy and what strategy you want to use to move the hierarchy from the producer to the consumer.

Important:

To maintain data integrity and ensure that reference fields are populated on the consumer as you'd expect, you must replicate all of the parent and child tables in the table hierarchy.

See Preserving table hierarchy in Instance Data Replication.

Analyzing table relationships

For each table you want to replicate, determine if the table has reference fields that point to other tables. If you are replicating a table with a reference field that points to another table, but you don’t include that table in your replication set, the reference field will be empty in the record on the consumer instance. Replicating table relationships maintains data integrity and ensures that reference fields are populated on the consumer as you'd expect.

Organizing replication sets

As part of the planning phase, determine how you want to organize your replication sets and the tables they contain.

Create a single replication set for each table. This option requires more setup, configuration, and time management, but allows you to pause replication for a single table without impacting other tables.
- Replication is performed in parallel using multiple jobs.
- Throughput metrics are easier to track with separate replication sets.
- When an error occurs, replication for a single table is affected instead of all multiple tables.
Create a replication set for a group of related tables. This option is easier to set up and manage, but oversight and potential issues impact all tables in the group.
- If you need find information about replication status, the logical name you use for this replication set might be easier to find and understand compared to using distinct replication sets for each table.
- If you want to pause replication, you can do so for the entire group of tables by pausing replication from one replication set.
- If a replication issue occurs on one table and replication stops, this stops replication for all tables in the replication set.
If you decide to group tables in a single replication set, try to limit it to five or fewer tables. If you have more than five tables to replicate, use multiple replication sets with five or fewer tables in each set.

Deciding which columns to replicate

For each table you want to replicate, decide which columns to include. Avoid including every column in your replication set by default. Instead, decide which columns are necessary, and exclude any sys_ columns or other columns that are updated by scripts automatically.

For example, if you include a column updated by the system automatically on a frequent basis, IDR may replicate data more often than necessary. When the system replicates this data, it can negatively impact the capacity subscription. Periodically check the IDR License and Usage Details dashboard to monitor the message count that is generated from the producer.

Preparing target tables on the consumer instance

By default, IDR uses a record's sys_id field as the lookup value to keep data synchronized between the producer and consumer instance. If the target table contains existing data or records from prior data imports, the sys_id values in the consumer don't match the sys_ids from the producer instance.

Always consider the producer instance as the source of truth. For optimal replication results, follow these guidelines:

Make the producer instance the exclusive data source for the consumer instance.
Prior to replication, ensure that the target table in the consumer instance is empty. Ideally, the initial records in the target instance are created from data sent exclusively from the producer via a replication set or a clone.
Ensure that there are no other ongoing data imports or updates to target tables on the consumer instance after replication starts.

For example, if users from LDAP are imported to the consumer instance using a different data source, the sys_ids for those records don't match the sys_id values in the producer instance. In this scenario, existing user records in the target table on the consumer aren't updated and duplicate records are created.

In situations where this cannot be avoided, clean the tables on the consumer instance prior to replication. You must delete the records in the target table or ensure the sys_id values in producer and consumer are the same.

Alternatively, you can use a custom Coalesce field to identify unique records (instead of the default sys_id column) for replication. Use a custom Coalesce column when records on the consumer instance have a different sys_id for the same records on the producer instance. For more information on using a custom Coalesce, see Custom coalescing.

Reviewing business rules

You can use business rules to trigger workflows after replication, such as sending a notification or validating the replicated data. Review any business rules on the consumer instance that might prevent the idr.system user from inserting, updating, or deleting records in the target table.

If failures related to business rules occur, they appear in the Instance Data Replication > Replication Payload Error table on the consumer instance. View the Error Message field for details on the script that is causing the failure.

With bidirectional replication, records created on the producer instance are replicated to a consumer instance and vice versa. When the record is inserted on the consumer instance and it triggers a business rule that updates the record, that update is not replicated back to the producer instance.

Reviewing ACLs

Review the ACLs in place on the target table in the consumer instance to ensure that IDR can replicate data successfully. Confirm that the idr.system user has the appropriate roles for the target table.

If failures related to ACLs occur, they appear in the Instance Data Replication > Replication Payload Error table on the consumer instance. View the Error Message field for details on the ACL that is causing the failure.

Reviewing data policies

Review any data policies in place on the target table in the consumer instance to ensure that IDR can replicate data successfully. Confirm that the data you are replicating satisfies the data policy.

If failures related to data policies occur, they appear in the Instance Data Replication > Replication Payload Error table on the consumer instance. View the Error Message field for details on the data policy that is causing the failure.

Preparing for IDR

Yokohama ServiceNow AI Platform Capabilities