Datasets
A dataset is a curated collection of structured data used to develop, deploy, and monitor AI systems in line with organizational policies, regulations, and ethical standards.
The AI dataset supports governance objectives by capturing key information about AI models, including risk assessments, compliance status, ownership, audit trails, and performance metrics. It also enables effective oversight, accountability, and decision-making within the organization. The quality and composition of a dataset directly impact the performance, fairness, and accuracy of the AI model. Well-curated datasets help verify that models learn meaningful patterns and generate reliable outputs in real-world scenarios.
Each dataset should be evaluated for completeness, accuracy, and relevance to the intended use case. Bias in datasets can lead to unfair or inaccurate model predictions and should be identified and mitigated. Tracking data lineage helps verify traceability, transparency, and accountability in how datasets are used and maintained.
Datasets must comply with data protection regulations, including privacy laws and organizational data handling policies. Regular reviews and updates help maintain dataset quality and reflect evolving data standards or business needs.
sn_risk_advanced.migrate_to_advanced_risk) under .Aggregated risk score consolidates individual risks such as bias, drift, and security, to inform departmental or enterprise-level AI risk profiles, enabling higher-level visibility and oversight. For example, several customer-facing AI models exhibiting signs of bias can lead to organizational risks. Aggregated risk score enables the AI Risk and Compliance team to obtain a consolidated view of AI risks across multiple models, teams, and business units, moving beyond fragmented risk assessments.
Related AI assets
The Related AI assets section lists the following for an AI dataset:
- AI systems: The AI systems that use this AI dataset.
- AI models: The AI models that use this AI dataset.