Azure Data Factory metadata collector
The Azure Data Factory metadata collector provides read-only access to metadata from an external Azure Data Factory account.
Use this collector to harvest metadata from ADF, including pipelines, datasets, dataflows, linked services, triggers, integration runtimes, and global parameters. It gathers lineage information between ADF datasets and between ADF and external sources such as Snowflake.
Metadata cataloged
The Azure Data Factory collector catalogs the following information.
| Object | Information cataloged |
|---|---|
| Factory | ID, Name, ETag, Location, Create Time, Provisioning State, Version, Public Network Access, Factory Tags, Repository configuration (Account name, Collaboration Branch, Repository Name, Disable Publish, Root Folder, Host Name, Client ID, Project Name, Last Commit ID, Tenant ID, Repo Configuration Type). |
| Pipeline | ID, Name, Description, Etag, Concurrency, Folder, Parameters, Metric Policy Duration, Variables |
| Pipeline Activity | Name, Description, Type, Inactivity Status, State, User Properties, Activity Policy (Retry, Timeout, Retry Interval In Secs, Secure Input, Secure Output) |
| Linked Service | ID, Name, Description, Type, Etag, Connection String, Domain, Parameters Note: Harvesting of Connection String for SFTP Linked Services is not supported. |
| Dataset | ID, Name, Etag, Type, Database, Schema, Table, Folder, Container, File Name, Parameters |
| Dataflow | ID, Name, Etag, Type, Description, Folder |
| Trigger | ID, Name, Etag, Type, State, Description, Frequency, Interval, Start time, End time |
| Integration Runtime | ID, Etag, Name, Type, Description, State Compute Properties (Node Size, Number of Nodes, Max Parallel Execution Per Node, Core Count, Compute Type, Clean up, Number of External Nodes, Number of Pipeline Nodes), SSIS properties ( Catalog Server Endpoint, Catalog Admin Username, Catalog Pricing Tier, License Type, Dual Standby PairName, Edition) |
| Global Parameter | ID, Name, Value, Type |
| ADF Table | ID, Name |
| ADF Column | ID, Name, Type, Precision, Scale |
| Pipeline Activity | Query |
Relationships between objects
Catalog pages show relationships between the following data asset types:
| Data asset page | Relationship |
|---|---|
| Factory | Contains Global Parameter, Contains Pipeline, Contains Dataset, Contains Dataflow, Contains Trigger, Contains Integration Runtime |
| Pipeline | Has Tag (also known as Annotation), Contains Activity |
| Activity | Belongs to Pipeline, Contains Activity, Depends on Activity, uses Linked Service, uses Integration Runtime, uses Dataset |
| Linked Service | Uses Integration Runtime, Has Tag (also known as Annotation), Connects to database |
| Dataset | Uses Linked Service, Has Tabular Datasource, Has Tag (also known as Annotation) |
| Dataflow | Uses Dataflow, Imports Data From Linked Service, Exports Data From Linked Service, Imports Data From Dataset, Exports Data From Dataset, has Tag (also known as Annotation) |
| Integration Runtime | Uses Integration Runtime, Uses Linked Service |
| Trigger | Triggers Pipeline, Has Tag (also known as Annotation) |
Lineage for Azure Data Factory
Collected lineage information:
| Object | Lineage available |
|---|---|
| Dataset | The collector identifies the source or sink of the dataset:
|
| ADF table | The collector identifies the associated table in an upstream table where the data is sourced from/sinked to. |
| ADF column | The collector identifies the associated table in an upstream column where the data is sourced from/sinked to. |
Supported data sources for cross-system lineage:
- Snowflake
- Databricks
Authentication types supported
The Azure Data Factory collector authenticates using Azure Service Principal.