Skip to content

You are viewing documentation for Immuta version 2024.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Bulk Create Snowflake Data Sources

Private preview

This feature is only available to select accounts. Reach out to your Immuta representative to enable this feature.

Requirements

  • Snowflake Enterprise Edition
  • Snowflake X-Large or Large warehouse is strongly recommended

Create Snowflake data sources

  1. Set the default subscription policy to None for bulk data source creation. This will simplify the data source creation process by not automatically applying policies.

  2. Make a request to the Immuta V2 API create data source endpoint, as the Immuta UI does not support creating more than 1000 data sources. The following options must be specified in your request to ensure the maximum performance benefits of bulk data source creation. The Skip Stats Job tag is only required if you are using specific policies that require stats; otherwise, Snowflake data sources automatically skip the stats job.

    "options": {
        "disableSensitiveDataDiscovery": true,
        "tableTags": [
            "Skip Stats Job"
        ]
    }
    

Specifying disableSensitiveDataDiscovery as true ensures that sensitive data discovery will not be applied when the new data sources are created in Immuta, regardless of how it is configured for the Immuta tenant. Disabling sensitive data discovery improves performance during data source creation.

Applying the Skip Stats Job tag using the tableTag value will ensure that some jobs that are not vital to data source creation are skipped, specifically the fingerprint and high cardinality check jobs.

When the Snowflake bulk data source creation feature is configured, the create data source endpoint operates asynchronously and responds immediately with a bulkId that can be used for monitoring progress.

Monitor progress

To monitor the progress of the background jobs for the bulk data source creation, make the following request using the bulkId from the response of the previous step:

curl \
    --request POST \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer dea464c07bd07300095caa8" \
    --data @example_payload.json
    https://your-immuta-url.com/jobs?bulkId=<your-bulkId>

The response will contain a list of job states and the number of jobs currently in each state. If errors were encountered during processing, a list of errors will be included in the response:

    {
      "total":"99893",
      "completed":"99892",
      "failed":"0",
      "pending":"1",
      "errors":null
    }

With these recommended configurations, bulk creating 100,000 Snowflake data sources will take between six and seven hours for all associated jobs to complete.