Databricks Project Workspaces Pre-Configuration Details
Audience: Project members
Content Summary: This page outlines prerequisites and provides an overview of the integration process for Databricks project workspaces.
See the Overview page for information on the utility of project workspaces and see the Configuration page for installation instructions.
Prerequisites
- Databricks integration configured.
- Databricks workspace configured.
- Databricks tables registered in Immuta.
- External IDs have been mapped in for Databricks.
- Cluster configuration: Before creating a workspace, the cluster must send its configuration to Immuta; to do
this, run a simple query on the cluster (i.e.,
show tables
). Otherwise, an error message will occur when you attempt to create a workspace.
Project Workspace Workflow
- An Immuta User with the
CREATE_PROJECT
permission creates a new project with Databricks data sources. - The Immuta Project Owner enables Project Equalization which balances every Project Members’ access to the data to be the same.
- The Immuta Project Owner creates a Databricks Project Workspace which automatically generates a subfolder in the root path specified by the Application Admin and remote database associated with the project.
- The Immuta Project Members query equalized data within the context of the project, collaborate, and write data back to Immuta, all within Databricks.
- The Immuta Project Members use their newly written derived data and register the derived tables in Immuta as derived data sources. These derived data sources inherit the necessary Immuta policies to be securely shared outside of the project.
Root Directory Details
-
Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.
-
If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.
-
Administrators can place a configuration value in the cluster configuration (
core-site.xml
) to mark that cluster as unavailable for use as a workspace.
Read and Write Data
-
When acting in the workspace project, users can read data using calls like
spark.read.parquet("immuta:///some/path/to/a/workspace")
. -
To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.