Querying Databricks Data

Audience: Data Users

Content Summary: This page offers a tutorial on how to query data within the Databricks integration.

Prerequisites:

Databricks integration configured with Immuta

Databricks tables registered as Immuta data sources

Query Data with Python

Create a new workspace.
Query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.

Run your query, it should look something like:

df = spark.sql('select * from database.table_name')
df.show()

Databricks Python Query

Query Data with SQL

Create a new workspace.
Query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.
Run your query. It should look something like this:
```
select * from database.table_name;
```

Databricks SQL Query

Query Data with SparkR

Establish the User's Identity

Create a new workspace.
Run:
```
library(SparkR)
```

Run a Query

In the same workspace, but a different cell, query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.

Run your query. It should look something like this:

df <- SparkR::sql("select * from database.table_name")
SparkR::head(df)

Databricks R Query

Query Data with Scala

Query the Immuta-protected data, which takes the form of database.table_name:
1. Database: The database that houses the backing tables of your Immuta data sources.
2. Table Name: The name of the table backing your Immuta data sources.

Run your query. It should look something like this:

val sqlDF = spark.sql("select * from database.tablename")
sqlDF.show()

Databricks Scala Query