Skip to content

You are viewing documentation for Immuta version 2024.1.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Immuta CDH Integration Installation

Audience: System Administrators

Content Summary: The Immuta CDH integration installation consists of the following components:

  • Immuta NameNode plugin
  • Immuta Hadoop Filesystem plugin
  • Immuta Spark 2 Vulcan service

This page outlines the installation steps required to successfully deploy these components on your CDH cluster.

Prerequisites: Follow the Immuta CDH Integration Prerequisites to prepare for installation.

Installation

Begin installation by transferring the Immuta .parcel and its associated .parcel.sha files to your Cloudera Manager node and placing them in /opt/cloudera/parcel-repo. Once copied, ensure files have both their owner and group permissions set to cloudera-scm

chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo

Next, transfer the Immuta CSD (.jar file) to /opt/cloudera/csd, and ensure both its owner and group permissions are set to cloudera-scm as well.

chown -R cloudera-scm:cloudera-scm /opt/cloudera/csd

You will need to restart the Cloudera Manager server in order for the CSD to be picked up:

systemctl restart cloudera-scm-server
service cloudera-scm-server restart

Follow Cloudera's instructions for distributing and activating the IMMUTA parcel.

Once the parcel has been successfully activated, you can add the IMMUTA service:

  1. From the Cloudera Manager select Add Service.
  2. Choose Immuta.
  3. Click Continue.
  4. Select nodes to install the services on. Your options are
    • For maximum redundancy, choose all.
    • Choose a single node.
    • Choose a few nodes. Set up a Load Balancer in front of the instances to distribute load. Contact Immuta support for more details.
  5. Proceed to the end of the workflow.

Configure HDFS

After adding the Immuta service to your CDH cluster, there is some configuration that needs to be completed.

If your cluster is configured with Kerberos, note that the default configuration expects to run Immuta services using the immuta principal. If you need to use a different Kerberos principal, see Running as a Non-Default User for detailed instructions on how to configure that. After running through these steps, note that you may need to manually run the Create Immuta User Home Directory command from the Actions menu for the Immuta service.

For more details on Immuta's HDFS configuration, please see Hadoop Cluster Configuration for Immuta.

NameNode-Only Configuration

Warning

The following settings should only be written to the configuration on the NameNode. Setting these values on DataNodes will have security implications, so be sure that they are set in the NameNode only section of Cloudera Manager. For optimal performance, only set these configuration options in the NameNode Role Config Group that controls the namespace where Immuta data resides.

Under the HDFS service of Cloudera Manager, Configuration tab, search for key:

NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml

and, using "View as XML", add/set the value(s) similar to:

<property>
    <name>dfs.namenode.authorization.provider.class</name>
    <value>com.immuta.hadoop.ImmutaAuthorizationProvider</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.fallback.class</name>
    <value>org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.allow.fallback</name>
    <value>false</value>
    <final>true</final>
</property>
<property>
    <name>immuta.system.api.key</name>
    <value>0ec28d3f-a8a2-4960-b653-d7ccfe4803b3</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.users.to.ignore</name>
    <value>hdfs,yarn,hive,impala,llama,mapred,spark,oozie,hue,hbase,livy,immuta</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.paths.to.enforce</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>immuta.permission.source.cache.enabled</name>
    <value>false</value>
    <final>true</final>
</property>

Best Practice: Configuration Values

Immuta recommends that all Immuta configuration values be marked final.

See Hadoop Cluster Configuration for Immuta for details about each individual configuration value.

Shared Configuration

The following configuration items should be configured for both the NameNode processes and the DataNode processes. These configurations are used both by the Immuta FileSystem and the Immuta NameNode plugin. For example:

Under the HDFS service of Cloudera Manager, Configuration tab, search for key:

Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml

and, using "View as XML", add/set the value(s) similar to:

<property>
    <name>immuta.base.url</name>
    <value>https://immuta.hostname</value>
    <final>true</final>
</property>
<property>
    <name>immuta.spark.partition.generator.user</name>
    <value>immuta</value>
    <final>true</final>
</property>
<property>
    <name>immuta.credentials.dir</name>
    <value>/user</value>
    <final>true</final>
</property>
<property>
    <name>immuta.visibility.cache.timeout.seconds</name>
    <value>600</value>
    <final>true</final>
</property>
<property>
    <name>fs.immuta.impl</name>
    <value>com.immuta.hadoop.ImmutaFileSystem</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.immuta.hosts</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.immuta.users</name>
    <value>*</value>
    <final>true</final>
</property>
<property>
    <name>hadoop.proxyuser.immuta.groups</name>
    <value>*</value>
    <final>true</final>
</property>

Best Practice: Configuration Values

Immuta recommends that all Immuta configuration values be marked final.

Make sure that user directories underneath immuta.credentials.dir are readable only by the owner of the directory. If the user's directory doesn't exist and we create it, we will set the permissions to 700.

See Hadoop Cluster Configuration for Immuta for details about each individual configuration value.

Enable TLS for the Immuta Vulcan Service

You can enable TLS on the Immuta Vulcan service by configuring it to use a keystore in JKS format.

Server-side TLS Configuration

Under the Immuta service of Cloudera Manager, Configuration tab, search for key:

Immuta Spark 2 Vulcan Server Advanced Configuration Snippet (Safety Valve) for session/generator.xml

and, using "View as XML", add/set the value(s) similar to:

<property>
    <name>immuta.secure.partition.generator.keystore</name>
    <value>/etc/immuta/keystore.jks</value>
    <final>true</final>
</property>
<property>
    <name>immuta.secure.partition.generator.keystore.password</name>
    <value>secure_password</value>
    <final>true</final>
</property>
<property>
    <name>immuta.secure.partition.generator.keymanager.password</name>
    <value>secure_password</value>
    <final>true</final>
</property>

Best Practice: Configuration Values

Immuta recommends that all Immuta configuration values be marked final.

Detailed Explanation:

  • immuta.secure.partition.generator.keystore
    • Specifies the path to the Immuta Vulcan service keystore.
    • Example: /etc/immuta/keystore.jks
  • immuta.secure.partition.generator.keystore.password
    • Specifies the password for the Immuta Vulcan service keystore. This password will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.
    • Example: secure_password
  • immuta.secure.partition.generator.keystore.password
    • Specifies the password for the Immuta Vulcan service keystore. This password will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file.
    • Example: secure_password
  • immuta.secure.partition.generator.keymanager.password
    • Specifies the KeyManager password for the Immuta Vulcan service keystore. This password will be a publicly available piece of information, but file permissions should be used to make sure that only the user running the service can read the keystore file. This is not always necessary.
    • Example: secure_password

Best Practice: Secure Keystore with File Permissions

Immuta recommends using file permissions to secure the keystore from improper access:

chown immuta:immuta /etc/immuta/keystore.jks
chmod 600 /etc/immuta/keystore.jks

Client-side TLS Configuration

You must also set the following properties under the following client sections:

For Spark 2, under the Immuta service of Cloudera Manager, Configuration tab, search for key:

Immuta Client Advanced Configuration Snippet (Safety Valve) for immuta-conf/session/generator.xml

and, using "View as XML", add/set the value(s) similar to:

<property>
    <name>immuta.secure.partition.generator.keystore</name>
    <value>true</value>
    <final>true</final>
</property>

Best Practice: Configuration Values

Immuta recommends that all Immuta configuration values be marked final.

Detailed Explanation:

  • immuta.secure.partition.generator.keystore
    • Set to true to enable TLS
    • Default: true

Impala Configuration

You must give the service principal that the Immuta Web Service is configured to use permission to delegate in Impala. To accomplish this, add the Immuta Web Service principal to authorized_proxy_user_config in the Impala daemon command line arguments.

Under the Impala service of Cloudera Manager, Configuration tab, search for key:

Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)

and add/set the value(s) similar to:

-authorized_proxy_user_config=<IMMUTA_SERVICE_PRINCIPAL>=*
Note

If the authorized_proxy_user_config parameter is already present for other services, append the Immuta configuration value to the end:

-authorized_proxy_user_config=hue=*;<IMMUTA_SERVICE_PRINCIPAL>=*

Spark 2 Configuration

No additional configuration is required.

Note: Immuta will work with any Spark 2 version you may have already installed on your cluster.

Immuta Vulcan Service Configuration

The Immuta Vulcan service requires the same system API key that is configured for the Immuta NameNode plugin. Be sure that the value of immuta.system.api.key is consistent across your configuration.

For Spark 2, under the IMMUTA service of Cloudera Manager, Configuration section, search for key:

Immuta Spark 2 Vulcan Server Advanced Configuration Snippet (Safety Valve) for session/generator.xml

and, using "View as XML", add/set the value(s) similar to:

<property>
    <name>immuta.system.api.key</name>
    <value>0ec28d3f-a8a2-4960-b653-d7ccfe4803b3</value>
    <final>true</final>
</property>

Best Practice: Configuration Values

Immuta recommends that all Immuta configuration values be marked final.

Immuta Web Service Configuration

The Immuta Web Service needs to be configured to support the HDFS plugin. You can set this configuration using the Immuta Configuration UI.

Though generally unnecessary given the configuration through the Application Settings of the Web UI, below is an example YAML snippet that can be used as an alternative to the Immuta Configuration UI if recommended by an Immuta representative.

client:
    kerberosRealm: YOURCOMPANY.COM
plugins:
    hdfsHandler:
        hdfsSystemToken: 0ec28d3f-a8a2-4960-b653-d7ccfe4803b3
kerberos:
    ticketRefreshInterval: 43200000
    username: immuta
    keyTabPath: /etc/immuta/immuta.keytab
    krbConfigPath: /etc/krb5.conf
    krbBinPath: /usr/bin/

Detailed Explanation:

  • client
    • kerberosRealm
      • Specifies the default realm to use for Kerberos authentication.
      • Example: YOURCOMPANY.COM
  • plugins
    • hdfsHandler
      • hdfsSystemToken
        • Token used by NameNode plugin to authenticate with the Immuta REST API. This must equal the value set in immuta.system.api.key. Use the value of HDFS_SYSTEM_TOKEN generated earlier.
        • Example: 0ec28d3f-a8a2-4960-b653-d7ccfe4803b3
  • kerberos
    • ticketRefreshInterval
      • Time in milliseconds to wait between kinit executions. This should be lower than the ticket refresh interval required by the Kerberos server.
      • Default: 43200000
    • username
      • User principal used for kinit.
      • Default: immuta
    • keyTabPath
      • The path to the keytab file on disk to be used for kinit.
      • Default: /etc/immuta/immuta.keytab
    • krbConfigPath
      • The path to the krb5 configuration file on disk.
      • Default: /etc/krb5.conf
    • krbBinPath
      • The path to the Kerberos installation binary directory.
      • Default: /usr/bin/

Additionally, you must upload a keytab for the immuta user as well as a krb5.conf configuration file to the Immuta Web Service. This can also be done via the Immuta Configuration UI.