FAQ Series_101 — Azure Databricks Security Controls
Azure Databricks is a mature analytical platform with lots of features to cater to batch, streaming and Machine Learning needs. I have observed that architects and customers are still not fully sure about the security controls provided by Databricks. I tried listing down most common questions asked during initial phases of engagements.
- What is the security architecture of the Azure Databricks?
Azure Databricks Architecture is divided into 2 Parts — 1. Control Plane which is present in MSFT Managed Subscription 2. Data Plane — this can be hosted within your VNet. Please refer this for more details
2. How do I secure my Control Plane?
Control plane can be secured using IP Access List, refer more details here. Please note this feature is only available in Premium Workspace
3. Is it possible to access the Databricks workspace over private IP?
Not present at the time of writing this blog, but access can be restricted using IP Access List. Please refer #2
4. How do I secure Data Plane to make sure my data is not leaving my virtual network boundary?
Users can configure the VNet injection for your Azure Databricks data plane to spawn VMs within your managed Virtual Networks pls refer details here
5. How to remove the Public IP address of the Worker VMs in Azure Databricks Cluster?
Use Secure Cluster Connectivity(SCC) reference
6. How do I restrict users accessing Azure Databricks Workspace?
Authentication is taken care by Azure Active Directory and location based access can be restricted using IP Access List
7. How to connect from Azure Databricks workspace to on-premise network?
Please refer this detailed documentation
8. My Cluster is configured using VNet injection but when I run the notebook it shows refreshing icon?
One of the reason could be your corporate firewall has blocked traffic based on domain. HTTPS and Websocket traffic needs to be enabled. Please refer this
9. How do I enable / disable user access to Azure Databricks Workspace Objects — Tables, Clusters, Notebooks etc.?
Using Azure Databricks’ s ACL framework admins can apply ACLs on Workspace Objects pls refer this
10. What is Credential Passthrough and How can I enable it?
Credential passthrough allows users to authenticate automatically to ADLS Gen1 & Gen2 without providing explicit credentials pls refer this
11. How to I enable disable users to create / manage clusters or use specific type of VMs for worker nodes?
Please use the Databricks ACLs to control permissions to the Databricks Cluster — refer this and refer this for setting ACLs on cluster pools
Use cluster policies to enforce specific type of VMs for cluster — refer this
Note — ACLs are available in Premium Workspace only
12. Is it possible to set Column Level Security (CLS) and Row Level Security (RLS) on Databricks Tables?
CLS and RLS is not supported out of the box but this functionality can be managed using views and function starting Databricks Runtime 7.3 LTS. refer these details
13. How do I make sure my credentials are not being misused while working with Databricks Notebooks?
Scenarios where credentials needs to be specified in code e.g. connecting to external sources, mounting folders in Databricks, calling APIs with keys, users can make use of the Databricks secret scope backed by Azure Key vault to store the credentials. refer this
14. How do I prevent exposing the credentials by accidentally printing it in notebooks or logs?
Please use the Databricks dbutil to work with secrets. It provides redacted secrets. refer this
15. Does Databricks encrypts the Databricks File System (DBFS)?
By default DBFS is encrypted using 256-bit AES encryption which is FIPS 140–2 compliant. It is possible to have a double encryption. please refer this details
16. How do I monitor the access to data and other Databricks resources e.g. Workspace Access, Clusters — Start / Stop/Resize etc.
Use Audit Logging to get overview of the activities performed, refer this document