A 2minute Guide to Unit Test Your Databricks Notebooks

Amit Damle
3 min readDec 8, 2021

While working on Databricks I have noticed most of the developers (data engineers / Data scientists ) struggle to unit test the notebooks. Recently, I came across a Git repository from Microsoft architects that provides more elegant option for unit testing Databricks notebooks with minimal prerequisite called Nutter.

Following is a 2 min quick start guide with Nutter

Why Unit Test Databricks Notebook?

Well there is no special reason for “Why” but in general it helps identify the logic issues early in the cycle, help maintain the quality

What are the Available Options for Unit Testing?

  1. Convert Databricks Notebook cells into python library and unit test it
  2. Use pytest or doc test
  3. Databricks Connect

These options needs some setup as part of prerequisite in other words not so straight forward at least for #1,2(views may differ here)

Nutter is a simple to use python library that help unit test Databricks notebook using CLI or from test Notebook. It can be easily integrated with DevOps pipeline as well

How do I start?

Simplest option to get familiar would be to create a test notebook, install Nutter using PyPi on databricks cluster and that's it

But what about things I need to know before start testing?

You need to know 3 things —

  1. How to Install : Use databricks python library installation for installing nutter or refer this for CLI
  2. Nutter Fixture : Your Test Class will be inherited from Nutter Fixture. Fixture is a collection of one or more Test Cases. Test Cases are executed when execute_tests() function is called. It returns the test result object
  3. Test Cases — Each test cases has 1 required and 3 optional functions as mentioned in below table

Multiple test cases are ordered by name and they are executed in alphabetical order.

State sharing attributes set in the constructor of Nutter is available across test cases [pls see the sample notebook in dbc archive : test_nycdata_proc_notebook]

About The Sample

I have a sample Databricks notebook that process the nyc data (sample data included) and performs following -

  1. Count records

2. Add Months Column

3. Calculates Number of Passengers Served by Driver in a Given Month

Notebooks can either have a functions that can be called from different cells or it can create a view (Global for sharing) or Table. Test notebook tests each of the above mentioned scenarios.

Note - I am assuming that users are aware about mounting storage in Databricks workspace, if not aware please refer this guide

References —

# Nutter Repository Github

# Sample

Disclaimer: Ideas / views expressed are personal opinions

--

--