Databricks

Auto Loader is one of the standout features in Databricks and this post will introduce you to why you’d want to use it to address common data ingestion challenges.

This post is for anyone who is unaware that interactive Databricks clusters can be deleted 30 days after termination, unless the cluster is “pinned”.

In part 4, the final part of this beginner’s mini-series of how to handle bad data, we will look at how we can retain flexibility to capture bad data and proceed uninterrupted.

We’ll look to use specifically, the “badRecordsPath” option in Azure Databricks, which has been available since Azure Databricks runtime 3.0.

In the 3rd instalment of this 4-part mini-series, we will look at how we can handle bad data using PERMISSIVE mode. It is the default mode when reading data using the DataFrameReader but there’s a bit more to it than simply replacing bad data with NULLs.

In the second part, we’ll continue to focus on the DataFrameReader class and look at the option, DROPMALFORMED to remove bad data.

Receiving bad data is often a case of “when” rather than “if”, so the ability to handle bad data is critical in maintaining the robustness of data pipelines.

In this beginners 4-part mini-series, we’ll look at how we can use the Spark DataFrameReader to handle bad data and minimise disruption in Spark pipelines. There are many other creative methods outside of what will be discussed and I invite you to share those if you’d like.

One of the most common reasons to perform a restore is to do so for a table. In this post, we’ll be looking into how one of delta lake’s neat features allows us to accomplish fast and simple table restores to previous versions.

Secret redaction within Databricks is a great feature that helps to prevent exposure of your secrets unintentionally. This post will look at a short demo of why we need to remain cautious of secret exposure, even with secret redaction in place.

This post shows how to quickly set up a managed identity for Databricks activities in Data Factory (ADF), to eliminate the need to manage credentials.

Why you want Databricks Auto Loader

Pin Databricks Clusters

Part 4 - Bad records path

Part 3 - Permissive

Part 2 - Dropmalformed

Part 1 - Failfast

Delta Lake table restore

Secret redaction caution

Databricks managed identity setup in ADF