<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kimani Mbugua - Data and Technology blog</title><link>http://kimanimbugua.com/</link><description>Recent content on Kimani Mbugua - Data and Technology blog</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 05 Mar 2023 00:00:00 +0000</lastBuildDate><atom:link href="http://kimanimbugua.com/rss.xml" rel="self" type="application/rss+xml"/><item><title>Automating repo scaffolding with Azure DevOps</title><link>http://kimanimbugua.com/post/using-azure-devops-to-automate-cookiecutter-scaffolding/</link><pubDate>Sun, 05 Mar 2023 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/using-azure-devops-to-automate-cookiecutter-scaffolding/</guid><description>&lt;p&gt;Integrating Cookiecutter with yaml pipelines in Azure DevOps to automatically scaffold repos, provides a simple and repeatable workflow and further minimises manual effort. This will show how we can set up this automation in DevOPs via yaml pipelines.&lt;/p&gt;</description></item><item><title>Using Azure DevOps to run Cookiecutter templates</title><link>http://kimanimbugua.com/post/using-azure-devops-to-run-cookiecutter-templates/</link><pubDate>Sun, 19 Feb 2023 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/using-azure-devops-to-run-cookiecutter-templates/</guid><description>&lt;p&gt;Ever wondered how you can scaffold repos in a yaml pipeline? This post will show how we could do this in Azure DevOps.&lt;/p&gt;</description></item><item><title>Using cookiecutter hooks to enhance code scaffolding</title><link>http://kimanimbugua.com/post/using-cookiecutter-hooks-to-enhance-code-scaffolding/</link><pubDate>Sun, 29 Jan 2023 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/using-cookiecutter-hooks-to-enhance-code-scaffolding/</guid><description>&lt;p&gt;Using cookiecutter to scaffold code repositories offers useful way kick start projects. To enhance the user experience even more, this post will look at using hooks to perform actions such as input validation and clean up activities.&lt;/p&gt;</description></item><item><title>Scaffolding repos with cookiecutter</title><link>http://kimanimbugua.com/post/scaffolding-repos-with-cookiecutter/</link><pubDate>Sun, 15 Jan 2023 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/scaffolding-repos-with-cookiecutter/</guid><description>&lt;p&gt;For code development projects, we often end up creating code repositories (repos) that have similar structures or components to what we have previously used.&lt;/p&gt;
&lt;p&gt;Cookiecutter is a tool that we can use to scaffold the creation of our repos and this post will guide you through how to do this and more.&lt;/p&gt;</description></item><item><title>Why you want Databricks Auto Loader</title><link>http://kimanimbugua.com/post/databricks-autoloader-why-you-want/</link><pubDate>Sat, 27 Feb 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/databricks-autoloader-why-you-want/</guid><description>&lt;p&gt;Auto Loader is one of the standout features in Databricks and this post will introduce you to why you&amp;rsquo;d want to use it to address common data ingestion challenges.&lt;/p&gt;</description></item><item><title>Part 3 - Pre-commit hooks - SQL Linting</title><link>http://kimanimbugua.com/post/azure-repos-pre-commit-hooks-part-3/</link><pubDate>Sat, 13 Feb 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/azure-repos-pre-commit-hooks-part-3/</guid><description>&lt;p&gt;It can be a challenge to keep code formatted consistently and with a lack of consistency, errors soon follow.&lt;/p&gt;
&lt;p&gt;In part 3 of this pre-commit hooks series, we&amp;rsquo;ll focus on how we can use pre-commit hooks in Azure git repos, to automatically check for stylistic and programmatic errors in SQL scripts.&lt;/p&gt;</description></item><item><title>Part 2 - Detect secrets in Azure repos</title><link>http://kimanimbugua.com/post/azure-repos-pre-commit-hooks-part-2/</link><pubDate>Sat, 30 Jan 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/azure-repos-pre-commit-hooks-part-2/</guid><description>&lt;p&gt;Even with the advent of cloud computing and all manner of technology enhancements, exposing secrets seems to be a problem that won&amp;rsquo;t go away.&lt;/p&gt;
&lt;p&gt;Without the right controls in place, developers can leak secrets that can cause financial and reputational damage to an organisation.&lt;/p&gt;
&lt;p&gt;In part 2, we&amp;rsquo;ll look at how we can use a pre-commit hook to try and detect secrets in our code.&lt;/p&gt;</description></item><item><title>Part 1 - Pre-commit hooks in Azure repos</title><link>http://kimanimbugua.com/post/azure-repos-pre-commit-hooks-part-1/</link><pubDate>Sat, 23 Jan 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/azure-repos-pre-commit-hooks-part-1/</guid><description>&lt;p&gt;Having standards for code development is a necessity but making sure those standards are followed can be a challenge.&lt;/p&gt;
&lt;p&gt;As human beings, we make mistakes and can overlook standards at the very moment we need to apply them.&lt;/p&gt;
&lt;p&gt;Central to that challenge is making sure standards are applied before changes are committed.&lt;/p&gt;
&lt;p&gt;In this series, we&amp;rsquo;ll look at taking on that challenge with pre-commit hooks. We&amp;rsquo;ll explore what pre-commit hooks are, why we might want to use them and how they work.&lt;/p&gt;</description></item><item><title>Moving to a Hugo static site</title><link>http://kimanimbugua.com/post/moving-blog-to-hugo-static-site/</link><pubDate>Fri, 31 Dec 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/moving-blog-to-hugo-static-site/</guid><description>&lt;p&gt;In this post, I&amp;rsquo;ll share my motivations and experiences of moving my blog, towards the end of 2021, from Wix to a Hugo static site.&lt;/p&gt;</description></item><item><title>Pin Databricks Clusters</title><link>http://kimanimbugua.com/post/pin-databricks-clusters/</link><pubDate>Mon, 23 Aug 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/pin-databricks-clusters/</guid><description>&lt;p&gt;This post is for anyone who is unaware that interactive Databricks clusters can be deleted 30 days after termination, unless the cluster is &amp;ldquo;pinned&amp;rdquo;.&lt;/p&gt;</description></item><item><title>Part 4 - Bad records path</title><link>http://kimanimbugua.com/post/handling-bad-data-part-4-bad-records-path/</link><pubDate>Mon, 14 Jun 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/handling-bad-data-part-4-bad-records-path/</guid><description>&lt;p&gt;In part 4, the final part of this beginner’s mini-series of how to handle bad data, we will look at how we can retain flexibility to capture bad data and proceed uninterrupted.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll look to use specifically, the “&lt;a href="https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/handling-bad-records"&gt;badRecordsPath&lt;/a&gt;” option in Azure Databricks, which has been available since Azure Databricks runtime 3.0.&lt;/p&gt;</description></item><item><title>Part 3 - Permissive</title><link>http://kimanimbugua.com/post/handling-bad-data-part-3-permissive/</link><pubDate>Sun, 30 May 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/handling-bad-data-part-3-permissive/</guid><description>&lt;p&gt;In the 3rd instalment of this 4-part mini-series, we will look at how we can handle bad data using PERMISSIVE mode. It is the default mode when reading data using the DataFrameReader but there’s a bit more to it than simply replacing bad data with NULLs.&lt;/p&gt;</description></item><item><title>Part 2 - Dropmalformed</title><link>http://kimanimbugua.com/post/handling-bad-data-part-2-dropmalformed/</link><pubDate>Mon, 17 May 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/handling-bad-data-part-2-dropmalformed/</guid><description>&lt;p&gt;In the second part, we’ll continue to focus on the DataFrameReader class and look at the option, &lt;strong&gt;DROPMALFORMED&lt;/strong&gt; to &lt;strong&gt;remove&lt;/strong&gt; bad data.&lt;/p&gt;</description></item><item><title>Part 1 - Failfast</title><link>http://kimanimbugua.com/post/handling-bad-data-part-1-failfast/</link><pubDate>Mon, 10 May 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/handling-bad-data-part-1-failfast/</guid><description>&lt;p&gt;Receiving bad data is often a case of “when” rather than “if”, so the ability to handle bad data is critical in maintaining the robustness of data pipelines.&lt;/p&gt;
&lt;p&gt;In this beginners 4-part mini-series, we’ll look at how we can use the Spark DataFrameReader to handle bad data and minimise disruption in Spark pipelines. There are many other creative methods outside of what will be discussed and I invite you to share those if you’d like.&lt;/p&gt;</description></item><item><title>Delta Lake table restore</title><link>http://kimanimbugua.com/post/delta-lake-fast-simple-easy-restores/</link><pubDate>Mon, 19 Apr 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/delta-lake-fast-simple-easy-restores/</guid><description>&lt;p&gt;One of the most common reasons to perform a restore is to do so for a table. In this post, we’ll be looking into how one of delta lake’s neat features allows us to accomplish fast and simple table restores to previous versions.&lt;/p&gt;</description></item><item><title>Key vault secrets in ADF pipelines</title><link>http://kimanimbugua.com/post/take-care-when-passing-key-vault-secrets-in-adf-pipeline-activities/</link><pubDate>Thu, 18 Mar 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/take-care-when-passing-key-vault-secrets-in-adf-pipeline-activities/</guid><description>&lt;p&gt;This short post looks at some considerations when using key vault secrets in Data Factory to securely pass information in pipeline activities. This is not an exhaustive list however but do take note.&lt;/p&gt;</description></item><item><title>Secret redaction caution</title><link>http://kimanimbugua.com/post/secret-redaction-caution/</link><pubDate>Thu, 18 Mar 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/secret-redaction-caution/</guid><description>&lt;p&gt;Secret redaction within Databricks is a great feature that helps to prevent exposure of your secrets unintentionally. This post will look at a short demo of why we need to remain cautious of secret exposure, even with secret redaction in place.&lt;/p&gt;</description></item><item><title>Databricks managed identity setup in ADF</title><link>http://kimanimbugua.com/post/set-up-azure-databricks-managed-identity-in-data-factory/</link><pubDate>Wed, 17 Feb 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/set-up-azure-databricks-managed-identity-in-data-factory/</guid><description>&lt;p&gt;This post shows how to quickly set up a managed identity for Databricks activities in Data Factory (ADF), to eliminate the need to manage credentials.&lt;/p&gt;</description></item><item><title>Cutting code and architects</title><link>http://kimanimbugua.com/post/how-much-technical-depth-should-architects-have/</link><pubDate>Mon, 15 Feb 2021 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/how-much-technical-depth-should-architects-have/</guid><description>&lt;p&gt;Over the years working on data platforms, I have seen architects writing less and less code, to the point where in more recent times, a seemingly growing number of architects (in data projects) are writing no code at all.&lt;/p&gt;</description></item><item><title>Concurrency defaults in ADF</title><link>http://kimanimbugua.com/post/concurrency-defaults-in-adf/</link><pubDate>Mon, 09 Nov 2020 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/concurrency-defaults-in-adf/</guid><description>&lt;p&gt;In this short post, we&amp;rsquo;ll look at concurrency default values in ADF and implications of changing them or not.&lt;/p&gt;</description></item><item><title>ADF activity policy</title><link>http://kimanimbugua.com/post/adf-activity-policy/</link><pubDate>Mon, 26 Oct 2020 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/adf-activity-policy/</guid><description>&lt;p&gt;In this post, we&amp;rsquo;ll explore the Azure Data Factory (ADF) activity policy, it&amp;rsquo;s configuration and default behaviour implications.&lt;/p&gt;</description></item><item><title>Specify dynamic JSON content in ADF</title><link>http://kimanimbugua.com/post/specify-dynamic-content-and-reference-key-vault/</link><pubDate>Sun, 11 Oct 2020 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/post/specify-dynamic-content-and-reference-key-vault/</guid><description>&lt;p&gt;This article shows how to utilise the json editor and key vault secret references in Azure Data factory (ADF) to provide an alternative experience for linked service connectors that do not have built-in parameterisation support.&lt;/p&gt;</description></item><item><title/><link>http://kimanimbugua.com/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/about/</guid><description>&lt;h5 id="hi-im-kimani"&gt;Hi. I&amp;rsquo;m Kimani.&lt;/h5&gt;
&lt;p&gt;I help companies with the strategy, design and implementation of their data projects.&lt;/p&gt;
&lt;p&gt;As well as other more general subjects, I particularly enjoy sharing my &lt;a href="https://credentials.databricks.com/211b3ea2-310a-460f-98e8-31179384ca84"&gt;experience&lt;/a&gt; with data and cloud technologies.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m based in Sussex, England and outside of work you can find me playing cricket, hanging out with family or on the occasional scuba dive.&lt;/p&gt;
&lt;p&gt;&lt;img src="http://kimanimbugua.com/images/km-about-me.jpg" alt="kimani mbugua - about me"&gt;&lt;/p&gt;</description></item><item><title/><link>http://kimanimbugua.com/contact/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>http://kimanimbugua.com/contact/</guid><description>&lt;iframe src="https://docs.google.com/forms/d/e/1FAIpQLSeEMwIu9_yEcr2hQEoyB-dt1oU9uBnrFxUx4vE8A6Yojc731w/viewform?embedded=true" width="640" height="900" frameborder="0" marginheight="0" marginwidth="0"&gt;Loading…&lt;/iframe&gt;</description></item></channel></rss>