export default {
    id: '2020-03-31',
    year: 2020,
    month: 3,
    date: 31,
    title: `Developer Story: Dealing with Dirty Data`,
    blog_url: `https://keithvictordawson.medium.com/developer-story-dealing-with-dirty-data-48bea1a61856`,
    image_url: `https://images.unsplash.com/photo-1549299513-83dceea1f48b`,
    image_caption: `Photo by <a class="text--primary" href="https://unsplash.com/@markusspiske">Markus Spiske</a> on <a class="text--primary" href="https://unsplash.com/s/photos/data-management">Unsplash</a>`,
    contents: [
        {
            type: 'text',
            content: `One of the first things that I started doing when I first began working on my personal project at the beginning of October last year was search for a source of data with which to seed the database that will serve as the backbone of the entire system that I was architecting. As the original goal of my personal project was to build a language learning system, the most important data that I could possibly use for seeding was a set of dictionary data or the closest equivalent. Luckily, I was able to find such a source of data relatively quickly after beginning my search. It turned out that the challenge was not in finding a source of data, but in whipping that data into a shape that could pass my demanding and exacting standards. As a software engineer, I have a deeply-held and almost obsessive need for order as it pertains to the software applications that I build. One of the places where that need holds most true is in the database, where all of the lifeblood of a software system truly resides.`,
        },
        {
            type: 'text',
            content: `For about the first month of working on my personal project, almost all of my time was spent beginning the grueling task of sifting through the set of data that I had chosen to use. Seeing as how there was literally a dictionary’s worth of data to work through with nearly eighty thousand individual lines of text, not to be confused with individual entries as many lines of text themselves contained multiple entries, this was no easy task. The job of cleaning up this rough set of data required a systematic approach in order to ensure that the end result was a clean and consistent dataset that I could then load into a series of migration scripts that could then be executed against my system database. Throughout that first month, I developed an approach that I continued to refine more and more as I worked through the mountain of data and encountered new things to look for and clean up in all of the data that I had already covered up to that point.`,
        },
        {
            type: 'text',
            content: `By the end of that first month, there were two reasons why I decided to shift my primary focus from cleaning up the data to actual software development. The first reason was that I had finally finished sufficiently cleaning up the first piece of the dataset. When I had first chosen the dataset that I was going to work with, I knew that I would have to divide it into multiple migration scripts because there would be far too much data to put into just a single migration script. I had concluded early on that the best way to make this division was by separating the data based on the first character of each entry. When I was able to finish cleaning up the first of these divisions, I knew that I had enough data with which to create the first migration script for the database. I also knew that the data in this first migration script would be enough to work with while developing the initial versions of the applications that would make up my overall software system.`,
        },
        {
            type: 'text',
            content: [
                {
                    type: 'text',
                    content: `The second reason for shifting focus from cleaning up data to software development was that I knew that I needed to complete work on preliminary versions of the applications for my software system so that I could have a canvas on which to expand my vision for the overall project. This canvas was one of the major milestones that I discussed in one of my `,
                },
                {
                    type: 'internal_link',
                    year: 2020,
                    month: 2,
                    date: 25,
                    content: `previous`,
                },
                {
                    type: 'text',
                    content: ` developer story entries. I did not want to wait anymore to begin work on creating those software applications, so I put the dataset behind me and moved on to full-time software development. While taking this step forward, I knew full well that I would need to circle back to the dataset at some point in the future in order to complete the task of cleaning it up and getting it into a set of migration scripts for the database.`,
                },
            ],
        },
        {
            type: 'text',
            content: `After reaching the major milestone a little over a month ago, I decided that it was time to return to the task of cleaning up the dataset. After all, it could not be avoided forever and that voice in the back of my head that constantly reminded me about the dirty data prevented me from fully focusing on the critical task of dreaming up new and interesting directions to take my personal project. So after another few weeks of working on cleaning up the dataset, I am much closer to having a set of data in hand clean enough to be loaded into migration scripts and inserted into the database.`,
        },
        {
            type: 'text',
            content: `This whole data cleanup process has certainly taken far longer than I had ever planned at the beginning of my personal project, but I fully believe that completing the task and having all of that data in the system database in the end will make the system far more useful and valuable than if I were to leave some of the data out. Without a full language learning system implemented in the overall cultural learning system that I am creating, the end goal that I had envisioned for this project would lose a lot of the core value that I had always intended for it to have. That is unacceptable to me, which is why I have continued pushing forward through this at times admittedly mind-numbing but ultimately very important process. But just like at the beginning of my work on this project, I will not allow myself to get bogged down in the data cleanup for too long. I will be returning to work on the software applications shortly. Please stay tuned for more developer story entries as I move back to software development and continue to make further progress on my personal project.`,
        },
    ],
}