Thursday, October 29, 2015

The DBA’s guide to the [new] galaxy: Part II

| Part 2: Adopt to the change

On the previous post we talked about how the database world has changed and is constantly changing. With this in mind, let’s continue the discussion.

Knowledge & Expertise
Let’s take me for example -
I’ve been working with SQL Server for many years (and still am). Loving this platform, developing tools that make it more fun to work with, active in the great SQL Community and making sure to always be updated with the latest builds coming out.

For many years, I’ve spent a lot of free time to extend my knowledge around SQL Server, while some of it came as part of my day-to-day roles.
Just like any other technology, the knowledge and expertise one can gather (in SQL Server for example) if almost endless - given the amount of time dedicated to do so.

But when you try to look at this from 30,000ft you come to wonder -
At which point knowing the really deep-dive material  (like memorizing trace-flags, knowing the internals of resource allocation and lock acquisition, mastering the in-depth tricks to force the server to build a specific execution plan, and the list goes on…)
At which point many of this becomes a classic case of the Diminishing returns? <wiki>

I argue that in most cases, knowledge gained beyond the orange marker is rarely required in practice.
Of course, given unlimited learning time and resources, go ahead and acquire all possible knowledge, but when this is not the case - maybe this time should be spent to learn new things?
Instead of a DBA, why not become a multicultural DBA?

(And just to be clear - getting to the orange marker takes many years of experience!)

Getting out of your comfort zone

So to rephrase, what I’m trying to say is the following:
Given the available data stores today, and given a limited amount of learning time an average full time employee has (say, per week) -
There is a higher probability you will need to understand how to work with, say, Apache Spark for example, rather than knowing how to force an execution plan to use parallelism in SQL Server. Again, just an example.
Expanding your knowledge on various data storage/processing/retrieval engines for at least some of the leading new platforms available today is extremely important!

Please read the two lines ^above^ again. This is pretty much the most important essence of this article.

Being familiar with various data stores, as well as understanding theories like the “CAP Theorem” (below) is significant in order to being able to choose the right platform for your next project.

Where should I begin?

451 Research conducted a beautiful map of data platforms:
Besides the actual content, it is apparent that the list is huge! Not only that - it is rapidly growing (consider the report was created sometime around last year, so the list is bigger today)
It is fairly impossible to dive into all of these products, let alone master one.
So my approach to you would be as follows:

  • Understand the concept and the key points of different data stores, as well as where they would fit in a high-level theory (like the CAP theorem above)
    So, assuming you already know ‘relational’ pretty well - focus on non-relational, key-value, document based, distributed frameworks.
  • Choose a leading platform from each area. This can be a popular/trending platform(such as Spark, Hadoop, MongoDb, Redis, Cassandra and others)
    or a framework that is already being used within your organization (which can help you connecting the learning material with practical implementation)
  • Practice:
    Reading is fine, but hands-on experience is extremely important. If you are a Windows user you may need to install a Linux VM as many of the new platforms are Linux natives.
    Some vendors offer a ready-to-go image. Some offer online simulators. Use those!

Emphasizing the last item, if your natural [os] habitat is Windows be aware that the vast majority of new systems are not windows natives
In fact, it makes sense that open-source services are running on a free operation system. It’s time to refresh your Linux skills!

One more thing - Consider your existing knowledge as a great advantage!

For example, SQL (in different variations), as a querying language, is still one of the most popular languages to query data.
You will find a lot of SQL implementations well integrated into newer technologies (Hive, Spark SQL only to name a few)
Opening up to learning new data products will probably boost your Résumé, but this is only a side effect. The main advantage is that it will help you choosing the right data platform(s) for your next project!

More about that in Part 3...