| Part 2: Adopt to the change
On the previous post we talked about how the database
world has changed and is constantly changing. With this in mind, let’s continue
the discussion.
Knowledge &
Expertise
Let’s take me for example -
I’ve been working with SQL Server for many years (and
still am). Loving this platform, developing tools that make it more fun to work
with, active in the great SQL Community and making sure to always be updated
with the latest builds coming out.
For many years, I’ve spent a lot of free time to extend
my knowledge around SQL Server, while some of it came as part of my day-to-day
roles.
Just like any other technology, the knowledge and
expertise one can gather (in SQL Server for example) if almost endless - given
the amount of time dedicated to do so.
But when you try to look at this from 30,000ft you come
to wonder -
At which point knowing the really deep-dive material (like memorizing trace-flags, knowing the internals of resource allocation and lock acquisition, mastering the in-depth tricks to force the server to build a specific execution plan, and the list goes on…)
At which point knowing the really deep-dive material (like memorizing trace-flags, knowing the internals of resource allocation and lock acquisition, mastering the in-depth tricks to force the server to build a specific execution plan, and the list goes on…)
At which point many of this becomes a classic case of the
Diminishing returns? <wiki>
I argue that in most cases, knowledge gained beyond the
orange marker is rarely required in practice.
Of course, given unlimited learning time and resources,
go ahead and acquire all possible knowledge, but when this is not the case -
maybe this time should be spent to learn new things?
Instead of a DBA, why not become a multicultural DBA?
(And just to be clear - getting to the orange marker
takes many years of experience!)
Getting out of your comfort zone
So to rephrase, what I’m trying to say is the following:
Given the available data stores today, and given a
limited amount of learning time an average full time employee has (say, per
week) -
There is a higher probability you will need to
understand how to work with, say, Apache Spark for example, rather than knowing
how to force an execution plan to use parallelism in SQL Server. Again, just an
example.
Expanding your knowledge on various data
storage/processing/retrieval engines for at least some of the leading new
platforms available today is extremely important!
Please read the two lines ^above^ again. This is pretty
much the most important essence of this article.
Being familiar with various data stores, as well as understanding
theories like the “CAP Theorem” (below) is significant in order to being
able to choose the right platform for your next project.
Where should I begin?
451 Research conducted a beautiful map of data platforms:
https://451research.com/images/Marketing/dataplatformsmapoctober2014.pdf
Besides the actual content, it is apparent that the list
is huge! Not only that - it is rapidly growing (consider the report was created sometime around last year, so the list is bigger today)
It is fairly impossible to dive into all of these products, let alone master one.
So my approach to you would be as follows:
So my approach to you would be as follows:
- Understand the
concept and the key points of different data stores, as well as where they
would fit in a high-level theory (like the CAP theorem above)
So, assuming you already know ‘relational’ pretty well - focus on non-relational, key-value, document based, distributed frameworks. - Choose a leading platform
from each area. This can be a popular/trending platform(such as Spark, Hadoop,
MongoDb, Redis, Cassandra and others)
or a framework that is already being used within your organization (which can help you connecting the learning material with practical implementation) - Practice:
Reading is fine, but hands-on experience is extremely important. If you are a Windows user you may need to install a Linux VM as many of the new platforms are Linux natives.
Some vendors offer a ready-to-go image. Some offer online simulators. Use those!
Emphasizing the last item, if your natural [os] habitat
is Windows be aware that the vast majority of new systems are not windows
natives
In fact, it makes sense that open-source services are
running on a free operation system. It’s time to refresh your Linux skills!
One more thing - Consider your existing knowledge as a great
advantage!
For example, SQL (in different variations), as a querying language,
is still one of the most popular languages to query data.
You will find a lot of SQL implementations well integrated into newer technologies (Hive, Spark SQL only to name a few)
You will find a lot of SQL implementations well integrated into newer technologies (Hive, Spark SQL only to name a few)
Opening up to learning new data products will probably boost
your Résumé, but this is only a side effect. The main advantage is that it will
help you choosing the right data platform(s) for your next project!
More about that in Part 3...