Tuesday, October 20, 2015

The DBA’s guide to the [new] galaxy

The DBA’s guide to the [new] galaxy

| Prologue

I’ve been wanting to write this article for quite a while. Consider this as some sort of a high-level summary of today’s data world, given from my own personal perspective.
Specifically, an overview for all of my fellow SQL DBA’s (admins, developers, architects, warehouse/BI specialists), who get exposed to new data platforms starting to get built around them
Then quietly wonder how to adjust to these changes, and what should be their right approach (I’ll give you a quick tl;dr hint - denial is not the right one!)

Are you ready? Let’s go!

| Part 1: A quick “12-steps” phase to acknowledge the data world is changing

Until not too long ago, a typical organization would hold its entire back-end data stack in one or more relational databases.
To handle the data, that typical organization would hire one or more DBA’s responsible for tasks such as storage planning, administration, development, tuning, DR and more.
Also, a typical DBA would usually specialize in one (usually relational) database.

While this is still relevant, reality forced changes to the traditional data world; here’s a brief of what and why:

Over the years, data collection volume is growing exponentially!
There’s an ever-growing need to collect and store more data, while persisting historical data.
The new data is often getting richer in content and structure, or sometimes intentionally lacks any formal pre-defined structure.
Storage pricing are constantly dropping, especially commodity storage, 
which is becoming more popular to use instead of having a very high-end server connected to a high-end storage.

The need to facilitate these ever-growing requirements while having minimal cost friction had led to new platforms, services and frameworks being built, whether cloud-based or on premise.

So, the data world has changed, dramatically!

Let’s look at these changes from a different angle:
  •  New products within new startups, as well as existing organizations will not necessarily (gently put) choose a highly priced relational data store (not naming names, but you know which ones), unless the business model specifically requires one.
  •  There are a *lot* of new technologies, mostly open-source, that have already reached enough maturity level in such way that many organizations trust these technologies/platforms as their production source of record.
  • Often, regardless to pricing, a required solution does not even fit inside a relational model and as a result, unlike a decade ago, you will see less and less relational databases trying to imitate processes that are not initially intended to be done inside a relational database. Need some examples? Key-Value stores, Document-based databases, unstructured data, true scale-out (share nothing) architectures, queues, graph data and more
  •  In addition to cutting down licensing costs, scaling out the data (both storage and processing) reduces the hardware cost. High-end server can easily cost like a new house!

Given the above, with the assurance a certain *free technology can be better (let alone - ‘good enough’) to handle its data services - it is very likely such technology will be chosen by almost any organization over its costly rivals.

*This “Free” has its costs, but I’ll get to that later in the article.

Do you see this happening in your organization? If not, it’s just a question of when, not if.
If any of the above is news to you, or if you knew ‘something was going on’ but was gently ignoring it, you may feel a bit of a discomfort. (But if you are, that is totally fine)

First, it is important to be aware of what’s out there
Second, it is also important to keep in mind that Relational databases are still very strong and dominant, and will stay there for many good reasons.
In fact, for any structured data, with constraints, relationship to other data objects that need to be consistent, isolated, and transaction-safe, there are no better solutions than the relational model.

Let’s have a look at some graphs, shall we?

 (Note: relevant to this article date, so - 2015-ish)

So, breath in, breath out! Here are some “perspective” graphs --

The first one, coming from “DB-engines” website’s ‘popularity and trends’ shows that the commercial relational databases are flying up above everyone else.
However, they do keep their stable value, if not decreasing slowly, while other newer “players” gradually increase

The second graph is coming from Google Trends:

The trends graph shows the same picture - while “SQL Server” search term which was used here is still above typical newer databases & services - the trend is pretty clear.
We can clearly assume that popularity = estimated usage

OK, so the data world has changed. Now what?

Continued in Part 2