Wednesday, November 18, 2015

The DBA’s guide to the [new] galaxy: Part III

| Part 3: Use the force, Luke!

Choosing the right architecture --

Experience with a wide variety of data stores doesn’t only help with choosing the right architecture, but also being confident about vetoing bad designs.
This goes both ways; so for the following (intentionally over-exaggerated) scenarios -

Scenario 1:
A new service which keeps track of your company’s orders.
Data specs: there are supposed to be about 100 orders per day, the information must be instantly consistent. Orders relate to products and customers.
Occasionally, some ad-hoc reports are needed, mostly to aggregate orders per customers, products or date range.
Selected solution:
the development team decided to save the orders to Redis, then use a queue to transform the data to an HDFS store,
out of which some map-reduce jobs will be used to return the relevant data.

So yes, the dev team got their hands all over a new trending technologies, and they can add “NoSQL” and “BIG”-data, but in fact, this is a very inappropriate solution for the problem.

Here’s another example -

Scenario 2:
Track the company app’s site visitor interaction in real-time on a heat map. In addition, show geo-locations trend over time.
Data specs: the app is extremely popular, gaining ~2500 interaction per second
Purposed solution:
Use a relational database on a very strong, high-end machine to able to support the 2500/sec OLTP rates
Create additional jobs running every few minutes, in order to aggregate the time frames & performing IP lookups.

Again, although this might work, this is again not a preferred solution.
In this case a real scale-out architecture will perform better at a lower cost.
A typical ELK stack for example could handle this scenario more naturally.

Can you see where this is going? Knowledge is important to divert the architecture in the right way.
Different specs require different platform implementation and in many cases - a mix.

Inappropriate platform selection can come from many sources, including:

  • Developers who want to experience with different technologies
  • PMs/POs lacking the detailed knowledge
  • CTOs wishing to brag about using some new technology (preferably while playing golf with other CTOs)
Choosing the right platform for a given solution is extremely important, even when the recommended technology is not the trending one.

| Epilogue

The world of data is constantly changing. The variety of solutions for storing & retrieving data in various shapes and sizes is rapidly growing.
Relational databases are strong, and will stay strong - for storing and retrieving relational data, usually when no scale-out is required. For all the other types - the world has already changed.
If there’s one thing you should take away with you, is that looking forward - you should embrace these new technologies, get to know them and understand the architectural position of the dominant ones.

Links to Part 1, Part 2