Say no to NoSQL AKA NewSQL is the future!
Nosql image by John hoho (Own work) [CC BY-SA 4.0], via Wikimedia Commons
The rise of Big Data™
With the rise of Big Data™ and IoT we saw a large wave of NoSQL™ proponents. Everyone began to jump on the bandwagon and hype trains to use these technologies to service their persistence layers. It became a mantra and it seemed like the end of nigh for relational (SQL) databases. Some companies were steadfast in only using NoSQL™ stores, claiming this was all that was needed for their data and would not hear another word of ever using a traditional "SQL" database. Unsurprisingly, SQL is not dead, but persistence layers have become more complex for many types of applications.
A small digression into NoSQL™ marketing, what it actually means, and why it really rubs me the wrong way. NoSQL™ is a catch all term for non-relational databases. It began as "No SQL" to indicate casting off the use of relational databases, which use at minimum, the ANSI SQL standard to query data. NoSQL™ stores many times use an implementation specific query language or mechanism. In a relational database, data has the ability to relate to other data, generally through the use of foreign keys and joins. A canonical example might be a User of your application who is able to have a set Roles. Users are unique and Roles are shared among these users. This illustrates a One-to-Many relationship between Users and Roles. Every user may have some number of (shared) roles. A non-relational database does not inherently have this ability to relate data. As the movement has matured, some have taken to saying NoSQL™ means "No only SQL", but I see through you. Let's just stop referring to databases as NoSQL™ and either say non-relational, use their name, or their direct database type.
Data will outlive you and is worth as much as your first child
Your data is almost without a doubt the most valuable part of your application
If you application is going to production, don't just jump at using the hottest database, or any particular database, or only ONE type of database. As you scale up and gather more and more data, a properly designed database can mean the difference between enterprise data at your fingertips or indefinite wait times. The data in your application will almost definitely outlive your stay at your current employer and may even outlive you and your kids' kids. Sometimes, its more valuable than you. Protect it.
Carefully consider the correct database(s) for your problem domain
If you don't understand which database you need, seek help and research how others might have solved similar problems. Seems like a no-brainer but common sense and smart practices are worth repeating. With so many different databases and database types (key-value, column, document, relational, graph, etc), its important to make sure you're using the right tool... or a combination of right tools. Please don't use HBase to write an RBAC for your app and don't use MSSQL to store massively streaming timeseries data. Leverage the right database for the data models in your problem domain.
Databases for Big Data™
Be honest with yourself; Are you ACTUALLY going to have multi-billions of rows in your DB? If not, you're probably better off using PostgreSQL. If you are streaming a fire-hose of data, then make sure you evaluate the Big Data™ landscape to create a robust data pipeline. Most likely, your data lake or data warehouse won't be the only databases servicing your applications. Work to have single sources of truth and different databases for different needs.
Backup, recovery, syncing, scaling strategies
HAVE A PLAN
- How and where are you backing up your database(s)?
- How will you recover when there is a failure (or many)?
- How do you sync across databases or regions?
- How do you scale your databases for as load increases?
Schema and design
I can't stress enough that unless you're making a proof of concept, you should take some time to design your database. Hot-spotting is a real concern particularly in non-relational databases which need a strong key design in order to perform optimally. Migrating data or changing schemas is costly and time consuming. Consider whether you really have semi-structured data or truly "schema-less" data. Chances are good you have some kind of schema, you just might not know it yet because you really don't understand the problem you are solving.
NoSQL is dead, long live NewSQL
NewSQL is the answer to everything forever! Have the best of all worlds! Many of these databases are trying to provide a relational model with high scalability. As with all newer technologies, strive to understand what these systems provide and the problems they aim to solve. Please take a measured approach to their longevity and usefulness.
NoSQL - your data is counting on you
Stop saying NoSQL, because its a dumb term. But more importantly, take care of your data. Store it and manage it appropriately and everyone will appreciate it. Only you can prevent timeouts and data mishaps that make everyone angry. Don't make everyone angry.
Interesting to peek over the transom at DB goings on some years after I retired from the IT business. Though the names have changed, the arguments seem familiar. I wonder if there remains any call for a hierarchical schema or higher-order normalization. Or has modern computing power reduced such ancient concepts to the ash heap? Actually, I'm hoping many of these problems have proven unsolvable, lest my oft-hacked personal data find its way into disadvantageous usage.
(Yes, I am related to Metaltoad's founder. It isn't his fault.)
Thu, 11/30/2017 - 18:54