Skip to main content

When to go from SQL to NoSQL databases?

While SQL has become lingua franca of relational database systems, SQL <> RDBMS. SQL is overwhelmingly  query language on RDBMS, terms, SQL database and RDBMS are used interchangeably.  I developed SQL features in RDBMS at Sybase, Informix & IBM. So, I feel good about it.

On the other hand, the rebels use NoSQL as an antonym to SQL.  While the popular expansion of NoSQL was Not-Only-SQL, it usually means the database does not support SQL... at least, it won't support good bit of SQL operations like joins, aggregations, etc.

Let’s assume, you have an application using SQL database.  Your boss returns from another big data conference — where they’ve invariably pitched the power of NoSQL — and asks you: “Josh,  why aren’t you moving to NoSQL?”.

While — volume, velocity, variety — give quick reasons to get to NoSQL, let’s look at the reasons and scenarios in bit more detail.

Let’s look at each criteria:

1. Critical Stable Application:   You’re running a mission critical application developed in-house that hasn’t changed in years. The database is reliable. Backup scripts work fine.  Report queries work reliably.
Backup & restore scripts are verified. Hot stand by is always ready. Your hardware can store & handle the data for next two years. And all the users you expect.

So, there is really no obvious reason to consider the alternative approach. 

2. A Very Dynamic Application: You’re attending meetings after meetings schema re-design?  You’re planning additional columns to the table in case it’s needed later on.  You add, unnamed_int, unnamed_varchar32, etc so when the request comes to add a new type, you can instantly provide that just by renaming the column.  You’re a hero!  However, just to decide the need for a particular column, there are numerous meetings.

Application is adjusting to market requirements.  Hence you need to change the data stored in tables — new data, new forms of data.  Any time you change the application, you need downtime.  And the changes never end. 

This is definitely a major symptom to consider NoSQL databases which usually have flexible schema. You only create the table and don’t specify the columns and types in NoSQL.  

In fact,in MongoDB, you don’t need to create the database or table ahead of time.
Simply start USEing a database, start to SAVE data into a table.  Database and tables are automatically created for you.  No column names to decide, no type lengths to think about.

3. Dynamic data sources: Many applications like single view of customer need to integrate new data sources: e.g. twitter feed of the customer or sales from the company you just acquired.  Changing data sources means changing the schema. Changing schema does not only mean you’re adding the columns — It also means, there could be conflicts that cannot be resolved. 

One data source has customer-id defined as integer and another as alphanumeric. That would force you to keep all the data in one superset data type — in this case, alphanumeric.  It would also mean, any existing data should be migrated.  Then, you need to change the SQL on this schema. Testing everything.

4. Scale-up and Scale-out:  I was watching shark-tank yesterday.  One of the contestants said, "year to date sales is 2 million dollars and we expect to close the year with 7 million dollars". Even the experienced sharks were surprised.  The reason for this was 75% of their sales come during christmas season.

Each business has variability due to seasons and events.  If you don't handle increased demand seamlessly, you'll risk losing revenue at the point of greatest profitability.

RDBMS can scale.  They've scaled from single processor systems to tens or even hundreds of processor-cores in a single machine, known as SMP -- symmetric multi-processor machines.    This is scale-up.  Every time you scale-up, you need planning, downtime, data migration and power-on procedures.  This is too costly, slow and risky.  

NoSQL systems were designed to scale-out.  When you reach system capacity, you simply add new systems into the cluster.  Data and workload gets distributed to the new configuration seamlessly.  No application downtime!   This is highly suited for modern commodity clusters and instant provisioning on the cloud.

5. TCO -Total cost of Ownership & Operation: Roughly, RDBMS vendors charge PER CORE.  Ivy Bridge from Intel has  12-core and 15-core model. NoSQL vendors charge PER NODE (machine within certain configuration limits).  The list price per node (defined as machine with up to 24-cores)  is typically the price of 2-cores.  So, 12 times difference right there!

So, it should be an easy call, right?  Unfortunately, not.  There is a huge feature difference between SQL and NoSQL systems. 

RDBMS does more than simple get and put of the row. Once you design a good schema, the number of SQL you can issue on this schema is infinite.  Data architects understand the data relationships and general application requirements and design the schema. Applications have flexibility on the queries they issue on this.  

In NoSQL, you're often encouraged to use the primary-key.  Often (e.g. HBase), primary-key is the only way to access the data efficiently. 

While RDBMS provides rich functionality, using the 80/20 hypothesis, majority of applications are using under 20% of the features.  Even the 20% of the RDBMS features used by these apps are often considered advanced NoSQL features!  However, many of them have workarounds that make migration worthwhile. When you consider the cost difference, some work arounds are worth it!

6. Read-Write Patterns: RDBMS generally assumes you write the data once and read it many-may times.  Assuming 80% SELECT and 20% INSERT/UPDATE/DELETE the typical data pattern.  Hence, a slower WRITE throughput on the system isn't penal. There are data warehousing situations where you're loading large amounts of data.  And there are many tools and techniques to do this.  Special loaders which load without the overhead of transactional log is one of them.  Other is to create a separate table, prepare the data in a separate table and then simply ATTACH that table into a single RDBMS table online.


NoSQL databases are designed to handle continuous stream of data at a very high speed.  And since you're advised to create minimal additional indices, the udpates will be eve faster.   The WRITE patterns in NoSQL suitable applications is typically much higher than 20%. 

SQL or NoSQL?

Going from a SQL to NoSQL system is a lengthy process. The pros-cons should be considered carefully. NoSQL systems offer exciting new technologies to solve many business problems now and will eventually mature to support complex operations and applications. 

Comments

Popular posts from this blog

Swami Vivekananda: The Monk That Nobody Sent to Chicago

  There’s a saying in Chicago: “We don’t want nobody that nobody sent.” This was the cold reception Swami Vivekananda faced when he arrived in the windy city in July 1893, determined to attend the World Parliament of Religions that September. He belonged to no organization, carried no letter of recommendation, his countrymen were nobody, and represented an alien religion to the Western world. As the days passed, his hope of attending the parliament dwindled. With money running out and the odds stacked against him, he left the Windy City and went to Boston, praying for a glimmer of opportunity.  Swamiji came to America to share India’s most profound gift: the wisdom of the Hindu sages, preserved through centuries of oral tradition and embodied by its monks. This was 1893, not 1993—India was under the British grip, its resources drained, and its spirit subdued. Swamiji’s mission was not just a cultural exchange; it was a bold step toward envisioning a future where India could re...

Why Should Databases Go Natural?

From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective. Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured:  Relational Database Systems   (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with   NoSQL databases .  Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data...

iQ Interactive: Cool Things for Developers on Couchbase Capella iQ

  The landscape of software development is ever-evolving with the advent of new technologies. As we venture into 2023, natural language processing ( NLP ) is rapidly emerging as a pivotal aspect of programming. Unlike previous generations of tools that primarily aimed at enhancing coding productivity and code quality, the new generation of Artificial Intelligence ( GenAI ) tools, like iQ, is set to revolutionize every facet of a developer's workflow. This encompasses a wide range of activities: Reading, writing, and rewriting specifications Designing, prototyping, and coding Reviewing, refactoring, and verifying software Going through the iterative cycle of deploying, debugging, and improving the software Create a draft schema and sample data for any use case Natural language queries. Generate sample queries on a given dataset Fix the syntax error for a query Don't stop here. Let your imagination fly. Although the insights garnered from iQ are preliminary and should be treated ...