Skip to main content

When to go from SQL to NoSQL databases?

While SQL has become lingua franca of relational database systems, SQL <> RDBMS. SQL is overwhelmingly  query language on RDBMS, terms, SQL database and RDBMS are used interchangeably.  I developed SQL features in RDBMS at Sybase, Informix & IBM. So, I feel good about it.

On the other hand, the rebels use NoSQL as an antonym to SQL.  While the popular expansion of NoSQL was Not-Only-SQL, it usually means the database does not support SQL... at least, it won't support good bit of SQL operations like joins, aggregations, etc.

Let’s assume, you have an application using SQL database.  Your boss returns from another big data conference — where they’ve invariably pitched the power of NoSQL — and asks you: “Josh,  why aren’t you moving to NoSQL?”.

While — volume, velocity, variety — give quick reasons to get to NoSQL, let’s look at the reasons and scenarios in bit more detail.

Let’s look at each criteria:

1. Critical Stable Application:   You’re running a mission critical application developed in-house that hasn’t changed in years. The database is reliable. Backup scripts work fine.  Report queries work reliably.
Backup & restore scripts are verified. Hot stand by is always ready. Your hardware can store & handle the data for next two years. And all the users you expect.

So, there is really no obvious reason to consider the alternative approach. 

2. A Very Dynamic Application: You’re attending meetings after meetings schema re-design?  You’re planning additional columns to the table in case it’s needed later on.  You add, unnamed_int, unnamed_varchar32, etc so when the request comes to add a new type, you can instantly provide that just by renaming the column.  You’re a hero!  However, just to decide the need for a particular column, there are numerous meetings.

Application is adjusting to market requirements.  Hence you need to change the data stored in tables — new data, new forms of data.  Any time you change the application, you need downtime.  And the changes never end. 

This is definitely a major symptom to consider NoSQL databases which usually have flexible schema. You only create the table and don’t specify the columns and types in NoSQL.  

In fact,in MongoDB, you don’t need to create the database or table ahead of time.
Simply start USEing a database, start to SAVE data into a table.  Database and tables are automatically created for you.  No column names to decide, no type lengths to think about.

3. Dynamic data sources: Many applications like single view of customer need to integrate new data sources: e.g. twitter feed of the customer or sales from the company you just acquired.  Changing data sources means changing the schema. Changing schema does not only mean you’re adding the columns — It also means, there could be conflicts that cannot be resolved. 

One data source has customer-id defined as integer and another as alphanumeric. That would force you to keep all the data in one superset data type — in this case, alphanumeric.  It would also mean, any existing data should be migrated.  Then, you need to change the SQL on this schema. Testing everything.

4. Scale-up and Scale-out:  I was watching shark-tank yesterday.  One of the contestants said, "year to date sales is 2 million dollars and we expect to close the year with 7 million dollars". Even the experienced sharks were surprised.  The reason for this was 75% of their sales come during christmas season.

Each business has variability due to seasons and events.  If you don't handle increased demand seamlessly, you'll risk losing revenue at the point of greatest profitability.

RDBMS can scale.  They've scaled from single processor systems to tens or even hundreds of processor-cores in a single machine, known as SMP -- symmetric multi-processor machines.    This is scale-up.  Every time you scale-up, you need planning, downtime, data migration and power-on procedures.  This is too costly, slow and risky.  

NoSQL systems were designed to scale-out.  When you reach system capacity, you simply add new systems into the cluster.  Data and workload gets distributed to the new configuration seamlessly.  No application downtime!   This is highly suited for modern commodity clusters and instant provisioning on the cloud.

5. TCO -Total cost of Ownership & Operation: Roughly, RDBMS vendors charge PER CORE.  Ivy Bridge from Intel has  12-core and 15-core model. NoSQL vendors charge PER NODE (machine within certain configuration limits).  The list price per node (defined as machine with up to 24-cores)  is typically the price of 2-cores.  So, 12 times difference right there!

So, it should be an easy call, right?  Unfortunately, not.  There is a huge feature difference between SQL and NoSQL systems. 

RDBMS does more than simple get and put of the row. Once you design a good schema, the number of SQL you can issue on this schema is infinite.  Data architects understand the data relationships and general application requirements and design the schema. Applications have flexibility on the queries they issue on this.  

In NoSQL, you're often encouraged to use the primary-key.  Often (e.g. HBase), primary-key is the only way to access the data efficiently. 

While RDBMS provides rich functionality, using the 80/20 hypothesis, majority of applications are using under 20% of the features.  Even the 20% of the RDBMS features used by these apps are often considered advanced NoSQL features!  However, many of them have workarounds that make migration worthwhile. When you consider the cost difference, some work arounds are worth it!

6. Read-Write Patterns: RDBMS generally assumes you write the data once and read it many-may times.  Assuming 80% SELECT and 20% INSERT/UPDATE/DELETE the typical data pattern.  Hence, a slower WRITE throughput on the system isn't penal. There are data warehousing situations where you're loading large amounts of data.  And there are many tools and techniques to do this.  Special loaders which load without the overhead of transactional log is one of them.  Other is to create a separate table, prepare the data in a separate table and then simply ATTACH that table into a single RDBMS table online.


NoSQL databases are designed to handle continuous stream of data at a very high speed.  And since you're advised to create minimal additional indices, the udpates will be eve faster.   The WRITE patterns in NoSQL suitable applications is typically much higher than 20%. 

SQL or NoSQL?

Going from a SQL to NoSQL system is a lengthy process. The pros-cons should be considered carefully. NoSQL systems offer exciting new technologies to solve many business problems now and will eventually mature to support complex operations and applications. 

Comments

Popular posts from this blog

Swami Vivekananda: The Monk That Nobody Sent to Chicago

  There’s a saying in Chicago: “We don’t want nobody that nobody sent.” This was the cold reception Swami Vivekananda faced when he arrived in the windy city in July 1893, determined to attend the World Parliament of Religions that September. He belonged to no organization, carried no letter of recommendation, his countrymen were nobody, and represented an alien religion to the Western world. As the days passed, his hope of attending the parliament dwindled. With money running out and the odds stacked against him, he left the Windy City and went to Boston, praying for a glimmer of opportunity.  Swamiji came to America to share India’s most profound gift: the wisdom of the Hindu sages, preserved through centuries of oral tradition and embodied by its monks. This was 1893, not 1993—India was under the British grip, its resources drained, and its spirit subdued. Swamiji’s mission was not just a cultural exchange; it was a bold step toward envisioning a future where India could re...

Why Should Databases Go Natural?

From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective. Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured:  Relational Database Systems   (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with   NoSQL databases .  Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data...

Accelerating Insights With Couchbase Columnar

Insights come from observing and analyzing data from all sources. Couchbase Columnar helps quickly analyze data from a variety of data sources with zero-ETL. The purpose of computing is insight, not numbers. -  Richard Hamming Capella  Columnar  is an advanced real-time analytics database service from  Couchbase , targeted for real-time data processing, offering SQL++ for processing  JSON  (semi-structured) data and more. This service enables data to be managed locally and streamed continuously from both relational and NoSQL databases, or simply process data on S3. The columns or fields of the source are directly mapped to a field in the JSON document at the destination automatically. This is really a zero-ETL operation. A key feature of this system is its ability to continuously stream data, making it immediately available for querying, thus ensuring near real-time data processing.  At the heart of Columnar's architecture is a Massively Parallel Proce...