Skip to main content

Couchbase N1QL Transaction: An Elastic and Scalable Distributed Transaction

 

SQL is the only 22nd-century language available for developers today.

Abstract

In relational database systems, SQL is more than a declarative query language. It includes procedural language (T-SQL, PL/SQL, etc) and defines transactions and their semantics. SQL as a query language has been unreasonably effective even in NoSQL database systems. However, few NoSQL database systems support transactions. The ones that support come with a long list of limitations and/or were unable to support SQL operations within the transaction.  We introduce and explain transactions in Couchbase N1QL: SQL for JSON.  N1QL transactions are multi-everything: multi-document, multi-bucket, multi-scope, multi-collection, and multi-DML-statement.

N1QL Transactions is now generally available with Couchbase 7.0. You can download it here. See the documentation here.

Introduction

Couchbase Cluster infographic.

N1QL is a declarative language to manipulate JSON. Couchbase stores all the documents in the data service.  The query service orchestrates the query execution optimizing the query, creating an execution plan, and then executing it using data, indexing, and FTS.  The Couchbase SDK and query interaction protocol are built via REST over HTTP/S. N1QL DML statements include SELECTINSERTUPDATEUPSERTDELETE, and MERGE.  

N1QL Transactions

Here’s an example of transactions in RDBMS and Couchbase N1QL:

Transactions
MySQL Database (Statements are same/similar in Oracle, SQL Server, Informix & DB2)
Couchbase Database (7.0)
Insert data. Tuples in MySQL, JSON documents in Couchbase
INSERT INTO customer(cid, name, balance) VALUES(4872, “John Doe”, 724.23);
INSERT INTO customer(cid, name, balance) VALUES(1924, “Bob Stanton”, 2735.48);
INSERT INTO customer
VALUES(“cx4872”, {“cid”: 4872, “name”:”John Doe”, “balance”:724.23});
INSERT INTO customer
VALUES(“cx1924”, {“cid”: 1924, “name”:”Bob Stanton”, “balance”:2735.48});
Simple transaction, debit, and credit. Intermediate selects have to be read their own updates (RYOW)
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
COMMIT ;
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from the customer;
COMMIT ;
The second transaction with partial rollback.
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
SAVEPOINT s1;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
ROLLBACK WORK TO SAVEPOINT s1;
SELECT cid, name, balance from customer;
COMMIT ;
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
SAVEPOINT s1;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
ROLLBACK WORK TO SAVEPOINT s1;
SELECT cid, name, balance from customer;
COMMIT ;

If you didn’t see much difference, that’s because there isn’t. 

N1QL Transactions Statements 

N1QL transactions are a set of transactions that include any of the DML statements in all forms: no restrictions. Transactional protection is issued from new statements: BEGIN/START, COMMIT, ROLLBACK, and SAVEPOINT.

Start Transaction (Same as the Begin Work Statement)

This statement starts a new transaction, assigns a new transaction ID, and returns the transaction ID to the caller. There are two rules the SDKs, tools (e.g. CBQ shell) follow to successfully execute the rest of the transaction.

  1. Send this transaction ID as a parameter for every subsequent statement within the transaction. This is how the query service knows the statement should be run as part of a particular transaction.
  2. Couchbase can have multiple query service nodes, but a single transaction is executed on a single query node. You can start a new transaction in ANY QUERY NODE. However, the rest of the statements FOR THAT SINGLE TRANSACTION should be sent to the VERY SAME query node.

Commit Transaction or Commit Work

This commits all the changes in the transaction to the data store. This is a distributed commit of the transaction into the Couchbase key-value data store. The commit supports all of the Couchbase durability options. This is still a distributed system — just like astronomy, rare things happen often. On any failure, the complete transaction is rolled back automatically and the application needs to retry the transaction. Failures can occur for various reasons: network failure, node failure, node overloaded, and write-write conflict.  Just like direct Couchbase WRITEs are optimistic and failures can occur due to concurrent writes resulting in CAS conflicts, N1QL transactions can also fail due to write conflicts. We implement a form of optimistic concurrency approach to transactions.

Rollback Transaction or Rollback Work

On an application-issued rollback, all of the modifications done within the transaction are rolled back.

As you’ve seen from the examples above, N1QL also supports savepoints and rollbacks to the savepoints within the transaction.  From the application perspective, these work the same as the RDBMS counterparts.

Transactional Features 

The transaction is more than just the statements — it’s all about the semantics and guarantees.  Hence the ACID definition. We talked about atomicity earlier wrt to COMMIT.  Let’s talk a bit more about this.

  • ATOMICITY: This is required for both the whole transaction and each statement. The DML statements will atomically roll back on any failure, but the transaction itself is open and can be continued. An example of a failure is a document key conflict on insert.
  • CONSISTENCY: This ensures the constraints are applied consistently for each statement. The only constraint in Couchbase is the unique constraint on the document key.  N1QL checks for the pre-existence of each of the keys inserted and rolls back the statement on any conflict.  Remember we use optimistic concurrency control.  That means, even after the INSERT is successful,  the commit stage can still run into a write-write conflict because some other session could have been inserted between the insert and commit. You’ll have to retry the transaction on such failures.
  • ISOLATION: We support the COMMITTED READ isolation level. All the data that’s read and evaluated is committed data in the index and data store. By default, we use the stringent request_plus consistency on the index reads.  This means, for a given predicate, we use the latest data in the index to qualify the documents to qualify select/update/delete. We then go the extra step to fetch the documents from the KV store and re-apply the predicates to ensure the latest committed version of the document is qualified and updated.

If performance wasn’t a consideration, everyone would have used serializable transactions :-). You can change the scan consistency to unbounded for improved index scan performance.

Durability

N1QL supports all of the durability options and features with the Couchbase data store to ensure durability on our distributed database.

ConcurrencyConcurrency infographic.


Broadly speaking, database transactions use either pessimistic or optimistic concurrency control. Traditional single-node databases follow the pessimistic concurrency control to avoid conflicts.  This approach is also applied to some of the multi-node implementations like Oracle RAC, DB2 Sysplex. Multi-node implementations are possible but require expensive Infiniband, custom hardware, etc.

Optimistic concurrency control version each base unit of the tuple (rows in RDBMS, documents in Couchbase), remember the version they read to modify, and check if the version has changed during the write.  If there’s indeed a conflict, the whole transaction has to be retried. The advantage of this approach is, in a well-designed application, there should be few conflicts: you won’t withdraw cash and transfer money between accounts in the same nanosecond.  On the rare occasion you do, the retry is tolerated.

Concurrency in the N1QL Query Service

Couchbase N1QL uses optimistic concurrency control. Each transaction reads the documents it needs to update, updates them, and keeps the updated documents in its private per-transaction cache. When you issue a subsequent statement, the query service is aware of the updated documents within the transactions and uses that version instead of the older version reflected in the index/data.  This is how it provides READ-YOUR-OWN-WRITE support.  

This is modeled so all the DML statements, all operations (select, join, project, aggregate, nest, unnesst, etc) will all get this RYOW benefit providing a consistent and crucial feature of the transaction. Even while the application is doing transactions (doing both reads and writes), within the transaction we’re reading and caching the updates until the commit time. This is the READ phase of the transaction, And because of this approach, there’s no coordination between multiple transactions or multiple query nodes within a transaction until commit (WRITE phase). This ensures the performance and scalability of distributed transactions in Couchbase. And, before you ask,  all these work concurrently with Couchbase distributed transactions we released in 6.5.

Coordination is the bane of scalable systems — Peter Bailis

Next Steps

This is a short overview of what’s coming up with Couchbase Transactions.  In the upcoming series of articles, we’ll dive further into implementation, usage, SDK support, Lambda Transactions, Spring support, etc, and more.

Acknowledgments

N1QL transactions are the result of intense work and collaboration in Couchbase query, SDK, and QE teams to design and implement. Thank you!

Comments

Popular posts from this blog

Swami Vivekananda: The Monk That Nobody Sent to Chicago

  There’s a saying in Chicago: “We don’t want nobody that nobody sent.” This was the cold reception Swami Vivekananda faced when he arrived in the windy city in July 1893, determined to attend the World Parliament of Religions that September. He belonged to no organization, carried no letter of recommendation, his countrymen were nobody, and represented an alien religion to the Western world. As the days passed, his hope of attending the parliament dwindled. With money running out and the odds stacked against him, he left the Windy City and went to Boston, praying for a glimmer of opportunity.  Swamiji came to America to share India’s most profound gift: the wisdom of the Hindu sages, preserved through centuries of oral tradition and embodied by its monks. This was 1893, not 1993—India was under the British grip, its resources drained, and its spirit subdued. Swamiji’s mission was not just a cultural exchange; it was a bold step toward envisioning a future where India could re...

Why Should Databases Go Natural?

From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective. Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured:  Relational Database Systems   (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with   NoSQL databases .  Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data...

iQ Interactive: Cool Things for Developers on Couchbase Capella iQ

  The landscape of software development is ever-evolving with the advent of new technologies. As we venture into 2023, natural language processing ( NLP ) is rapidly emerging as a pivotal aspect of programming. Unlike previous generations of tools that primarily aimed at enhancing coding productivity and code quality, the new generation of Artificial Intelligence ( GenAI ) tools, like iQ, is set to revolutionize every facet of a developer's workflow. This encompasses a wide range of activities: Reading, writing, and rewriting specifications Designing, prototyping, and coding Reviewing, refactoring, and verifying software Going through the iterative cycle of deploying, debugging, and improving the software Create a draft schema and sample data for any use case Natural language queries. Generate sample queries on a given dataset Fix the syntax error for a query Don't stop here. Let your imagination fly. Although the insights garnered from iQ are preliminary and should be treated ...