Skip to main content

Introduction to Couchbase for Oracle Developers & Experts: Part 1: Architecture


INTRODUCTION

You know Oracle, but need to a good overview of Couchbase, a modern NoSQL database? You're in the right place. This will speed up your understanding of Couchbase.  It’s not intended to evaluate all the features or performance.  Since Oracle is a relational database with SQL as the main query language, it's normal for developers to view other databases from SQL point of view. In this article, we'll compare and contrast Oracle with Couchbase from SQL point of view.  You'll get a good feel for the architecture and capabilities of Couchbase. We won't do a deep dive into high-availability, scale-out and sizing the databases in this series.  

This blog series will cover the following topics, comparing Oracle and Couchbase from a developer perspective.

  1. Architecture
  2. Database Objects
  3. Data types
  4. Data Modeling
  5. Statements and Features
  6. Indexes
  7. Optimizer
  8. Transaction
Here’s the overview comparison from logic to language, transactions to tooling.   In rest of the blogs in the series, we’ll cover each of the topics.

Both Oracle and Couchbase were developed to handle enterprise transactional and analytical workloads.  They coexist in enterprises.  It’s with this sensibility, we’re providing a comparison of these products to help Oracle developers to understand what’s Couchbase.


ARCHITECTURE

SINGLE NODE DEPLOYMENT
Oracle Single node
COUCHBASE

CUSTER DEPLOYMENT
Oracle Cluster
ORACLE RAC (CLUSTER) deployment (SHARED DISK)
-------------
Couchbase Cluster
COUCHBASE CLUSTER deployment (SHARED NOTHING)
-----------------
Couchbase MDS deployment
COUCHBASE Multi-dimensional deployment (SHARED NOTHING)
-----------------


TopicOracleCouchbase
Single Node DeploymentOracle was built  and optimized for deploying in a single node database (in the deployment shown in 2).  It was then enhanced to run on SMP (symmettric multi-socket/core processing) machines.  All the layers of a database (logical, physical space management, locking, logging, are all managed for a single machine).   The code executes in a single thread (or cooperating threads) with resources like locking designed for a single machine.

This configuration is the majority of the development and production deployment. When you run out of disk capacity, you add capacity.  When you run out of CPU capacity, you migrate the workload to a larger machine with more sockets, more cores.  Or these days, bigger VMs or containers.

The APP for server communication uses proprietary oracle protocol. Higher level SDKs work on top of this.

Couchbase server can be installed on a single server or a single node or a VM/container for development: example MacOS, Linux or Windows.  All of the database functions are abstracted into services (data, index, query, search, analytics) which work in concert to provide the database functionality. These services co-ordinate to expose the services via REST API.  The applications use Couchbase SDKs or tools to create indexes, run N1QL queries, get results and visualize data.

This configuration is used mainly for development on your laptop, cloud, VM or a container. Single node deployment is rarely for production. In fact, Couchbase gives you a warning when you’ve deployed on a single node there’s only one copy of your data!  High availability is built into the core.

The API for server communication is via REST API and Memcached binary protocol.  All SDKs use REST or Memcached protocols underneath.

LanguagesSQL, PL/SQL, XQuery, SES (Secure Enterprise Search)N1QL (SQL for JSON), Javascript, Python functions (with N1QL), Direct data access API, FTS (full text search).
ConnectionsA persistent connection between user process and database process/thread running on behalf of the client application.HTTP over TCP/IP connection.
Connection for Memcached. Each request for data fetch, and modification, query is a REST call.
SessionA session starts from CONNECT to QUIT. Each application “session” has a corresponding session on the Oracle server. Within a session, you can execute a 1 query or one million queries spanning multiple transactions.Just like connection, in Couchbase, each request in a distinct request. Each request has to carry both bind parameters and any context parameters (e.g. timeout). All the N1QL statements for a single transaction execute on the same query node; concurrent transactions can execute on any number of nodes.
User, authenticationSupports OS user, database user. Authentication can be done by OS, database, third party like LDAP, kerberos.Supports OS user, database user. Authentication can be done by OS, database, third party like LDAP, kerberos.
DatabaseOracle instance
->Databases
    ->Schema
       -> Tables
           -> Rows
               -> Columns

One Oracle instance can have multiple databases.
One Oracle database can have multiple schema  and each schema can have multiple tables. Each table can have multiple partitions, indexes,  constraints, triggers, etc.

Cluster
-> Buckets
     -> Scopes
         -> Collections
              -> JSON documents
                  -> Fields
One Couchbase instance can have up to 30 BUCKETS.  Each bucket can have many scopes, each scope many collections.  Within each scope, you can also create functions (in SQL, Javascript, Python).

Each JSON document will have a user provided unique document key.

E.g: “cust.x817.022.4u2”
“cust.x2317.402.2742”
“parts.lkfh38.sldkv”
“parts.lkfh38.sldkv”
“parts.lkfh38.sldkv”

TableTableCollection
ColumnColumnField
All the documents in Couchbase have to be JSON documents conforming to http://json.org/
Consider a simple 1-level JSON document.
Document Key: “cust:2984”
Doc: {“a”:1, “b”: True, “c”: “Hello”}
In your mind map, you can think of this document as a row, individual attributes, “a”, “b”, “c” as columns, the document key as a primary key.
This document contains simple scalar values.JSON can contain arrays and objects, arrays of objects, objects containing arrays. Attribute names are referenced from top down (think of objects in Oracle.This will be discussed in details in “Database Types” section.
JSON support   JSON is a type for a columnJSON is the model for the whole document
SDKshttp://bit.ly/20TCGn0

http://bit.ly/2FoiFP8

https://www.couchbase.com/downloads  (scroll down to SDKs) 
Multi-node architecture
(Homogeneous deployment).
Multi Dimensional DeploymentNot Available. Each node in Oracle RAC (Real Application Cluster) has full Oracle SQL capability including the transactional and administrative capability. 

Each node in the cluster can have one or more combination of the services: Data, Query, Index, Search,  Analytics and eventing. The cluster manager (co-ordinator) is aware of the distribution and availability of the services and will inform the services about each other.

The SDK is also aware of the data distribution, query and FTS nodes. SDK tries to balance the workload among different query nodes for query workload, data nodes for data/KV workload, FTS nodes for search workload.

System ArchitectureSingle node deployment is suited for SMP.
Multi-node deployment share the same disk/storage. Hence, Shared Disk System. You can scale up to a point by adding more compute and sharing the same disk. Eventually, shared disk and IO throughput becomes the bottleneck. Transactions is supported by distributed lock manager which reaches threshort with a small number of nodes. Developer is oblivious (for most part) to the instances: single node or multi-node RAC. The SQL supported are the same and transactions work seamlessly in a multi-node environment.Oracle has developed a sophisticated distributed lock manager, buffer pool synchronization, etc to achieve this. It does require the expensive infiniband to maintain the speed and performance.

Despite this, it’s not uncommon to partition the workload between multiple RAC nodes to minimize the conflicts in locking, etc.

Uses shared nothing architecture from the ground up enabling Couchbase to scale out. Each node can scale up to use the resources in each node; database functionality is implemented as co-operating multiple services running on a single system. Coordination is done via message passing, even on a single system.

This services (data, index, query, search, analytics, and eventing) lend themselves to scale out seamlessly. You can have all of the services in each node and simply add new nodes with the same services. All the services will understand the multi-node topology.  This makes Couchbase elastic.

Applications don’t use all of the services uniformly. The bottlenecks could be on data, indexer of the query. So, in Couchbase, you size and add new nodes simply to run the bottlenecked services. This will provide optimal resource utilization and better performance compared to a homogeneous deployment. It’ll gives you flexibility and reduce your cost. The multi-dimensional scaling is easy to deploy as any other configuration and manageability remains the same.

QueryOracle has Full fledged SQL support.
The SQL extensions include support for JSON and text search.
Oracle also has XML DB, supporting XML, XQuery, SQL/XML, etc.
N1QL (SQL for JSON) provides declarative language that's an enhanced version of SQL for JSON. It comes with indexes and a full fledged query engine, patented cost based optimizer, transaction support, UDF (userd defined functions - aka stored procedures) to execute queries efficiently. Details at : https://query.couchbase.com
Same language is supported for operational workload (OLTP) and analytical workload (OLAP), just like RDBMS. Developers also get a simple GET and SET API for each document in a collection.

Couchbase also supports builtin SQL and JavaScript functions in the query service to support equivalent of PL/SQL functionality. 

Couchbase FTS (Full Text Search) helps you create text index and search.  This is fully integrated with N1QL
http://bit.ly/2vbcbOF
High AvailabilityOracle’s MAA (Maximum Availability Architecture) recommends: “On the Cluster configuration screen, in the Disk Group Details section, MAA recommends choosing Oracle Automatic Storage Management (ASM) HIGH redundancy for ALL (DATA and RECO) disk groups for best protection and highest operational simplicity”

Oracle has facility to create hot standby servers, schema based replication, change capture and probably more.

Within a single cluster, you can have multiple copies of data and index just by specifying the number of copies you’d like to have. You can have up to 3 copies of the data and any number of index copies. No additional setup required.

Across-multiple clusters, you can replicate the data with built-in XDCR (cross-data center replication).

TransactionsMulti-statement, ACID with multiple isolation levelsSupports multi-statement ACID transactions with READ-COMMITTED isolation level.  Since these are distributed transactions, it comes with configurable durability as well.  Oracle uses pessimistic locking; Couchbase uses optimistic locking (using CAS) for its simple updates as well as multi-statement transactions.  
DriversJDBC, ODBC, .NET, LINQ and moreCouchbase SDK (Java, .NET, LINQ, PHP, Python, Go), Simba JDBC/ODBC, JDBC driver for tableau
Data ModelSupports both normalized and denormalized data model.Denormalized (aggregated) JSON model.
Relationships between multiple type of documents (e.g. Orders to Customers) can be represented and processed. Either the child (orders) or the parent (customer) can store the primary key of the related documents and then JOIN them. This relationship is implied, not enforced by a constraint (e.g. Foreign Key Constraint) in Couchbase. 
High level SQL-N1QL Comparison:

Here's a sneak peak of the comparison between the Oracle's SQL model versus Couchbase's NoSQL(JSON) model for objects, language and query processing. We'll discuss these in detail in rest of this blog series.


SQL N1QL comparison
(Click on the image to view PDF)


Comments

Popular posts from this blog

Swami Vivekananda: The Monk That Nobody Sent to Chicago

  There’s a saying in Chicago: “We don’t want nobody that nobody sent.” This was the cold reception Swami Vivekananda faced when he arrived in the windy city in July 1893, determined to attend the World Parliament of Religions that September. He belonged to no organization, carried no letter of recommendation, his countrymen were nobody, and represented an alien religion to the Western world. As the days passed, his hope of attending the parliament dwindled. With money running out and the odds stacked against him, he left the Windy City and went to Boston, praying for a glimmer of opportunity.  Swamiji came to America to share India’s most profound gift: the wisdom of the Hindu sages, preserved through centuries of oral tradition and embodied by its monks. This was 1893, not 1993—India was under the British grip, its resources drained, and its spirit subdued. Swamiji’s mission was not just a cultural exchange; it was a bold step toward envisioning a future where India could re...

Why Should Databases Go Natural?

From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective. Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured:  Relational Database Systems   (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with   NoSQL databases .  Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data...

iQ Interactive: Cool Things for Developers on Couchbase Capella iQ

  The landscape of software development is ever-evolving with the advent of new technologies. As we venture into 2023, natural language processing ( NLP ) is rapidly emerging as a pivotal aspect of programming. Unlike previous generations of tools that primarily aimed at enhancing coding productivity and code quality, the new generation of Artificial Intelligence ( GenAI ) tools, like iQ, is set to revolutionize every facet of a developer's workflow. This encompasses a wide range of activities: Reading, writing, and rewriting specifications Designing, prototyping, and coding Reviewing, refactoring, and verifying software Going through the iterative cycle of deploying, debugging, and improving the software Create a draft schema and sample data for any use case Natural language queries. Generate sample queries on a given dataset Fix the syntax error for a query Don't stop here. Let your imagination fly. Although the insights garnered from iQ are preliminary and should be treated ...