Skip to main content

Introduction to Couchbase for Oracle Developers and Experts: Part 7: Optimizer

To database what Yoda is to Star Wars, optimizer is. - Yoda

Separating the HOW (logical, physical representations, access methods) from WHAT (the SQL query) was the genius of the relational model. SQL has been unreasonably effective for relational model and beyond. A good optimizer is critical for SQL irrespective of the data model. The job of an optimizer is to produce an efficient execution algorithm, commonly known as the query plan or simply the plan.  Oracle is a RDBMS, used a rule based optimizer(RBO) for many years before  switching to Selinger style cost-based optimizer.  Couchbase N1QL is a declarative query language that is SQL for JSON. Couchbase N1QL also started with a rule based optimizer.  In 2021, we added a cost based optimizer(CBO) for the query service. CBO for the analytics service is on the roadmap.  JSON is flexible, supports complex objects, arrays in documents and predicates.   Couchbase's patented cost based optimizer builds on and extends the traditional CBO concepts to JSON’s flexible, nested model. Just like Oracle, RBO and CBO coexist in Couchbase.   This article describes the query optimizer in the query service of Couchbase.  Couchbase analytics service still uses the rule based optimizer. 


*See the full intro to Couchbase for Oracle developer series here


Below is a typical flow of query execution. 


The job of the optimizer is to transform the query into a query plan.:

To achieve that, at a high level, query optimizer does the following steps: 

  1. Rewrite a  query to its logical equivalent to make the optimizer easier/better

    1. E.g. Expression rewrites, subquery flattening

  2. Join reordering (aka join enumeration)

    1. Determine the most efficient join order

  3. Choose efficient access method for each data source and join method

    1. Choose the right index, predicate push downs,  join type (e.g. hash join or nested loop join) for each join

  4. Create a physical plan for the query engine to execute.


Oracle Resources on Optimizer: 

SQL Maria has great byte sized videos explaining the Oracle Optimizer.  Much of the Oracle features and descriptions in this article from them.  Oracle’s SQL tuning guide is also a great resource for learning about the Oracle optimizer. Optimizer is the key to the success of SQL and performance of your queries.


Let’s deep dive into each of the optimizers and compare the features. 


ORACLE

Couchbase

RESOURCES


Optimizer Concepts

Optimizer Statistics

Optimizer hints

Advisor

Oracle CBO Book


Rule Based Optimizer

CBO documentation

CBO Blog and Stats blog

The CBO Talk


QUERY TRANSFORMATION (REWRITES) (chapter 5)


INLIST handling

IN list handling.

We’ve seen queries with 25,000 values in the IN list! N1QL optimizer handles various forms of IN list efficiently by using multiple index lookups on both static and dynamic list by creating a hash map which has O(1) lookup complexity during IN clause evaluation. This was added in Couchbase 6.5.

Subqueries:

Significant subquery optimizations via query rewrites. See chapter 5.4


Subqueries. There are simple optimizations for both correlated and non-correlated suqueries. Subquery unnesting (flattening) or rewriting subqueries as joins isn’t done yet.

Materialized view rewrites. Used mainly in data warehouse workload.

Couchbase does have materialized views (simply called views) that execute map-reduce functions. These views have a separate API to access and are not used by N1QL.  

Star join transformations and others.

Query service is targeted for OLTP like workload. Many of the transformations like star transformations are useful for data warehousing workload. 

JOIN METHODS


Nested loop

Nested loop

Hash join

Hash join. With RBO, you need to specify the build or the probe side of the has join for one of the collection; CBO determines probe and build sides based on the statistics.  The query service does limit the number of items for the hash table to 16 million to avoid the memory bloat. .

Sort-merge join

Unavailable

Join reordering is an important step in optimization; Join order can change a query execution time from hours to minutes and vice versa. It’s based on the cardinality estimates create a plan with the lowest cost.

Couchbase 7.1 adds cost based join-reordering optimization (aka, join enumeration). Prior to 7.1, the join order is the same as the order specified in the FROM clause of the SELECT statement.

ACCESS METHODS


Full table scan; Reads all the rows in the table in without any logical order. Rarely done in an OLTP application and use only where index paths are unavailable.

Couchbase collection is a hash distributed table and doesn’t support the exact full table (collection) scan.  However, you can create a primary index and that provides equivalent functionality. In addition, you can filter the data based on the document key and you can also exploit the key-ordering in the index 

E.g. FROM t ORDER BY META(t).id

The primary index can also be used to efficiently filter and paginate the result sets. 

FROM t 

WHERE META(t).id LIKE “CA::%”

ORDER BY META(t).id

OFFSET 100

LIMIT 10; 

Pro tip: Consider using keyset pagination.

Table access by rowid

Document key is the equivalent of rowid. Index scans return a set of document keys to query service which uses a look up by rowid to retrieve the document for further processing.  The applications can also retrieve documents by specifying one or more keys


SELECT * FROM t USE KEYS [“k1”]

SELECT * FROM t USE KEYS [“k1”, “k2”, “K8”, “K4”]

Index unique scan

Index scan with limit 1

Index range scan

Index range scan is the main workhorse for the query service to use the indexer. Query specifies the spans for the index to filter and return the results.  This range scan can also skip the keys in case of a composite index.  E.g. index on (c1, c2, c3) can be used with index spans on c1 and c3 for WHERE c1 = 20 and c3 > 40; 

Index skip scan

The optimizer does a statistical analysis of the  leading columns to scan and filter values in non-leading columns.  This can reduce the number of indexes you need in the syste. 

Unavailable.

Full Index Scan

Full index scan are used depending on the predicate and cost.  

E.g.   WHERE c1 IS NOT MISSING ORDER BY c1; 

Fast Full Index Scan

[All the columns are in the index and used for order by]

This is simply ordinary index scan with optimizer matching the order by expressions with index key order; Additional optimization by pushing down the pagination clauses (OFFSET, LIMIT) to the index scan

Index join

In Couchbase parlance, it’s called multi-index scan. Multiple indexes are used to evaluate a single collection, results of which can be joined with other indexes. 

Bitmap indexes

Not available

Array indexes are known as multivalue indexes in Oracle. They need to have a a special code remove duplicates from the qualified rowids returned from the index scan. 

Array index scan is similar in Couchbase.  For arrays, there could be multiple index entries for a single document. Hence duplicate document keys returned by the index scan needs to be removed. Couchbase array indexes can itself have an array constructor, complex expressions and therefore is more flexible. 



Statistics Collection


ANALYZE statement

ANALYZE  statement

Same as

UPDATE STATISTICS statement

Automatic statistics collection via DBMS package. 

Unavailable

EXPLAIN


Multiple ways to generate the query plan. The plan is tabular with indentation to represent the order.

  1. EXPLAIN PLAN

  2. V$SQL_PLAN (Plan for statement in cursor cache)

See the full guide here.

Single way to generate the query plan. The plan generated is in JSON and therefore the query execution tree is correctly represented without having to guess.

  1. EXPLAIN. Explained here and here.


Graphical plan with profile: I searched and searched. I could only see an old graphical explain; Nothing for 21c

Graphical explain plan via query work bench. 

Graphical plan with profile: I searched and searched. I could only see an old graphical explain; Nothing for 21c

When you execute the query, the profile will give the same plan with different shades of color to give you a sense of the expense. Clicking on each box will give you stats on that iterator. 

Hints: Extensive hints are available.

Hints: Extensive hints are available. N1QL hints are similar to the other Oracle database, MySQL than Oracle database. 


Comments

Popular posts from this blog

Swami Vivekananda: The Monk That Nobody Sent to Chicago

  There’s a saying in Chicago: “We don’t want nobody that nobody sent.” This was the cold reception Swami Vivekananda faced when he arrived in the windy city in July 1893, determined to attend the World Parliament of Religions that September. He belonged to no organization, carried no letter of recommendation, his countrymen were nobody, and represented an alien religion to the Western world. As the days passed, his hope of attending the parliament dwindled. With money running out and the odds stacked against him, he left the Windy City and went to Boston, praying for a glimmer of opportunity.  Swamiji came to America to share India’s most profound gift: the wisdom of the Hindu sages, preserved through centuries of oral tradition and embodied by its monks. This was 1893, not 1993—India was under the British grip, its resources drained, and its spirit subdued. Swamiji’s mission was not just a cultural exchange; it was a bold step toward envisioning a future where India could re...

Why Should Databases Go Natural?

From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective. Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured:  Relational Database Systems   (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with   NoSQL databases .  Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data...

iQ Interactive: Cool Things for Developers on Couchbase Capella iQ

  The landscape of software development is ever-evolving with the advent of new technologies. As we venture into 2023, natural language processing ( NLP ) is rapidly emerging as a pivotal aspect of programming. Unlike previous generations of tools that primarily aimed at enhancing coding productivity and code quality, the new generation of Artificial Intelligence ( GenAI ) tools, like iQ, is set to revolutionize every facet of a developer's workflow. This encompasses a wide range of activities: Reading, writing, and rewriting specifications Designing, prototyping, and coding Reviewing, refactoring, and verifying software Going through the iterative cycle of deploying, debugging, and improving the software Create a draft schema and sample data for any use case Natural language queries. Generate sample queries on a given dataset Fix the syntax error for a query Don't stop here. Let your imagination fly. Although the insights garnered from iQ are preliminary and should be treated ...