Skip to main content

Introduction to Couchbase for Oracle Developers & Experts: Part 6: indexing


"Use the Index, Luke!"  -- https://use-the-index-luke.com


ORACLE

Couchbase

OVERVIEW


Index Documentation

Index Documentation

Types of Indexes: 

Primary & secondary Index (B-tree)

Bitmap Index

Partial Index

Partitioned Index

Function-based index

Spatial index

Search indexes (full text search)

Types of Indexes: 

Primary & secondary index (lock-free skiplist)

Partial index

Partitioned index

Functional-key index

Array index

Flex indexes

Search index

Spatial index


Indexing data structures

B-Tree

Bitmap

Spatial

Inverted tree

Indexing data structures

Lock-free skiplist

Inverted tree (text)

Z-curve (spatial)

CLASSES of INDEXES


Oracle indexes which can be large and is managed thru bufferpools.  Oracle text index can be creed with inmemory option.

Couchbase has two classes of indexes: Standard secondary index which can be large and relevant entries are paged in based on usage, memory optimized index, optimized for performance is entirely kept in memory and a write is done to the disk for recovery purposes.

INDEXING FEATURES


Updates;

Indexes are updated synchronously.  Changes are visible within the transaction (read your own writes).

Updates:

The indexes are updated asynchronously. However, within each transaction, changes are visible immediately (read your own writes) for all the access methods (including index scan).  Yes, we do magic!

Table Scans

For SQL indexes are optional. You need it for improving query latency, throughput and meeting SLAs. Tables have internal mechanisms to scan the entire table, partitioned or otherwise.   

Collection Scans

Couchbase collections are hash partitioned distributed table/collection. You can retrieve a document iff you have the document key. There isn’t a full scan access method just in the collection. You should build a PRIMARY INDEX which provides the equivalent of table scan for collection. You can issue arbitrary queries on it.

Index consistency

Index updates are synchronous and are visible based on the isolation level. 

Index consistency:

Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes is always in place.  These subtle features are taken for granted in a single system and transactional RDBMS.  These options are available in modern distributed systems so applications can trade off consistency for availability and performance.

TYPES OF INDEXES


PRIMARY KEY index

CREATE TABLE t1(c1 int primary key)

CREATE TABLE t1(c1 int, constraint c1pk PRIMARY KEY(c1))

You can create a primary key one one or more columns and index is automatically created to enforce this primary key constraint.

PRIMARY KEY index

CREATE PRIMARY INDEX ON t1;

CREATE PRIMARY INDEX ip1 ON t1; 

Each JSON document you insert into Couchbase has a separate, user generated, unique per collection document key that can be up to 250 bytes. The primary key is simply an index on the document keys.  The uniqueness is enforced by the collection without the need for primary key.

SECONDARY index

CREATE INDEX i1 ON t(c1)

CREATE INDEX i1 ON t(c1, c2, c3)

CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC)

These are the workhorses for an OLTP workload. These form the kernel of the index and can be combined with most other features to form sophisticated, sometimes complex indexes to power the workload. This is Luke’s lightsaber. 

SECONDARY index

CREATE INDEX i1 ON t(c1)

CREATE INDEX i1 ON t(c1, c2, c3)

CREATE INDEX i1 ON t(c1 ASC, c2 DESC, c3 DESC)

The secondary indexes are similar to Oracle at a high level. Couchbase indexes do not support reverse scans. If you do need them, you need to specify the DESC order in the index definition. You can define that for each key.  Each type of index will be used for all types of range scans, it only makes a difference in performance for order-by query optimization. 

PARTIAL index

Partial Index gives us the ability to create both local and global indexes on only a subset of partitions within a partitioned table. Prior to Oracle 12c , you could not create indexes on selective partitions; Indexes always meant on all of the data.” 

PARTIAL INDEX

The concept here is similar to PostgreSQL than Oracle.

You can create indexes on any arbitrary subset of documents and the optimizer will choose the index automatically and when appropriate.

CREATE INDEX i1 ON t(c1) where c2 = “USA”;

CREATE INDEX i2 ON t(c1, c2) where c3 IN [“C”, 23, 24];

CREATE INDEX i3 ON t(c1) WHERE c4 LIKE “xyz%”

FUNCTION based index

Instead of indexing the column value AS-IS, you index the result of a function or an expression on it.

CREATE INDEX i1 ON t(LOWER(c1));

CREATE INDEX i2 ON t(c1 + (c2 * c4));


FUNCTION based index

The functionality is similar.

CREATE INDEX i1 ON t(LOWER(c1));

CREATE INDEX i2 ON t(c1 + (c2 * c4));


PARTITION index

Oracle has the widest functionality and support for partitioning for table and therefore Index: range, list, hash, interval, reference and all. All of these combined with other features makes it powerful. This is mainly targeted for data warehousing where you’re potentially analyzing large sets of data and “logically pruning” partitions to scan for the query has significant benefit.

PARTITION index

Couchbase collections are always hash partitioned. By default, the index is global and in a single partition. You’ll have defined the partitioned index as part of CREATE INDEX. Couchbase supports hash partitioned index, but this can be on any arbitrary expression.  Just like any hash partitioned object, queries with equality or IN expression on the partitioning key will benefit from partition pruning; others will get the benefit from parallel scans.

For partitioning by range, you’ll have to use partial index syntax and create multiple indexes.  Oracle-like range or interval index syntaxes are unavailable. 

DOMAIN indexes (Search)

You can index and query text for language awareness, stemming, etc using Oracle text.  These indexes are used by the optimizer when you have CONTAINS, CATSEARCH, or MATCHES predicates. Oracle text supports only character types and cannot index numerical or datetime data types. 

Full Text Search (Search)

Text search is similar to Oracle. Couchbase FTS can index text(string), numbers, booleans and datetimes making the search usable in a larger number of use cases.  Couchbase FTS has an elaborate query language and all of its features can be used by N1QL using SEARCH() predicate.  N1QL also exploits the FTS index for its FLEX indexing optimization for the queries. Flex index is a technology where a single index in FTS can support arbitrarily complex predicates in N1QL.  This helps when users are given flexibility to choose custom predicates for their reports. 

Use cases:

  1. Flexible query support: Index on fields (a, b, c) can have predicates (a= 10 and (b = 20 or c between 30 and 40)); It can also have predicates ((a = 10 or c = 20) or (b between 10 and 20); etc, etc.

  2. Dynamic query support: when you create this flex index with (dynamic property set to true), FTS will index all the fields in the document, including the ones unknown at the time of index creation. This takes flexibility to new heights. Index evolves as the schema evolves. 

  3. All the cases above support searching in addition to range queries.

  4. In addition to search, FTS allows simple aggregation known as facets.

  5. FTS also helps efficient processing of multiple array predicates because it can index any number of arrays in a single index efficiently

The full text search has a fuller functionality to enable modern agile development and effective search.

DOMAIN indexes (Spatial)

These indexes non-tradition types like point, line, polygon, etc with ability to issues filters like overlaps, contains and nearest neighbor.  

Couchbase SPATIAL index

Couchbase spatial index uses z-curve data structure and is integrated into FTS indexing and querying. This also enables you to index and query scalars, arrays, text and spatial using a single index!  You can learn more here, here and here 

ARRAY index

Oracle calls this MULTIVALUE index – essentially, the index will have multiple entries pointing to the same doing. A normal index will has one index entry per row.


CREATE MULTIVALUE INDEX mvi_1 ON mytable t

      (t.jcol.credit_score.numberOnly());



ARRAY index

Array is THE difference between the relational model and the JSON model.   — Gerald Sangudi

As we saw in the data model section, it’s easy to store array, but difficult to index it.  Couchbase supports the most generalized array indexing in databases. From simple array indexes on a field, set of fields, and expressions on each one of those. As mentioned above, Using FTS, Couchbase can index multiple array fields in a single index and pushdown multiple query predicates to the index scan to improve performance. 

  

BITMAP index

Unavailable. 

Index High Availability

Oracle allows creation of multiple indexes with the same index definition. So, if one of the indexes is unavailable (redbuild/etc), the other can be used for queries.

Index High Availability

You can simply create additional copies of the index by specifying the num_replica parameter to CREATE INDEX statement. The query engine automatically distributes the the workload between the indexes based on the performance and workload. 

Index consistency

Index updates are synchronous and are visible based on the isolation level. 

Index consistency:

Index updates are asynchronous and are visible based on index scan consistency. In Couchbase transactions, the index scan levels are request_plus by default and read-your-own-writes is always in place.  These subtle features are taken for granted in a single system and transactional RDBMS.  These options are available in modern distributed systems so applications can trade off consistency for availability and performance.

INDEX ADVISOR

Oracle’s SQL advisor includes index advisor among other things.

INDEX ADVISOR

Couchbase has ADVISE statement, ADVISOR function and advisor service. 



With so many types of indexes, the choices for the optimizer is plenty. In the next article, we compare the two optimizers!



Comments

Popular posts from this blog

Swami Vivekananda: The Monk That Nobody Sent to Chicago

  There’s a saying in Chicago: “We don’t want nobody that nobody sent.” This was the cold reception Swami Vivekananda faced when he arrived in the windy city in July 1893, determined to attend the World Parliament of Religions that September. He belonged to no organization, carried no letter of recommendation, his countrymen were nobody, and represented an alien religion to the Western world. As the days passed, his hope of attending the parliament dwindled. With money running out and the odds stacked against him, he left the Windy City and went to Boston, praying for a glimmer of opportunity.  Swamiji came to America to share India’s most profound gift: the wisdom of the Hindu sages, preserved through centuries of oral tradition and embodied by its monks. This was 1893, not 1993—India was under the British grip, its resources drained, and its spirit subdued. Swamiji’s mission was not just a cultural exchange; it was a bold step toward envisioning a future where India could re...

Why Should Databases Go Natural?

From search to CRM, applications are adopting natural language and intuitive interactions. Should databases follow? This article provides a strategic perspective. Amid the many technological evolutions in software and hardware (CISC/RISC, Internet, Cloud, and AI), one technology has endured:  Relational Database Systems   (RDBMS), aka SQL databases. For over 50 years, RDBMS has survived and thrived, overcoming many challenges. It has evolved and adopted beneficial features from emerging technologies like object-relational databases and now competes robustly with   NoSQL databases .  Today, RDBMS dominates the market, with four of the top five databases and seven of the top ten being relational. RDBMS has smartly borrowed ideas, like JSON support, from NoSQL, while NoSQL has also borrowed from RDBMS. NoSQL no longer rejects SQL. From a user perspective, all modern databases have SQL-inspired query language and a set of APIs. All applications manage the respective data...

iQ Interactive: Cool Things for Developers on Couchbase Capella iQ

  The landscape of software development is ever-evolving with the advent of new technologies. As we venture into 2023, natural language processing ( NLP ) is rapidly emerging as a pivotal aspect of programming. Unlike previous generations of tools that primarily aimed at enhancing coding productivity and code quality, the new generation of Artificial Intelligence ( GenAI ) tools, like iQ, is set to revolutionize every facet of a developer's workflow. This encompasses a wide range of activities: Reading, writing, and rewriting specifications Designing, prototyping, and coding Reviewing, refactoring, and verifying software Going through the iterative cycle of deploying, debugging, and improving the software Create a draft schema and sample data for any use case Natural language queries. Generate sample queries on a given dataset Fix the syntax error for a query Don't stop here. Let your imagination fly. Although the insights garnered from iQ are preliminary and should be treated ...