Monday, 30 June 2025

Oracle Indexes: What BLEVEL, LEAF_BLOCKS, and CLUSTERING_FACTOR Really Tell You

BLEVEL: The GPS Navigation System
Imagine you're trying to find a book in a massive library. BLEVEL tells you how many floors you need to climb to reach your destination.

What BLEVEL really means:
- 0: You're already on the right floor (tiny indexes)
- 1: One elevator ride to the correct section (most common)
- 2+: Multiple elevator rides and stair climbs (very large tables)

Real Story:  BLEVEL is your index's "navigation depth." It counts how many levels Oracle must traverse from the root block to reach your data. Think of it as the number of "hops" your query makes through the index structure.
When to Worry:  A BLEVEL that suddenly jumps from 1 to 3 without a corresponding data increase might signal index fragmentation. But don't panic—modern Oracle versions handle this pretty well.

LEAF_BLOCKS: The Storage Bill
LEAF_BLOCKS is straightforward: it's how much disk space your index is consuming at the leaf level.
The Simple Truth:  More data = more leaf blocks = bigger index size.

Performance Reality Check:
- Single-row lookups: Don't care much about leaf blocks (you only read one)
- Range scans: Care a lot (you read multiple blocks)

Pro Tip: If your LEAF_BLOCKS is growing faster than your table data, you might be over-indexing or indexing the wrong columns.

CLUSTERING_FACTOR: The Performance Crystal Ball
This is where things get interesting. CLUSTERING_FACTOR is like a fortune teller for your query performance.
CLUSTERING_FACTOR = Number of times Oracle switches table blocks when reading index entries in order

What the Numbers Mean:
- Low CF (close to LEAF_BLOCKS): Your data is well-organized—like books sorted by author on library shelves
- High CF (close to table NUM_ROWS): Your data is scattered—like books randomly placed throughout the library

The Performance Impact: Range scans become expensive when Oracle has to jump between hundreds of different table blocks instead of reading consecutive blocks.

## Let's Build a Real Example

Step 1: Create Our Test Environment
CREATE TABLE employees_test (
    employee_id NUMBER(6) PRIMARY KEY,
    first_name VARCHAR2(20),
    last_name VARCHAR2(25),
    email VARCHAR2(25),
    phone_number VARCHAR2(20),
    hire_date DATE,
    job_id VARCHAR2(10),
    salary NUMBER(8,2),
    commission_pct NUMBER(2,2),
    manager_id NUMBER(6),
    department_id NUMBER(4)
);

-- Populate with 100,000 employees
BEGIN
    FOR i IN 1..100000 LOOP
        INSERT INTO employees_test VALUES (
            i,                                    -- Sequential employee_id (good clustering)
            'FirstName' || i,
            'LastName' || i,
            'email' || i || '@example.com',
            '555-123-' || LPAD(i, 4, '0'),
            SYSDATE - DBMS_RANDOM.VALUE(1, 365*5),
            CASE MOD(i, 5) WHEN 0 THEN 'IT_PROG' ELSE 'SA_REP' END,
            ROUND(DBMS_RANDOM.VALUE(3000, 15000), 2),
            ROUND(DBMS_RANDOM.VALUE(0, 0.4), 2),
            CASE WHEN MOD(i, 100) = 0 THEN NULL 
                 ELSE ROUND(DBMS_RANDOM.VALUE(100, 200)) END,
            CASE MOD(i, 3) WHEN 0 THEN 90 
                           WHEN 1 THEN 60 
                           ELSE 30 END            -- Scattered department_id (poor clustering)
        );
    END LOOP;
    COMMIT;
END;
/

-- Create our test index
CREATE INDEX idx_emp_dept_id ON employees_test (department_id);
-- Gather fresh statistics
EXEC DBMS_STATS.GATHER_TABLE_STATS(
    OWNNAME => USER, 
    TABNAME => 'EMPLOYEES_TEST', 
    CASCADE => TRUE
);

Step 2: The Moment of Truth
Now let's see what our indexes look like:
-- Check our table stats
SELECT table_name, blocks, num_rows  FROM dba_tables  WHERE table_name = 'EMPLOYEES_TEST';
TABLE_NAME                           BLOCKS   NUM_ROWS
-------------------------------- ---------- ----------
EMPLOYEES_TEST                         1467     100000

-- Check our index stats
SELECT index_name, table_name, CLUSTERING_FACTOR,blevel, leaf_blocks, num_rows, distinct_keys FROM user_indexes WHERE table_name = 'EMPLOYEES_TEST';
INDEX_NAME                       TABLE_NAME                       CLUSTERING_FACTOR     BLEVEL LEAF_BLOCKS   NUM_ROWS DISTINCT_KEYS
-------------------------------- -------------------------------- ----------------- ---------- ----------- ---------- -------------
SYS_C008470                      EMPLOYEES_TEST                                1464          1         187     100000        100000
IDX_EMP_DEPT_ID                  EMPLOYEES_TEST                                4392          1         196     100000             3

## Decoding the Results
Good Guy:  Primary Key Index (SYS_C008470)
BLEVEL = 1:  Perfect! One hop to reach any employee ID
LEAF_BLOCKS = 187:  Reasonable size for 100,000 unique values
CLUSTERING_FACTOR = 1464:  Excellent! Very close to table blocks (1467)
Why This Rocks:  Employee IDs were inserted sequentially (1, 2, 3...), so the physical table order matches the index order. When you query `WHERE employee_id BETWEEN 1000 AND 1050`, Oracle reads consecutive table blocks—like reading pages in a book.
Troublemaker:  Department Index (IDX_EMP_DEPT_ID)
BLEVEL = 1: Same efficient navigation
LEAF_BLOCKS = 196: Slightly larger (non-unique values)
CLUSTERING_FACTOR = 4392: Uh-oh! Much higher than table blocks
Why This Hurts: Department IDs are scattered (90, 60, 30, 90, 60, 30...). A query like `WHERE department_id = 90` will find all matching rows in the index, but then jump all over the table to fetch them—like trying to read a book where every page is in a different room.

## Performance Showdown
Let's see these differences in action with real explain plans:
###Fast Query (Good Clustering)
SELECT * FROM employees_test WHERE employee_id BETWEEN 1000 AND 1050;
Explain Plan:
Plan hash value: 240947081
------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |    52 |  5200 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES_TEST |    52 |  5200 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_C008470    |    52 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
   2 - access("EMPLOYEE_ID">=1000 AND "EMPLOYEE_ID"<=1050)

Scorecard:
- Cost: 3 - Lightning fast
- INDEX RANGE SCAN - Oracle loves this index
- 52 rows retrieved efficiently

###Slow Query (Poor Clustering)

SELECT * FROM employees_test  WHERE department_id = 90;
Explain Plan:
Plan hash value: 304096605
------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                | 33333 |  3255K|   400   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEES_TEST | 33333 |  3255K|   400   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
   1 - filter("DEPARTMENT_ID"=90)

Scorecard:
- Cost: 400 - 133x slower!
- TABLE ACCESS FULL - Oracle completely ignored the index!
- 33,333 rows processed with full table scan

Plot Twist:  Why Oracle Ignored the Index
Here's the fascinating part:  Oracle's optimizer looked at the department index and said, "No thanks, I'll do a full table scan instead."

Math Behind the Decision:
- Department 90:  ~33,333 rows (1/3 of 100,000)
- Clustering Factor:  4,392 (very high)
- Table Blocks:  1,467
Oracle calculated that using the index would require reading nearly 4,400 different table blocks randomly. A full table scan (1,467 blocks sequentially) was actually faster! 

Lesson: High clustering factors can make indexes so inefficient that the optimizer prefers full table scans. This is why monitoring CLUSTERING_FACTOR is crucial.

Sunday, 15 June 2025

Global Hash-Partitioned Index

What is a Global Hash-Partitioned Index?
A global index is a single index structure that spans across all partitions of a partitioned table. It's not tied to the table's partitioning strategy.

Hash Partitioning:
The index itself is divided into a fixed number of partitions using a hash algorithm on the index key. This aims to distribute index entries evenly across these index partitions.
Independence from Table Partitioning: The partitioning of a global index does not necessarily align with the partitioning of the underlying table. You can have a range-partitioned table with a hash-partitioned global index, for example.
Supports Unique Constraints: Global indexes can enforce unique constraints across the entire table, even if the table is partitioned. This is a key advantage over local indexes in certain scenarios where the unique key doesn't include the table's partition key.

How do Global Hash-Partitioned Indexes Improve Performance?
Global hash-partitioned indexes are particularly beneficial in specific scenarios, primarily by:
Reducing Contention for "Hot Blocks" (Especially during Inserts):
In OLTP (Online Transaction Processing) environments, where there are a high number of concurrent inserts into a table, especially when the indexed column is a sequence-generated primary key (e.g., an IDENTITY column or a sequence NEXTVAL), a standard B-tree index can become a "hot spot." This is because new entries are constantly being added to the "rightmost" leaf block of the index, leading to contention for that single block.

Hash partitioning distributes these new index entries across multiple index partitions (and thus multiple physical blocks) based on the hash of the key. This spreads the insert workload, reducing contention, "buffer busy waits," and improving overall insert throughput. This is often referred to as addressing the "90-10 split problem."
Improving Point Queries:
For equality queries (e.g., WHERE index_column = :value), the database can quickly calculate the hash of the value and go directly to the specific index partition where that value's entry is located. This allows for efficient access to data, often with fewer logical I/Os compared to a non-partitioned index on a very large table, as the index tree to traverse within a partition is smaller.
Facilitating Parallelism:
While not as inherently parallel for range scans as local non-prefixed indexes, global hash-partitioned indexes can still benefit from parallel query execution for certain operations, as the workload can be distributed across the index partitions.
Managing Large Indexes:
Partitioning a very large index into smaller, more manageable pieces can simplify maintenance tasks (e.g., rebuilding a single partition instead of the entire index), though global indexes do require more manual maintenance if table partitions are dropped or added, as the global index may become unusable and need rebuilding or UPDATE INDEXES clause to stay valid.

When to Consider Using Global Hash-Partitioned Indexes:

High-volume OLTP systems: Especially when suffering from "hot block" contention on indexes due to sequential key generation.
Indexes on non-partitioning key columns: 

When you need a unique constraint or fast access on columns that are not part of the table's partitioning key.
Queries primarily involving equality (=) or IN list predicates: These types of queries can leverage partition pruning on the index. Range predicates (like BETWEEN, >, <) generally won't benefit from hash partition pruning on the index side.
When even data distribution is crucial: Hash partitioning aims to distribute data evenly, which can be beneficial for load balancing. For optimal distribution, the number of hash partitions should ideally be a power of 2 (e.g., 2, 4, 8, 16, 32).
When the partitioning key column has high cardinality and is fairly unique.


Potential Drawbacks and Considerations:

No Partition Pruning for Range Scans: 

A significant limitation is that global hash-partitioned indexes typically do not benefit from partition pruning for range queries. If your queries frequently use BETWEEN, >, or < on the indexed column, a different index strategy (like a global range-partitioned index or a local index if applicable) might be more suitable.


Maintenance Overhead: 

When table partitions are dropped or truncated, global indexes (both hash and range) can become UNUSABLE and require rebuilding or an UPDATE INDEXES clause during the table partition operation. This can lead to downtime for large indexes. Local indexes are often preferred for tables where partitions are frequently added/dropped (e.g., historical data).
Choosing the Right Partitioning Key: Just like with table partitioning, selecting an appropriate partitioning key for the index is crucial for achieving even data distribution and performance benefits. A poor choice can lead to skewed data and "hot" index partitions, negating the benefits.
Complexity: Managing partitioned indexes adds a layer of complexity to database administration.