Saturday, 31 May 2025

Understanding GV$INSTANCE_PING:RAC Interconnect Performance Monitoring

In an Oracle Real Application Clusters (RAC) environment, maintaining optimal interconnect performance is crucial for database operations. The GV$INSTANCE_PING view serves as a vital tool for Database Administrators (DBAs) to monitor and analyze the latency between RAC instances. This blog post explores how this view helps DBAs ensure optimal RAC performance.

What is GV$INSTANCE_PING?
GV$INSTANCE_PING is a dynamic performance view that provides detailed information about the measured latency of the interconnect between all instances in an Oracle RAC environment. Every few seconds, the PING process of each instance checks the response time of the interconnect to all other instances in the same database.

Key Features and Measurements
The view tracks two types of messages:
1. **500-byte messages (500B) Smaller test messages
2. **8-kilobyte messages (8K) Larger test messages

For each message type, the view provides comprehensive metrics:
Current Performance Metrics
- CURRENT_500B and CURRENT_8K: Latest response times in microseconds
- AVERAGE_500B and AVERAGE_8K: Average response times since instance startup
- MAX_500B and MAX_8K: Maximum observed response times

Cumulative Statistics
- COUNT_500B and COUNT_8K: Number of measurements taken
- WAIT_TIME_500B and WAIT_TIME_8K: Total accumulated response times
- WAIT_TIME_SQUARED_500B and WAIT_TIME_SQUARED_8K: Sum of squared response times (useful for statistical analysis)

Let's analyze an actual output from a 4-node RAC environment:
SQL> select * from GV$INSTANCE_PING;
   INST_ID   INSTANCE CURRENT_500B AVERAGE_500B   MAX_500B COUNT_500B WAIT_TIME_500B WAIT_TIME_SQUARED_500B CURRENT_8K AVERAGE_8K     MAX_8K   COUNT_8K WAIT_TIME_8K WAIT_TIME_SQUARED_8K     CON_ID
---------- ---------- ------------ ------------ ---------- ---------- -------------- ---------------------- ---------- ---------- ---------- ---------- ------------ -------------------- ----------
         2          1            7            7        250     370458        2924730                   1253          8          8        941     370458      3380206                 2547          0
         2          2            3            3       2249     370458        1217885                   5695          4          4        551     370458      1539529                  881          0
         2          3            7            7        304     370457        2887515                   1393          8          8      14780     370457      3389103               220212          0
         2          4            7            7        349     370458        2847595                   1298         10          8        515     370458      3313461                 1551          0
         3          1            8            7        227     370454        2952602                   1285          8          8        919     370454      3370483                 2855          0
         3          2            8            7      17647     370454        2901210                 314283          8          8       2206     370454      3416102                 6594          0
         3          3            3            3        531     370454        1226169                    760          4          3        611     370454      1557469                 1527          0
         3          4            7            7      17321     370454        2922328                 326255          8          8        617     370454      3322511                 1796          0
         4          1            7            7       1967     370454        2905185                   4652          8          8       1683     370454      3344914                 4081          0
         4          2            8            7        443     370454        2877140                   1129         22          8        709     370454      3396211                 1514          0
         4          3            7            7       1478     370454        2893488                   3060          8          9        367     370454      3372591                 1441          0
         4          4            3            3        749     370455        1211323                    790          4          4        199     370455      1517458                  421          0
         1          1            4            3        458     370459        1215589                    732          4          3        569     370459      1539772                 1330          0
         1          2            7            7        208     370458        2921436                   1109         10          9        561     370458      3448708                 1981          0
         1          3           15            7        151     370458        2929392                   1054          8          8        214     370458      3420401                 1464          0
         1          4            7            7        234     370459        2902809                   1181          8          8        203     370459      3362650                 1415          0

16 rows selected.


Key Observations from the Data:
1.Instance 2's Communication Patterns
   - Current 500B latency to Instance 1: 7 microseconds
   - Current 500B latency to Instance 2: 3 microseconds (self-ping)
   - Current 500B latency to Instance 3: 7 microseconds
   - Current 500B latency to Instance 4: 7 microseconds

2.Performance Characteristics
   - Average 500B latency is consistently 7 microseconds for most connections
   - Self-ping (Instance 2 to 2) shows better performance at 3 microseconds
   - Maximum latencies show some spikes:
     * 2249 microseconds to Instance 2
     * 304 microseconds to Instance 3
     * 349 microseconds to Instance 4

3.8K Message Performance
   - Current 8K latencies range from 4 to 10 microseconds
   - Maximum 8K latency shows significant variation:
     * 941 microseconds to Instance 1
     * 551 microseconds to Instance 2
     * 14780 microseconds to Instance 3 (notable spike)
     * 515 microseconds to Instance 4

4.Measurement Volume
   - Approximately 370,458 measurements taken for each connection
   - Consistent measurement count across instances indicates stable monitoring

Why is it Important for DBAs?
1.Performance Monitoring
   - Helps identify interconnect latency issues
   - Enables proactive detection of performance degradation
   - Provides historical data for trend analysis

2.Troubleshooting
   - Assists in diagnosing RAC performance problems
   - Helps identify instance-to-instance communication issues
   - Enables comparison of current vs. historical performance

Wednesday, 14 May 2025

Oracle Database 19c: Lost Write Protection Gets Smarter with AUTO

One of the most under-the-radar but important changes introduced in Oracle Database 19c Release Update (RU) 19.26 is the new default setting for the parameter DB_LOST_WRITE_PROTECT.
๐Ÿ”„ From NONE to AUTO: A Quiet but Powerful Shift

Previously, the default setting for DB_LOST_WRITE_PROTECT was:
DB_LOST_WRITE_PROTECT = NONE

This meant lost write detection was completely disabled by default unless explicitly turned on.

Starting with RU 19.26, Oracle has made a significant shift:
DB_LOST_WRITE_PROTECT = AUTO

This change is also reflected in Oracle Database 23ai. It’s more than just a default toggle — it's a smarter, adaptive mechanism that brings better protection with minimal performance trade-offs.
๐Ÿ’ก What Is a Lost Write?

A Lost Write is a critical data corruption event where:
    Oracle writes a data block to storage,
    The storage subsystem acknowledges the write as successful,
    But the data block is never actually written to persistent storage.

These can be caused by memory errors, firmware bugs, OS issues, or storage controller failures. Lost writes are notoriously difficult to detect and often surface during recovery or standby failovers.

Oracle defines a lost write as:
    "A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage."

๐Ÿง  Why AUTO Matters: Intelligent Lost Write Detection
The new AUTO value for DB_LOST_WRITE_PROTECT changes how Oracle handles lost write detection:
▶️ On a Primary Database:
    Oracle only logs buffer cache reads in the redo log if:
        Physical standby databases exist, and
        Real-time redo apply is enabled.

▶️ On a Standby Database:
    Oracle performs lost write detection only if the apply lag is within reasonable bounds.
    If apply lag exceeds:
        60 seconds, or
        2/3 of the FSFO apply lag threshold (if Fast-Start Failover is enabled),
    Then lost write detection is temporarily skipped to reduce overhead and ensure smooth role transitions.

⚙️ Comparison of DB_LOST_WRITE_PROTECT Modes
Value           Description                                                                Performance Impact
NONE         No lost write detection.                                                ๐Ÿ”‹ Low
TYPICAL    Logs reads for read/write tablespaces.                       ⚖️ Moderate
FULL           Logs reads for all tablespaces (read-only + read/write).    ๐Ÿ› ️ High
AUTO (NEW)    Adapts based on real-time redo and apply lag.    ⚙️ Balanced


๐Ÿšจ Real-World Impact: Why It Matters
With older RUs and the default NONE, lost write protection was disabled by default — even in Data Guard environments. This left many systems unprotected by mistake, simply because DBAs weren’t aware or didn’t configure it explicitly.

Now, AUTO intelligently enables protection only when it's needed, helping DBAs avoid the all-or-nothing trade-off between performance and data integrity.
๐Ÿงช What Should You Do?

Here are a few actions to take if you’ve recently applied Oracle 19.26 or higher:
✅ 1. Check the Current Setting
SHOW PARAMETER DB_LOST_WRITE_PROTECT;

If you see AUTO, you're on the new default.
✅ 2. Monitor Redo Generation
You may notice an increase in redo generation, especially if:
    You have a physical standby.
    Real-time redo apply is enabled.
See MOS Note: 37260974
“Database Generating Large Amount Of Redo After Applying Jan 2025 DBRU”


✅ 3. Understand the Error ORA-00752

If you ever encounter:
ORA-00752: recovery detected a lost write of a data block
Do not attempt to fix this manually. Instead:
    Shutdown the primary database.
    Failover to the standby.
    Open a support SR with Oracle immediately.


Thursday, 24 April 2025

Starting DBRU 19.27 and 23.8 Small Pages Are Not Allowed for RDBMS SGA In Exadata

If you are planning to apply the latest DBRU 19.27 or DBRU 23.8, please be aware of a significant change impacting SGA memory management. Small pages are no longer permitted for the System Global Area (SGA) on Exadata systems.

Attempting to start an Oracle instance configured to use small pages for the SGA will likely fail, and you might encounter the following error:
ORA-27138: unable to allocate large pages with current parameter setting

Your database instance alert log may also report errors related to the use_large_pages parameter, indicating that FALSE, TRUE, or AUTO are no longer supported settings for databases on Exadata.

Why this change?

DBRU 19.27 introduces this restriction to prevent performance and stability issues. Utilizing small 4k memory pages for the SGA can lead to:
    Page table bloat within the virtual machine.
    RDMA resource memory bloat on both the VM and the hypervisor.
    Potential instability affecting the database node and all instances running on it.

Resolution:
To ensure successful database startup and optimal performance after applying these DBRUs, please verify and adjust your large page configuration at the operating system level. For your SGA configuration, we strongly recommend the following:
    Remove memory_target and memory_max_target
    SET sga_target to your desired SGA size.
    Set use_large_pages=ONLY. This is the Maximum Availability Architecture (MAA) recommended value and ensures your entire SGA resides in huge pages on Linux-based systems.

๐Ÿ“š Further Reading & Reference Materials:
    My Oracle Support Note 401749.1: HugePages / Large Pages on Linux
    My Oracle Support Note 361323.1: Checklist for Valid HugePages Setup
    My Oracle Support Note 1392497.1: How to Calculate Recommended HugePages/LargePages Size for Oracle Database
    Oracle Documentation (Doc ID 3081878.1): Starting DBRU 19.27 and 23.8 Small Pages Are Not Allowed for RDBMS SGA In Exadata.

Please take the necessary steps to review your configurations and implement these recommendations to maintain the stability and performance of your Exadata environment.


Sunday, 13 April 2025

Smarter Patching in Oracle Database 23ai: Two-Stage Rolling Updates and Local Rolling Maintenance

Downtime is the eternal nemesis of enterprise systems, especially those powering critical workloads. With the release of Oracle Database 23ai,
Oracle has introduced a set of intelligent patching and maintenance features that drastically reduce downtime and improve database availability during upgrades and patching

Oracle RAC Two-Stage Rolling Updates
Starting with Oracle Database 23ai, the Oracle RAC two-stage rolling patches feature enables you to apply previously non-rolling patches in a rolling fashion.

Oracle RAC two-stage rolling patches are new types of patches, which you can apply in a rolling fashion in stages. Once the patch is applied on the first node, the second node is patched, and so on. When all the nodes are patched, you can enable the patches. Fixes that you apply using this feature are disabled by default.

You can view the patches applied via this method using:
SELECT * FROM V$RAC_TWO_STAGE_ROLLING_UPDATES;

Local Rolling Database Maintenance:
Another standout feature in 23ai is Local Rolling Database Maintenance—an enhancement designed to keep node-level downtime invisible to users during rolling patching. 

What It Does
During a rolling patch, Oracle can now start a second instance on the same node and relocate sessions to it, reducing the patching impact on connected applications.
This technique:
    Enables session failover on the same node, minimizing CPU and network overhead
    Reduces or eliminates application interruptions during patching
    Works great when paired with (Transparent) Application Continuity
Requirements
    The node must have enough CPU and memory resources to run two instances simultaneously.
    DBAs need to manage new ORACLE_HOME paths and instance configurations.
To prepare and perform local rolling maintenance:
srvctl modify database -d <dbname> -o $NEW_HOME --localrolling
srvctl transfer instance -d <dbname>
This makes it easier to patch a single node while keeping user sessions online and preventing workload relocation to other nodes in the cluster.

Thursday, 10 April 2025

How to Resolve "CheckActiveFilesAndExecutables" Failure in Oracle OPatch

When applying or rolling back a patch using Oracle OPatch, you might encounter the following error:
Prerequisite check "CheckActiveFilesAndExecutables" failed.
OPatch failed with error code 73
This typically happens when some files or libraries in the Oracle Home directory are currently being used by running processes

here i tried opatch rollback
[oracle@prdracdb01 ~]$ opatch rollback -id 35739076
Oracle Interim Patch Installer version 12.2.0.1.45
Copyright (c) 2025, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/19.0.0.0/dbhome_1
Central Inventory : /app/oraInventory
   from           : /u01/app/oracle/product/19.0.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.45
OUI version       : 12.2.0.7.0
Log file location : /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_10-47-49AM_1.log
Patches will be rolled back in the following order:
   35739076
Prerequisite check "CheckActiveFilesAndExecutables" failed.
The details are:
Following active files/executables/libs are used by ORACLE_HOME :/u01/app/oracle/product/19.0.0.0/dbhome_1
/u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1

UtilSession failed: Prerequisite check "CheckActiveFilesAndExecutables" failed.
Log file location: /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_10-47-49AM_1.log

OPatch failed with error code 73

[oracle@prdracdb01 ~]$

Patching failed with UtilSession failed: Prerequisite check "CheckActiveFilesAndExecutables" failed.

Analyzing the OPatch Log:
[oracle@prdracdb01 ~]$ cat /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_10-47-49AM_1.log
[Apr 10, 2025 10:47:49 AM] [INFO]   CAS Dynamic Loading :
[Apr 10, 2025 10:47:49 AM] [INFO]   CUP_LOG: Trying to load HomeOperations object
[Apr 10, 2025 10:47:49 AM] [INFO]   CUP_LOG: HomeOperations object created. CUP1.0 is enabled
[Apr 10, 2025 10:47:49 AM] [INFO]   OPatch invoked as follows: 'rollback -id 35739076 -invPtrLoc /u01/app/oracle/product/19.0.0.0/dbhome_1/oraInst.loc '
[Apr 10, 2025 10:47:49 AM] [INFO]   Runtime args: [-Xverify:none, -Xmx3072m, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch, -DCommonLog.LOG_SESSION_ID=, -DCommonLog.COMMAND_NAME=rollback, -DOPatch.ORACLE_HOME=/u01/app/oracle/product/19.0.0.0/dbhome_1, -DOPatch.DEBUG=false, -DOPatch.MAKE=false, -DOPatch.RUNNING_DIR=/u01/app/oracle/product/19.0.0.0/dbhome_1/OPatch, -DOPatch.MW_HOME=, -DOPatch.WL_HOME=, -DOPatch.COMMON_COMPONENTS_HOME=, -DOPatch.OUI_LOCATION=/u01/app/oracle/product/19.0.0.0/dbhome_1/oui, -DOPatch.FMW_COMPONENT_HOME=, -DOPatch.OPATCH_CLASSPATH=, -DOPatch.WEBLOGIC_CLASSPATH=, -DOPatch.SKIP_OUI_VERSION_CHECK=, -DOPatch.NEXTGEN_HOME_CHECK=false, -DOPatch.PARALLEL_ON_FMW_OH=]
[Apr 10, 2025 10:47:49 AM] [INFO]   Heap in use : 112 MB
                                    Total memory: 1963 MB
                                    Free memory : 1850 MB
                                    Max memory  : 2731 MB
[Apr 10, 2025 10:47:49 AM] [INFO]   Oracle Home       : /u01/app/oracle/product/19.0.0.0/dbhome_1
                                    Central Inventory : /app/oraInventory
                                       from           : /u01/app/oracle/product/19.0.0.0/dbhome_1/oraInst.loc
                                    OPatch version    : 12.2.0.1.45
                                    OUI version       : 12.2.0.7.0
                                    OUI location      : /u01/app/oracle/product/19.0.0.0/dbhome_1/oui
                                    Log file location : /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_10-47-49AM_1.log
[Apr 10, 2025 10:47:49 AM] [INFO]   Patch history file: /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch_history.txt
[Apr 10, 2025 10:47:51 AM] [INFO]   [OPSR-TIME] Loading raw inventory
[Apr 10, 2025 10:47:51 AM] [INFO]   [OPSR-MEMORY] Loaded all components from inventory. Heap memory in use: 153 (MB)
[Apr 10, 2025 10:47:51 AM] [INFO]   [OPSR-MEMORY] Loaded all one offs from inventory. Heap memory in use: 174 (MB)
[Apr 10, 2025 10:47:51 AM] [INFO]   [OPSR-TIME] Raw inventory loaded successfully
[Apr 10, 2025 10:47:51 AM] [INFO]   NRollback::no CAS enabled, OPatch runs with legacy process.
[Apr 10, 2025 10:47:51 AM] [INFO]   opatch-external.jar is in /u01/app/oracle/product/19.0.0.0/dbhome_1/OPatch/jlib/opatch-external.jar
[Apr 10, 2025 10:47:53 AM] [INFO]   [OPSR-TIME] Loading cooked inventory
[Apr 10, 2025 10:47:53 AM] [INFO]   [OPSR-MEMORY] : Loading cooked one offs. Heap memory used 215 (MB)
[Apr 10, 2025 10:47:55 AM] [INFO]   [OPSR-MEMORY] : Loaded cooked oneoffs. Heap memory used : 253 (MB)
[Apr 10, 2025 10:47:55 AM] [INFO]   [OPSR-TIME] Cooked inventory loaded successfully
[Apr 10, 2025 10:48:00 AM] [INFO]   [OPSR-TIME] buildFilesConflict begins
[Apr 10, 2025 10:48:00 AM] [INFO]   [OPSR-TIME] checkFileVersionConflict begins
[Apr 10, 2025 10:48:00 AM] [INFO]   Alias feature is enable?false
[Apr 10, 2025 10:48:00 AM] [INFO]   [OPSR-TIME] checkFileVersionConflict begins
[Apr 10, 2025 10:48:00 AM] [INFO]   [OPSR-TIME] buildFilesConflict ends
[Apr 10, 2025 10:48:00 AM] [INFO]   Subset Patch 29517242 remain inactive due to active superset patch 35643107
[Apr 10, 2025 10:48:00 AM] [INFO]   Subset Patch 29585399 remain inactive due to active superset patch 35655527
[Apr 10, 2025 10:48:00 AM] [INFO]   OPatchSessionHelper::sortOnOverlay() Sorting is not needed
[Apr 10, 2025 10:48:02 AM] [INFO]   Patches will be rolled back in the following order:
                                       35739076
[Apr 10, 2025 10:48:02 AM] [INFO]   Running prerequisite checks...
[Apr 10, 2025 10:48:02 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/oracle at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:02 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/oracle at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:02 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:02 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/extjob at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:02 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/extjob at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:02 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:02 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/extjobo at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:02 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/extjobo at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:02 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:02 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/setasmgid at Thu Apr 10 10:48:02 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/setasmgid at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:03 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/kfod at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/kfod at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:03 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/renamedg at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/bin/renamedg at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:03 AM] [INFO]   Start fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1 at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   Finish fuser command /sbin/fuser /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1 at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   SKIP_FUSER_WARNINGS is set to true (flag was set in opatch.properties)
[Apr 10, 2025 10:48:03 AM] [INFO]   Files in use by a process: /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1 PID( 42574 92627 )
[Apr 10, 2025 10:48:03 AM] [INFO]   Printing more details of active processes:
[Apr 10, 2025 10:48:03 AM] [INFO]   START PARENT PROCESS DETAILS
                                    PID COMMAND
                                    83924 bash
                                    END PARENT PROCESS DETAILS
[Apr 10, 2025 10:48:03 AM] [INFO]   START CHILD PROCESS DETAILS FOR PARENT PROCESS: 83924
                                    PID COMMAND
                                    92627 rman
                                    END CHILD PROCESS DETAILS FOR PARENT PROCESS: 83924
[Apr 10, 2025 10:48:03 AM] [INFO]   START PARENT PROCESS DETAILS
                                    PID COMMAND
                                    42548 Standby_sync.sh
                                    END PARENT PROCESS DETAILS
[Apr 10, 2025 10:48:03 AM] [INFO]   START CHILD PROCESS DETAILS FOR PARENT PROCESS: 42548
                                    PID COMMAND
                                    42574 python
                                    END CHILD PROCESS DETAILS FOR PARENT PROCESS: 42548
[Apr 10, 2025 10:48:03 AM] [INFO]   Following active files/executables/libs are used by ORACLE_HOME :/u01/app/oracle/product/19.0.0.0/dbhome_1
                                    /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1
[Apr 10, 2025 10:48:03 AM] [INFO]   Prerequisite check "CheckActiveFilesAndExecutables" failed.
                                    The details are:                      
                                    
                                    Following active files/executables/libs are used by ORACLE_HOME :/u01/app/oracle/product/19.0.0.0/dbhome_1
                                    /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1
[Apr 10, 2025 10:48:03 AM] [SEVERE] OUI-67073:UtilSession failed: Prerequisite check "CheckActiveFilesAndExecutables" failed.
[Apr 10, 2025 10:48:03 AM] [INFO]   Finishing UtilSession at Thu Apr 10 10:48:03 PDT 2025
[Apr 10, 2025 10:48:03 AM] [INFO]   Log file location: /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_10-47-49AM_1.log
[Apr 10, 2025 10:48:03 AM] [INFO]   Stack Description: java.lang.RuntimeException: Prerequisite check "CheckActiveFilesAndExecutables" failed.
                                        at oracle.opatch.OPatchSessionHelper.runRollbackPrereqs(OPatchSessionHelper.java:5253)
                                        at oracle.opatch.opatchutil.NRollback.legacy_process(NRollback.java:762)
                                        at oracle.opatch.opatchutil.NRollback.process(NRollback.java:217)
                                        at oracle.opatch.opatchutil.OUSession.nrollback(OUSession.java:1154)
                                        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                                        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                                        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                                        at java.lang.reflect.Method.invoke(Method.java:498)
                                        at oracle.opatch.UtilSession.process(UtilSession.java:355)
                                        at oracle.opatch.OPatchSession.process(OPatchSession.java:2640)
                                        at oracle.opatch.OPatch.process(OPatch.java:888)
                                        at oracle.opatch.OPatch.main(OPatch.java:945)
                                    Caused by: oracle.opatch.PrereqFailedException: Prerequisite check "CheckActiveFilesAndExecutables" failed.
                                        ... 12 more
[oracle@prdracdb01 ~]$

revealed that the rollback failed due to the shared library libclntsh.so.19.1 being used by running processes.
The log details also confirmed active processes holding the file:
Files in use by a process:
/u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1 PID( 42574 92627 )
...
PID COMMAND
92627 rman
42574 python

Identify and Kill the Processes:
Check which processes were using libclntsh.so.19.1:
[oracle@prdracdb01 bin]$ lsof | grep libclntsh.so.19.1
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.
lsof: WARNING: can't stat() bpf file system /opt/sentinelone/ebpfs/bpf_mount
      Output information may be incomplete.
python    42574               oracle  mem       REG              252,0  82204624          25219363 /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1
rman      92627               oracle  mem       REG              252,0  82204624          25219363 /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1

Kill both sessions
[oracle@prdracdb01 bin]$ kill -9 42574
[oracle@prdracdb01 bin]$ lsof | grep libclntsh.so.19.1
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.
lsof: WARNING: can't stat() bpf file system /opt/sentinelone/ebpfs/bpf_mount
      Output information may be incomplete.
rman      92627               oracle  mem       REG              252,0  82204624          25219363 /u01/app/oracle/product/19.0.0.0/dbhome_1/lib/libclntsh.so.19.1
[oracle@prdracdb01 bin]$ ps -ef | grep rman
oracle   11618 88116  0 11:02 pts/4    00:00:00 grep --color=auto rman
oracle   92627 83924  0 Jan06 pts/1    00:00:05 rman
[oracle@prdracdb01 bin]$ kill -9 92627

tried optach and completed successfully  
[oracle@prdracdb01 OPatch]$ opatch rollback -id 35739076
Oracle Interim Patch Installer version 12.2.0.1.45
Copyright (c) 2025, Oracle Corporation.  All rights reserved.
Oracle Home       : /u01/app/oracle/product/19.0.0.0/dbhome_1
Central Inventory : /app/oraInventory
   from           : /u01/app/oracle/product/19.0.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.45
OUI version       : 12.2.0.7.0
Log file location : /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_11-03-11AM_1.log
Patches will be rolled back in the following order:
   35739076
The following patch(es) will be rolled back: 35739076

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/19.0.0.0/dbhome_1')

Is the local system ready for patching? [y|n]
y
User Responded with: Y
Rolling back patch 35739076...
RollbackSession rolling back interim patch '35739076' from OH '/u01/app/oracle/product/19.0.0.0/dbhome_1'
Patching component oracle.rdbms, 19.0.0.0.0...
Patching component oracle.rdbms.rsf, 19.0.0.0.0...
RollbackSession removing interim patch '35739076' from inventory
Log file location: /u01/app/oracle/product/19.0.0.0/dbhome_1/cfgtoollogs/opatch/opatch2025-04-10_11-03-11AM_1.log
OPatch succeeded.
[oracle@prdracdb01 OPatch]$

 

Sunday, 6 April 2025

Log Generation Rate in Azure SQL Database Hyperscale Pool

What is Log Generation Rate?
Log generation rate refers to the speed at which transaction logs are produced in a database. In Hyperscale Pool, log generation is closely monitored and regulated to prevent overloading the system. Azure implements log rate governance to ensure that log generation stays within defined limits, keeping the system stable and performing efficiently.
Log Rate Governance in Hyperscale
By default, Hyperscale databases have a log generation limit of 105 MB/s, irrespective of the compute size. If everything is running smoothly, the log generation can reach 100 MiB/s. This is designed to ensure that logs are consistently processed and replicated without overwhelming system resources.
However, there may be situations where Azure needs to temporarily reduce the log generation rate. This happens when a secondary replica or page server falls behind in applying the transaction logs. The system will then throttle the log generation rate to allow the lagging components to catch up, ensuring the overall stability of the database.
When Does Log Generation Rate Get Reduced?
Log generation rate may be reduced for several reasons:
    Delayed log consumption by a page server or replica.
    A geo-secondary replica might be lagging in applying logs.
    Slow database checkpointing could delay log processing on the page server.
    Migration or reverse migration from Hyperscale to a non-Hyperscale database can also cause temporary delays in log consumption.
Monitoring Log Generation Rate with sys.dm_hs_database_log_rate
Azure provides the sys.dm_hs_database_log_rate dynamic management function (DMF) to monitor and troubleshoot log generation rates in Hyperscale. This function returns detailed information on which components are limiting the log generation rate, including:
    Current log rate limit
    Catch-up rate of components (bytes per second)
    Component-specific delays and logs that are behind
Key Columns in the DMF:
    current_max_log_rate: Maximum log rate limit in bytes per second.
    catchup_rate: The rate at which lagging components are catching up.
    catchup_bytes: The amount of log data that must be processed to catch up.
    role_desc: Describes the role of the component affecting log rate, such as a page server, replica, or geo-replica.
This tool helps you quickly identify any components causing delays and allows you to take corrective actions if needed.
How to Check Log Generation Rate in Your Database
To check the log generation rate for a specific database, use the following query:
SELECT current_max_log_rate_bps, role_desc, catchup_rate_bps, catchup_distance_bytes
FROM sys.dm_hs_database_log_rate(DB_ID(N'YourDatabaseName'));

For databases within an elastic pool, you can use NULL to get results for all databases in the pool:
SELECT current_max_log_rate_bps, role_desc, catchup_rate_bps, catchup_distance_bytes
FROM sys.dm_hs_database_log_rate(NULL);

Wait types appear in sys.dm_os_wait_stats when the log rate is reduced:
Wait type    Reason
RBIO_RG_STORAGE    Delayed log consumption by a page server
RBIO_RG_DESTAGE    Delayed log consumption by the long-term log storage
RBIO_RG_REPLICA    Delayed log consumption by an HA secondary replica or a named replica
RBIO_RG_GEOREPLICA    Delayed log consumption by a geo-secondary replica
RBIO_RG_DESTAGE    Delayed log consumption by the log service
RBIO_RG_LOCALDESTAGE    Delayed log consumption by the log service
RBIO_RG_STORAGE_CHECKPOINT    Delayed log consumption on by a page server due to slow database checkpoint
RBIO_RG_MIGRATION_TARGET    Delayed log consumption by the non-Hyperscale database during reverse migration
 


 

Wednesday, 2 April 2025

Understanding IPv4 and IPv6

Have you ever wondered how your computer knows where to send that email or how Netflix finds your device to stream your favorite show? 
The answer lies in something called Internet Protocol addresses - the postal system of the digital world.

What Are IP Addresses?
Think of IP addresses like street addresses for the internet. Just as your home needs a unique address for mail delivery, every device connected to the internet needs a unique identifier so data can find its way to the right destination.

IPv4: The Current Standard
IPv4 (Internet Protocol version 4) has been the backbone of internet communication since the 1980s. Here's what makes it tick:

The Format
IPv4 addresses look like this: 192.168.1.1 or 8.8.8.8. These are four numbers (each between 0-255) separated by dots. Behind the scenes, your computer sees this as a 32-bit binary number.

The Problem
Here's where things get interesting - and problematic. IPv4 can only create about 4.3 billion unique addresses. That sounds like a lot, but with billions of smartphones, computers, smart TVs, and IoT devices, we've essentially run out of new IPv4 addresses.


Types You Might Recognize
Public IP: Your internet-facing address (like 74.125.224.72)
Private IP: Your device's address on your home network (usually starts with 192.168)
Localhost: 127.0.0.1 - your computer talking to itself

IPv6: The Future is Here
IPv6 (Internet Protocol version 6) is the solution to IPv4's limitations, and it's already being rolled out worldwide.

The Format
IPv6 addresses are much longer: 2001:0db8:85a3:0000:0000:8a2e:0370:7334. They use hexadecimal (0-9, a-f) and are separated by colons instead of dots.

The Solution
IPv6 provides approximately 340 undecillion addresses. To put that in perspective, that's enough addresses to give every grain of sand on Earth its own IP address - with plenty left over!

What's Better About IPv6?
Unlimited addresses - no more address exhaustion
Better security - built-in encryption capabilities
Faster routing - simplified packet headers
Auto-configuration - devices can set up their own addresses

The Transition Challenge
So why aren't we all using IPv6 yet? 
The transition is like trying to upgrade every road in the world simultaneously while traffic keeps flowing. Most networks now run "dual-stack" - supporting both IPv4 and IPv6 - allowing for a gradual transition.

What This Means for You
As an everyday internet user, this transition is mostly invisible. Your devices likely already support both protocols, and your internet service provider is handling the complexity behind the scenes.
However, understanding these concepts helps you appreciate the incredible engineering that makes our connected world possible. Every time you send a message, stream a video, or browse the web, you're participating in one of humanity's greatest collaborative achievements - the internet.

Looking Forward
IPv6 isn't just about solving today's address shortage; it's about enabling tomorrow's innovations. From smart cities with millions of connected sensors to personal devices we haven't even imagined yet, IPv6 provides the foundation for our digital future.
The next time you effortlessly connect to the internet, remember the elegant addressing system working behind the scenes - and marvel at how 32 bits evolved into 128 bits to keep our world connected.

Sunday, 30 March 2025

VECTOR_DISTANCE

What is VECTOR_DISTANCE?
The VECTOR_DISTANCE function calculates the distance between two vectors (represented as expr1 and expr2). Depending on the context, the vectors can represent various types of data, such as images, text, or numbers.
Key Points:
    Purpose: Calculates the distance between two vectors.
    Optional Metric: You can specify a distance metric. If not specified:
        The default metric is Cosine Distance for general vectors.
        For binary vectors, the default is Hamming Distance.
If you do not specify a distance metric, Cosine Distance is used by default for most cases, and Hamming Distance for binary vectors.
Shorthand Functions for Common Distance Metrics
To make it easier to calculate distances, the VECTOR_DISTANCE function comes with shorthand functions for common distance metrics. These are equivalent to the more detailed functions, providing a more compact way to express vector distance calculations.
Here are the shorthand functions:
    L1_DISTANCE: Manhattan (L1) distance.
    L2_DISTANCE: Euclidean (L2) distance.
    COSINE_DISTANCE: Cosine similarity distance.
    INNER_PRODUCT: Negative dot product (used to compare similarity).
    HAMMING_DISTANCE: Hamming distance for binary vectors.
    JACCARD_DISTANCE: Jaccard distance for binary vectors.

Distance Metrics Available:
    COSINE: Measures the cosine of the angle between two vectors, useful for high-dimensional data like text.
    DOT: Calculates the negated dot product of two vectors, useful for measuring similarity.
    EUCLIDEAN: Measures the straight-line (L2) distance between two vectors, commonly used in spatial data.
    EUCLIDEAN_SQUARED: Euclidean distance without taking the square root, often used in optimization tasks.
    HAMMING: Counts the number of differing dimensions between two binary vectors, typically used in error correction.
    MANHATTAN: Also known as L1 distance, calculates the sum of absolute differences between vector components, useful for grid-based problems.
    JACCARD: Measures dissimilarity between binary vectors based on the ratio of the intersection to the union of the vectors.
    
Shorthand Operators for Distance Metrics:
Instead of specifying the distance metric explicitly, you can use shorthand operators for quicker calculations. These are especially handy when writing queries or performing similarity searches:
    <->: Equivalent to L2_DISTANCE (Euclidean distance).
    <=>: Equivalent to COSINE_DISTANCE (Cosine similarity).
    <#>: Equivalent to -1 * INNER_PRODUCT (Negative dot product).

Tuesday, 25 March 2025

Connecting to a Schema in Oracle 23AI PDB DB

In non-CDBs, we can connect directly to a schema using username/password. However, in PDBs, we must use a service name alias to connect to the database.
1. Connect to PDB
[oracle@poclab ~]$ sql
SQL*Plus: Release 23.0.0.0.0 - Production on Wed Mar 26 03:53:56 2025
Version 23.7.0.25.01
Connected to:
Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free
SQL> alter session set container=freepdb1;
SQL> show con_name;
CON_NAME
------------------------------
FREEPDB1

2. Connecting to a User or Schema with password
SQL> conn aivector/aivector
ERROR:
ORA-01017: invalid credential or not authorized; logon denied

This error occurs because, unlike non-CDBs, PDBs require you to use a service name alias to specify the pluggable database in the connection string.
3. Correct Connection to the PDB Using Service Name Alias
SQL> conn aivector/aivector@//localhost:1521/freepdb1
SQL> show user
USER is "AIVECTOR"
 

Identifying Your Container: CDB or PDB in Oracle 23ai

In Oracle databases, particularly when working with Multitenant Architecture, it's essential to understand the distinction between the Container Database (CDB) and Pluggable Databases (PDBs). These are the core components that make up the Multitenant model, which is one of the highlights of modern Oracle database systems. But sometimes, it can be tricky to track whether you're working in a CDB or a PDB. Let's break it down based on a real-world session in Oracle Database 23ai.
Understanding CDB and PDB
    CDB (Container Database): CDB is the primary container that holds the system metadata and the necessary infrastructure for managing multiple PDBs. It has one root container (CDB$ROOT) and potentially many PDBs.
    PDB (Pluggable Database): A PDB is a self-contained, portable database that runs inside a CDB. Each PDB can have its own data, schemas, and users, but shares the same infrastructure and system resources as the CDB.

Let's take a look at an example session in Oracle 23ai. This will help us understand how we can identify where we are, whether in the CDB$ROOT or a PDB.
Step 1: Connecting to the CDB
Upon first logging into Oracle, you typically connect to the CDB as shown below:
[oracle@poclab ~]$ sql
SQL*Plus: Release 23.0.0.0.0 - Production on Wed Mar 26 03:04:12 2025
Version 23.7.0.25.01
Connected to:
Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free
Version 23.7.0.25.01
Once logged in, you can check the current instance by querying v$instance:
SQL> select instance_name, version, status, con_id from v$instance;

INSTANCE_NAME    VERSION           STATUS           CON_ID
---------------- ----------------- ------------ ----------
FREE             23.0.0.0.0        OPEN            0
CON_ID = 0 indicates that we're in the CDB$ROOT container.

Now, let’s confirm the current container:
SQL> show con_id
CON_ID
------------------------------
1
Here, CON_ID = 1 corresponds to the root container, CDB$ROOT.
SQL> show con_name
CON_NAME
------------------------------
CDB$ROOT

Step 2: Switching to a PDB
To move from the CDB to a specific PDB, you can connect to the PDB directly. In this example, let's connect to FREEPDB1:
SQL> conn sys/pwd@//localhost:1521/freepdb1 as sysdba
Connected.
Now, let's check the instance information for FREEPDB1:
SQL> select instance_name, version, status, con_id from v$instance;
INSTANCE_NAME    VERSION           STATUS           CON_ID
---------------- ----------------- ------------ ----------
FREE             23.0.0.0.0        OPEN            0
Again, the CON_ID = 0 shows that we’re connected to the FREEPDB1 PDB.
Confirm the current container name:
SQL> show con_id
CON_ID
------------------------------
3
Here, CON_ID = 3 refers to the FREEPDB1 pluggable database:
SQL> show con_name
CON_NAME
------------------------------
FREEPDB1

Step 3: Switching Back to the CDB
Once inside the PDB, you might want to switch back to the CDB$ROOT container. You can do this by using the alter session command:
SQL> alter session set container=CDB$ROOT;
Session altered.
Now, let's check the container ID and name:
SQL> show con_id
CON_ID
------------------------------
1
And the container name confirms you're back in the root container:
SQL> show con_name
CON_NAME
------------------------------
CDB$ROOT

Monday, 24 March 2025

Common Blocking Scenarios in Azure SQL Database: Causes & Resolutions

Blocking happens when one session (SPID) holds a lock on a resource, preventing another session from accessing it. Unlike deadlocks, where two or more processes are stuck indefinitely, blocking can eventually resolve—but it can still lead to performance bottlenecks.
Common Blocking Scenarios & Their Resolutions
Scenario Wait Type Open Transactions Status Resolves?
1 NOT NULL ≥ 0 Runnable ✅ Yes, when the query finishes.
2 NULL >0 Sleeping ❌ No, but SPID can be killed.
3 NULL ≥ 0 Runnable ❌ No, won’t resolve until the client fetches all rows or closes the connection.
 Killing SPID may take up to 30 seconds.
4 Varies ≥ 0 Runnable ❌ No, won’t resolve until the client cancels queries or closes connections.
 Killing SPIDs may take up to 30 seconds.
5 NULL >0 Rollback ✅ Yes.
6 NULL >0 Sleeping ⏳ Eventually. When Windows NT detects inactivity, the connection will break.

How to Identify Blocking in Azure SQL Database
1. Identify Blocked and Blocking Sessions
SELECT blocking_session_id, session_id, wait_type, wait_time, wait_resource  
FROM sys.dm_exec_requests  
WHERE blocking_session_id <> 0;
2. Check Open Transactions
SELECT session_id, open_transaction_count, status  
FROM sys.dm_exec_sessions  
WHERE open_transaction_count > 0;
3. Analyze Query Execution Details
SELECT r.session_id, s.host_name, s.program_name, r.command, r.wait_type, r.wait_time  
FROM sys.dm_exec_requests r  
JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id  
WHERE r.blocking_session_id <> 0;

How to Resolve Blocking in Azure SQL Database
✅ Scenarios that resolve automatically
    Scenario 1: Query completes, releasing locks.
    Scenario 5: Rollback operation finishes.
    Scenario 6: Windows NT eventually disconnects the session.

❌ Scenarios requiring manual intervention
If blocking does not resolve, consider the following approaches:
1. Kill the Blocking SPID
If a transaction is stuck, you can terminate it:
KILL <session_id>;
Use this cautiously, as it may cause rollbacks.
2. Optimize Long-Running Queries
    Index Optimization: Ensure proper indexing to reduce query execution time.
    Query Tuning: Use QUERY_PLAN to optimize slow queries.
    Batch Processing: Process data in smaller batches to prevent long locks.
3. Handle Open Transactions Properly
    Regularly check sys.dm_tran_active_transactions for long-running transactions.
    Ensure all transactions explicitly COMMIT or ROLLBACK when completed.
4. Improve Connection Management
    Ensure clients properly fetch all rows or close connections.
    Avoid unnecessary long-running transactions that hold locks.

Saturday, 22 March 2025

Deadlocks with Bitmap Indexes in Oracle

Oracle's Bitmap Index is an efficient indexing method, particularly useful for columns with low cardinality (few distinct values). While it can significantly enhance query performance in read-heavy environments,it presents unique challenges in systems with heavy DML  operations,One of the most significant challenges is the risk of deadlocks due to the nature of how bitmap indexes work.

In this blog, we'll explore the mechanics of bitmap indexes, how they work in Oracle, and why they can cause deadlocks and locking issues when there's heavy DML activity.
What is a Bitmap Index?
In a bitmap index, data is organized as a series of bitmaps (binary representations of 0s and 1s) that represent the presence or absence of a particular value for rows in the indexed column. Each entry in the bitmap index corresponds to a unique value in the indexed column and contains information about which rows in the table have that value.
The structure of a bitmap index involves:

  •     Key Value: The actual value in the indexed column.
  •     Low-Rowid: The starting rowid in the range of rows that this bitmap entry applies to.
  •     High-Rowid: The ending rowid in the range of rows that this bitmap entry applies to.
  •     Bitmap: A string of 0s and 1s, where each bit corresponds to a row in the table (within the specified range). A '1' means the value is present in that row, and a '0' means the value is not.

Deadlocks Due to Bitmap Index Updates
Let’s consider a scenario where DML operations occur and multiple transactions interact with the same bitmap index, causing locking issues.
Scenario 1: Updating a Record in PRODUCT
Let’s assume you have the following data in your PRODUCT table and bitmap index
CREATE TABLE product (
    product_id NUMBER,
    product_name VARCHAR2(100),
    category_id NUMBER
);
INSERT INTO product (product_id, product_name, category_id) VALUES (1001, 'Widget A', 5);
INSERT INTO product (product_id, product_name, category_id) VALUES (2002, 'Widget B', 8);
INSERT INTO product (product_id, product_name, category_id) VALUES (3003, 'Widget C', 5);

Your bitmap index might look like this:
CATEGORY_ID    LOW-ROWID    HIGH-ROWID    BITMAP
5              aaadf1000    aaadf1050    01010101010101
5              aaadf1060    aaadf1100    11010101010101
8              aaadf1200    aaadf1250    10101010101010

In this case, each bitmap entry represents a category (e.g., CATEGORY_ID = 5 or CATEGORY_ID = 8). The LOW-ROWID and HIGH-ROWID represent the range of rows that the bitmap entry applies to. The bitmap string (e.g., 01010101010101) corresponds to the product rows in that range, indicating which rows belong to that category (where "1" means the product belongs to the category, and "0" means it does not).
Let’s now assume you execute the following update:
UPDATE product SET category_id = 8 WHERE product_id = 1001;
This update changes the category of Widget A (product ID 1001) from category 5 to category 8. The bitmap index needs to be updated:
    The bitmap entry for CATEGORY_ID = 5 will remove the "1" at the position where Widget A (row 1001) was found.
    The bitmap entry for CATEGORY_ID = 8 will add a "1" at the position where Widget A (row 1001) is now moved.
At this point, the bitmap index entries for both CATEGORY_ID = 5 and CATEGORY_ID = 8 are locked by your transaction, since both bitmap entries need to be updated.
Scenario 2: A Conflicting Update
Now, assume another transaction tries to execute the following update:
UPDATE product SET category_id = 5 WHERE product_id = 2002;
This transaction is attempting to change Widget B (product ID 2002) from category 8 to category 5. Since Widget B is currently in category 8, the bitmap entry for CATEGORY_ID = 8 needs to be updated to remove the "1" for Widget B (row 2002), and the bitmap entry for CATEGORY_ID = 5 needs to be updated to add a "1" for Widget B (row 2002).
At this point, a deadlock can occur. Here’s why:
    The first transaction has already locked the bitmap entries for both CATEGORY_ID = 5 (to remove the "1" for Widget A) and CATEGORY_ID = 8 (to add the "1" for Widget A).
    The second transaction is attempting to update the same bitmap entries: it wants to remove the "1" from CATEGORY_ID = 8 (for Widget B) and add a "1" to CATEGORY_ID = 5 (for Widget B).
    Since both transactions are trying to update the same bitmap entries simultaneously (in this case, for both category 5 and category 8), they block each other, leading to a deadlock.
This occurs because both transactions are competing to modify the same bitmap index entries that represent overlapping rows in the PRODUCT table.
    

 

 

Thursday, 13 March 2025

How to Identify MAXDOP Value for Running/Completed Queries

 To find the MAXDOP (Maximum Degree of Parallelism) used by running queries in SQL Server, you can use Dynamic Management Views (DMVs) such as sys.dm_exec_requests and sys.dm_exec_query_profiles. These views provide details about query execution, including parallelism levels.
1. Checking MAXDOP for Running Queries
SELECT  
    r.session_id,  
    r.request_id,  
    r.start_time,  
    r.status,  
    r.cpu_time,  
    r.total_elapsed_time,  
    r.logical_reads,  
    r.writes,  
    r.dop AS MAXDOP,  -- Degree of Parallelism
    st.text AS sql_text  
FROM sys.dm_exec_requests r  
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st  
WHERE r.dop > 1  -- Filtering only parallel queries
ORDER BY r.start_time DESC;

Explanation:
    r.dop: Shows the degree of parallelism (i.e., the number of CPU cores used for execution).
    r.session_id: Identifies the session running the query.
    r.status: Shows the execution status (e.g., running, suspended).
    st.text: Displays the actual SQL query text.
    Note: If dop = 1, the query is running serially without parallelism.
2. Checking MAXDOP for Completed Queries
SELECT  
    qs.execution_count,  
    qs.total_worker_time / qs.execution_count AS avg_worker_time,  
    qs.max_dop,  -- MAXDOP used
    st.text AS sql_text  
FROM sys.dm_exec_query_stats qs  
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st  
ORDER BY qs.total_worker_time DESC;

3. Checking MAXDOP for Running Query Execution Plans
SELECT  
    er.session_id,  
    qp.query_plan  
FROM sys.dm_exec_requests er  
CROSS APPLY sys.dm_exec_query_plan(er.plan_handle) qp  
WHERE er.dop > 1;

Look for Parallelism (Gather Streams) in the query plan XML to confirm parallel execution.

4. Checking MAXDOP Setting at Database Level
EXEC sp_configure 'show advanced options', 1;  
RECONFIGURE;  
EXEC sp_configure 'max degree of parallelism';

To check the database-level MAXDOP setting in Azure SQL Database:
SELECT *  
FROM sys.database_scoped_configurations  
WHERE name = 'MAXDOP';

5. Checking MAXDOP for Index Operations

SELECT  
    r.session_id,  
    r.command,  
    r.dop,  
    st.text  
FROM sys.dm_exec_requests r  
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st  
WHERE r.command LIKE '%INDEX%';

Wednesday, 12 March 2025

Intelligent Query Processing (IQP) in SQL Databases

Efficient query performance is crucial for modern applications, as databases handle massive amounts of data. Traditionally, query optimization relied on static cost-based estimations, which sometimes led to suboptimal execution plans due to incorrect assumptions.
To address this, modern databases—particularly Microsoft SQL Server—have introduced Intelligent Query Processing (IQP). IQP enhances query execution by automatically adapting, optimizing, and learning from past executions. This minimizes performance issues without requiring code changes.
What is Intelligent Query Processing (IQP)?
Intelligent Query Processing (IQP) is a set of advanced query optimization features in SQL Server (starting from SQL Server 2017 and significantly expanded in SQL Server 2019 and later).
IQP enhances query performance dynamically by making real-time adjustments based on execution statistics, feedback loops, and AI-driven techniques.

How is IQP different from Traditional Query Processing?

AspectTraditional Query ProcessingIntelligent Query Processing (IQP)
Optimization StageStatic, before executionDynamic, adjusts during execution
Query Plan AdjustmentsBased on fixed statisticsAdapts based on real-time data
Handling Plan RegressionRequires manual interventionAutomatically detects & corrects
Performance TuningDBA-driven tuning requiredMinimal or no code changes needed
Machine Learning InfluenceNoneUses feedback loops & AI

Why Do We Need Intelligent Query Processing?
Traditional query optimization relies on cardinality estimation—predicting the number of rows a query will process. However, real-world queries often face:
✅ Bad Cardinality Estimates – Outdated statistics or complex predicates lead to poor execution plans.
✅ Query Plan Regressions – A once-efficient query suddenly slows down due to a bad plan.
✅ Memory Allocation Issues – Queries either over-allocate (wasting resources) or under-allocate (causing spills to disk).
✅ Suboptimal Join Strategies – Poor join selection (Nested Loop instead of Hash Join) causes performance degradation.
IQP fixes these problems automatically, reducing the need for manual performance tuning.


๐Ÿš€ Key Features of Intelligent Query Processing
IQP introduces a range of powerful enhancements that improve query performance dynamically. Let’s explore some of its most impactful features.

1️⃣ Batch Mode on Rowstore
๐Ÿ“Œ What it does:
Originally available only for Columnstore indexes, Batch Mode Execution improves the performance of queries running on rowstore tables (traditional tables with B-tree indexes).
๐Ÿ“ˆ Benefits:
    Uses vectorized execution, reducing CPU usage.
    Drastically improves performance for aggregations, joins, and large scans.
    No changes needed—SQL Server automatically enables it when beneficial.
๐Ÿ’ก Example:
SELECT CustomerID, COUNT(*)  FROM Sales.Orders  GROUP BY CustomerID;
Without batch mode, this query processes one row at a time. With batch mode, SQL Server processes thousands of rows at once, leading to faster execution.
2️⃣ Adaptive Joins
๐Ÿ“Œ What it does:
Instead of selecting a Nested Loop Join, Hash Join, or Merge Join at compile time, Adaptive Joins allow SQL Server to switch the join strategy dynamically at runtime.
๐Ÿ“ˆ Benefits:
    Prevents bad join choices due to incorrect row estimates.
    Ensures optimal join selection for varying input sizes.
๐Ÿ’ก Example:
If SQL Server expects 100 rows but actually gets 10 million rows, it will switch from a Nested Loop Join to a Hash Join automatically.
3️⃣ Adaptive Memory Grants
๐Ÿ“Œ What it does:
Allocates just the right amount of memory for query execution instead of over- or under-allocating.
๐Ÿ“ˆ Benefits:
    Prevents out-of-memory issues for large queries.
    Reduces spilling to tempdb, which slows down execution.
๐Ÿ’ก Example:
A complex report query initially requests 500MB but actually needs 5GB. SQL Server dynamically adjusts memory allocation for future executions.
4️⃣ Interleaved Execution for Multi-Statement Table-Valued Functions (MSTVFs)
๐Ÿ“Œ What it does:
Traditional table-valued functions (TVFs) always assumed fixed row estimates. This often led to poor query plans.
With Interleaved Execution, SQL Server delays optimization until runtime to get an accurate row estimate.
๐Ÿ“ˆ Benefits:
    Prevents underestimating or overestimating TVF outputs.
    Optimizes execution plans based on real row counts.
๐Ÿ’ก Example:
SELECT * FROM dbo.GetCustomerOrders(@CustomerID);
Before IQP, SQL Server guessed a default row count. Now, it waits until the function runs and then optimizes the query plan dynamically.
5️⃣ Table Variable Deferred Compilation
๐Ÿ“Œ What it does:
Table variables previously used fixed row estimates, often leading to poor execution plans. IQP defers their compilation until runtime, allowing SQL Server to optimize based on actual data size.
๐Ÿ“ˆ Benefits:
    Improves performance of queries using table variables.
    Prevents incorrect join and index choices.
๐Ÿ’ก Example:
DECLARE @TempTable TABLE (ID INT, Value VARCHAR(50));  
INSERT INTO @TempTable SELECT ID, Value FROM LargeTable;  
SELECT * FROM @TempTable JOIN AnotherTable ON @TempTable.ID = AnotherTable.ID;

SQL Server waits until the actual row count is known before optimizing the execution plan.


SQL Server Extended Events: Monitoring Queries Running Longer Than X Minutes

What Are Extended Events in SQL Server?
Extended Events provide a flexible and lightweight framework to capture detailed performance data in SQL Server. They help in diagnosing slow-running queries, deadlocks, waits, and other issues affecting database performance.
Why Use Extended Events Instead of SQL Profiler?
Low Overhead: Uses fewer system resources.
More Powerful: Captures granular event data.
Better Filtering: Allows precise filtering on execution time, database, users, etc.
Replaces SQL Trace/Profiler: Profiler is deprecated in newer SQL Server versions.

Step-by-Step: Configuring Extended Events for Queries Running More Than 5 Minutes
1. Create an Extended Events Session
We will create an Extended Events session to capture queries that take longer than 300 seconds (5 minutes) to execute.
Using SSMS GUI:
    Open SQL Server Management Studio (SSMS).
    Expand Management > Extended Events > Sessions.
    Right-click Sessions and choose New Session....
    Provide a name, e.g., Long_Running_Queries.
    Under Events, click "Add Event", search for sql_statement_completed, and add it.
    Under the Global Fields (Actions) tab, select:
        sql_text (to capture the query text)
        session_id (to track the session)
        database_id (to identify the database)
    Apply a Filter (Predicate):
        Click Configure, then Filter (Predicate).
        Select duration, set it to >= 3000000000 (300 seconds in microseconds).
    Configure Data Storage:
        Choose Event File as the target.
        Specify a file path for saving captured events.
    Click OK, then right-click the session and select Start Session.

Using T-SQL:
Alternatively, use the following T-SQL script to create the session:

CREATE EVENT SESSION [Long_Running_Queries]  
ON SERVER  
ADD EVENT sqlserver.sql_statement_completed (  
    WHERE duration >= 3000000000  -- 300 seconds (5 minutes) in microseconds  
)  
ADD TARGET package0.event_file (  
    SET filename = 'C:\Temp\LongRunningQueries.xel', max_file_size = 50MB  
)  
WITH (STARTUP_STATE = ON);  
GO  

2. Viewing and Analyzing the Captured Events
Using SSMS:
    Expand Management > Extended Events > Sessions.
    Right-click your session (Long_Running_Queries) and choose Watch Live Data.
    Execute long-running queries and monitor captured events in real-time.


Using T-SQL to Read the Event File:
To analyze captured events from the event file:
SELECT  
    event_data.value('(event/@name)', 'VARCHAR(100)') AS event_name,  
    event_data.value('(event/data[@name="sql_text"]/value)', 'NVARCHAR(MAX)') AS sql_text,  
    event_data.value('(event/data[@name="duration"]/value)', 'BIGINT') / 1000000 AS duration_seconds  
FROM  
(  
    SELECT CAST(event_data AS XML) AS event_data  
    FROM sys.fn_xe_file_target_read_file('C:\Temp\LongRunningQueries*.xel', NULL, NULL, NULL)  
) AS xevents  
ORDER BY duration_seconds DESC;

To stop the session:
ALTER EVENT SESSION [Long_Running_Queries] ON SERVER STATE = STOP;

To drop (delete) the session:
DROP EVENT SESSION [Long_Running_Queries] ON SERVER;