Saturday, 31 May 2025

Understanding GV$INSTANCE_PING:RAC Interconnect Performance Monitoring

In an Oracle Real Application Clusters (RAC) environment, maintaining optimal interconnect performance is crucial for database operations. The GV$INSTANCE_PING view serves as a vital tool for Database Administrators (DBAs) to monitor and analyze the latency between RAC instances. This blog post explores how this view helps DBAs ensure optimal RAC performance.

What is GV$INSTANCE_PING?
GV$INSTANCE_PING is a dynamic performance view that provides detailed information about the measured latency of the interconnect between all instances in an Oracle RAC environment. Every few seconds, the PING process of each instance checks the response time of the interconnect to all other instances in the same database.

Key Features and Measurements
The view tracks two types of messages:
1. **500-byte messages (500B) Smaller test messages
2. **8-kilobyte messages (8K) Larger test messages

For each message type, the view provides comprehensive metrics:
Current Performance Metrics
- CURRENT_500B and CURRENT_8K: Latest response times in microseconds
- AVERAGE_500B and AVERAGE_8K: Average response times since instance startup
- MAX_500B and MAX_8K: Maximum observed response times

Cumulative Statistics
- COUNT_500B and COUNT_8K: Number of measurements taken
- WAIT_TIME_500B and WAIT_TIME_8K: Total accumulated response times
- WAIT_TIME_SQUARED_500B and WAIT_TIME_SQUARED_8K: Sum of squared response times (useful for statistical analysis)

Let's analyze an actual output from a 4-node RAC environment:
SQL> select * from GV$INSTANCE_PING;
   INST_ID   INSTANCE CURRENT_500B AVERAGE_500B   MAX_500B COUNT_500B WAIT_TIME_500B WAIT_TIME_SQUARED_500B CURRENT_8K AVERAGE_8K     MAX_8K   COUNT_8K WAIT_TIME_8K WAIT_TIME_SQUARED_8K     CON_ID
---------- ---------- ------------ ------------ ---------- ---------- -------------- ---------------------- ---------- ---------- ---------- ---------- ------------ -------------------- ----------
         2          1            7            7        250     370458        2924730                   1253          8          8        941     370458      3380206                 2547          0
         2          2            3            3       2249     370458        1217885                   5695          4          4        551     370458      1539529                  881          0
         2          3            7            7        304     370457        2887515                   1393          8          8      14780     370457      3389103               220212          0
         2          4            7            7        349     370458        2847595                   1298         10          8        515     370458      3313461                 1551          0
         3          1            8            7        227     370454        2952602                   1285          8          8        919     370454      3370483                 2855          0
         3          2            8            7      17647     370454        2901210                 314283          8          8       2206     370454      3416102                 6594          0
         3          3            3            3        531     370454        1226169                    760          4          3        611     370454      1557469                 1527          0
         3          4            7            7      17321     370454        2922328                 326255          8          8        617     370454      3322511                 1796          0
         4          1            7            7       1967     370454        2905185                   4652          8          8       1683     370454      3344914                 4081          0
         4          2            8            7        443     370454        2877140                   1129         22          8        709     370454      3396211                 1514          0
         4          3            7            7       1478     370454        2893488                   3060          8          9        367     370454      3372591                 1441          0
         4          4            3            3        749     370455        1211323                    790          4          4        199     370455      1517458                  421          0
         1          1            4            3        458     370459        1215589                    732          4          3        569     370459      1539772                 1330          0
         1          2            7            7        208     370458        2921436                   1109         10          9        561     370458      3448708                 1981          0
         1          3           15            7        151     370458        2929392                   1054          8          8        214     370458      3420401                 1464          0
         1          4            7            7        234     370459        2902809                   1181          8          8        203     370459      3362650                 1415          0

16 rows selected.


Key Observations from the Data:
1.Instance 2's Communication Patterns
   - Current 500B latency to Instance 1: 7 microseconds
   - Current 500B latency to Instance 2: 3 microseconds (self-ping)
   - Current 500B latency to Instance 3: 7 microseconds
   - Current 500B latency to Instance 4: 7 microseconds

2.Performance Characteristics
   - Average 500B latency is consistently 7 microseconds for most connections
   - Self-ping (Instance 2 to 2) shows better performance at 3 microseconds
   - Maximum latencies show some spikes:
     * 2249 microseconds to Instance 2
     * 304 microseconds to Instance 3
     * 349 microseconds to Instance 4

3.8K Message Performance
   - Current 8K latencies range from 4 to 10 microseconds
   - Maximum 8K latency shows significant variation:
     * 941 microseconds to Instance 1
     * 551 microseconds to Instance 2
     * 14780 microseconds to Instance 3 (notable spike)
     * 515 microseconds to Instance 4

4.Measurement Volume
   - Approximately 370,458 measurements taken for each connection
   - Consistent measurement count across instances indicates stable monitoring

Why is it Important for DBAs?
1.Performance Monitoring
   - Helps identify interconnect latency issues
   - Enables proactive detection of performance degradation
   - Provides historical data for trend analysis

2.Troubleshooting
   - Assists in diagnosing RAC performance problems
   - Helps identify instance-to-instance communication issues
   - Enables comparison of current vs. historical performance

Wednesday, 14 May 2025

Oracle Database 19c: Lost Write Protection Gets Smarter with AUTO

One of the most under-the-radar but important changes introduced in Oracle Database 19c Release Update (RU) 19.26 is the new default setting for the parameter DB_LOST_WRITE_PROTECT.
๐Ÿ”„ From NONE to AUTO: A Quiet but Powerful Shift

Previously, the default setting for DB_LOST_WRITE_PROTECT was:
DB_LOST_WRITE_PROTECT = NONE

This meant lost write detection was completely disabled by default unless explicitly turned on.

Starting with RU 19.26, Oracle has made a significant shift:
DB_LOST_WRITE_PROTECT = AUTO

This change is also reflected in Oracle Database 23ai. It’s more than just a default toggle — it's a smarter, adaptive mechanism that brings better protection with minimal performance trade-offs.
๐Ÿ’ก What Is a Lost Write?

A Lost Write is a critical data corruption event where:
    Oracle writes a data block to storage,
    The storage subsystem acknowledges the write as successful,
    But the data block is never actually written to persistent storage.

These can be caused by memory errors, firmware bugs, OS issues, or storage controller failures. Lost writes are notoriously difficult to detect and often surface during recovery or standby failovers.

Oracle defines a lost write as:
    "A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage."

๐Ÿง  Why AUTO Matters: Intelligent Lost Write Detection
The new AUTO value for DB_LOST_WRITE_PROTECT changes how Oracle handles lost write detection:
▶️ On a Primary Database:
    Oracle only logs buffer cache reads in the redo log if:
        Physical standby databases exist, and
        Real-time redo apply is enabled.

▶️ On a Standby Database:
    Oracle performs lost write detection only if the apply lag is within reasonable bounds.
    If apply lag exceeds:
        60 seconds, or
        2/3 of the FSFO apply lag threshold (if Fast-Start Failover is enabled),
    Then lost write detection is temporarily skipped to reduce overhead and ensure smooth role transitions.

⚙️ Comparison of DB_LOST_WRITE_PROTECT Modes
Value           Description                                                                Performance Impact
NONE         No lost write detection.                                                ๐Ÿ”‹ Low
TYPICAL    Logs reads for read/write tablespaces.                       ⚖️ Moderate
FULL           Logs reads for all tablespaces (read-only + read/write).    ๐Ÿ› ️ High
AUTO (NEW)    Adapts based on real-time redo and apply lag.    ⚙️ Balanced


๐Ÿšจ Real-World Impact: Why It Matters
With older RUs and the default NONE, lost write protection was disabled by default — even in Data Guard environments. This left many systems unprotected by mistake, simply because DBAs weren’t aware or didn’t configure it explicitly.

Now, AUTO intelligently enables protection only when it's needed, helping DBAs avoid the all-or-nothing trade-off between performance and data integrity.
๐Ÿงช What Should You Do?

Here are a few actions to take if you’ve recently applied Oracle 19.26 or higher:
✅ 1. Check the Current Setting
SHOW PARAMETER DB_LOST_WRITE_PROTECT;

If you see AUTO, you're on the new default.
✅ 2. Monitor Redo Generation
You may notice an increase in redo generation, especially if:
    You have a physical standby.
    Real-time redo apply is enabled.
See MOS Note: 37260974
“Database Generating Large Amount Of Redo After Applying Jan 2025 DBRU”


✅ 3. Understand the Error ORA-00752

If you ever encounter:
ORA-00752: recovery detected a lost write of a data block
Do not attempt to fix this manually. Instead:
    Shutdown the primary database.
    Failover to the standby.
    Open a support SR with Oracle immediately.