...: May 2025

Saturday, 31 May 2025

Understanding GV$INSTANCE_PING:RAC Interconnect Performance Monitoring

In an Oracle Real Application Clusters (RAC) environment, maintaining optimal interconnect performance is crucial for database operations. The GV$INSTANCE_PING view serves as a vital tool for Database Administrators (DBAs) to monitor and analyze the latency between RAC instances. This blog post explores how this view helps DBAs ensure optimal RAC performance.

What is GV$INSTANCE_PING?
GV$INSTANCE_PING is a dynamic performance view that provides detailed information about the measured latency of the interconnect between all instances in an Oracle RAC environment. Every few seconds, the PING process of each instance checks the response time of the interconnect to all other instances in the same database.

Key Features and Measurements
The view tracks two types of messages:
1. **500-byte messages (500B) Smaller test messages
2. **8-kilobyte messages (8K) Larger test messages

For each message type, the view provides comprehensive metrics:
Current Performance Metrics
- CURRENT_500B and CURRENT_8K: Latest response times in microseconds
- AVERAGE_500B and AVERAGE_8K: Average response times since instance startup
- MAX_500B and MAX_8K: Maximum observed response times

Cumulative Statistics
- COUNT_500B and COUNT_8K: Number of measurements taken
- WAIT_TIME_500B and WAIT_TIME_8K: Total accumulated response times
- WAIT_TIME_SQUARED_500B and WAIT_TIME_SQUARED_8K: Sum of squared response times (useful for statistical analysis)

Let's analyze an actual output from a 4-node RAC environment:
SQL> select * from GV$INSTANCE_PING;
INST_ID INSTANCE CURRENT_500B AVERAGE_500B MAX_500B COUNT_500B WAIT_TIME_500B WAIT_TIME_SQUARED_500B CURRENT_8K AVERAGE_8K MAX_8K COUNT_8K WAIT_TIME_8K WAIT_TIME_SQUARED_8K CON_ID
---------- ---------- ------------ ------------ ---------- ---------- -------------- ---------------------- ---------- ---------- ---------- ---------- ------------ -------------------- ----------
2 1 7 7 250 370458 2924730 1253 8 8 941 370458 3380206 2547 0
2 2 3 3 2249 370458 1217885 5695 4 4 551 370458 1539529 881 0
2 3 7 7 304 370457 2887515 1393 8 8 14780 370457 3389103 220212 0
2 4 7 7 349 370458 2847595 1298 10 8 515 370458 3313461 1551 0
3 1 8 7 227 370454 2952602 1285 8 8 919 370454 3370483 2855 0
3 2 8 7 17647 370454 2901210 314283 8 8 2206 370454 3416102 6594 0
3 3 3 3 531 370454 1226169 760 4 3 611 370454 1557469 1527 0
3 4 7 7 17321 370454 2922328 326255 8 8 617 370454 3322511 1796 0
4 1 7 7 1967 370454 2905185 4652 8 8 1683 370454 3344914 4081 0
4 2 8 7 443 370454 2877140 1129 22 8 709 370454 3396211 1514 0
4 3 7 7 1478 370454 2893488 3060 8 9 367 370454 3372591 1441 0
4 4 3 3 749 370455 1211323 790 4 4 199 370455 1517458 421 0
1 1 4 3 458 370459 1215589 732 4 3 569 370459 1539772 1330 0
1 2 7 7 208 370458 2921436 1109 10 9 561 370458 3448708 1981 0
1 3 15 7 151 370458 2929392 1054 8 8 214 370458 3420401 1464 0
1 4 7 7 234 370459 2902809 1181 8 8 203 370459 3362650 1415 0

16 rows selected.

Key Observations from the Data:
1.Instance 2's Communication Patterns
- Current 500B latency to Instance 1: 7 microseconds
- Current 500B latency to Instance 2: 3 microseconds (self-ping)
- Current 500B latency to Instance 3: 7 microseconds
- Current 500B latency to Instance 4: 7 microseconds

2.Performance Characteristics
- Average 500B latency is consistently 7 microseconds for most connections
- Self-ping (Instance 2 to 2) shows better performance at 3 microseconds
- Maximum latencies show some spikes:
* 2249 microseconds to Instance 2
* 304 microseconds to Instance 3
* 349 microseconds to Instance 4

3.8K Message Performance
- Current 8K latencies range from 4 to 10 microseconds
- Maximum 8K latency shows significant variation:
* 941 microseconds to Instance 1
* 551 microseconds to Instance 2
* 14780 microseconds to Instance 3 (notable spike)
* 515 microseconds to Instance 4

4.Measurement Volume
- Approximately 370,458 measurements taken for each connection
- Consistent measurement count across instances indicates stable monitoring

Why is it Important for DBAs?
1.Performance Monitoring
- Helps identify interconnect latency issues
- Enables proactive detection of performance degradation
- Provides historical data for trend analysis

2.Troubleshooting
- Assists in diagnosing RAC performance problems
- Helps identify instance-to-instance communication issues
- Enables comparison of current vs. historical performance

Wednesday, 14 May 2025

Oracle Database 19c: Lost Write Protection Gets Smarter with AUTO

One of the most under-the-radar but important changes introduced in Oracle Database 19c Release Update (RU) 19.26 is the new default setting for the parameter DB_LOST_WRITE_PROTECT.
🔄 From NONE to AUTO: A Quiet but Powerful Shift

Previously, the default setting for DB_LOST_WRITE_PROTECT was:
DB_LOST_WRITE_PROTECT = NONE

This meant lost write detection was completely disabled by default unless explicitly turned on.

Starting with RU 19.26, Oracle has made a significant shift:
DB_LOST_WRITE_PROTECT = AUTO

This change is also reflected in Oracle Database 23ai. It’s more than just a default toggle — it's a smarter, adaptive mechanism that brings better protection with minimal performance trade-offs.
💡 What Is a Lost Write?

A Lost Write is a critical data corruption event where:
    Oracle writes a data block to storage,
    The storage subsystem acknowledges the write as successful,
    But the data block is never actually written to persistent storage.

These can be caused by memory errors, firmware bugs, OS issues, or storage controller failures. Lost writes are notoriously difficult to detect and often surface during recovery or standby failovers.

Oracle defines a lost write as:
    "A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage."

🧠 Why AUTO Matters: Intelligent Lost Write Detection
The new AUTO value for DB_LOST_WRITE_PROTECT changes how Oracle handles lost write detection:
▶️ On a Primary Database:
    Oracle only logs buffer cache reads in the redo log if:
        Physical standby databases exist, and
        Real-time redo apply is enabled.

▶️ On a Standby Database:
    Oracle performs lost write detection only if the apply lag is within reasonable bounds.
    If apply lag exceeds:
        60 seconds, or
        2/3 of the FSFO apply lag threshold (if Fast-Start Failover is enabled),
    Then lost write detection is temporarily skipped to reduce overhead and ensure smooth role transitions.

⚙️ Comparison of DB_LOST_WRITE_PROTECT Modes
Value   Description                                                                Performance Impact
NONE   No lost write detection.      🔋 Low
TYPICAL    Logs reads for read/write tablespaces.   ⚖️ Moderate
FULL           Logs reads for all tablespaces (read-only + read/write).    🛠️ High
AUTO (NEW)    Adapts based on real-time redo and apply lag.    ⚙️ Balanced

🚨 Real-World Impact: Why It Matters
With older RUs and the default NONE, lost write protection was disabled by default — even in Data Guard environments. This left many systems unprotected by mistake, simply because DBAs weren’t aware or didn’t configure it explicitly.

Now, AUTO intelligently enables protection only when it's needed, helping DBAs avoid the all-or-nothing trade-off between performance and data integrity.
🧪 What Should You Do?

Here are a few actions to take if you’ve recently applied Oracle 19.26 or higher:
✅ 1. Check the Current Setting
SHOW PARAMETER DB_LOST_WRITE_PROTECT;

If you see AUTO, you're on the new default.
✅ 2. Monitor Redo Generation
You may notice an increase in redo generation, especially if:
You have a physical standby.
Real-time redo apply is enabled.
See MOS Note: 37260974
“Database Generating Large Amount Of Redo After Applying Jan 2025 DBRU”

✅ 3. Understand the Error ORA-00752

If you ever encounter:
ORA-00752: recovery detected a lost write of a data block
Do not attempt to fix this manually. Instead:
    Shutdown the primary database.
    Failover to the standby.
    Open a support SR with Oracle immediately.

labels

Saturday, 31 May 2025

Understanding GV$INSTANCE_PING:RAC Interconnect Performance Monitoring

Wednesday, 14 May 2025

Oracle Database 19c: Lost Write Protection Gets Smarter with AUTO