Friday, October 1, 2021

Wait Events in Oracle


Wait Events in Oracle

db file sequential reads:

Possible Causes :


1.Use of an unselective index 
2.Fragmented Indexes
3.High I/O on a particular disk or mount point
4.Bad application design 
5.Index reads performance can be affected by  slow I/O subsystem and/or poor database  files layout, which result in a higher average  wait time

 

Actions :


1.Check indexes on the table to ensure that the right index is being used

2.Check the column order of the index  with the WHERE clause of the Top SQL statements

3.Rebuild indexes with a high clustering factor

4.Use partitioning to reduce the amount of blocks being visited

5.Make sure optimizer statistics are up to date

6.Inspect the execution plans of the SQL statements that access data through indexes

 

Query to find db file sequential read;

 

select b.sid,
nvl(substr(a.object_name,1,30),
'P1='||b.p1||' P2='||b.p2||' P3='||b.p3) object_name,
a.subobject_name,
a.object_type
from dba_objects a, v$session_Wait b, x$bh c
where c.obj = a.object_id(+)
and b.p1 = c.file#(+)
and b.p2 = c.dbablk(+)
and b.event = 'db file sequential read'
union
select b.sid,
nvl(substr(a.object_name,1,30),
'P1='||b.p1||' P2='||b.p2||' P3='||b.p3) object_name,
a.subobject_name,
a.object_type
from dba_objects a, v$session_Wait b, x$bh c
where c.obj = a.data_object_id(+)
and b.p1 = c.file#(+)
and b.p2 = c.dbablk(+)
and b.event = 'db file sequential read'
order by 1;

 

db file scattered reads:

 

Possible Causes :


1. The Oracle session has requested and is waiting for multiple contiguous database blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT)  to be  read into the SGA from disk.
2. Full Table scans

3. Fast Full Index Scans

 

Actions :


1. Optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT

2. Partition pruning to reduce number of blocks visited

3. Consider the usage of multiple buffer pools and cache frequently used indexes/tables in the KEEP pool
4.Optimize the SQL statement that initiated most of the waits. The goal is to minimize the number of physical and logical reads.
5. Should the statement access the data by a full table scan or index FFS? Would an index range or unique scan

 

SQL> select * from v$system_event where event = 'db file scattered read';

SQL> select *

from   v$session_event

where  event = 'db file scattered read'

order by time_waited;

select a.sid, b.name, a.value

from   v$sesstat a, v$statname b

where  a.statistic# = b.statistic#

and    a.value     <> 0

and    b.name = 'table scan blocks gotten'

order by 3,1;

 

 

log file parallel write:

 

Possible Causes :


1. LGWR waits while writing contents of the redo log buffer cache to the online log files on disk
2. I/O wait on sub system holding the online  redo log files

 

Actions :


1. Reduce the amount of redo being generated

2. Do not leave tablespaces in hot backup mode for longer than necessary

3. Do not use RAID 5 for redo log files

4. Use faster disks for redo log files

5. Ensure that the disks holding the archived redo log files and the online redo log files are separate so as to avoid contention

6.Consider using NOLOGGING or UNRECOVERABLE options in SQL statements

 

SQL> select event, time_waited, average_wait

 from   v$system_event

 where  event in ('log file parallel write','log file sync');   2    3

 

EVENT                                                            TIME_WAITED AVERAGE_WAIT

---------------------------------------------------------------- ----------- ------------

log file parallel write                                                 9884         7.76

log file sync                                                                1043        25.44

 

find out who is committing frequently with the following query:

 

SQL>  select sid, value

 from   v$sesstat

 where  statistic# = (select statistic#

 from   v$statname

 where  name = 'user commits')

 order by value;   2    3    4    5    6

 

       SID      VALUE

---------- ----------

         2          0

        92          0

        89          0

        94          0

  

excessive commits is high redo wastage.

 

SQL> select b.name, a.value, round(sysdate - c.startup_time) days_old

 from   v$sysstat a, v$statname b, v$instance c

 where  a.statistic# = b.statistic#

 and    b.name      in ('redo wastage','redo size');   2    3    4

 

NAME                                                                  VALUE   DAYS_OLD

---------------------------------------------------------------- ---------- ----------

redo size                                                         117105528          0

redo wastage                                                         329192          0

 

query to find the average number of redo log blocks per write and the average LGWR I/O size in bytes:

SQL>  select round((a.value / b.value) + 0.5,0) as avg_redo_blks_per_write,

 round((a.value / b.value) + 0.5,0) * c.lebsz as avg_io_size

 from   v$sysstat a, v$sysstat b, x$kccle c

 where  c.lenum = 1

 and    a.name  = 'redo blocks written'

 and    b.name  = 'redo writes';   2    3    4    5    6

 

AVG_REDO_BLKS_PER_WRITE AVG_IO_SIZE

----------------------- -----------

                    186       95232

 

log file sync:


Possible Causes :


1. Oracle foreground processes are waiting for a COMMIT or ROLLBACK to complete
Actions :
2. Tune LGWR to get good throughput to  disk eg: Do not put redo logs on  RAID5

3. Reduce overall number of commits by batching transactions so that there are fewer distinct COMMIT operations

Actions :

1. Tune LGWR to get good throughput to  disk eg: Do not put redo logs on RAID5

2.  Reduce overall number of commits by batching transactions so that there are fewer distinct COMMIT operations

 

Guidelines For Resolving 'Log File Sync' Waits

 

If CPU starvation is an issue, then reducing CPU starvation is the correct step to resolve it.

If commit rate is higher, then decreasing commits is correct step but, in a few cases, if that is not possible, increasing the priority of LGWR (using nice) or increasing the priority class of LGWR to RT might provide some relief.

Solid State Disk devices also can be used if the redo size is extreme. in some case , it is also preferable to decrease redo size.

check redo switch per hours , normally it should be 4-5 switch per hours , Sometimes redo logs are very less in size , Oracle recommends, maximum 8 switch per hour .It may be change as per environment to environment  .

User commit %  and user rollback % can be seen from AWR report which could help to troubleshoot the issue .

If Redo logs are undersized or huge size both are not good , so tune properly.

 

buffer busy waits:

 

Possible Causes :


1. Buffer busy waits are common in an I/O-bound Oracle system.
2. The two main cases where this can occur are:
3. Another session is reading the block into the buffer
4. Another session holds the buffer in an incompatible mode to our request
5. These waits indicate read/read, read/write, or write/write contention.
6. The Oracle session is waiting to pin a buffer .A buffer must be pinned before it can be read or modified. Only one process can pin a  buffer at any one time.

 

Actions :


1. The main way to reduce buffer busy waits is to reduce the total I/O on the system

2. Depending on the block type, the actions will differ Data Blocks

3. Eliminate HOT blocks from the application. Check for repeatedly scanned / unselective indexes.

4. Try rebuilding the object with a higher PCTFREE so that you reduce the number of rows per block.
5. Check for ‘right- hand-indexes’ (indexes that get inserted into at the same point by many processes).

6. Increase INITRANS and MAXTRANS and reduce PCTUSED This will make the table less dense .

SQL> select * from v$system_event where event like '%wait%';

SQL> select * from v$waitstat;

select p1 "File #", p2 "Block #", p3 "Reason Code" from v$session_wait where event = 'buffer busy waits';

 

free buffer waits:

 

Possible Causes :


1. This means we are waiting for a free buffer but there are none available in the cache because there are too many dirty buffers in  the cache

2. Either the buffer cache is too small or the DBWR is slow in writing modified buffers to disk

3. DBWR is unable to keep up to the write  requests

4. Checkpoints happening too fast – maybe due  to high database activity and under-sized  online redo log files

5. Large sorts and full table scans are filling the cache with modified blocks faster than the  DBWR is able to write to disk
6. If the  number of dirty buffers that need to be  written to disk is larger than the number that DBWR can write per batch, then these waits  can be observed

 

Actions :


1. Reduce checkpoint frequency  – increase the size of the online redo log files

2. Examine the size of the buffer cache – consider increasing the size of the buffer cache in the SGA

3. Set disk_asynch_io = true set If not using asynchronous I/O increase the number of db writer processes or dbwr slaves Ensure hot spots do not exist by spreading datafiles over disks and disk controllers Pre-sorting or reorganizing data can help.

 

SQL> select * from v$sysstat where name in('free buffer requested','free buffer inspected');

 

STATISTIC# NAME                                                                  CLASS      VALUE    STAT_ID     CON_ID

---------- ---------------------------------------------------------------- ---------- ---------- ---------- -----

       230 free buffer requested                                                     8      53196 3411924934          0

       234 free buffer inspected                                                     8          0  941676439          0

 

SQL> select * from v$system_event where event ='free buffer waits';

 

no rows selected

 

enqueue waits

 

Possible Causes :


1. This wait event indicates a wait for a lock  that is held by another session (or sessions) in an incompatible mode to the requested mode.

2. TX Transaction Lock

3. Generally due to table or application set up issues

4. This indicates contention for row-level lock. This wait occurs when a transaction tries to update or delete rows that are currently locked by another transaction.

5. This usually is an application issue.

TM DML enqueue lock

6. Generally due to application issues, particularly if foreign key constraints have not been indexed.

ST lock

7. Database actions that modify the UET$ (used extent) and FET$ (free extent) tables require the ST lock, which includes actions such as drop, truncate, and coalesce.

8. Contention for the ST lock indicates there are multiple sessions actively performing dynamic disk space allocation or deallocation in dictionary managed tablespaces

 

Actions :


1. Reduce waits and wait times

2. The action to take depends on the lock  type which is causing the most problems

3. Whenever you see an enqueue wait event for the TX enqueue, the first step is to find out who the blocker is and if there are multiple waiters for the same resource

4. Waits for TM enqueue in Mode 3 are primarily due to unindexed foreign key columns.

5. Create indexes on foreign keys  < 10g

6. Following are some of the things you can do to minimize ST lock contention in your database:

7. Use locally managed tablespaces
8. Recreate all temporary tablespaces  using the CREATE TEMPORARY TABLESPACE TEMPFILE… command.

 

SQL>  SELECT * FROM V$RESOURCE_LIMIT WHERE RESOURCE_NAME IN ('dml_locks', 'enqueue_resources');

 how long a session has been waiting for certain enqueues since the database was started?

SQL> SELECT * FROM

(SELECT SUBSTR(EVENT, 1, 50) EVENT, TIME_WAITED

FROM V$SYSTEM_EVENT

WHERE EVENT LIKE 'enq: %'

ORDER BY TIME_WAITED DESC)

WHERE ROWNUM <=10;  2    3    4    5    6

 

EVENT                                                                                                                                                                   TIME_WAITED

enq: CR - block range reuse ckpt                                                                                                                                        390

enq: JG - queue lock                                                                                                                                                    229

enq: PR - contention                                                                                                                                                     41

enq: RO - fast object reuse                                                                                                                                              28

enq: CF - contention                                                                                                                                                      4

enq: PV - syncstart                                                                                                                                                       3

 

6 rows selected.

 which tables are most responsible for TX enqueue waits?

SQL> SELECT * FROM

   (SELECT OBJECT_NAME, SUBSTR(STATISTIC_NAME, 1, 30), VALUE

   FROM V$SEGMENT_STATISTICS

   WHERE STATISTIC_NAME = 'ITL waits' OR

         STATISTIC_NAME = 'row lock waits'

   ORDER BY VALUE DESC )

WHERE ROWNUM <=10;

 

Cache buffer chain latch

 

Possible Causes :


1. Processes need to get this latch when they  need to move buffers based on the LRU block replacement policy in the buffer cache
2. The cache buffer lru chain latch is acquired in order to introduce a new block into the buffer cache and when writing a buffer back to disk, specifically when trying  to scan the LRU (least recently used) chain containing all the dirty blocks in the buffer cache. Competition for the cache buffers lru chain .

3.  latch is symptomatic of intense buffer cache  activity caused by inefficient SQL  statements. Statements that repeatedly scan large unselective indexes or perform full table scans are the prime culprits. 

4.Heavy contention for this latch is generally  due to heavy buffer cache activity which  can be caused, for example, by:
 Repeatedly scanning large unselective indexes

 

Actions :


1. Contention in this latch can be avoided implementing multiple buffer pools or increasing the number of LRU latches with the  parameter DB_BLOCK_LRU_LATCHES (The default value is generally  sufficient for most systems).

2. Its possible to reduce contention for the cache buffer lru chain latch by increasing the  size of the buffer cache and  thereby reducing the rate at which new blocks are  introduced into the buffer cache.

 

SQL> select latch# from v$latch

where name = 'cache buffers chains';

  2

    LATCH#

----------

       327

 

SQL> select count(1) cnt from v$latch_children where latch# = 203;

 

       CNT

----------

         0

 

Direct Path Reads

 

Possible Causes :


1. These waits are associated with direct read operations which read data directly into the sessions PGA bypassing the SGA

2. The “direct path read” and “direct path write” wait events are related to operations that are performed in PGA like sorting, group by operation, hash join

3. In DSS type systems, or during heavy batch periods, waits on “direct path read” are quite normal However, for an OLTP system these waits are significant
4. These wait events can occur during sorting operations which is not surprising as direct path reads and writes usually occur in connection with temporary tsegments
5. SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA

Actions :


1. Ensure the OS asynchronous IO is configured correctly.
2. Check for IO heavy sessions / SQL and see if the amount of IO can be reduced.
Ensure no disks are IO bound.
3. Set your PGA_AGGREGATE_TARGET to appropriate value (if the parameter WORKAREA_SIZE_POLICY = AUTO) Or set *_area_size manually (like sort_area_size and then you have to set WORKAREA_SIZE_POLICY = MANUAL
Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN.
4. Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely.

Also, consider automating the SQL work areas using PGA_AGGREGATE_TARGET in Oracle9i Database.

Query V$SESSTAT> to identify sessions with high “physical reads direct”

SQL> select decode(ownerid,2147483644,'PARENT','CHILD') stmt_level,

       audsid,

       sid,

       serial#,

       username,

       osuser,

       process,

       sql_hash_value,

       sql_address

from   v$session

where  type <> 'BACKGROUND'

and    audsid in (select audsid

                  from   v$session

                  group by audsid

                  having count(*) > 1)

order by audsid, stmt_level desc, sid, username, osuser;

 

select a.username,
       a.osuser,
       to_char(a.logon_time,'MMDD/HH24MISS') as logon_time,
       a.sid,
       to_char(sysdate,'MMDD/HH24MI') as sample_time,
       b.event,
       a.sql_hash_value
from   v$session a, v$session_wait b
where  a.sid   = b.sid
and    b.event = 'direct path read';


Direct Path  Writes:

 

Possible Causes :


1. These are waits that are associated with direct write operations that write data from users’ PGAs to data files or temporary  tablespaces
2. Direct load operations (eg: Create Table as  Select (CTAS) may use this)
3.  Parallel DML operations
4. Sort IO (when a sort does not fit in memory

 

Actions :


1. If the file indicates a temporary  tablespace check for unexpected disk sort operations.
2. Ensure <Parameter:DISK_ASYNCH_IO> is TRUE . This is unlikely to reduce wait times from the wait event timings but may reduce sessions elapsed times (as synchronous direct IO is not accounted for in wait event timings).
3. Ensure the OS asynchronous IO is configured correctly. Ensure no disks are IO bound

 

SQL> select a.name, b.sid, b.value

from   v$statname a, v$sesstat b

where  a.statistic# = b.statistic#

and    b.value      > 0

and    a.name       = 'physical writes direct'

order by b.value;  2    3    4    5    6

 

NAME                                                                    SID      VALUE

---------------------------------------------------------------- ---------- ----------

physical writes direct                                                   39          7

physical writes direct                                                   31        124

 

SQL>

 

Latch Free Waits


Possible Causes :


1. This wait indicates that the process is waiting for a latch that is currently busy  (held by another process).
2. When you see a latch free wait event in the V$SESSION_WAIT view, it means the process failed to obtain the latch in the
willing-to-wait mode after spinning  _SPIN_COUNT times and went to sleep. When processes compete heavily for  latches, they will also consume more CPU resources because of spinning. The result is a higher response time

 

Actions :


1. If the TIME spent waiting for latches is significant then it is best to determine which latches are suffering from contention.
Remark:
2. A latch is a kind of low level lock. Latches apply only to memory structures in the SGA. They do not apply to database objects. An Oracle SGA has many latches, and they exist to protect various memory structures from potential corruption  by concurrent access.

3. The time spent on latch waits is an effect, not a cause; the cause is that you are doing too many block gets, and block gets require cache buffer chain latching

select a.total_waits, b.sum_of_sleeps

 from  (select total_waits from v$system_event where event = latch_free) a,

 (select sum(sleeps) sum_of_sleeps from v$latch) b;

 

Library cache latch

 

Possible Causes :


1. The library cache latches protect the  cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to add a new statement to the library cache.

2. Application is making heavy use of literal SQL- use of bind variables will reduce this latch considerably

 

Actions :


1. Latch is to ensure that the application is reusing as much as possible SQL statement representation. Use bind variables whenever ossible in the application.

2.You can reduce the library cache latch hold time by properly setting the SESSION_CACHED_CURSORS parameter.
3. Consider increasing shared pool.

 

SQL> select name from v$latch where lower(name) like '%library%';

 

NAME

----------------------------------------------------------------

library cache load lock

 

SQL> select name from v$event_name where name like '%library%';

 

NAME

----------------------------------------------------------------

library cache pin

library cache lock

library cache load lock

library cache: mutex X

library cache: bucket mutex X

library cache: dependency mutex X

library cache: mutex S

OSD IPC library

library cache revalidation

library cache shutdown

 

10 rows selected.

 

SQL>


Shared pool latch

 

Possible Causes :


1. The shared pool latch is used to protect critical operations when allocating and freeing memory in the shared pool Contentions for the shared pool and library cache latches are mainly due to intense hard  parsing.

2. A hard parse applies to new cursors and cursors that are aged out and must be re-executed

The cost of parsing a new SQL statement is expensive both in terms of CPU requirements and the number of times  the library cache and shared pool latches  may need to be acquired and released.

 

Actions :


1. Ways to reduce the shared pool latch  are, avoid hard parses when possible, parse once, execute many.

2. Eliminating literal SQL is also useful to avoid the shared pool latch. The size  of the shared_pool and use of MTS  (shared server option) also greatly  influences the shared pool latch.
3. The workaround is to set the initialization parameter  CURSOR_SHARING to FORCE. This  allows statements that differ in literal values but are otherwise identical to share a cursor and therefore reduce latch contention, memory usage, and  hard parse.

 

 

 

Row cache objects latch

 

Possible Causes :


1. This latch comes into play when user processes are attempting to  access the cached data dictionary values.

 

Actions :


1. It is not common to have contention in this latch and the only way to reduce contention for this latch is by increasing the size of the shared pool (SHARED_POOL_SIZE).

2. Use Locally Managed tablespaces for your application objects especially indexes Review and amend your database logical design , a good example is to merge or decrease the number of  indexes on tables with heavy inserts

 

No comments:

Post a Comment

ORA-38824 for trigger FLOWS_FILES.wwv_biu_flow_file_objects.

  ORA-38824 for trigger FLOWS_FILES.wwv_biu_flow_file_objects .   Issue Faced: After installing apex 20.2  some of the APEX functions were n...