Wait Events in Oracle
db file sequential reads:
Possible Causes :
1.Use of an unselective index
2.Fragmented Indexes
3.High I/O on a particular disk or mount point
4.Bad application design
5.Index reads performance can be affected by slow I/O subsystem and/or
poor database files layout, which result in a higher average wait
time
Actions :
1.Check indexes on the table to ensure that the right index is being used
2.Check the column order of the index with
the WHERE clause of the Top SQL statements
3.Rebuild indexes with a high clustering factor
4.Use partitioning to reduce the amount of blocks
being visited
5.Make sure optimizer statistics are up to date
6.Inspect the execution plans of the SQL statements
that access data through indexes
Query to
find db file sequential read;
select b.sid,
nvl(substr(a.object_name,1,30),
'P1='||b.p1||' P2='||b.p2||' P3='||b.p3)
object_name,
a.subobject_name,
a.object_type
from dba_objects a, v$session_Wait b, x$bh c
where c.obj = a.object_id(+)
and b.p1 = c.file#(+)
and b.p2 = c.dbablk(+)
and b.event = 'db file sequential read'
union
select b.sid,
nvl(substr(a.object_name,1,30),
'P1='||b.p1||' P2='||b.p2||' P3='||b.p3)
object_name,
a.subobject_name,
a.object_type
from dba_objects a, v$session_Wait b, x$bh c
where c.obj = a.data_object_id(+)
and b.p1 = c.file#(+)
and b.p2 = c.dbablk(+)
and b.event = 'db file sequential read'
order by 1;
db file
scattered reads:
Possible Causes :
1. The
Oracle session has requested and is waiting for multiple contiguous database
blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be read into the
SGA from disk.
2. Full Table scans
3. Fast Full Index Scans
Actions :
1. Optimize
multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT
2. Partition pruning to reduce number of
blocks visited
3. Consider the usage of multiple buffer pools and
cache frequently used indexes/tables in the KEEP pool
4.Optimize the SQL statement that initiated most of the waits. The goal is to
minimize the number of physical and logical reads.
5. Should the statement access the data by a full table scan or index FFS?
Would an index range or unique scan
SQL> select * from v$system_event where event =
'db file scattered read';
SQL> select *
from
v$session_event
where event
= 'db file scattered read'
order by time_waited;
select a.sid, b.name, a.value
from
v$sesstat a, v$statname b
where
a.statistic# = b.statistic#
and
a.value <> 0
and b.name
= 'table scan blocks gotten'
order by 3,1;
log file
parallel write:
Possible Causes :
1. LGWR
waits while writing contents of the redo log buffer cache to the online log
files on disk
2. I/O wait on sub system holding the online redo log files
Actions :
1. Reduce
the amount of redo being generated
2. Do not leave tablespaces in hot backup mode for
longer than necessary
3. Do not use RAID 5 for redo log files
4. Use faster disks for redo log files
5. Ensure that the disks holding the archived
redo log files and the online redo log files are separate so as to avoid contention
6.Consider using NOLOGGING or UNRECOVERABLE options
in SQL statements
SQL> select event, time_waited,
average_wait
from
v$system_event
where
event in ('log file parallel write','log file sync'); 2 3
EVENT TIME_WAITED
AVERAGE_WAIT
----------------------------------------------------------------
----------- ------------
log file parallel write
9884 7.76
log file sync 1043 25.44
find out who is committing frequently
with the following query:
SQL>
select sid, value
from
v$sesstat
where
statistic# = (select statistic#
from
v$statname
where
name = 'user commits')
order by value; 2
3 4 5
6
SID VALUE
---------- ----------
2 0
92 0
89 0
94 0
excessive
commits is high redo wastage.
SQL> select b.name, a.value,
round(sysdate - c.startup_time) days_old
from
v$sysstat a, v$statname b, v$instance c
where
a.statistic# = b.statistic#
and
b.name in ('redo
wastage','redo size'); 2 3
4
NAME
VALUE DAYS_OLD
----------------------------------------------------------------
---------- ----------
redo size
117105528 0
redo wastage 329192 0
query to find the average number of redo
log blocks per write and the average LGWR I/O size in bytes:
SQL>
select round((a.value / b.value) + 0.5,0) as avg_redo_blks_per_write,
round((a.value / b.value) + 0.5,0) * c.lebsz
as avg_io_size
from
v$sysstat a, v$sysstat b, x$kccle c
where
c.lenum = 1
and
a.name = 'redo blocks written'
and
b.name = 'redo writes'; 2 3 4
5 6
AVG_REDO_BLKS_PER_WRITE AVG_IO_SIZE
----------------------- -----------
186 95232
log file
sync:
Possible Causes :
1. Oracle foreground processes are waiting for a COMMIT or ROLLBACK to complete
Actions :
2. Tune LGWR to get good throughput to disk eg: Do not put redo logs
on RAID5
3. Reduce overall number of commits by batching
transactions so that there are fewer distinct COMMIT operations
Actions
:
1. Tune
LGWR to get good throughput to disk eg: Do not put redo logs on
RAID5
2. Reduce
overall number of commits by batching transactions so that there are fewer distinct
COMMIT operations
Guidelines For Resolving 'Log File Sync' Waits
If CPU starvation is an issue, then reducing CPU
starvation is the correct step to resolve it.
If commit rate is higher, then decreasing commits
is correct step but, in a few cases, if that is not possible, increasing the
priority of LGWR (using nice) or increasing the priority class of LGWR to RT
might provide some relief.
Solid State Disk devices also can be used if the
redo size is extreme. in some case , it is also preferable to decrease redo
size.
check redo switch per hours , normally it should be
4-5 switch per hours , Sometimes redo logs are very less in size , Oracle
recommends, maximum 8 switch per hour .It may be change as per environment to
environment .
User commit %
and user rollback % can be seen from AWR report which could help to
troubleshoot the issue .
If Redo logs are undersized or huge size both are
not good , so tune properly.
buffer
busy waits:
Possible Causes :
1. Buffer busy waits are common in an I/O-bound Oracle system.
2. The two main cases where this can occur are:
3. Another session is reading the block into the buffer
4. Another session holds the buffer in an incompatible mode to our request
5. These waits indicate read/read, read/write, or write/write contention.
6. The Oracle session is waiting to pin a buffer .A buffer must be pinned
before it can be read or modified. Only one process can pin a buffer at any one time.
Actions :
1. The
main way to reduce buffer busy waits is to reduce the total I/O on the system
2. Depending on the block type, the actions will
differ Data Blocks
3. Eliminate HOT blocks from the application.
Check for repeatedly scanned / unselective indexes.
4. Try rebuilding the object with a higher PCTFREE
so that you reduce the number of rows per block.
5. Check for ‘right- hand-indexes’ (indexes that get inserted into at the same
point by many processes).
6. Increase INITRANS and MAXTRANS and reduce
PCTUSED This will make the table less dense .
SQL> select * from v$system_event where event
like '%wait%';
SQL> select * from v$waitstat;
select p1 "File #", p2 "Block
#", p3 "Reason Code" from v$session_wait where event = 'buffer
busy waits';
free
buffer waits:
Possible Causes :
1. This means we are waiting for a free buffer but there are none available in
the cache because there are too many dirty buffers in the cache
2. Either the buffer cache is too small or the DBWR
is slow in writing modified buffers to disk
3. DBWR is unable to keep up to the
write requests
4. Checkpoints happening too fast – maybe due
to high database activity and under-sized online redo log files
5. Large sorts and full table scans are filling the
cache with modified blocks faster than the DBWR is able to write to disk
6. If the number of dirty buffers that need to be written to
disk is larger than the number that DBWR can write per batch, then these
waits can be observed
Actions :
1. Reduce
checkpoint frequency – increase the size of the online redo log files
2. Examine the size of the buffer cache – consider
increasing the size of the buffer cache in the SGA
3. Set disk_asynch_io = true set If not using
asynchronous I/O increase the number of db writer processes or dbwr slaves Ensure
hot spots do not exist by spreading datafiles over disks and disk controllers Pre-sorting
or reorganizing data can help.
SQL> select * from v$sysstat where name in('free
buffer requested','free buffer inspected');
STATISTIC# NAME
CLASS VALUE
STAT_ID CON_ID
----------
---------------------------------------------------------------- ----------
---------- ---------- -----
230
free buffer requested 8 53196 3411924934 0
234
free buffer inspected
8 0 941676439 0
SQL> select * from v$system_event where event
='free buffer waits';
no rows selected
enqueue
waits
Possible Causes :
1. This wait event indicates a wait for a lock that is held by another
session (or sessions) in an incompatible mode to the requested mode.
2. TX Transaction Lock
3. Generally due to table or application set up
issues
4. This indicates contention for row-level lock.
This wait occurs when a transaction tries to update or delete rows that are
currently locked by another transaction.
5. This usually is an application issue.
TM DML enqueue lock
6. Generally due to application issues,
particularly if foreign key constraints have not been indexed.
ST lock
7. Database actions that modify the UET$ (used
extent) and FET$ (free extent) tables require the ST lock, which includes
actions such as drop, truncate, and coalesce.
8. Contention for the ST lock indicates there are
multiple sessions actively performing dynamic disk space allocation or
deallocation in dictionary managed tablespaces
Actions :
1. Reduce waits and wait times
2. The action to take depends on the
lock type which is causing the most problems
3. Whenever you see an enqueue wait event for the
TX enqueue, the first step is to find out who the blocker is and if there are
multiple waiters for the same resource
4. Waits for TM enqueue in Mode 3 are primarily due
to unindexed foreign key columns.
5. Create indexes on foreign keys < 10g
6. Following are some of the things you can do to
minimize ST lock contention in your database:
7. Use locally managed tablespaces
8. Recreate all temporary tablespaces using the CREATE TEMPORARY
TABLESPACE TEMPFILE… command.
SQL>
SELECT * FROM V$RESOURCE_LIMIT WHERE RESOURCE_NAME IN ('dml_locks',
'enqueue_resources');
how
long a session has been waiting for certain enqueues since the database was started?
SQL> SELECT * FROM
(SELECT SUBSTR(EVENT, 1, 50) EVENT, TIME_WAITED
FROM V$SYSTEM_EVENT
WHERE EVENT LIKE 'enq: %'
ORDER BY TIME_WAITED DESC)
WHERE ROWNUM <=10; 2
3 4 5
6
EVENT
TIME_WAITED
enq: CR - block range reuse ckpt
390
enq: JG - queue lock
229
enq: PR - contention 41
enq: RO - fast object reuse
28
enq: CF - contention
4
enq: PV - syncstart 3
6 rows selected.
which
tables are most responsible for TX enqueue waits?
SQL> SELECT * FROM
(SELECT
OBJECT_NAME, SUBSTR(STATISTIC_NAME, 1, 30), VALUE
FROM
V$SEGMENT_STATISTICS
WHERE
STATISTIC_NAME = 'ITL waits' OR
STATISTIC_NAME = 'row lock waits'
ORDER BY
VALUE DESC )
WHERE ROWNUM <=10;
Cache
buffer chain latch
Possible Causes :
1. Processes
need to get this latch when they need to move buffers based on the LRU
block replacement policy in the buffer cache
2. The cache buffer lru chain latch is acquired in order to introduce a new
block into the buffer cache and when writing a buffer back to disk,
specifically when trying to scan the LRU (least recently used) chain
containing all the dirty blocks in the buffer cache. Competition for the cache
buffers lru chain .
3. latch is symptomatic of intense buffer
cache activity caused by inefficient SQL statements. Statements
that repeatedly scan large unselective indexes or perform full table scans are
the prime culprits.
4.Heavy contention for this latch is
generally due to heavy buffer cache activity which can be caused,
for example, by:
Repeatedly scanning large unselective indexes
Actions :
1. Contention
in this latch can be avoided implementing multiple buffer pools or increasing
the number of LRU latches with the parameter DB_BLOCK_LRU_LATCHES (The
default value is generally sufficient for most systems).
2. Its possible to reduce contention for the cache
buffer lru chain latch by increasing the size of the buffer cache
and thereby reducing the rate at which new blocks are introduced
into the buffer cache.
SQL> select latch# from v$latch
where name = 'cache buffers chains';
2
LATCH#
----------
327
SQL> select count(1) cnt from v$latch_children
where latch# = 203;
CNT
----------
0
Direct Path Reads
Possible Causes :
1. These
waits are associated with direct read operations which read data directly into
the sessions PGA bypassing the SGA
2. The “direct path read” and “direct path
write” wait events are related to operations that are performed in PGA like
sorting, group by operation, hash join
3. In DSS type systems, or during heavy batch
periods, waits on “direct path read” are quite normal However, for an OLTP
system these waits are significant
4. These wait events can occur during sorting operations which is not
surprising as direct path reads and writes usually occur in connection with
temporary tsegments
5. SQL statements with functions that require sorts, such as ORDER BY, GROUP
BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace
when the input size is larger than the work area in the PGA
Actions :
1. Ensure
the OS asynchronous IO is configured correctly.
2. Check for IO heavy sessions / SQL and see if the amount of IO can be
reduced.
Ensure no disks are IO bound.
3. Set your PGA_AGGREGATE_TARGET to appropriate value (if the parameter
WORKAREA_SIZE_POLICY = AUTO) Or set *_area_size manually (like sort_area_size
and then you have to set WORKAREA_SIZE_POLICY = MANUAL
Whenever possible use UNION ALL instead of UNION, and where applicable use HASH
JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN.
4. Make sure the optimizer selects the right driving table. Check to see if the
composite index’s columns can be rearranged to match the ORDER BY clause to
avoid sort entirely.
Also, consider automating the SQL work areas using
PGA_AGGREGATE_TARGET in Oracle9i Database.
Query V$SESSTAT> to identify sessions with high
“physical reads direct”
SQL> select
decode(ownerid,2147483644,'PARENT','CHILD') stmt_level,
audsid,
sid,
serial#,
username,
osuser,
process,
sql_hash_value,
sql_address
from
v$session
where type
<> 'BACKGROUND'
and audsid
in (select audsid
from v$session
group by audsid
having count(*) > 1)
order by audsid, stmt_level desc, sid, username,
osuser;
select a.username,
a.osuser,
to_char(a.logon_time,'MMDD/HH24MISS') as logon_time,
a.sid,
to_char(sysdate,'MMDD/HH24MI') as sample_time,
b.event,
a.sql_hash_value
from v$session a, v$session_wait b
where a.sid = b.sid
and b.event = 'direct path read';
Direct Path
Writes:
Possible Causes :
1. These
are waits that are associated with direct write operations that write data from
users’ PGAs to data files or temporary tablespaces
2. Direct load operations (eg: Create Table as Select (CTAS) may use
this)
3. Parallel DML operations
4. Sort IO (when a sort does not fit in memory
Actions :
1. If the
file indicates a temporary tablespace check for unexpected disk sort
operations.
2. Ensure <Parameter:DISK_ASYNCH_IO> is TRUE . This is unlikely to reduce
wait times from the wait event timings but may reduce sessions elapsed times
(as synchronous direct IO is not accounted for in wait event timings).
3. Ensure the OS asynchronous IO is configured correctly. Ensure no disks are
IO bound
SQL> select a.name, b.sid, b.value
from
v$statname a, v$sesstat b
where
a.statistic# = b.statistic#
and
b.value > 0
and
a.name = 'physical writes
direct'
order by b.value;
2 3 4
5 6
NAME
SID VALUE
----------------------------------------------------------------
---------- ----------
physical writes direct
39 7
physical writes direct
31 124
SQL>
Latch Free
Waits
Possible Causes :
1. This
wait indicates that the process is waiting for a latch that is currently
busy (held by another process).
2. When you see a latch free wait event in the V$SESSION_WAIT view, it means
the process failed to obtain the latch in the
willing-to-wait mode after spinning _SPIN_COUNT times and went to sleep.
When processes compete heavily for latches, they will also consume more
CPU resources because of spinning. The result is a higher response time
Actions :
1. If the
TIME spent waiting for latches is significant then it is best to determine
which latches are suffering from contention.
Remark:
2. A latch is a kind of low level lock. Latches apply only to memory structures
in the SGA. They do not apply to database objects. An Oracle SGA has many
latches, and they exist to protect various memory structures from potential
corruption by concurrent access.
3. The time spent on latch waits is an effect, not
a cause; the cause is that you are doing too many block gets, and block gets
require cache buffer chain latching
select a.total_waits, b.sum_of_sleeps
from (select total_waits from v$system_event where
event = latch_free) a,
(select
sum(sleeps) sum_of_sleeps from v$latch) b;
Library
cache latch
Possible Causes :
1. The library cache latches protect the cached SQL statements and
objects definitions held in the library cache within the shared pool. The
library cache latch must be acquired in order to add a new statement to the
library cache.
2. Application is making heavy use of literal SQL-
use of bind variables will reduce this latch considerably
Actions :
1. Latch is to ensure that the application is reusing as much as possible SQL
statement representation. Use bind variables whenever ossible in the
application.
2.You can reduce the library cache latch hold time
by properly setting the SESSION_CACHED_CURSORS parameter.
3. Consider increasing shared pool.
SQL> select name from v$latch where lower(name)
like '%library%';
NAME
----------------------------------------------------------------
library cache load lock
SQL> select name from v$event_name where name
like '%library%';
NAME
----------------------------------------------------------------
library cache pin
library cache lock
library cache load lock
library cache: mutex X
library cache: bucket mutex X
library cache: dependency mutex X
library cache: mutex S
OSD IPC library
library cache revalidation
library cache shutdown
10 rows selected.
SQL>
Shared pool latch
Possible Causes :
1. The shared pool latch is used to protect critical operations when allocating
and freeing memory in the shared pool Contentions for the shared pool and
library cache latches are mainly due to intense hard parsing.
2. A hard parse applies to new cursors and cursors
that are aged out and must be re-executed
The cost of parsing a new SQL statement is
expensive both in terms of CPU requirements and the number of times the
library cache and shared pool latches may need to be acquired and
released.
Actions :
1. Ways
to reduce the shared pool latch are, avoid hard parses when possible,
parse once, execute many.
2. Eliminating literal SQL is also useful to
avoid the shared pool latch. The size of the shared_pool and use of
MTS (shared server option) also greatly influences the shared pool
latch.
3. The workaround is to set the initialization parameter CURSOR_SHARING
to FORCE. This allows statements that differ in literal values but are
otherwise identical to share a cursor and therefore reduce latch contention,
memory usage, and hard parse.
Row cache
objects latch
Possible Causes :
1. This
latch comes into play when user processes are attempting to access the
cached data dictionary values.
Actions :
1. It is
not common to have contention in this latch and the only way to reduce
contention for this latch is by increasing the size of the shared pool
(SHARED_POOL_SIZE).
2. Use Locally Managed tablespaces for your
application objects especially indexes Review and amend your database
logical design , a good example is to merge or decrease the number of
indexes on tables with heavy inserts
No comments:
Post a Comment