Common Oracle Wait Events &
Actions
db file sequential reads
Possible Causes :
· Use of an unselective index
· Fragmented Indexes
· High I/O on a particular disk or mount point
· Bad application design
· Index reads performance can be affected by slow I/O subsystem and/or poor database files layout, which result in a higher average wait time
· Use of an unselective index
· Fragmented Indexes
· High I/O on a particular disk or mount point
· Bad application design
· Index reads performance can be affected by slow I/O subsystem and/or poor database files layout, which result in a higher average wait time
Actions :
· Check indexes on the table to ensure that the right index is being used
· Check indexes on the table to ensure that the right index is being used
· Check the column
order of the index with the WHERE clause of the Top SQL statements
· Rebuild indexes with
a high clustering factor
· Use partitioning to
reduce the amount of blocks being visited
· Make sure optimizer
statistics are up to date
· Relocate ‘hot’
datafiles
· Consider the usage of
multiple buffer pools and cache frequently used indexes/tables in the KEEP pool
· Inspect the execution
plans of the SQL statements that access data through indexes
· Is it appropriate for
the SQL statements to access data through index lookups?
· Would full table
scans be more efficient?
· Do the statements use
the right driving table?
· The optimization goal
is to minimize both the number of logical and physical I/Os.
Remarks:
· The Oracle process wants a block that is currently not in the SGA, and it is waiting for the database block to be read into the SGA from disk.
· Significant db file sequential read wait time is most likely an application issue.
· If the DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the number of blocks in the table, then most of the rows in the table are ordered. This is desirable.
· The Oracle process wants a block that is currently not in the SGA, and it is waiting for the database block to be read into the SGA from disk.
· Significant db file sequential read wait time is most likely an application issue.
· If the DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the number of blocks in the table, then most of the rows in the table are ordered. This is desirable.
· However, if the
clustering factor approaches the number of rows in the table, it means the rows
in the table are randomly ordered and thus it requires more I/Os to complete
the operation. You can improve the index’s clustering factor by rebuilding the
table so that rows are ordered according to the index key and rebuilding the
index thereafter.
· The
OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING initialization parameters
can influence the optimizer to favour the nested loops operation and choose an
index access path over a full table scan.
db file scattered reads
Possible Causes :
· The Oracle session has requested and is waiting for multiple contiguous database blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be read into the SGA from disk.
· Full Table scans
· The Oracle session has requested and is waiting for multiple contiguous database blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be read into the SGA from disk.
· Full Table scans
· Fast Full Index Scans
Actions :
· Optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT
· Optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT
· Partition pruning to
reduce number of blocks visited
· Consider the usage of
multiple buffer pools and cache frequently used indexes/tables in the KEEP pool
· Optimize the SQL statement that initiated most of the waits. The goal is to minimize the number of physical
and logical reads.
· Should the statement access the data by a full table scan or index FFS? Would an index range or unique scan
be more efficient? Does the query use the right driving table?
· Are the SQL predicates appropriate for hash or merge join?
· If full scans are appropriate, can parallel query improve the response time?
· The objective is to reduce the demands for both the logical and physical I/Os, and this is best
achieved through SQL and application tuning.
· Make sure all statistics are representative of the actual data. Check the LAST_ANALYZED date
· Optimize the SQL statement that initiated most of the waits. The goal is to minimize the number of physical
and logical reads.
· Should the statement access the data by a full table scan or index FFS? Would an index range or unique scan
be more efficient? Does the query use the right driving table?
· Are the SQL predicates appropriate for hash or merge join?
· If full scans are appropriate, can parallel query improve the response time?
· The objective is to reduce the demands for both the logical and physical I/Os, and this is best
achieved through SQL and application tuning.
· Make sure all statistics are representative of the actual data. Check the LAST_ANALYZED date
Remarks:
· If an application that has been running fine for a while suddenly clocks a lot of time on the db file scattered read event and there hasn’t been a code change, you might want to check to see if one or more indexes has been dropped or become unusable.
· Or whether the stats has been stale.
· If an application that has been running fine for a while suddenly clocks a lot of time on the db file scattered read event and there hasn’t been a code change, you might want to check to see if one or more indexes has been dropped or become unusable.
· Or whether the stats has been stale.
log file parallel write
Possible Causes :
· LGWR waits while writing contents of the redo log buffer cache to the online log files on disk
· I/O wait on sub system holding the online redo log files
· LGWR waits while writing contents of the redo log buffer cache to the online log files on disk
· I/O wait on sub system holding the online redo log files
Actions :
· Reduce the amount of redo being generated
· Reduce the amount of redo being generated
· Do not leave
tablespaces in hot backup mode for longer than necessary
· Do not use RAID 5 for
redo log files
· Use faster disks for
redo log files
· Ensure that the disks
holding the archived redo log files and the online redo log files are separate
so as to avoid contention
· Consider using NOLOGGING
or UNRECOVERABLE options in SQL statements
log file sync:
Possible Causes :
· Oracle foreground processes are waiting for a COMMIT or ROLLBACK to complete
Actions :
· Tune LGWR to get good throughput to disk eg: Do not put redo logs on RAID5
· Oracle foreground processes are waiting for a COMMIT or ROLLBACK to complete
Actions :
· Tune LGWR to get good throughput to disk eg: Do not put redo logs on RAID5
· Reduce overall number
of commits by batching transactions so that there are fewer distinct COMMIT
operations
Actions :
·
Tune LGWR to get good
throughput to disk eg: Do not put redo logs on RAID5
·
Reduce overall number
of commits by batching transactions so that there are fewer distinct COMMIT
operations
buffer busy waits:
Possible Causes :
· Buffer busy waits are common in an I/O-bound Oracle system.
· The two main cases where this can occur are:
· Another session is reading the block into the buffer
· Another session holds the buffer in an incompatible mode to our request
· These waits indicate read/read, read/write, or write/write contention.
· The Oracle session is waiting to pin a buffer .A buffer must be pinned before it can be read or modified. Only one process can pin a
buffer at any one time.
· Buffer busy waits are common in an I/O-bound Oracle system.
· The two main cases where this can occur are:
· Another session is reading the block into the buffer
· Another session holds the buffer in an incompatible mode to our request
· These waits indicate read/read, read/write, or write/write contention.
· The Oracle session is waiting to pin a buffer .A buffer must be pinned before it can be read or modified. Only one process can pin a
buffer at any one time.
· This wait can be
intensified by a large block size as more rows can be contained within
the block
· This wait happens
when a session wants to access a database block in the buffer cache but it
cannot as the buffer is “busy
· It is also often due
to several processes repeatedly reading the same blocks (eg: i lots of people
scan the same index or data block)
Actions :
· The main way to reduce buffer busy waits is to reduce the total I/O on the system
· The main way to reduce buffer busy waits is to reduce the total I/O on the system
· Depending on the
block type, the actions will differ
Data Blocks
· Eliminate HOT blocks
from the application. Check for repeatedly scanned / unselective indexes.
· Try rebuilding the
object with a higher PCTFREE so that you reduce the number of rows per block.
·
Check for ‘right- hand-indexes’ (indexes that get inserted into at the same point by many processes).
·
Check for ‘right- hand-indexes’ (indexes that get inserted into at the same point by many processes).
· Increase INITRANS and
MAXTRANS and reduce PCTUSED This will make the table less dense .
· Reduce the number of
rows per block
Segment Header
· Increase of number of
FREELISTs and FREELIST GROUPs
Undo Header
· Increase the number
of Rollback Segments.
free buffer waits:
Possible Causes :
· This means we are waiting for a free buffer but there are none available in the cache because there are too many dirty buffers in the cache
· This means we are waiting for a free buffer but there are none available in the cache because there are too many dirty buffers in the cache
· Either the buffer
cache is too small or the DBWR is slow in writing modified buffers to disk
· DBWR is unable to
keep up to the write requests
· Checkpoints happening
too fast – maybe due to high database activity and under-sized
online redo log files
· Large sorts and full
table scans are filling the cache with modified blocks faster than the
DBWR is able to write to disk
· If the number of dirty buffers that need to be written to disk is larger than the number that DBWR can write per batch, then these waits can be observed
· If the number of dirty buffers that need to be written to disk is larger than the number that DBWR can write per batch, then these waits can be observed
Actions :
Reduce checkpoint frequency – increase the size of the online redo log files
Reduce checkpoint frequency – increase the size of the online redo log files
Examine the size of the
buffer cache – consider increasing the size of the buffer cache in the SGA
Set disk_asynch_io = true
set
If not using asynchronous
I/O increase the number of db writer processes or dbwr slaves
Ensure hot spots do not
exist by spreading datafiles over disks and disk controllers
Pre-sorting or reorganizing
data can help
enqueue waits
Possible Causes :
· This wait event indicates a wait for a lock that is held by another session (or sessions) in an incompatible mode to the requested mode.
· This wait event indicates a wait for a lock that is held by another session (or sessions) in an incompatible mode to the requested mode.
TX Transaction Lock
· Generally due to
table or application set up issues
· This indicates
contention for row-level lock. This wait occurs when a transaction tries to
update or delete rows that are currently
locked by another transaction.
locked by another transaction.
· This usually is an
application issue.
TM DML enqueue lock
· Generally due to
application issues, particularly if foreign key constraints have not been
indexed.
ST lock
· Database actions that
modify the UET$ (used extent) and FET$ (free extent) tables require the ST
lock, which includes actions such as drop, truncate, and coalesce.
· Contention for the ST
lock indicates there are multiple sessions actively performing
· dynamic disk space
allocation or deallocation
· in dictionary managed
tablespaces
Actions :
· Reduce waits and wait times
· Reduce waits and wait times
· The action to take
depends on the lock type which is causing the most problems
· Whenever you see an
enqueue wait event for the TX enqueue, the first step is to find out who the
blocker is and if there are multiple waiters for the same resource
· Waits for TM enqueue
in Mode 3 are primarily due to unindexed foreign key columns.
· Create indexes on
foreign keys < 10g
· Following are some of
the things you can do to minimize ST lock contention in your database:
· Use locally managed
tablespaces
· Recreate all temporary tablespaces using the CREATE TEMPORARY TABLESPACE TEMPFILE… command.
· Recreate all temporary tablespaces using the CREATE TEMPORARY TABLESPACE TEMPFILE… command.
Cache buffer chain latch
Possible Causes :
· Processes need to get this latch when they need to move buffers based on the LRU block replacement policy in the buffer cache
· The cache buffer lru chain latch is acquired in order to introduce a new block into the buffer cache and when writing a buffer
back to disk, specifically when trying to scan the LRU (least recently used) chain containing all the dirty blocks in the buffer
cache. Competition for the cache buffers lru chain .
· Processes need to get this latch when they need to move buffers based on the LRU block replacement policy in the buffer cache
· The cache buffer lru chain latch is acquired in order to introduce a new block into the buffer cache and when writing a buffer
back to disk, specifically when trying to scan the LRU (least recently used) chain containing all the dirty blocks in the buffer
cache. Competition for the cache buffers lru chain .
· latch is symptomatic
of intense buffer cache activity caused by inefficient SQL
statements. Statements that repeatedly scan
· large unselective
indexes or perform full table scans are the prime culprits.
· Heavy contention for
this latch is generally due to heavy buffer cache activity which
can be caused, for example, by:
Repeatedly scanning large unselective indexes
Repeatedly scanning large unselective indexes
Actions :
Contention in this latch can be avoided implementing multiple buffer pools or increasing the number of LRU latches with the parameter DB_BLOCK_LRU_LATCHES (The default value is generally sufficient for most systems).
Contention in this latch can be avoided implementing multiple buffer pools or increasing the number of LRU latches with the parameter DB_BLOCK_LRU_LATCHES (The default value is generally sufficient for most systems).
Its possible to reduce
contention for the cache buffer lru chain latch by increasing the size of
the buffer cache and thereby reducing the rate at which new blocks
are introduced into the buffer cache.
Direct Path Reads
Possible Causes :
· These waits are associated with direct read operations which read data directly into the sessions PGA bypassing the SGA
· These waits are associated with direct read operations which read data directly into the sessions PGA bypassing the SGA
· The “direct path
read” and “direct path write” wait events are related to operations that are
performed in PGA like sorting, group by operation, hash join
· In DSS type systems,
or during heavy batch periods, waits on “direct path read” are quite normal
However, for an OLTP system these waits are significant
· These wait events can occur during sorting operations which is not surprising as direct path reads and writes usually occur in connection with temporary tsegments
· SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA
Actions :
Ensure the OS asynchronous IO is configured correctly.
Check for IO heavy sessions / SQL and see if the amount of IO can be reduced.
Ensure no disks are IO bound.
Set your PGA_AGGREGATE_TARGET to appropriate value (if the parameter WORKAREA_SIZE_POLICY = AUTO) Or set *_area_size manually (like sort_area_size and then you have to set WORKAREA_SIZE_POLICY = MANUAL
Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN.
Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely.
· These wait events can occur during sorting operations which is not surprising as direct path reads and writes usually occur in connection with temporary tsegments
· SQL statements with functions that require sorts, such as ORDER BY, GROUP BY, UNION, DISTINCT, and ROLLUP, write sort runs to the temporary tablespace when the input size is larger than the work area in the PGA
Actions :
Ensure the OS asynchronous IO is configured correctly.
Check for IO heavy sessions / SQL and see if the amount of IO can be reduced.
Ensure no disks are IO bound.
Set your PGA_AGGREGATE_TARGET to appropriate value (if the parameter WORKAREA_SIZE_POLICY = AUTO) Or set *_area_size manually (like sort_area_size and then you have to set WORKAREA_SIZE_POLICY = MANUAL
Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of SORT MERGE and NESTED LOOPS instead of HASH JOIN.
Make sure the optimizer selects the right driving table. Check to see if the composite index’s columns can be rearranged to match the ORDER BY clause to avoid sort entirely.
Also, consider automating
the SQL work areas using PGA_AGGREGATE_TARGET in Oracle9i Database.
Query V$SESSTAT> to
identify sessions with high “physical reads direct”
Remark:
· Default size of HASH_AREA_SIZE is twice that of SORT_AREA_SIZE
· Default size of HASH_AREA_SIZE is twice that of SORT_AREA_SIZE
· Larger HASH_AREA_SIZE
will influence optimizer to go for hash joins instead of nested loops
· Hidden parameter
DB_FILE_DIRECT_IO_COUNT can impact the direct path read performance.It sets
the maximum I/O buffer size of direct read and write operations. Default is 1M
in 9i
Direct Path Writes:
Possible Causes :
· These are waits that are associated with direct write operations that write data from users’ PGAs to data files or temporary tablespaces
· Direct load operations (eg: Create Table as Select (CTAS) may use this)
· Parallel DML operations
· Sort IO (when a sort does not fit in memory
· These are waits that are associated with direct write operations that write data from users’ PGAs to data files or temporary tablespaces
· Direct load operations (eg: Create Table as Select (CTAS) may use this)
· Parallel DML operations
· Sort IO (when a sort does not fit in memory
Actions :
If the file indicates a temporary tablespace check for unexpected disk sort operations.
Ensure
<Parameter:DISK_ASYNCH_IO> is TRUE . This is unlikely to reduce wait times from the wait event timings but
may reduce sessions elapsed times (as synchronous direct IO is not accounted for in wait event timings).
Ensure the OS asynchronous IO is configured correctly.
Ensure no disks are IO bound
If the file indicates a temporary tablespace check for unexpected disk sort operations.
Ensure
<Parameter:DISK_ASYNCH_IO> is TRUE . This is unlikely to reduce wait times from the wait event timings but
may reduce sessions elapsed times (as synchronous direct IO is not accounted for in wait event timings).
Ensure the OS asynchronous IO is configured correctly.
Ensure no disks are IO bound
Latch Free Waits
Possible Causes :
· This wait indicates that the process is waiting for a latch that is currently busy (held by another process).
· When you see a latch free wait event in the V$SESSION_WAIT view, it means the process failed to obtain the latch in the
willing-to-wait mode after spinning _SPIN_COUNT times and went to sleep. When processes compete heavily for latches, they will also consume more CPU resources because of spinning. The result is a higher response time
Possible Causes :
· This wait indicates that the process is waiting for a latch that is currently busy (held by another process).
· When you see a latch free wait event in the V$SESSION_WAIT view, it means the process failed to obtain the latch in the
willing-to-wait mode after spinning _SPIN_COUNT times and went to sleep. When processes compete heavily for latches, they will also consume more CPU resources because of spinning. The result is a higher response time
Actions :
· If the TIME spent waiting for latches is significant then it is best to determine which latches are suffering from contention.
Remark:
· A latch is a kind of low level lock. Latches apply only to memory structures in the SGA. They do not apply to database objects. An Oracle SGA has many latches, and they exist to protect various memory structures from potential corruption by concurrent access.
· If the TIME spent waiting for latches is significant then it is best to determine which latches are suffering from contention.
Remark:
· A latch is a kind of low level lock. Latches apply only to memory structures in the SGA. They do not apply to database objects. An Oracle SGA has many latches, and they exist to protect various memory structures from potential corruption by concurrent access.
· The time spent on
latch waits is an effect, not a cause; the cause is that you are doing too many
block gets, and block gets require cache buffer chain latching
Library cache latch
Possible Causes :
· The library cache latches protect the cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to add a new statement to the library cache.
· The library cache latches protect the cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to add a new statement to the library cache.
· Application is making
heavy use of literal SQL- use of bind variables will reduce this latch
considerably
Actions :
· Latch is to ensure that the application is reusing as much as possible SQL statement representation. Use bind variables whenever ossible in the application.
· Latch is to ensure that the application is reusing as much as possible SQL statement representation. Use bind variables whenever ossible in the application.
· You can reduce the
library cache latch hold time by properly setting the SESSION_CACHED_CURSORS
parameter.
· Consider increasing shared pool.
Remark:
· Larger shared pools tend to have long free lists and processes that need to allocate space in them must spend extra time scanning the long free lists while holding the shared pool latch
· Consider increasing shared pool.
Remark:
· Larger shared pools tend to have long free lists and processes that need to allocate space in them must spend extra time scanning the long free lists while holding the shared pool latch
· if your database is
not yet on Oracle9i Database, an oversized shared pool can increase the
contention for the shared pool latch..
Shared pool latch
Possible Causes :
The shared pool latch is used to protect critical operations when allocating and freeing memory in the shared pool
The shared pool latch is used to protect critical operations when allocating and freeing memory in the shared pool
Contentions for the shared
pool and library cache latches are mainly due to intense hard parsing. A
hard parse applies to new cursors and cursors that are aged out and must be
re-executed
The cost of parsing a new
SQL statement is expensive both in terms of CPU requirements and the number of
times the library cache and shared pool latches may need to be
acquired and released.
Actions :
· Ways to reduce the shared pool latch are, avoid hard parses when possible, parse once, execute many.
· Ways to reduce the shared pool latch are, avoid hard parses when possible, parse once, execute many.
· Eliminating literal
SQL is also useful to avoid the shared pool latch. The size of the
shared_pool and use of MTS (shared server option) also greatly influences
the shared pool latch.
· The workaround is to set the initialization parameter CURSOR_SHARING to FORCE. This allows statements that differ in literal
values but are otherwise identical to share a cursor and therefore reduce latch contention, memory usage, and hard parse.
· The workaround is to set the initialization parameter CURSOR_SHARING to FORCE. This allows statements that differ in literal
values but are otherwise identical to share a cursor and therefore reduce latch contention, memory usage, and hard parse.
Row cache objects latch
Possible Causes :
This latch comes into play when user processes are attempting to access the cached data dictionary values.
This latch comes into play when user processes are attempting to access the cached data dictionary values.
Actions :
· It is not common to have contention in this latch and the only way to reduce contention for this latch is by increasing the size of the shared pool (SHARED_POOL_SIZE).
· It is not common to have contention in this latch and the only way to reduce contention for this latch is by increasing the size of the shared pool (SHARED_POOL_SIZE).
· Use Locally Managed
tablespaces for your application objects especially indexes
· Review and amend your database logical
design , a good example is to merge or decrease the number of indexes on
tables with heavy inserts
Remark:
· Configuring the library cache to an acceptable size usually ensures that the data dictionary cache is also properly sized. So tuning Library Cache will tune Row Cache indirectly.
Remark:
· Configuring the library cache to an acceptable size usually ensures that the data dictionary cache is also properly sized. So tuning Library Cache will tune Row Cache indirectly.
Reference: https://samadhandba.wordpress.com
===========================================================
Whatever changes you make in the
database, the changes are recorded in the online redo log files. These are
important files for recovery in the event of a crash. There must be atleast two
redo log groups in the database. The online redo log files are used in a
circular fashion, i.e., when the current redo log file is full the changes are
recorded in the next available online redo log file. When the last redo log
file is full the changes are then recorded in the first redo log file
overwriting the already available redo changes. If you have enabled the
archiving then the online redo log file must have been archived before it is
overwritten.
A log switch occurs when the current online redo log file is full. It enables the LGWr process to close the current redo log file and open the next available redo log file and start writing the changes in that file. A checkpoint occurs during the log switch which enables the DBWr to write the dirty buffers to be flushed to the data files.
When the redo log file size is small then the log file gets filled frequently causing log switches to occur more frequently. When a user session waits for log file switch completion wait event, it means the LGWr has not completed its work.
Tuning Option
Increase the size of the online redo log file. Check out the v$log_history view to see how often the log switch has taken place. Size your log files so that the log switch occurs every 30 minutes. For eg., if your current log file size is 50MB and log switch occurs every 5 minutes then increase the file size to 300MB.
A log switch occurs when the current online redo log file is full. It enables the LGWr process to close the current redo log file and open the next available redo log file and start writing the changes in that file. A checkpoint occurs during the log switch which enables the DBWr to write the dirty buffers to be flushed to the data files.
When the redo log file size is small then the log file gets filled frequently causing log switches to occur more frequently. When a user session waits for log file switch completion wait event, it means the LGWr has not completed its work.
Tuning Option
Increase the size of the online redo log file. Check out the v$log_history view to see how often the log switch has taken place. Size your log files so that the log switch occurs every 30 minutes. For eg., if your current log file size is 50MB and log switch occurs every 5 minutes then increase the file size to 300MB.
The cause for this wait event is same as
mentioned here. When you see the log file switch completion wait event you
will most likely see the checkpoint incomplete wait event. During the log
switch a checkpoint occurs. The checkpoint signals the DBWr to write the dirty
buffers to the data files.
The difference between log file switch completion and log file switch completion (Checkpoint Incomplete) wait event is, in the case of former wait event the users wait for the Log writer background process (LGWr) to complete its work (log switch). In the case of latter wait event the users wait for the Database Writer background process (DBWr) to complete its work (checkpoint).
Tuning option
Increase the size of redo log files. Increase the number of redo log groups.
The difference between log file switch completion and log file switch completion (Checkpoint Incomplete) wait event is, in the case of former wait event the users wait for the Log writer background process (LGWr) to complete its work (log switch). In the case of latter wait event the users wait for the Database Writer background process (DBWr) to complete its work (checkpoint).
Tuning option
Increase the size of redo log files. Increase the number of redo log groups.
When a user
session requires free buffers, the server process scans the LRU list to a get a
free buffer space. After scanning the LRU list up to a threshold, if the server
process could not get free space, it requests the DBWr to write the dirty
buffer from the LRU list to disk. While the DBWr process writes the dirty
buffers the session waits on 'Free Buffer Waits'.
Tuning Options
Poor SQL Statements--
Query the V$SQL view for statements that have high DISK_READS. Tune the statements to reduce the physical reads. The poorly written SQL Statements are the main cause of this wait event.
DBWr Processes--
Increase the DBWr processes (or)
Decrease the Buffer Cache (or)
Decrease the FAST_START_MTTR_TARGET parameter.
Delayed Block Cleanout---
The delayed block cleanout will cause the free buffer wait events. To avoid delayed block cleanout perform a full table scan on a table that has been loaded with a lot of rows before it is released to the application.
Small Buffer Cache----
Increase the size of Buffer Cache if you feel that the buffer cache is under sized and check for the wait event.
Tuning Options
Poor SQL Statements--
Query the V$SQL view for statements that have high DISK_READS. Tune the statements to reduce the physical reads. The poorly written SQL Statements are the main cause of this wait event.
DBWr Processes--
Increase the DBWr processes (or)
Decrease the Buffer Cache (or)
Decrease the FAST_START_MTTR_TARGET parameter.
Delayed Block Cleanout---
The delayed block cleanout will cause the free buffer wait events. To avoid delayed block cleanout perform a full table scan on a table that has been loaded with a lot of rows before it is released to the application.
Small Buffer Cache----
Increase the size of Buffer Cache if you feel that the buffer cache is under sized and check for the wait event.
Slow IO
log file sync wait
When a user
issues a commit or rollback then the redo data in the redo buffer is written to
the online redo log file. The user session waits for this event to finish
before continuing with other processing. This wait time is represented as log
file sync wait event.
A number of people have asked the question as what is the difference between log file parallel write and log file sync.
The difference is....
log file parallel write occurs when LGWR writes redo records from redo buffer to online redo log file. This may take place very frequently when it meets any one of the following condition,
1. Once in every three seconds.
2. _LOG_IO_SIZE threshold is met.
3. 1MB worth of redo entries are buffered.
4. Commit.
5. Rollback.
6. When DBWr requests.
The user sessions will never experience the log file parallel write wait event.
When the user session issue commit or rollback then it leads to log file sync wait event, which the user will experience by response time.
A number of people have asked the question as what is the difference between log file parallel write and log file sync.
The difference is....
log file parallel write occurs when LGWR writes redo records from redo buffer to online redo log file. This may take place very frequently when it meets any one of the following condition,
1. Once in every three seconds.
2. _LOG_IO_SIZE threshold is met.
3. 1MB worth of redo entries are buffered.
4. Commit.
5. Rollback.
6. When DBWr requests.
The user sessions will never experience the log file parallel write wait event.
When the user session issue commit or rollback then it leads to log file sync wait event, which the user will experience by response time.
The log file
sync event occurs when a user issues Commit or Rollback. Click here for the difference between log file sync and log
parallel write wait event.
When a user issues a commit or rollback command, the redo data in the redo buffer is written to online redo log file. This write is known as sync write. During this synchronization process the user process waits in log file sync event, while the LGWr waits on log file parallel write event.
The log file sync event is very fast and usually unnoticed by the end users. However you may notice that there are very high time waited for this wait event in certain cases. The main cause for such high wait for this event is as follows,
Too many commits
If you notice high waits at session level then it may be due to running batch processes there are commits within a loop. If that is the case then the application logic can be changed by eliminating unneccessary commits and reduce commit frequency.
If you notice high waits at system level then it may be due to short transactions. OLTP databases usually have short transactions and have high log file sync wait events. Only thing you can do to improve the performance, in this case, is to use faster IO subsystem, rawdevices.
Large Log buffer
The redo entries from buffer to log files take place either through sync writes as explained earlier or through background writes (Such as 1/3 full, 1MB redo etc). When redo log buffer is large then more redo data are accumulated in the buffer. The background writes (i.e., when redo becomes 1/3 full ) are limited or delayed. When a user issues a commit or roll back then the sync writes will take more time.
When a user issues a commit or rollback command, the redo data in the redo buffer is written to online redo log file. This write is known as sync write. During this synchronization process the user process waits in log file sync event, while the LGWr waits on log file parallel write event.
The log file sync event is very fast and usually unnoticed by the end users. However you may notice that there are very high time waited for this wait event in certain cases. The main cause for such high wait for this event is as follows,
Too many commits
If you notice high waits at session level then it may be due to running batch processes there are commits within a loop. If that is the case then the application logic can be changed by eliminating unneccessary commits and reduce commit frequency.
If you notice high waits at system level then it may be due to short transactions. OLTP databases usually have short transactions and have high log file sync wait events. Only thing you can do to improve the performance, in this case, is to use faster IO subsystem, rawdevices.
Large Log buffer
The redo entries from buffer to log files take place either through sync writes as explained earlier or through background writes (Such as 1/3 full, 1MB redo etc). When redo log buffer is large then more redo data are accumulated in the buffer. The background writes (i.e., when redo becomes 1/3 full ) are limited or delayed. When a user issues a commit or roll back then the sync writes will take more time.
The 'Buffer Busy Waits' Event occurs due to the following
reasons,
1. A user wants to access a data block for read or write operation. The block is present in the Buffer Cache but locked by another session. The user has to wait till the other session releases the lock on that block.
1. A user wants to access a data block for read or write operation. The block is present in the Buffer Cache but locked by another session. The user has to wait till the other session releases the lock on that block.
2. A user wants to access a data block for read
or write operation. The block is not present in the Buffer Cache. The block has
to be read from data files into Buffer Cache. But the same block is being read
by another session. Hence the user patiently waits for the IO of the other
session to complete. Prior to oracle 10g, this wait is referred to as Buffer
busy wait, but from oracle 10g this wait event is referred to as 'read by
other session'wait.
Tuning Options,
Run the following query to find whether any
block or range of blocks are always responsible for buffer busy waits,
SQL> select p1 "File #", p2
"Block #", p3 "Reason Code" From v$session_wait
where event = 'buffer busy waits';
where event = 'buffer busy waits';
Use the following query to find the segment the
block belongs to,
SQL> select
owner,segment_name,segment_type From dba_extents
where file_id = &file#
and &block# between block_id and block_id + blocks -1;
Once the segment name is identified use the V$Segment_Statistics view to monitor the statistics of the segment.
SQL>select * from v$segment_statistics
where owner like 'RACFIN'
and statistic_name like 'buffer busy waits'
and object_name like 'IBM_PARTY_BRANCH' ;
Use the following query to find what kind of contention is causing the buffer busy waits.
SQL> Select * from v$waitstat;
The output shows the sum and total time of all waits for particular class of block such as data block, segment header, undo header block etc.
To avoid the buffer busy waits,
1. Increase the PCTFREE and PCTUSED values to reduce the number of rows per block to avoid data block contention.
2. Increase the INITRANS value to avoid data block contention.
3. Increase the FREELIST and FREELIST GROUPS value to avoid freelist block contention and segment header block contention.
where file_id = &file#
and &block# between block_id and block_id + blocks -1;
Once the segment name is identified use the V$Segment_Statistics view to monitor the statistics of the segment.
SQL>select * from v$segment_statistics
where owner like 'RACFIN'
and statistic_name like 'buffer busy waits'
and object_name like 'IBM_PARTY_BRANCH' ;
Use the following query to find what kind of contention is causing the buffer busy waits.
SQL> Select * from v$waitstat;
The output shows the sum and total time of all waits for particular class of block such as data block, segment header, undo header block etc.
To avoid the buffer busy waits,
1. Increase the PCTFREE and PCTUSED values to reduce the number of rows per block to avoid data block contention.
2. Increase the INITRANS value to avoid data block contention.
3. Increase the FREELIST and FREELIST GROUPS value to avoid freelist block contention and segment header block contention.
Enqueue waits are locking mechanisms
that control the access to shared resources. There are various modes of
enqueues.
The following query gives you the detail of Sessions holding the lock, the lock type, mode.
SQL> select DECODE(request,0,'Holder: ','Waiter: ')sid sess, id1, id2, lmode, request, type
FROM V$LOCK
WHERE (id1, id2, type) IN (SELECT id1, id2, type FROM V$LOCK WHERE request>0)
ORDER BY id1, request ;
The most common enqueue waits are discussed below,
TYPE: TM (Table Lock)
LMODE: 3
CAUSE: Unindexed Foreign Key
SOLUTION: The holder has to issue commit or rollback. To avoid this kind of lock in first place create indexes on the foreign key columns. You can do this by taking the ID1 column value in v$lock. This ID1 value is the object ID of the child table. Use dba_objects dictionary table and get the object name. Create the index on the foreign key column.
TYPE: TX (Row level lock)
LMODE: 6
CAUSE: Updating or deleting rows that are currently locked by another transaction.
SOLUTION: Application issue. The lock is released when the holding session issues a commit or rollback. Killing the holding session will rollback the transaction.
RESOURCE LOCKED: Issue the following query to find the resource that is locked.
SQL> select c.sid waiter_sid, a.object_name, a.object_typefrom dba_objects a, v$session b, v$session_wait cwhere (a.object_id = b.row_wait_obj# or a.data_object_id = b.row_wait_obj#)and b.sid = c.sidand chr(bitand(c.P1,-16777216)/16777215) chr(bitand(c.P1,16711680)/65535) = ’TX’and c.event = ’enqueue’;
TYPE: TX (ITL Shortage)
LMODE: 4
CAUSE: i) ITL (Interested Transaction List) Shortage. ii) Unique Key Enforcement. iii) Bitmap index Entry.
SOLUTION: To see whether the wait is due to ITL shortage dump the data block and see how many ITL slots are being used.
SQL> Alter system dump datafile block
If it is indeed due to ITL shortage, then increase the INITRANS value of the object. Also increase the PCTFREE value of the objects.
ii) If the Wait is due to the Unique key Enforcement (i.e, if more than one session inserts the same value that has unique or primary key then the insert will not succeed). If the first session that inserted the value commits then the waiting session will receive the unique constraint violation error. If the first session rollsback then the second session succeeds.
iii) Bitmap Index Entry: A bitmap entry covers a range of ROWIDs. When a bitmap entry is locked all the ROWIDs that correspond to the bitmap entry are locked. When multiple users attempt to delete or update different rows that have the same bitmap entry then a wait for TX in mode 4 will occur.
It is difficult to find whether the lock was due to unique key enforcement or bitmap index entry by merely looking in to the V$Lock view. You have to capture the SQL statements that holder and waiter have issued. If the statement is an insert then wait is due to the unique key enforcement. If the statement is update or delete then the wait is due to the bitmap index entry.
The following query gives you the detail of Sessions holding the lock, the lock type, mode.
SQL> select DECODE(request,0,'Holder: ','Waiter: ')sid sess, id1, id2, lmode, request, type
FROM V$LOCK
WHERE (id1, id2, type) IN (SELECT id1, id2, type FROM V$LOCK WHERE request>0)
ORDER BY id1, request ;
The most common enqueue waits are discussed below,
TYPE: TM (Table Lock)
LMODE: 3
CAUSE: Unindexed Foreign Key
SOLUTION: The holder has to issue commit or rollback. To avoid this kind of lock in first place create indexes on the foreign key columns. You can do this by taking the ID1 column value in v$lock. This ID1 value is the object ID of the child table. Use dba_objects dictionary table and get the object name. Create the index on the foreign key column.
TYPE: TX (Row level lock)
LMODE: 6
CAUSE: Updating or deleting rows that are currently locked by another transaction.
SOLUTION: Application issue. The lock is released when the holding session issues a commit or rollback. Killing the holding session will rollback the transaction.
RESOURCE LOCKED: Issue the following query to find the resource that is locked.
SQL> select c.sid waiter_sid, a.object_name, a.object_typefrom dba_objects a, v$session b, v$session_wait cwhere (a.object_id = b.row_wait_obj# or a.data_object_id = b.row_wait_obj#)and b.sid = c.sidand chr(bitand(c.P1,-16777216)/16777215) chr(bitand(c.P1,16711680)/65535) = ’TX’and c.event = ’enqueue’;
TYPE: TX (ITL Shortage)
LMODE: 4
CAUSE: i) ITL (Interested Transaction List) Shortage. ii) Unique Key Enforcement. iii) Bitmap index Entry.
SOLUTION: To see whether the wait is due to ITL shortage dump the data block and see how many ITL slots are being used.
SQL> Alter system dump datafile block
If it is indeed due to ITL shortage, then increase the INITRANS value of the object. Also increase the PCTFREE value of the objects.
ii) If the Wait is due to the Unique key Enforcement (i.e, if more than one session inserts the same value that has unique or primary key then the insert will not succeed). If the first session that inserted the value commits then the waiting session will receive the unique constraint violation error. If the first session rollsback then the second session succeeds.
iii) Bitmap Index Entry: A bitmap entry covers a range of ROWIDs. When a bitmap entry is locked all the ROWIDs that correspond to the bitmap entry are locked. When multiple users attempt to delete or update different rows that have the same bitmap entry then a wait for TX in mode 4 will occur.
It is difficult to find whether the lock was due to unique key enforcement or bitmap index entry by merely looking in to the V$Lock view. You have to capture the SQL statements that holder and waiter have issued. If the statement is an insert then wait is due to the unique key enforcement. If the statement is update or delete then the wait is due to the bitmap index entry.
The control file parallel write wait
event occurs due to some operations that caused the control file to be updated,
such as
1. log switches by LGWR process.
2. adding a datafile.
3. removing a datafiles.
4. checkpoint information by CKPT process.
5. archive log information by ARCH process.
To find which sessions cause transactions to controlfile, issue the following statement.
SQL> select a.sid,decode(a.type, 'BACKGROUND', 'BACKGROUND-' || substr
(a.program,instr(a.program,'(',1,1)), 'FOREGROUND') type, b.time_waited,
round(b.time_waited/b.total_waits,4) average_wait, round((sysdate - a.logon_time)*24) hours_connected
from v$session_event b, v$session a
where a.sid = b.sid
and b.event = 'control file parallel write'
order by type, time_waited;
The output of the above statement shows which background process is writing to control file frequently, For eg., if LGWr has more time_waited then it implies that the log switches are more. If the foreground process have more time_waited then it implies that there are more changes to database that requires to update the SCN in control file.
1. log switches by LGWR process.
2. adding a datafile.
3. removing a datafiles.
4. checkpoint information by CKPT process.
5. archive log information by ARCH process.
To find which sessions cause transactions to controlfile, issue the following statement.
SQL> select a.sid,decode(a.type, 'BACKGROUND', 'BACKGROUND-' || substr
(a.program,instr(a.program,'(',1,1)), 'FOREGROUND') type, b.time_waited,
round(b.time_waited/b.total_waits,4) average_wait, round((sysdate - a.logon_time)*24) hours_connected
from v$session_event b, v$session a
where a.sid = b.sid
and b.event = 'control file parallel write'
order by type, time_waited;
The output of the above statement shows which background process is writing to control file frequently, For eg., if LGWr has more time_waited then it implies that the log switches are more. If the foreground process have more time_waited then it implies that there are more changes to database that requires to update the SCN in control file.
'controlfile sequential read' occurs while
reading control file (backup, share information from controlfile between
instances etc). The parameters in V$session_wait are as follows,
P1 - The file# of control file from which the session is reading.
P2 – The block# from which the session starts reading.
P3 – The no. of blocks the session is trying to read.
'controlfile parallel write' occurs while writing to all the control files. The parameters in V$session_wait are as follows,
P1 – No. of control files being updated.
P2 – No. of blocks that are being updated.
P3 – No. of IO requests.
Tuning Options: Use Asynchronous IO if possible. Move the controlfile to a different disk or use faster disk.
P1 - The file# of control file from which the session is reading.
P2 – The block# from which the session starts reading.
P3 – The no. of blocks the session is trying to read.
'controlfile parallel write' occurs while writing to all the control files. The parameters in V$session_wait are as follows,
P1 – No. of control files being updated.
P2 – No. of blocks that are being updated.
P3 – No. of IO requests.
Tuning Options: Use Asynchronous IO if possible. Move the controlfile to a different disk or use faster disk.
'db file parallel read' occurs during
recovery. The datablocks that need to be changed are read from various
datafiles and are placed in non-contiguous buffer blocks. The server process
waits till all the blocks are read in to the buffer.
Tuning options - same as db file sequential read.
'db file parallel write' occurs when database writer (DBWr) is performing parallel write to files and blocks. Check the average_wait in V$SYSTEM_EVENT, if it is greater than 10 milliseconds then it signals a slow IO throughput.
Tuning options - The main blocker for this wait event is the OS I/O sub systems. Hence use OS monitoring tools (sar -d, iostat) to check the write performance. To improve the average_wait time you can consider the following,
If the data files reside on raw devices use asynchronous writes. However if the data files reside on cooked file systems use synchronous writes with direct IO.
Note: If the average_wait time for db file parallel write is high then you may see that the system waits on free buffer waits event.
Tuning options - same as db file sequential read.
'db file parallel write' occurs when database writer (DBWr) is performing parallel write to files and blocks. Check the average_wait in V$SYSTEM_EVENT, if it is greater than 10 milliseconds then it signals a slow IO throughput.
Tuning options - The main blocker for this wait event is the OS I/O sub systems. Hence use OS monitoring tools (sar -d, iostat) to check the write performance. To improve the average_wait time you can consider the following,
If the data files reside on raw devices use asynchronous writes. However if the data files reside on cooked file systems use synchronous writes with direct IO.
Note: If the average_wait time for db file parallel write is high then you may see that the system waits on free buffer waits event.
'db file sequential read' occurs during
single block read (Reading index blocks, row fetch by row id).
Tuning Options
1. Find the top SQL with high physical reads (AWR or Statspack).
Analyze the objects for better Execution plans.
Use more selective index.
Rebuild the indexes if it is fragmented.
Use Partition if possible.
2. Find the I/O Statistics
Check hot disks using V$filestat.
Move datafiles to avoid contention to a single disk.
3. Try to increase the Buffer Cache
In 9i, use buffer cache advisory and in 10g use ASSM to determine the optimal size for buffer cache.
Check for hot segments and place it in the Keep Pool.
Tuning Options
1. Find the top SQL with high physical reads (AWR or Statspack).
Analyze the objects for better Execution plans.
Use more selective index.
Rebuild the indexes if it is fragmented.
Use Partition if possible.
2. Find the I/O Statistics
Check hot disks using V$filestat.
Move datafiles to avoid contention to a single disk.
3. Try to increase the Buffer Cache
In 9i, use buffer cache advisory and in 10g use ASSM to determine the optimal size for buffer cache.
Check for hot segments and place it in the Keep Pool.
'db file scattered read' occurs during
multiblock read (Full table Scan, Index Fast Full Scans).
Tuning Options
1. Check for SQL that performs Full scans. Tune for optimal plans.
2. If the multiblock scans are due to optimal plans then increase the init parameter DB_FILE_MULTIBLOCK_READ_COUNT (up to 9i). Set this parameter to 0 (automatic tuning) in 10g.
3. Use Partitions if possible.
Tuning Options
1. Check for SQL that performs Full scans. Tune for optimal plans.
2. If the multiblock scans are due to optimal plans then increase the init parameter DB_FILE_MULTIBLOCK_READ_COUNT (up to 9i). Set this parameter to 0 (automatic tuning) in 10g.
3. Use Partitions if possible.
When you find a session waiting for
either sequential read or scattered read, it might be useful to find which
object is being accessed for further tuning.
To find the object and the block number the session is accessing,
SQL> Select SID, Event, P1 File#, p2 Block#, p3 “Blocks Fetched”,
wait_time, seconds_in_wait, state
From V$Session_Wait
Where Sid in (Select Sid From V$Session where osuser != ‘oracle’
and status = ‘ACTIVE’);
From the above query get the file# and the block#.
To find the name of the file, issue the following query.
SQL> SELECT tablespace_name, file_name FROM dba_data_files
WHERE file_id = &File#;
To find the object, issue the following query.
SQL> SELECT owner , segment_name , segment_type, partition_name
FROM dba_extents
WHERE file_id = &File#
AND &Block# BETWEEN block_id AND block_id + blocks -1 ;
To find the object and the block number the session is accessing,
SQL> Select SID, Event, P1 File#, p2 Block#, p3 “Blocks Fetched”,
wait_time, seconds_in_wait, state
From V$Session_Wait
Where Sid in (Select Sid From V$Session where osuser != ‘oracle’
and status = ‘ACTIVE’);
From the above query get the file# and the block#.
To find the name of the file, issue the following query.
SQL> SELECT tablespace_name, file_name FROM dba_data_files
WHERE file_id = &File#;
To find the object, issue the following query.
SQL> SELECT owner , segment_name , segment_type, partition_name
FROM dba_extents
WHERE file_id = &File#
AND &Block# BETWEEN block_id AND block_id + blocks -1 ;
The 'log file parallel write' event
is caused by the Log writer (LGWR) process. The LGWR writes the redo buffer to
the online redo log files . It issues a series of write calls to the system IO.
The LGWR waits for the writes to complete on log file parallel write. A slow
LGWR process can introduce log file sync waits which makes the user to
experience wait times during commit or rollback. The log file parallel write
and log file sync wait events are interrelated and must be dealt
simultaneously.
If the average_wait time is high (above 10 milliseconds) it indicates that the system IO throughput is slow. To improve the average_wait time follow the same techniques used in db file parallel write wait event.
Tuning options:
1. Avoid running hot backups during peak hours.
2. Check for high commit sessions and try to change the application logic to commit less frequently. Use the following queries to find high commit sessions,
SQL> select sid, value from v$sesstat
where statistic# = select statistic# from v$statname where name = 'user commits') order by 2 desc;
A high redo wastage also indicates high frequency commits
SQL> select b.name, a.value, round(sysdate - c.startup_time) days_old
from v$sysstat a, v$statname b, v$instance c
where a.statistic# = b.statistic#
and b.name in ('redo wastage','redo size');
If the average_wait time is high (above 10 milliseconds) it indicates that the system IO throughput is slow. To improve the average_wait time follow the same techniques used in db file parallel write wait event.
Tuning options:
1. Avoid running hot backups during peak hours.
2. Check for high commit sessions and try to change the application logic to commit less frequently. Use the following queries to find high commit sessions,
SQL> select sid, value from v$sesstat
where statistic# = select statistic# from v$statname where name = 'user commits') order by 2 desc;
A high redo wastage also indicates high frequency commits
SQL> select b.name, a.value, round(sysdate - c.startup_time) days_old
from v$sysstat a, v$statname b, v$instance c
where a.statistic# = b.statistic#
and b.name in ('redo wastage','redo size');
http://oracledba-vinod.blogspot.com/search/label/Wait%20Events
No comments:
Post a Comment