Applications getting Lock Timeout errors
SQL0911N The current transaction has been rolled back because of a deadlock or timeout
1.1
Error message
SQL0911N The current
transaction has been rolled back because of a deadlock or timeout. Reason code 68.
Reason code 68 occurs when transaction rolled back due to lock timeout.
To see this explanation use: db2 ? SQL0911N
1.2 Possible causes
Usually, this problem happens when an application hold locks in an object for too long. Another possibility is that value of LOCKTIMEOUT database configuration parameter is too low.
1.3 Investigating
Check if the monitor switches for LOCK, STATEMENT, TIMESTAMP and UOW are ON:
db2 get monitor switches
If any of these are OFF turn them ON using:
db2 update monitor switches using monitor ON
Step 1 : take an snapshot for locks
db2 get snapshot for locks \
on dbname | grep -E "Object Type|Table Schema|Table Name|Mode"
On the output, start looking for exclusive locks (X) on tables, for example:
Output:
Table Schema = SIVIEW
Table Name = FHCDATAHS_DATA
Mode = X
Save a snapshot for locks for the database:
db2 get snapshot for locks on dbname > slocks.out
Check with the user if he/she is trying to use any of the tables with X locks that you see on the report. If he/she says yes – investigate the application that is using that table. If the user doesn’t know which table is in use, search for the tables that have X locks on the report you saved before (slocks.out) and make notes of Application handle for that tables. For example, for table FHCDATAHS_DATA we have on slocks.out:
Output:
Application handle = 1104
Application ID = GA0A0221.A84E.050654183150
Sequence number = 0005
Application name = hswtdgmg
CONNECT Authorization ID = SIVIEW/FHCDATAHS_DATA
Object Type = Internal Variation Lock
Mode = S
Lock Name = 0x001A0002000000000000000054
Lock Attributes = 0x00000000
Release Flags = 0x00000004
Lock Count = 1
Hold Count = 0
Lock Object Name = 2
Object Type = Table
Tablespace Name = TBSFHCDATAHS_DATA
Table Schema = SIVIEW
Table Name = FHCDATAHS_DATA
Mode = X
You must investigate the application that is using the same table that the user is trying to use or the application that is holding locks for more time (check the output of db2 list application show detail). For this example, let’s consider that the culprit application is 1104.
Step 2: Check if the application holding locks is hanging or taking too long
db2 get snapshot for application agentid 1104 | grep -i idle
Output:
Application idle time = 40 minutes and 10 seconds
If you
enabled only monitor switches TIMESTAMP or UOW, you will not get the real Application
idle time.
db2 get snapshot for application agentid 1104 | grep -i row
Output:
Internal rows deleted = 0
Internal rows inserted = 0
Internal rows updated = 0
Rows deleted = 0
Rows inserted = 0
Rows updated = 0
Rows selected = 2119
Rows read = 5684397
Rows written = 25
Rows read = 5684396
Rows written = 25
Rows deleted = 0
Rows updated = 0
Rows inserted = 0
Rows fetched = 2119
Repeat this last snapshot a few times and check if the numbers are changing or not. If they are changing the application is not hanging, it can be progressing slowly.
If at this point you confirmed that the application is hanging or progressing slowly, save some snapshots that will be useful if the customer asks for an RCA (Root Cause Analysis) for the problem.
db2 get snapshot for application agentid 1104 > sapp1104.out
db2 get snapshot for locks for application agentid 1104 > slocks1104.out
If the application is hanging or slow go to Step 3. If not, go to Step 4.
1.4 Resolution
Step 3: You confirmed that the application is hanging or progressing slowly.
Check with user if the application can be forced down. If he/she agrees, force the
application. BE VERY CAREFULL with FORCE APPLICATION command
db2 “force application agentid (1104)”
This command will disconnect the application from the database. If the application is in Executing state, a rollback will be done, which may take some time.
Step 4: Applications holding locks will complete in a few minutes. On this case, the problem might be that the database configuration parameter LOCKTIMEOUT is set too low. Check the value for it using:
db2 get db cfg for <dbname> | grep LOCKTIMEOUT
Discuss with the application team how long the transactions usually take to complete for your application. Check for the recommendations for this parameter on the Information Center. If the decision is to change this parameter (the change should be done in a change window) use:
db2 update db cfg for <dbname> using LOCKTIMEOUT <value>
The new value for the parameter will take effect after the database is re-activated.