tsunami

log in
history

SQL2005: cpu and disk resource logging

Luke Breuer
2010-04-17 01:09 UTC

introduction
When users complain that a database server is operating slowly, there can be many causes. Here are a few DB server-specific causes:
  1. CPU is pegged (by other users)
  2. disk I/O is pegged (by other users)
  3. there is locking contention
  4. the user is pulling lots of data over a slow connection
  5. the actual query is slow
Discerning which issue is dominating is not always easy. Therefore, I decided to create zs_Connection_Resource_Usage_Log in order to debug this problem. It helps to identify problems 1, 2, and 5. zs_Bandwidth_Log will help with problem 4. I have not yet written any logging functionality which targets 3 in particular.
caveats
The data inserted into zs_Connection_Resource_Usage_Log has two peculiarities:
  1. row_count can actually decrease for a given connection
  2. it stores continually-accruing values, such that a fancy self-join is needed to pull out incremental changes
#1 is probably due to the resource pooling issue below, but I haven't yet looked into details.

#2 is not strictly required — the "fancy self-join" is done in the script that populates the table, so the calculation could have been done there. I didn't do this because:
  1. I wasn't sure if these values would never decrease.
  2. when the insert is run for the first time (or after a long delay), the first rows of values might be very large, and would look out of place

Unlike the situation with zs_Bandwidth_Log, #1 has occurred. I do not yet know why, but I am guessing it has something to do with connection pooling. Because of #1, #2 is irrelevant.
SQL code
table definition
drop table dbo.zs_Connection_Resource_Usage_Log 
create table dbo.zs_Connection_Resource_Usage_Log (
    connection_id  uniqueidentifier  not null,
    
    cpu_time       int               not null,
    memory_usage   int               not null,
    logical_reads  bigint            not null,
    reads          bigint            not null,
    writes         bigint            not null,
    row_count      bigint            not null,
    
    input_date     datetime          not null,
    
    primary key clustered (input_date, connection_id),
) 
insert statement
(This can be put in a job that runs every minute.)
with Snapshot as (
    select  c.connection_id,
            s.cpu_time,
            s.memory_usage,
            s.logical_reads,
            s.reads,
            s.writes,
            s.row_count
    from    sys.dm_exec_connections c
    inner join sys.dm_exec_sessions s on s.session_id = c.session_id
), Existing as (
    select  connection_id,
            cpu_time,
            memory_usage,
            logical_reads,
            reads,
            writes,
            row_count,
            input_date,         
            row_number = row_number() over (partition by connection_id order by input_date desc)
    from    zs_Connection_Resource_Usage_Log
    where   input_date > getdate() - 7 -- using datediff, we don't get an index seek
)
insert  zs_Connection_Resource_Usage_Log
select  s.*, getdate()
from    Snapshot s
left join Existing e on e.connection_id = s.connection_id and e.row_number = 1
where   e.connection_id is null
    or  e.cpu_time      < s.cpu_time
    or  e.logical_reads < s.logical_reads
    or  e.reads         < s.reads
    or  e.writes        < s.writes
-- the WHERE clause above is required because sometimes these values _do_ decrease;
-- exactly why is unclear, but it might have to do with connection pooling
usage by minute
with Data as (
    select  input_date,
            cpu_time,
            logical_reads,
            reads,
            writes,
            row_count,
            row_number = row_number() over (partition by connection_id order by input_date),
            connection_id
    from    zs_Connection_Resource_Usage_Log 
), Grouped as (
    select  cur.input_date,
            cpu_time = sum(cur.cpu_time - isnull(prev.cpu_time,  0)),
            logical_reads = sum(cur.logical_reads - isnull(prev.logical_reads,  0)),
            reads = sum(cur.reads - isnull(prev.reads,  0)),
            writes = sum(cur.writes - isnull(prev.writes,  0)),
            row_count = sum(cur.row_count - isnull(prev.row_count,  0))
    from    Data cur
    left join Data prev on prev.connection_id = cur.connection_id and prev.row_number = cur.row_number - 1
    group by cur.input_date
)
select  top 600 *
from    Grouped
where   input_date > '4/10/10'
order by input_date desc