Data Engineering with Avishkar: 4 W's of Production DBA

July 13, 2010

4 W's of Production DBA

Why My server is hung?
A SQL Report outage. Why did this happen?
Who is responsible for outage?
What steps you will do to resolve it ASAP?
What steps you need to do so that it should not happen again?
When issue will resolved?

Most of the production DBAs daily face one of these 4 W's Why,What,Who,When.

Below is a simple guide to tackle these 4 W's :

1. Identify Factors Affecting Performance

Application
Database Design
Microsoft SQL Server
Operating System
Hardware

Now you got answer of Who

2. Resolve Performance bottleneck:

SQL Coding Issues:
Excessive Scanning
Poor use of indexes/missing Indexes
Out of Date Table Statistics
‘Where’ clause not limiting record set enough
Excessive Recompiles
Long blocking durations (multi-connections)

Where:
Limit number of Columns
Avoid <>, OR, and not
Indexed fields
Avoid calculated columns
Avoid functions
Avoid redundant criteria

Joins
Avoid Join Hints
Always Join on indexed columns
Use selectivity from table with fewest rows
Avoid Bookmark lookups

Nested Loop Join
Common When One Table is VerySmall (Candidate outer table) and Other is Large and Indexed on Join Columns (Candidate inner table)

Merge Join
Method for Joins with Sorted Inputs, Usually Inputs are Sequenced on Join Key
Common When Both Tables are Indexed on join key ,Common When Join Sequence Needs to be Preserved on Output.

Hash Join
Join of Disproportionately Sized Tables
Does Not Require Tables to beProcessed in Join Column Sequence
Does Not Produce Ordered Result
Highly Applicable to Ad-hoc Queries
Indexes Can’t be Anticipated

Hardware Issues:

Typical CPU scaling issues
Symptoms:

Plan compilation and recompilations
Plan reuse < 90% is bad
Parallel queries
Parallel wait type cxpacket > 10% of total waits
High runnable tasks or sos_scheduler_yield waits

Causes:

Queries not parameterized
Inefficient Query plan
Not enough stored procedures
MAXDOP is not set to 1
Statistics not updated
Table scan, range scan
SET option changes within SP

Typical IO Scaling Issues:

Symptoms:
High average disk seconds per read (> 10 msec) and write (> 2 msec)

for spindle based devices
Top 2 values for wait stats are one of - ASYNCH_IO_COMPLETION,

IO_COMPLETION, LOGMGR, WRITELOG, PAGEIOLATCH_x

Causes:

Aggravated by Big IOs such as table scans (bad query plans)
Non covering indexes
Sharing of storage backend – combine OLTP and DW workloads
TempDB bottleneck
Too few spindles, HBA’s

Typical Blocking Issues

Symptoms:

High average row lock or latch waits
Will show up in
sp_configure “blocked process threshold” and Profiler “Blocked process

Report”
Top wait statistics are LCK_x. See sys.dm_os_wait_stats.

Causes:
Higher isolation levels
Index contention
Lock escalation
Slow I/O
Sequence number problem

Typical Memory Issues
Symptoms:

Page life expectancy < 300 secs
SQL Cache hit ratio < 99%
Lazy writes/sec constantly active
Out of memory errors

Causes:

Too many large scans (I/O)
Bad query plans
External (other process) pressure

1 comment:

Anonymous said...: I would like to exchange links with your site www.blogger.com
Is this possible?; August 2, 2010 at 4:13 AM

Data Engineering with Avishkar

July 13, 2010

4 W's of Production DBA

1 comment:

Creating DataFrames from CSV in Apache Spark

Search This Blog