November 17, 2008

Performance tuning : Beginner

The slow performance of stored procedures might have a lot of causes, like hardware, server memory, OS, network etc.
Primarily four parameters to monitor.
CPU : The amount of CPU time in milliseconds.
READS : A scalar counter, which shows number of logical reads performed by the server.
WRITES : Number of physical disk writes performed by the server.
DURATION : Amount of time taken to execute the batch or SP.
there are no specific rules to tune SQL statements. The ultimate goal is to reduce time and system resources.

Open the Management Studio and the stored procedure that needed to tune. Before running Profiler, execute the two commands below.
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
These two commands clears the machine cache and cleaned the buffers. This allows to get the Profiler readings from scratch.

We should take readings from SET STATISTICS IO ON and SET STATISTICS TIME ON in Management Studio.

  • Performance tuning few things:
    i) We should be careful about creating temporary tables, dynamic SQL and joining. Those things are the gold mine for the tuners as eliminating them can make your procedure much faster.
    ii) Be very careful about the functionality of the procedure as your procedure should return the same results after tuning as before.
  • iii) It is not enough that SSMS is returning the "Command(s) completed successfully." message. We should check all the tables used in the procedures. Whether they have the same number of rows and whether the rows are contain the same data. These are the kinds of checks we should perform after tuning.
    iv) We always want to see the execution plan for each DML statement. The execution plan is the road layout, based on which the optimizer will decide which path will take less time and resources.
    v) Try and avoid table scans. If the query optimizer chooses a table scan, it will scan the whole table, which is far more time consuming than index scan. If possible, create indexes where table scan is used (though based on the size of the table, optimizer sometimes decide to go for a table scan though there are indexes present for the column, but in this case table should contains a small number of records).
    vi) Always try to write all the SQL statements in ANSI format. Generally, ANSI syntax not reduces the logical Reads, but it is more helpful to understand.

November 13, 2008

SQL Server tough Interview questions

1. What is the output of the following query? SELECT 1 UNION SELECT NULL UNION SELECT '1'

2. What do you expect the result of the following query to be? No cheating, don't run until you've answered!WITH DATA (Numbers) AS(SELECT NULL UNION ALL SELECT NULL UNION ALLSELECT NULL UNION ALLSELECT 1 UNION ALL SELECT 2 UNION SELECT 3)SELECT COUNT(ALL Numbers) AS NULLNumberCount FROM DATA WHERE Numbers IS NULL

3. You are trying to run some scripts using SQLCMD but keep getting errors that your password has expired. SSMS is not installed on this machine. What should you do?

4. What does sp_validname do?

5. You receive some data from UI and need to populate table Customer if the name received from UI does not exist in database. Which of options below should be used for this purpose:1.IF NOT EXISTS ( SELECT * FROM Customer WHERE Name = @CustomerName ) INSERT INTO [Customer] ( [Name]) VALUES ( @CustomerName)2.INSERT INTO [Customer] ( [Name])SELECT @CustomerNameWHERE NOT EXISTS ( SELECT * FROM Customer WHERE Name = @CustomerName )

6. What will be the count against each category? Each row in the result is shown as a pair in the answers.CREATE TABLE #CATEGORY(CATID INT ,VAL1 INT)INSERT INTO #CATEGORY VALUES(1,NULL)INSERT INTO #CATEGORY VALUES(2,1)INSERT INTO #CATEGORY VALUES(3,2)CREATE TABLE #DATA(VAL1 INT)INSERT INTO #DATA VALUES(1)INSERT INTO #DATA VALUES(1)INSERT INTO #DATA VALUES(1)INSERT INTO #DATA VALUES(2)INSERT INTO #DATA VALUES(2)INSERT INTO #DATA VALUES(3)SELECT C.CATID, COUNT(*) FROM #DATA D INNER JOIN #CATEGORY C ON C.VAL1 = D.VAL1 OR C.VAL1 IS NULLGROUP BY C.CATID

7. Cursor optimization we all know that cursors are slow on SQL Server and should be avoided when possible. However, there are a few cases where they can't be avoided, for instance because an external process has to be invoked for each row, or because you run into one of the very rare problems where a cursor actually runs faster than a set-based solution.So, suppose you find yourself in a situation where you do need a cursor - which cursor option must be specified in order to assure maximum performance, assuming the cursor fits in cache?

8. You are developing security policy for your SQL Servers and have all of the data entry clerks needing access to a series of tables. You create a Data Entry role and assign the proper permissions as well as add the users.
You then find out that Bob is a part of the HR group because of cross training and needs to use the same objects from the same application except for the Vacation table, to which he should not have access. No column permissions are assigned. What should you do?

9. What is the difference between a WHERE clause and a HAVING clause?

10. A developer is building an application against SQL Server 2000 and comes to see you regarding his design. This portion of the application is providing reporting on a series of clients that have been selected. His design is as follows:
Display a list of clients that can be selected
Loop through the list of clients and call a stored procedure to get data for each client (same data for each
Asemble this information and display a grid to the user To improve performance, what suggestions would you give the developer?

11. I want to append output data from a query into an existing file, which command prompt tool will help me?

12. If I create a database in simple recovery mode in SQL Server 2005, then backup the database, then add 1000 records, then change the recovery mode to full, then add 500 more. Can I recover using the log backups?

13. In SQL Server 2008, what is the maximum number of table sources that you can have in a single SELECT query?

14. What does the CURRENT_TIMESTAMP function return?

November 03, 2008

SQL Server Memory

Buffer Pool (BPool) Consists of several fragmented regions (up to 32) of address space used by SQL Server. This area is dynamic (the size can change) and is used by SQL Server for many internal processes, such as storing compiled and execution plans, creating indexes, and allocating for cursors.
MemToLeave Consists of a single contiguous region of addresses within the SQL Server process space that is set aside at startup. This area is static (the size cannot change) and is left unallocated by SQL Server for the use by components in the SQL Server process space, such as extended procedures, COM/OLE Automation objects, and linked servers. Memory allocations by SQL Server larger than 8 KB are also made from the MemToLeave area.
Reserved A region of address space that is set aside for future use of a process, and the OS has not used any physical RAM for this allocation.
Committed A region of address space that is currently being used by a process, and the OS has provided either physical RAM and/or paging file space for this allocation.
Allocation The act of giving the memory resource to the consumer.

SQL Server grows and shrinks its memory usage dynamically by committing and de-committing buffers from the buffers reserved at startup. The LazyWriter process is responsible for growing and shrinking the BPool. A committed bitmap array is maintained to track the commit or de-commit of buffers.
Growing the BPool :
When the LazyWriter wakes up, it checks if SQL Server needs more memory. If so, then the LazyWriter checks the committed bitmap to find a buffer with the bit set to off (0 or not committed) and locates the buffer. The buffer is then committed, and finally the bit in the committed bitmap is flipped to on (1 or committed).
Shrinking the BPool:
When the LazyWriter wakes up, it checks if the OS has sufficient memory available to service the memory requests from other applications. If not, then the LazyWriter looks for buffers that can be de-committed. If it finds such a buffer, it is de-committed and the 8 KB page is returned to SQL Server's reserved address space, which can be used in the OS.

DBCC MEMORYSTATUS
The DBCC MEMORYSTATUS command provides a snapshot of the current memory status of SQL Server. The output from this command is useful in troubleshooting issues that relate to the memory consumption of SQL Server or to specific out-of-memory errors

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...