June 18, 2009

Varchar Vs NVarchar

Varchar is a single-byte character and nvarchar is a double-byte character.
The obvious point that the same text takes up twice as much space in the database as nvarchar - because the unicode character set uses 2 bytes per character. So unless you need funny characters it is better to use varchar. This is not just because of database space - obviously all data being pulled back to the client is doubled up too - so 100 chars of varchar = about 100 bytes but it would be 200 bytes for nvarchar doubling network loads.
It deals with the performance hit you can get when SQL Server implicitly converts your data to nchar and nvarchar for you when you have enabled Unicode support in your data sources.
If you are selecting against a char or varchar column in your database with a Unicode value in an nchar or nvarchar variable, it uses Clustered Index Scan in execution plan since every value in the table must be converted before it can be compared.
i.e. CONVERT_IMPLICIT(nvarchar(50),[myvarchar],0)=[@myparam]
While varchar uses Clustered Index seek.

June 04, 2009

What is a Bookmark Lookup in Sql Server?

A bookmark lookup is the process of finding the actual data in the SQL table, based on an entry found in a non-clustered index.
When you search for a value in a non-clustered index, and your query needs more fields than are part of the index leaf node (all the index fields, plus any possible INCLUDE columns), then SQL Server needs to go retrieve the actual data page(s) - that's what's called a bookmark lookup.
In some cases, that's really the only way to go - only if your query would require just one more field (not a whole bunch of 'em), it might be a good idea to INCLUDE that field in the non-clustered index. In that case, the leaf-level node of the non-clustered index would contain all fields needed to satisfy your query (a "covering" index), and thus a bookmark lookup wouldn't be necessary anymore.
It's a NESTED LOOP which joins a non-clustered index with the table itself on a row pointer.
Happens for the queries like this:SELECT col1
FROM table
WHERE col2 BETWEEN 1 AND 10
, if you have an index on col2.
The index on col2 contains pointers to the indexed rows.
So, in order to retrieve the value of col1, the engine needs to scan the index on col2 for the key values from 1 to 10, and for each index leaf, refer to the table itself using the pointer contained in the leaf, to find out the value of col1.
The Bookmark Lookup operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

When a non-indexed column is required in the SELECT list, the actual table row must be accessed to retrieve the column. This is called a bookmark lookup, and it can be a very expensive operation on a large table. I have often seen bookmark lookups as the bottleneck on a poor performing query. So, depending on the queries in the application, it could be beneficial to have quite a few columns in the index to avoid bookmark lookups.
Avoiding bookmark lookups and table scans are two reasons you may have for wanting extra columns as part of the index. Another thing to consider is that indexes take up substantial disk space and must be updated whenever data changes are made to the table. Indexes must be finely tuned and not just arbitrarily created to avoid a negative impact on performance.

To avoid Bookmark Lookup use Included Columns in SQL Server 2005
Key Column Size Limitations :
The key columns of an index may not exceed 16 columns or a key length of 900 bytes. Using more than 16 columns in an index seems excessive to me, but maybe there are some business or performance reasons to do this based on the application. The 900 byte limitation means that the sum of the maximum possible width of each of the columns in the index must not exceed 900 bytes. For example, an int column takes up 4 bytes. The number of bytes in a char or varchar column is equal to the number of characters the column can hold. The number of bytes in an nchar or nvarchar column is twice the number of characters.

By adding columns that can not be added to the index key as Included Columns, you can overcome the index key size limitations. Included columns are not counted in the 16 column or 900 byte limits. You can also add columns of the new large value data types, such as varchar(max), as an Included Column. The Included Columns are not stored at the intermediate levels of the index, only at the leaf level. Even though they are not stored at the intermediate levels, Included Columns still improved the performance of several queries I tested when the column was used in the WHERE clause along with the first column in the index. Bookmark lookups are avoided when Included Columns are part of the SELECT list.
A column of any data type except for text, ntext and image can be an Included Columns. Computed columns can also be used. The column can not be dropped while it is an Included Column.
USE AdventureWorks
GO
--Add a new large column to the table
ALTER TABLE Sales.SalesOrderDetail
ADD TestIndex char(500) NULL
GO

--Update the column with some data
UPDATE Sales.SalesOrderDetail
SET TestIndex = replicate(cast(SalesOrderID as varchar) +
cast(SalesOrderDetailID as varchar) +
coalesce(CarrierTrackingNumber,''),5)
GO

--Drop the index on the ProductID
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )
GO
exec sp_spaceused 'Sales.SalesOrderDetail'
--Recreate the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)
GO
--Before the new column is added to the index
exec sp_spaceused 'Sales.SalesOrderDetail'
GO

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
--Add the new column to the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID],
[TestIndex]
)
GO

exec sp_spaceused 'Sales.SalesOrderDetail'

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO

--Add the column as an included colum
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)INCLUDE ( [TestIndex])


exec sp_spaceused 'Sales.SalesOrderDetail'

GO
--Reverse all changes
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
ALTER TABLE Sales.SalesOrderDetail DROP COLUMN TestIndex
GO
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)

http://www.codersrevolution.com/index.cfm/2009/2/14/SQL-Server-Performance-How-Are-My-Indexes-Being-Using

http://www.sql-server-performance.com/articles/clustering/create_nonkey_columns_p1.aspx

What is a Bookmark Lookup in Sql Server?

A bookmark lookup is the process of finding the actual data in the SQL table, based on an entry found in a non-clustered index.
When you search for a value in a non-clustered index, and your query needs more fields than are part of the index leaf node (all the index fields, plus any possible INCLUDE columns), then SQL Server needs to go retrieve the actual data page(s) - that's what's called a bookmark lookup.
In some cases, that's really the only way to go - only if your query would require just one more field (not a whole bunch of 'em), it might be a good idea to INCLUDE that field in the non-clustered index. In that case, the leaf-level node of the non-clustered index would contain all fields needed to satisfy your query (a "covering" index), and thus a bookmark lookup wouldn't be necessary anymore.
It's a NESTED LOOP which joins a non-clustered index with the table itself on a row pointer.
Happens for the queries like this:SELECT col1
FROM table
WHERE col2 BETWEEN 1 AND 10
, if you have an index on col2.
The index on col2 contains pointers to the indexed rows.
So, in order to retrieve the value of col1, the engine needs to scan the index on col2 for the key values from 1 to 10, and for each index leaf, refer to the table itself using the pointer contained in the leaf, to find out the value of col1.
The Bookmark Lookup operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

When a non-indexed column is required in the SELECT list, the actual table row must be accessed to retrieve the column. This is called a bookmark lookup, and it can be a very expensive operation on a large table. I have often seen bookmark lookups as the bottleneck on a poor performing query. So, depending on the queries in the application, it could be beneficial to have quite a few columns in the index to avoid bookmark lookups.
Avoiding bookmark lookups and table scans are two reasons you may have for wanting extra columns as part of the index. Another thing to consider is that indexes take up substantial disk space and must be updated whenever data changes are made to the table. Indexes must be finely tuned and not just arbitrarily created to avoid a negative impact on performance.

To avoid Bookmark Lookup use Included Columns in SQL Server 2005
Key Column Size Limitations :
The key columns of an index may not exceed 16 columns or a key length of 900 bytes. Using more than 16 columns in an index seems excessive to me, but maybe there are some business or performance reasons to do this based on the application. The 900 byte limitation means that the sum of the maximum possible width of each of the columns in the index must not exceed 900 bytes. For example, an int column takes up 4 bytes. The number of bytes in a char or varchar column is equal to the number of characters the column can hold. The number of bytes in an nchar or nvarchar column is twice the number of characters.

By adding columns that can not be added to the index key as Included Columns, you can overcome the index key size limitations. Included columns are not counted in the 16 column or 900 byte limits. You can also add columns of the new large value data types, such as varchar(max), as an Included Column. The Included Columns are not stored at the intermediate levels of the index, only at the leaf level. Even though they are not stored at the intermediate levels, Included Columns still improved the performance of several queries I tested when the column was used in the WHERE clause along with the first column in the index. Bookmark lookups are avoided when Included Columns are part of the SELECT list.
A column of any data type except for text, ntext and image can be an Included Columns. Computed columns can also be used. The column can not be dropped while it is an Included Column.
USE AdventureWorks
GO
--Add a new large column to the table
ALTER TABLE Sales.SalesOrderDetail
ADD TestIndex char(500) NULL
GO

--Update the column with some data
UPDATE Sales.SalesOrderDetail
SET TestIndex = replicate(cast(SalesOrderID as varchar) +
cast(SalesOrderDetailID as varchar) +
coalesce(CarrierTrackingNumber,''),5)
GO

--Drop the index on the ProductID
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )
GO
exec sp_spaceused 'Sales.SalesOrderDetail'
--Recreate the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)
GO
--Before the new column is added to the index
exec sp_spaceused 'Sales.SalesOrderDetail'
GO

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
--Add the new column to the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID],
[TestIndex]
)
GO

exec sp_spaceused 'Sales.SalesOrderDetail'

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO

--Add the column as an included colum
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)INCLUDE ( [TestIndex])


exec sp_spaceused 'Sales.SalesOrderDetail'

GO
--Reverse all changes
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
ALTER TABLE Sales.SalesOrderDetail DROP COLUMN TestIndex
GO
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)

Microsoft SQL Server 2005 : MIRRORING, CLUSTERING, OR REPLICATION

Microsoft SQL Server 2005 offers four solutions to mitigate the risk of hardware failure. In order of data-availability and minimized data-loss, your choices are log shipping, replication, and mirroring or clustering.

Log shipping :
In log shipping, log backups from a primary database server are ‘shipped’ to a secondary database server at an alterable, pre-determined interval. As the logs arrive at the secondary server, they are ‘applied’ to the database. Before a failover can occur, all unrestored logs must first be manually applied to the secondary server.
The advantage of log shipping is that the same logs can be applied to multiple standby servers for multiple purposes, and is an integral component of any disaster recovery plan. The disadvantage is that failover cannot be automated, that downtime can be lengthy, and that there is a distinct possibility of data-loss depending on the shipping interval.
Log shipping is supported in all versions of SQL Server 2005, excepting ‘Express’ and ‘Compact’ editions.
Replication :
Replication is an automated distribution of data to one or more receiving nodes. Microsoft refers to this relationship as a publish-subscribe model—a Publisher (the primary database) pushes its information to a secondary database server called a Subscriber.
Subscriber nodes are available for reporting, data warehousing, and backup purposes, and should be considered for any application where the strain of these tasks should not be performed by the primary online node.
For online applications that are very lookup-intensive (e.g., searches), connections can be sent to a farm of Subscriber nodes. In that same scenario, database writes (e.g., ecommerce purchases) can be sent to the Publisher for processing. This application architecture ensures proper scalability in the database layer, and is a very effective load-balancing mechanism.
SQL Server 2005 supports three types of replication: snapshot, transactional, and merge. Snapshot replication is a good choice when data changes infrequently or the dataset is small. Merge replication is appropriate when multiple environments work independently and on separate sets of data, which later need to be re-integrated (merged). Transactional replication is the best choice when data changes frequently and needs to be pushed to subscribers in near-real-time.
Replication is supported by all editions of SQL Server 2005, but only ‘publishing’ is not available in the ‘Express’ or ‘Compact’ editions.

Mirroring :
Database mirroring is new in SQL Server 2005. Mirroring is a form of database clustering to keep an exact real-time copy of the database on a second server. This setup requires three servers: a Principal, a Mirror, and a Witness. The Witness brokers the relationship between the Principal and the Mirror, and initiates a seamless automated failover when necessary.
The advantage of mirroring is that it is a cost-effective solution when only one or very few databases need to be maintained, and the entire dataset fits onto the servers’ disk drives. The drawback is that the Mirror cannot be accessed for reporting or backups (only snapshots). Additionally, Mirroring is not a server but a database clustering technology. This means that the Mirror server must have all the proper logins, SQL Agent jobs, SQL Server Integration Services, and other supporting components and configurations. Lastly, Mirroring is not appropriate for inter-dependent databases, as only one database might failover, breaking the dependencies.

Failover Clustering:
Failover clustering is as much a function of the underlying operating system as it is SQL Server. In failover clustering for SQL Server 2005, the advantages and caveats flip-flop when compared to database mirroring. Failover clustering has higher hardware and resource requirements—such as shared external storage (a SAN)—but can accommodate a much higher volume of data and databases.

Windows Server 2003, with Microsoft Cluster Service (MSCS) supports Active/Passive and Active/Active cluster topologies. The “Standard” edition of SQL Server 2005 supports only two nodes in the cluster, whereas “Enterprise” supports up to eight active nodes. An Active/Passive cluster only supports one instance of a database. Should the Active node fail, the Passive node will take over the workload. Up until then, the Passive node has been sitting around idly with nothing to do, except for receiving transaction updates from the Active server. However, in Active/Passive clustering the financial benefit is that the Passive node does not require a separate database license, yet can be used for backup and reporting purposes of the idle database.
Active/Active clustering supports multiple database instances, meaning that individual nodes in the cluster can be online and actively collecting and manipulating data, and also failover to another node in the cluster. In an Active/Active cluster all nodes need to be licensed.

· Replication would be primarily suited for online-offline data synchronization scenarios (laptop , mobile devices, remote servers).
· Log shipping could be used to have a failover server with manual switching.
· Database Mirroring is an automatic failover technique
· Failover Clustering is an advanced type of database mirroring.

Failover clustering is an availability technology that provides redundancy at the hardware level and is built on top of Windows Clustering technology, i.e. it is not specific to SQL Server.
For example, the processor blows up on Server A. Fortunately Server A is part of a SQL Server Cluster and so Server B takes over the job of providing the SQL Server Service, within a matter of seconds. All of this occurs automatically and is transparent to the database users and or application being served by the cluster.
The main difference between Database Mirroring and clustering is that SQL Clustering provides redundancy at the instance level whereas database mirroring provides redundancy at the database level.
Log shipping is considered more of a redundancy technology.
For example, it can be used to provide a full copy of your primary environment, typically used as a warm standby that can be manually brought online. This can be used to provide additional redundancy to your backup strategy. Log shipping can also be used to offload reporting from a primary server by creating a read only copy of the production database at an alternative location/server.
Replication is quite a diverse technology and can be used to cater for a number of different scenarios, the choice of which will determine the specific type of replication that is implemented.
For example, merge replication can be used to support distributed processing by spreading the workload of an application across several servers, i.e. distributed processing architectures.
Merge replication often requires an application that is relatively aware of its environment. Techniques such as conflict resolution also have to be taken into consideration in order to ensure data consistency across the entire integrated environment.
Transactional Replication can be used in a similar fashion to log shipping however you can limit the specific objects that are replicated to the subscriber. This can be useful if only a subset of tables is required for reporting purposes.

A failover cluster is a combination of one or more physical disks in a Microsoft Cluster Service (MSCS) cluster group, known as a resource group, that are participating nodes of the cluster. The resource group is configured as a failover clustered instance that hosts an instance of SQL Server. A SQL Server failover clustered instance appears on the network as if it were a single computer, but has functionality that provides failover from one node to another if one node becomes unavailable.
Failover clusters provide high-availability support for an entire Microsoft SQL Server instance, in contrast to database mirroring, which provides high-availability support for a single database. Database mirroring works between failover clusters and, also, between a failover cluster and a nonclustered host.

Typically, when mirroring is used with clustering, the principal server and mirror server both reside on clusters, with the principal server running on the failover clustered instance of one cluster and the mirror server running on the failover clustered instance of a different cluster. You can establish a mirroring session in which one partner resides on the failover clustered instance of a cluster and the other partner resides on a separate, unclustered computer, however.
If a cluster failover makes a principal server temporarily unavailable, client connections are disconnected from the database. After the cluster failover completes, clients can reconnect to the principal server on the same cluster, or on a different cluster or an unclustered computer, depending on the operating mode.
When deciding how to configure database mirroring in a clustered environment, the operating mode you use for mirroring is significant.
High-Safety mode Session with Automatic Failover
If you intend to mirror a database in high-safety mode with automatic failover, a two-cluster configuration is recommended for the partners. This configuration provides maximum availability. The witness can reside either on a third cluster or on an unclustered computer.
If the node running the current principal server fails, automatic failover of the database begins within a few seconds, while the cluster is still failing over to another node. The database mirroring session fails over to the mirror server on the other cluster or unclustered computer, and the former mirror server becomes the principal server. The new principal server rolls forward its copy of the database as quickly as possible and brings it online as the principal database. After the cluster failover completes, which typically takes several minutes, the failover clustered instance that was formerly the principal server becomes the mirror server
The following illustration shows an automatic failover between clusters in a mirroring session running in high-safety mode with a witness (which supports automatic failover).
The three server instances in the mirroring session reside on three distinct clusters: Cluster_A, Cluster_B, and Cluster_C. On each cluster, a default instance of SQL Server is running as a SQL Server failover clustered instance. When the mirroring session starts, the failover clustered instance on Cluster_A is the principal server, the failover clustered instance on Cluster_B is the mirror server, and the failover clustered instance on Cluster_C is the witness in the mirroring session. Eventually, the active node on Cluster_A fails, which causes the principal server to become unavailable.
Before the cluster has time to fail over, the loss of the principal server is detected by the mirror server, with the help of the witness. The mirror server rolls forward its database and brings it online as the new principal database as quickly as possible. When Cluster_A finishes failing over, the former principal server is now the mirror server, and it synchronizes its database with the current principal database on Cluster_B.
High-Safety Mode Session Without Automatic Failover
If you are mirroring a database in high-safety mode without automatic failover, another node in the cluster will act as the principal server if the node running the current principal server fails. Note that while the cluster is unavailable, the database is unavailable.
High-Performance Mode Session
If you intend to mirror a database in high-performance mode, consider placing the principal server on the failover clustered instance of a cluster and placing the mirror server on an unclustered server in a remote location. If the cluster fails over to a different node, the failover clustered instance will continue as the principal server in the mirroring session. If the entire cluster has problems, you can force service onto the mirror server

June 03, 2009

SQL Server performance tips

Great set of tips for SQL Server performance :

1. Does your SQL statement have a WHERE clause?
I know this sounds obvious, but don't retrieve more data than you need. However, less obvious is that even if your SELECT statement retrieves the same quantity of data without a WHERE clause, it may run faster with one.

2. Is SELECT DISTINCT being used properly?
Again, pretty obvious, but using SELECT DISTINCT where no duplicate records are being returned is an unnecessary performance hit. If you are getting duplicate records, first double check your table joins as this is often the cause and only use the DISTINCT clause if you really need it.

3. Are you using UNION instead of UNION ALL?
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it is much quicker.

4. Are your stored procedures prefixed with 'sp_'?
Any stored procedures prefixed with 'sp_' are first searched for in the Master database rather than the one it is created in. This will cause a delay in the stored procedure being executed.

5. Are all stored procedures referred to as dbo.sprocname?
When calling a stored procedure you should include the owner name in the call, i.e. use EXEC dbo.spMyStoredProc instead of EXEC spMyStoredProc.Prefixing the stored procedure with the owner when executing it will stop SQL Server from placing a COMPILE lock on the procedure while it determines if all objects referenced in the code have the same owners as the objects in the current cached procedure plan.

6. Are you using temporary tables when you don't need to?
Although there is sometimes a benefit of using temporary tables, generally they are best eliminated from your stored procedure. Don't assume that retrieving data multiple times is always less efficient than getting the data once and storing it in temporary table as often it isn't. Consider using a sub-query or derived table instead of a temporary table (see examples below). If you are using a temporary table in lots of JOINS in you stored procedure and it contains loads of data, it might be beneficial to add an index to your temporary table as this may also improve performance.
An example of a derived table instead of a temporary table

SELECT COLUMN1, COLUMN2, COUNTOFCOL3
FROM A_TABLE A
INNER JOIN (SELECT COUNT(COLUMN3) AS COUNTOFCOL3,
COLUMN2
FROM B_TABLE B
INNER JOIN C_TABLE C ON B.ID = C.ID
) ON A.ID = B.ID

7. Are you using Cursors when you don't need to?
Cursors of any kind slow down SQL Server's performance. While in some cases they are unavoidable, often there are ways to remove them from your code.Consider using any of these options instead of using a cursor as they are all faster:
Derived tables
Sub-queries
CASE statements
Multiple queries
Temporary tables

8. Are your Transactions being kept as short as possible?
If you are use SQL transactions, try to keep them as short as possible. This will help db performance by reducing the number of locks. Remove anything that doesn't specifically need to be within the transaction like setting variables, select statements etc.

9. Is SET NO COUNT ON being used?
By default, every time a stored procedure is executed, a message is sent from the server to the client indicating the number of rows that were affected by the stored procedure. You can reduce network traffic between the server and the client if you don't need this feature by adding SET NO COUNT ON at the beginning of your stored procedure.

10. Are you using IN or NOT IN when you should be using EXISTS or NOT EXISTS?
If you are using IN or NOT IN in a WHERE clause that contains a sub-query you should re-write it to use either EXISTS, NOT EXISTS or perform a LEFT OUTER JOIN. This is because particularly the NOT IN statement offers really poor performance. The example below probably better explains what I mean: e.g. This SQL statement:
SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE A_TABLE.COLUMN2 NOT IN (SELECT A_TABLE2.COLUMN2FROM A_TABLE2)Could be re-written like this:
SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE NOT EXISTS (SELECT A_TABLE2.COLUMN2FROM A_TABLE2
WHERE A_TABLE.COLUMN2 = A_TABLE2.COLUMN2
)

11. Do you have a function that acts directly on a column used in a WHERE clause?
If you apply a function to a column used in the WHERE clause of your SQL statement, it is unlikely that the SQL statement will be able to make use of any indexes applied to that column.e.g.
SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE SUBSTRING (FIRSTNAME,1,1) = 'm'Could be re-written:
SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE FIRSTNAME LIKE = 'm%'Where you have a choice of using the IN or BETWEEN clausesUse the BETWEEN clause as it is much more efficiente.g. This SQL statement:
SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER IN (100, 101, 102, 103)Should be re-written like this:
SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER BETWEEN 100 AND 103

12. Are you doing excessive string concatenation in your stored procedure?
Where possible, avoid doing loads of string concatenation as it is not a fast process in SQL Server.

13. Have you checked the order of WHERE clauses when using AND?
If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written (assuming that no parenthesis have been used to change the order of execution). You may want to consider one of the following when using AND:
Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.
If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.

12. Have you checked that you are using the most efficient operators?
Often you don't have much of a choice of which operator you use in your SQL statement. However, sometimes there is an alternative way to re-write your SQL statement to use a more efficient operator. Below is a list of operators in their order of performance (with the most efficient first).
=
>, >=, <, <=
LIKE
<>

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...