June 04, 2009

What is a Bookmark Lookup in Sql Server?

A bookmark lookup is the process of finding the actual data in the SQL table, based on an entry found in a non-clustered index.
When you search for a value in a non-clustered index, and your query needs more fields than are part of the index leaf node (all the index fields, plus any possible INCLUDE columns), then SQL Server needs to go retrieve the actual data page(s) - that's what's called a bookmark lookup.
In some cases, that's really the only way to go - only if your query would require just one more field (not a whole bunch of 'em), it might be a good idea to INCLUDE that field in the non-clustered index. In that case, the leaf-level node of the non-clustered index would contain all fields needed to satisfy your query (a "covering" index), and thus a bookmark lookup wouldn't be necessary anymore.
It's a NESTED LOOP which joins a non-clustered index with the table itself on a row pointer.
Happens for the queries like this:SELECT col1
FROM table
WHERE col2 BETWEEN 1 AND 10
, if you have an index on col2.
The index on col2 contains pointers to the indexed rows.
So, in order to retrieve the value of col1, the engine needs to scan the index on col2 for the key values from 1 to 10, and for each index leaf, refer to the table itself using the pointer contained in the leaf, to find out the value of col1.
The Bookmark Lookup operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

When a non-indexed column is required in the SELECT list, the actual table row must be accessed to retrieve the column. This is called a bookmark lookup, and it can be a very expensive operation on a large table. I have often seen bookmark lookups as the bottleneck on a poor performing query. So, depending on the queries in the application, it could be beneficial to have quite a few columns in the index to avoid bookmark lookups.
Avoiding bookmark lookups and table scans are two reasons you may have for wanting extra columns as part of the index. Another thing to consider is that indexes take up substantial disk space and must be updated whenever data changes are made to the table. Indexes must be finely tuned and not just arbitrarily created to avoid a negative impact on performance.

To avoid Bookmark Lookup use Included Columns in SQL Server 2005
Key Column Size Limitations :
The key columns of an index may not exceed 16 columns or a key length of 900 bytes. Using more than 16 columns in an index seems excessive to me, but maybe there are some business or performance reasons to do this based on the application. The 900 byte limitation means that the sum of the maximum possible width of each of the columns in the index must not exceed 900 bytes. For example, an int column takes up 4 bytes. The number of bytes in a char or varchar column is equal to the number of characters the column can hold. The number of bytes in an nchar or nvarchar column is twice the number of characters.

By adding columns that can not be added to the index key as Included Columns, you can overcome the index key size limitations. Included columns are not counted in the 16 column or 900 byte limits. You can also add columns of the new large value data types, such as varchar(max), as an Included Column. The Included Columns are not stored at the intermediate levels of the index, only at the leaf level. Even though they are not stored at the intermediate levels, Included Columns still improved the performance of several queries I tested when the column was used in the WHERE clause along with the first column in the index. Bookmark lookups are avoided when Included Columns are part of the SELECT list.
A column of any data type except for text, ntext and image can be an Included Columns. Computed columns can also be used. The column can not be dropped while it is an Included Column.
USE AdventureWorks
GO
--Add a new large column to the table
ALTER TABLE Sales.SalesOrderDetail
ADD TestIndex char(500) NULL
GO

--Update the column with some data
UPDATE Sales.SalesOrderDetail
SET TestIndex = replicate(cast(SalesOrderID as varchar) +
cast(SalesOrderDetailID as varchar) +
coalesce(CarrierTrackingNumber,''),5)
GO

--Drop the index on the ProductID
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )
GO
exec sp_spaceused 'Sales.SalesOrderDetail'
--Recreate the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)
GO
--Before the new column is added to the index
exec sp_spaceused 'Sales.SalesOrderDetail'
GO

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
--Add the new column to the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID],
[TestIndex]
)
GO

exec sp_spaceused 'Sales.SalesOrderDetail'

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO

--Add the column as an included colum
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)INCLUDE ( [TestIndex])


exec sp_spaceused 'Sales.SalesOrderDetail'

GO
--Reverse all changes
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
ALTER TABLE Sales.SalesOrderDetail DROP COLUMN TestIndex
GO
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)

http://www.codersrevolution.com/index.cfm/2009/2/14/SQL-Server-Performance-How-Are-My-Indexes-Being-Using

http://www.sql-server-performance.com/articles/clustering/create_nonkey_columns_p1.aspx

No comments:

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...