June 24, 2010

Generally we backed up tables and SPs for safer side with date suffix.
Below is cleaup script for those objects which might not be used after successfull deployment.

-- To drop all tables suffix with dates ( backup unused tables)

SELECT 'DROP TABLE '+s.name+'.'+o.name
FROM sys.schemas s
JOIN sys.objects o on s.schema_id=o.schema_id
AND o.type='U'
WHERE SUBSTRING(o.name,PATINDEX('%[^' + 'a-z._' + ']%',o.name),1) !=''

-- To drop all procedures suffix with dates

SELECT 'drop procedure '+s.name+'.'+o.name
FROM sys.schemas s
JOIN sys.objects o on s.schema_id=o.schema_id
AND o.type='P'
WHERE SUBSTRING(o.name,PATINDEX('%[^' + 'a-z._' + ']%',o.name),1) !=''

June 17, 2010

The difference between estimated and actual plans

Optimization takes place before execution, so in one sense, any query plan is an estimated plan. But when you request an actual plan, SQL Server will actually execute each statement as it displays the plan, so the plan for subsequent statements can change after the earlier statements are executed. In addition, SQL Server adds additional information to the actual plan, after the execution takes place. As mentioned last time, the actual plan includes the number of processors used to execute the query. It will also include the number of rows returned from each step of the plan as it is executed, and the number of times each step is executed. If you are executing a plan that has already been cached and is now being reused, the actual plan will include both the parameter values the plan was originally compiled with and the parameter values used during execution.

i) You cannot use estimated plans for temp tables, you will get an error because the plan for the SELECT cannot be generated when the temp table does not exist

ii) One of the main reasons that the actual plan may different from the estimated plan is because of data changes to your data. In general (and assuming you have the option 'auto update statistics' enabled) if more than 20% of the data in a table changes, the optimizer will detect stale statistics and update them automatically. The updated statistics will then trigger a recompile.

iii) parameterised stored procedure - The first time you run the procedure, a estimated plan will be built based on the first parameter. The second time you run the procedure, the original plan will be used, and the statistics should show that SQL Server is performing more reads than there are pages in the table.
Seeing the different values used for compilation and execution can lead you to suspect an issue with parameter sniffing.

You can request that SQL Server display ESTIMATED plans without executing the query with any of the following options:

SET SHOWPLAN_TEXT ON

SET SHOWPLAN_ALL ON

SET SHOWPLAN_XML ON

For graphical estimated plans, you can use the Query | Display Estimated Execution Plan menu option. This can be invoked with a toolbar button, or with Cntl-L.

You can request that SQL Server display actual plans with any of the following options:

SET STATISTICS PROFILE ON

SET STATISTICS XML ON

June 16, 2010

Indexes in one-to-many relationship between two tables

Whenever you have a one-to-many relationship between two tables, you'll have a foreign key reference in the table on the many side of the relationship.

In most database systems, the primary key is automatically indexed; that's not always the case with the foreign key.

To ensure best performance on joins, make sure that every foreign key in the database is indexed. The next candidate for indexing is any column that will be used for sorting — any column that will consistently be used in an ―order by‖ clause of a SQL query.

You'll also want to index columns that will be used for restricting the returned data set, such as those that consistently appear in ―where clauses.

June 08, 2010

SQL Server 2008 HIERARCHYID Data Type

Very often while a designing database you may need to create a table structure to store hierarchical data (Hierarchal data is defined as a set of data items related to one another in a hierarchy, that is, a parent node has a child and so forth). For example,
• To represent organizational hierarchy
• To represent a family tree
• To store BOM (Bills of Material) information which represents collection of sub-components which make up a final component.
Organizations have struggled in past with the representation of tree like structures in the databases, lot of joins lots of complex logic goes into the place.

Until SQL Server 2005, we had to create a self-referencing table to store hierarchical information, which gradually became more difficult and tedious to work with when hierarchical level grew to more than 2 or 3levels. The more levels, the more difficult to query as it required lot of joins, complex recursive logic. Though with the introduction of CTEs (Common Table Expression) in SQL Server 2005, we were not required to write iterative logic anymore to get a node (i.e. a Manager) and all its child nodes (i.e. all subordinates), but the performance of the CTE method is not that good.




Now the question is given a manager (ReportsTo) ID, get all employees who are directly or indirectly reporting to him, let's see how we can do this in different versions of SQL Server.

--SQL Server 2005
--Get all employees who directly/indirectly report to EmpID = 2
WITH EmployeeHierarchy(EmpID, EmpName, Manager, [Level])
AS
(
SELECT emp.EmpID, emp.EmpName, (SELECT EmpName FROM Employees_2000_2005 WHERE EmpID = emp.ReportsTo) , 1 AS [Level]
FROM Employees_2000_2005 AS emp
--Get all employees who directly/indirectly report to EmpID = 2
WHERE emp.EmpID = 2
UNION ALL
SELECT emp.EmpID, emp.EmpName, Parent.EmpName, Level+1
FROM Employees_2000_2005 AS emp
INNER JOIN EmployeeHierarchy AS Parent ON emp.ReportsTo = parent.EmpID
)
SELECT *
FROM EmployeeHierarchy

--Get all employees who directly/indirectly report to EmpID = 4
WITH EmployeeHierarchy(EmpID, EmpName, Manager, [Level])
AS
(
SELECT emp.EmpID, emp.EmpName, (SELECT EmpName FROM Employees_2000_2005 WHERE EmpID = emp.ReportsTo) , 1 AS [Level]
FROM Employees_2000_2005 AS emp
--Get all employees who directly/indirectly report to EmpID = 4
WHERE emp.EmpID = 4
UNION ALL
SELECT emp.EmpID, emp.EmpName, Parent.EmpName, Level+1
FROM Employees_2000_2005 AS emp
INNER JOIN EmployeeHierarchy AS Parent ON emp.ReportsTo = parent.EmpID
)
SELECT *
FROM EmployeeHierarchy

Even though SQL Server 2005, using CTE, simplified the process of retrieving hierarchical data from the SQL Server but still it has performance penalty. Not only this, the major drawback was still existent, I mean it was a nightmare to move child/sub-child records from one parent to another because of some organizational restructuring/realignment. Luckily with HIERARCHYID in SQL Server 2008 we would not be having those problems anymore. So now let's turn our attention to learn about this new system data type in details.

The new HIERARCHYID data type has been introduced to SQL Server 2008.
The following are the methods in the SQL Server 2008 Database Engine to support HierarchyID data type.

GetAncestor()

This method is useful to find the (nth ancestor of the given child node.

Syntax: child.GetAncestor ( n )

GetDescendant()

This method is very useful to get the descendant of a given node. It has a great significance in terms of finding the new descendant position get the descendants etc.

Syntax: parent.GetDescendant ( child1 , child2 )

This function returns one child node that is a descendant of the parent.

1.If parent is NULL, returns NULL.
2.If parent is not NULL, and both child1 and child2 are NULL, returns a child of parent.
3.If parent and child1 are not NULL, and child2 is NULL, returns a child of parent greater than child1.
4.If parent and child2 are not NULL and child1 is NULL, returns a child of parent less than child2.
5.If parent, child1, and child2 are all not NULL, returns a child of parent greater than child1 and less than child2.
6.If child1 or child2 is not NULL but is not a child of parent, an exception is raised.
7.If child1 >= child2, an exception is raised.

GetLevel()

This method is useful to find the Level of the current node.

Syntax: node.GetLevel ( )

This function will return an integer that represents the depth of this node in the current tree.

GetRoot()

This method will return the root of the hierarchy tree and this is a static method if you are using it within CLR.

Syntax: hierarchyid::GetRoot ( )

It will return the data type hierarchyID.

IsDescendant()

This method returns true/false (BIT) if the node is a descendant of the parent.

Syntax: parent.IsDescendant ( child )

Parse()

Parse converts the canonical string representation of a hierarchyid to a hierarchyid value. Parse is called implicitly when a conversion from a string type to hierarchyid occurs. Acts as the opposite of ToString(). Parse() is a static method.

Syntax: hierarchyid::Parse ( input )

Read()

Read reads binary representation of SqlHierarchyId from the passed-in BinaryReader and sets the SqlHierarchyId object to that value. Read cannot be called by using Transact-SQL. Use CAST or CONVERT instead.

Syntax: void Read( BinaryReader r )

Reparent()

This is a very useful method which helps you to reparent a node i.e. suppose if we want to align an existing node to a new parent or any other existing parent then this method is very useful.



Syntax: node.Reparent ( oldRoot, newRoot )

ToString()



This method is useful to get the string representation of the HierarchyID. The method returns a string that is a nvarchar(4000) data type.

Syntax: node.ToString ( )

Write()

Write writes out a binary representation of SqlHierarchyId to the passed-in BinaryWriter. Write cannot be called by using Transact-SQL. Use CAST or CONVERT instead.

Syntax: void Write( BinaryWriter w )


--First I will create a table which uses new data type
CREATE TABLE Employees_Ver2008
(
EmpID INT PRIMARY KEY NOT NULL IDENTITY,
EmpName VARCHAR(255) NOT NULL,
Title VARCHAR(255) NOT NULL,
HierarchicalLevel HIERARCHYID --Note the new data type here
)
GO
--Insert the Root node first in the table
--HIERARCHYID::GetRoot() is static method which returns the root node of a hierarchy
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('CTO', 'Chief Technology Officer', HIERARCHYID::GetRoot())
GO
--Let's see the data in the table
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString() AS [Position]
FROM Employees_Ver2008
GO


--Insert the first child node of the root node
--Get the root node we wish to insert a descendant of
DECLARE @CTONode HIERARCHYID
SELECT @CTONode = HIERARCHYID::GetRoot() FROM Employees_Ver2008
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('Manager_Dev1', 'Managar -Dev', @CTONode.GetDescendant(NULL, NULL))
GO

--Now let's insert the second child node of the root node
--Get the root node we wish to insert a descendant of
DECLARE @CTONode HIERARCHYID
SELECT @CTONode = HIERARCHYID::GetRoot() FROM Employees_Ver2008
-- Determine the last child position
DECLARE @LastChildPosition HIERARCHYID
SELECT @LastChildPosition = MAX(HierarchicalLevel) FROM Employees_Ver2008
WHERE HierarchicalLevel.GetAncestor(1) = @CTONode
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('Manager_Test1', 'Manager- Test', @CTONode.GetDescendant(@LastChildPosition, NULL))
GO

--Let's see the data in the table
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString() AS [Position] FROM Employees_Ver2008
GO

CREATE PROCEDURE AddEmployee_Ver2008
(@ReportsToID INT, @EmpName VARCHAR(255), @Title VARCHAR(255))
AS
BEGIN
-- Get the root node we wish to insert a descendant of
DECLARE @ReportsToNode HIERARCHYID
SELECT @ReportsToNode = HierarchicalLevel FROM Employees_Ver2008
WHERE EmpID = @ReportsToID
-- Determine the last child position
DECLARE @LastChildPosition HIERARCHYID
SELECT @LastChildPosition = MAX(HierarchicalLevel) FROM Employees_Ver2008
WHERE HierarchicalLevel.GetAncestor(1) = @ReportsToNode

INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES (@EmpName, @Title, @ReportsToNode.GetDescendant(@LastChildPosition, NULL))
END
GO
--Let's add remaining nodes of tree using created stored procedure
EXECUTE AddEmployee_Ver2008 2, 'Lead_D1', 'Team Lead- Dev'
and so on...


--Get all employees who directly/indirectly report to EmpID = 4
DECLARE @ReportsToID HIERARCHYID
SELECT @ReportsToID = HierarchicalLevel FROM Employees_Ver2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employees_Ver2008
WHERE HierarchicalLevel.IsDescendantOf(@ReportsToID)= 1




Now let's turn our attention from inserting records to retrieve it. The problem statement is:

Get all employees who directly/indirectly report to a particular Employee.
Get all managers of a given employee in all levels in the management chain until the root node.
--Get all employees who directly/indirectly report to EmpID = 4
DECLARE @ReportsToID HIERARCHYID
SELECT @ReportsToID = HierarchicalLevel FROM Employee_2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employee_2008
WHERE HierarchicalLevel.IsDescendantOf(@ReportsToID)= 1

--Get all managers of a given employee(EmpID=4) in all levels in
--the management chain until the root node
SELECT @ReportsToID = HierarchicalLevel FROM Employee_2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employee_2008
WHERE @ReportsToID.IsDescendantOf(HierarchicalLevel) = 1


Now let's see the last example of HIERARCHYID here. The requirement is Lead_D1 has left the job so all his direct reports now will report to new Lead_D3. To accomplish this task you need to write your script which will look something like this.

--Adding new Dev Lead
EXECUTE AddEmployees_Ver20082, 'Lead_D1','DEV Leader'
--Get the HierarchyID of New DEV Lead
DECLARE @NewDevLead HierarchyID= (SELECT HierarchicalLevel FROM Employees_Ver2008WHERE EmpID = @@IDENTITY)
--Get the HierarchyID of old DEV Lead
DECLARE @OldDevLead HierarchyID= (SELECT HierarchicalLevel FROM Employees_Ver2008WHERE EmpID = 4)
UPDATE Employee_2008
SET HierarchicalLevel = HierarchicalLevel.GetReparentedValue(@OldDevLead, @NewDevLead)
WHERE HierarchicalLevel.IsDescendantOf(@OldDevLead)= 1
--Parent is considered its own descendant, so you need to exclude it
AND HierarchicalLevel <> @OldDevLead


HIERARCHYID data type does not ensure uniqueness; I mean there's nothing stopping you from inserting a new row into the Employee table with the same position within the hierarchy as the other nodes unless you define unique constraint on the column.

The HIERARCHYID column can be indexed in two ways

Depth First Strategy: By default when you create an index on HIERARCHYID data type column, it uses Depth First Strategy. In a depth-first index strategy, rows in a sub tree are stored near each other. For example, all employees that report through a manager are stored near their managers' record.

Breadth First Strategy: A breadth-first strategy stores the rows at each level of the hierarchy together. For example, the records of employees who directly report to the same manager are stored near each other. Also to create a bread first index, the system needs to know the level of each record in table; for that purpose you can use GetLevel function of HIERARCHYID.

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...