November 09, 2011

IDENT_CURRENT,@@IDENTITY and SCOPE_IDENTITY

IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope.

@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes.

SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope.

An INSERT trigger is defined on T1. When a row is inserted to T1, the trigger fires and inserts a row in T2. This scenario illustrates two scopes: the insert on T1, and the insert on T2 by the trigger.

Assuming that both T1 and T2 have identity columns, @@IDENTITY and SCOPE_IDENTITY will return different values at the end of an INSERT statement on T1. @@IDENTITY will return the last identity column value inserted across any scope in the current session. This is the value inserted in T2. SCOPE_IDENTITY() will return the IDENTITY value inserted in T1. This was the last insert that occurred in the same scope. The SCOPE_IDENTITY() function will return the null value if the function is invoked before any INSERT statements into an identity column occur in the scope.

Failed statements and transactions can change the current identity for a table and create gaps in the identity column values. The identity value is never rolled back even though the transaction that tried to insert the value into the table is not committed. For example, if an INSERT statement fails because of an IGNORE_DUP_KEY violation, the current identity value for the table is still incremented.

Be cautious about using IDENT_CURRENT to predict the next generated identity value. The actual generated value may be different from IDENT_CURRENT plus IDENT_INCR because of insertions performed by other sessions.


CREATE TABLE T1 (
ID int IDENTITY(1,1)PRIMARY KEY,
Name varchar(20) NOT NULL
)

INSERT T1
VALUES ('A')
INSERT T1
VALUES ('B')
INSERT T1
VALUES ('C')

CREATE TABLE T2 (
T2ID int IDENTITY(1,1)PRIMARY KEY,
ID int
)

CREATE TRIGGER Trig_T1
ON T1
FOR INSERT AS
BEGIN
INSERT into T2
select ID from inserted
END

FIRE the trigger and determine what identity values you obtain with the @@IDENTITY,IDENT_CURRENT and SCOPE_IDENTITY functions.*/
INSERT T1 VALUES ('Rosalie')

SELECT SCOPE_IDENTITY() AS [SCOPE_IDENTITY]
GO
SELECT @@IDENTITY AS [@@IDENTITY]
GO
SELECT IDENT_CURRENT ('T1') AS T1_Identity;
GO
SELECT IDENT_CURRENT ('T2') AS T2_Identity;

November 07, 2011

Hive - Data Warehousing & Analytics on Hadoop

Hive is an open source, peta-byte scale date warehousing framework based on Hadoop that was developed by the Data Infrastructure Team at Facebook that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.


Hive architecture:
Hive organizes data in tables and partitions. A good partitioning scheme allows Hive to prune data while processing a query and that has a direct impact on how fast a result of the query can be produced. Behind the scenes, Hive stores partitions and tables into directories in Hadoop File System (HDFS).


Hive comprises of the following major components:

-Metastore: To store the meta data.
-Query compiler and execution engine: To convert SQL queries to a sequence of map/reduce jobs that are then executed on Hadoop.
- SerDe and ObjectInspectors: Programmable interfaces and implementations of common data formats and types.
-UDF and UDAF: Programmable interfaces and implementations for user defined functions (scalar and aggregate functions).
-Clients: Command line client similar to Mysql command line and a web UI.

Data Flow into Hadoop Cloud:




For more Information:
http://www.vldb.org/pvldb/2/vldb09-938.pdf

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...