April 25, 2016

Heap Table

Tables without clustered indexes are called heaps.
They’re scattered on disk anywhere that SQL Server can find a spot, and they’re not stored in any order whatsoever. This can make for really fast inserts – SQL Server can just throw the data down – but slow selects, updates, and deletes.
These tables have a index_id value of 0 in the sys.indexes catalog view
Cases where heaps perform better than tables with clustered indexes.

1.Fast inserts
2. Since there is no clustered index, additional time is not needed to maintain the index and there is not the need for additional space to store the clustered index tree.
3. If your heap table only has INSERTS occurring, your table will not become fragmented, since only new data is written.

Drawback of Heaps:
1. Specific data can not be retrieved quickly, unless there are also non-clustered indexes.
2. Data is not stored in any particular order.
3. Fragmentation:
Resolving the fragmentation for a Heap table not easy as there is no rebuilding or reorganizing your clustered index. You need to create a new table and insert data from the heap table into the new table based on some sort order or Export the data, truncate the table and import the data back into the table


In following scenarios, you should use Heap tables:
  • Tables containing OLTP transactions
  • Data Warehouse fact tables
  • Log tables of all kinds
  • Telco Call Detail Records (CDR)
  • Event from Web Servers and social media

March 31, 2016

How do you set a default value for a MySQL Datetime column?

How do you set a default value for a MySQL Datetime column?
In SQL Server, it's getdate(). What is the equivalant for MySQL?

The DEFAULT value clause in a data type specification indicates a default value for a column. With one exception, the default value must be a constant; it cannot be a function or an expression. This means, for example, that you cannot set the default for a date column to be the value of a function such as NOW() or CURRENT_DATE. The exception is that you can specify CURRENT_TIMESTAMP as the default for a TIMESTAMP column.

1. Create table

Create table test
(
  id integer,
  CreatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP);

Or  get around the problem using a trigger,Create trigger to set value everytime new row is inserted.

CREATE TRIGGER TRIG_test_CreatedOn

BEFORE INSERT
ON test
FOR EACH ROW

SET NEW.CreatedOn = NOW();

2. Existing Datetime column?

CREATE TRIGGER TRIG_test2_CreatedOn

BEFORE INSERT
ON test2
FOR EACH ROW

SET NEW.CreatedOn = NOW();

Or

alter table   test2  CHANGE COLUMN  CreatedOn CreatedOn TIMESTAMP NOT NULL
 DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;

March 17, 2016

QlikView Vs Qlik Sense

QlikView Vs Qlik Sense

Qlik Sense is not just a new release of QlikView. Instead it is something different. There are so many similarities and differences between the two products

The two products have the same analysis engine and old scripts and old formulae will (almost) always work exactly the same way as before. Both the two products both have the same Green-White-Gray logic; both use the same calculation engine; both have roughly the same response times; and you should use the same considerations for both when it comes to data modelling.

Differences

1. QlikView  used for  Guided Analytics of QlikView and QlikSense used for Self Service BI.
2. QlikSense has a responsive UI on any platform QlikView don't have.

Common string operations in SQL Server 2008

1. Find the last occurence of a character within a string in sql server 2008

DECLARE @StringA varchar(100), @pos int, @FindChar char(1)=','
SET @stringA = 'DirectoryA,'

SET @pos = LEN(@StringA) - CHARINDEX(',',REVERSE(@StringA))
SELECT @pos

2. How to remove special char from a string in sql server 2008

select replace(@stringA,',','')

3. How to remove last occurence  from a string in sql server 2008

Select SUBSTRING (@stringA, 1, len(@stringA)-1)

4. How to remove last occurence of comma from a string in sql server 2008

select case

when right(rtrim(@stringA),1) = ',' then substring(rtrim(@stringA),1,len(rtrim(@stringA))-1)

else @stringA END

March 16, 2016

Issues with MS SQL Server Standard edition in AWS

Issues with MS SQL Server Standard edition in AWS

1. Database Mirroring and index maintenance
MS SQL Server Standard edition in AWS allows  synchronous mirroring only.In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror’s transaction log, and thus led to a decrease in transaction throughput on the principal.

Solution: log stream compression

SQL Server 2008 introduces a new feature called "Database Mirroring Log Compression". With SQL Server 2008, the outgoing log stream from the principal to the mirror is compressed, thereby minimizing the network bandwidth used by database mirroring. In a network constrained for bandwidth, compressing the log stream helps improve the application performance and throughput.Log compression compression rates of at least 12.5% are achieved.


2. You can’t cluster in EC2… with SQL Server 2008 R2
You can’t cluster in EC2. There’s no shared storage. EBS volumes can only be attached to a single server at a time. You could share the drives, but you really shouldn’t do that. You could use some Windows based SAN system that mounts drives and pretends it’s a SAN, but without any guarantees of performance or longevity, why would you want to?


HA options : Log shipping, mirroring and transactional replication

3. Scaling SQL Server in  EC2: SCALING UP

There’s a finite limit to how much you can scale up SQL Server in EC2. That limit is currently 8 cores and 68.4GB of RAM. Those 8 cores currently are Intel Xeon X5550s that clock in at 2.66 GHz, but that will have changed by the time you’re reading this.

4.Backup or restore database:


Backup or restore database activity on AWS SQL Server standard edition with EBS disks have following issues.
1. Connections timeout
2. I/O requests taking longer than than usual
3. The timeouts was not just from the database being restored, but from all databases as  EBS drives get clogged up causing IO to bunch up severely.


Some recommendations:
1. It looks as though the disk reads are maxing out the EBS connection, doublecheck EC2 tuning settings?
2. Choose an EBS–optimized instance that provides more dedicated EBS throughput than your application needs; otherwise, the connection between Amazon EBS and Amazon EC2 can become a performance bottleneck.
You can enable EBS optimization for the other instance types that support EBS optimization when you launch the instances, or enable EBS optimization after the instances are running. 

Database Mirroring and index maintenance in AWS

Database Mirroring and index maintenance in AWS

Database Mirroring, the database has to be in the full recovery mode, so all index rebuilds are fully logged. It all comes down to the amount of transaction log generated and whether this causes a problem.

MS SQL Server Standard edition in AWS allows  synchronous mirroring only.In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror’s transaction log, and thus led to a decrease in transaction throughput on the principal.

Basically, the way mirroring works is by shipping all the transaction log from the principal to the mirror, and then replaying the log records on the mirror database. Network bandwidth can be a bottleneck for mirrored databases where the rate of transaction log generation is high enough to saturate the network. On synchronously mirrored databases (where a transaction cannot commit on the principal until all the transaction log for it has been hardened in the log of the mirror database), this can even lead to workload throughput degradation.

The index creation time for a clustered and a nonclustered index for various mirroring safety levels. Creating an index on a large table takes longer with safety FULL compared to safety OFF. The performance impact of synchronous mirroring is more pronounced if you create a clustered index than if you create a nonclustered index. This is because creating a clustered index generates much more transaction log compared to that generated by a nonclustered index.

Performance under database mirroring is highly dependent upon the network performance and the log I/O performance. If either the network performance or the log I/O performance is a bottleneck, the performance of database mirroring for the workload can degrade significantly with the safety FULL operation.

The network becomes a bottleneck for mirroring and it slows down both the maintenance and reduces the workload throughput.

Solution: log stream compression

SQL Server 2008 introduces a new feature called "Database Mirroring Log Compression". With SQL Server 2008, the outgoing log stream from the principal to the mirror is compressed, thereby minimizing the network bandwidth used by database mirroring. In a network constrained for bandwidth, compressing the log stream helps improve the application performance and throughput.

Log compression compression rates of at least 12.5% are achieved.

The downside of this is that compression comes with a cost as extra CPU must be used. The obvious CPU load increases comes from having to compress the transaction log on the principal and then decompress it again on the mirror before hardening it in the log. The non-obvious increase will come because the principal and mirror should end up being able to process more transactions/second, which means more CPU is needed.

This extra CPU could be a problem for some systems that are already pegged in terms of CPU load, and so they may actually see a drop in performance when log stream compression is enabled. Now, it’s on by default when you upgrade to SQL Server 2008 but there is a way to turn it off.

January 27, 2016

Amazon RDS - Backup and Restore



Amazon RDS - Backup and Restore



Amazon RDS - Backup

FAQ:  It is possible to create a backup of a database running on an Amazon RDS instance and restore it on a local machine?

Ans:

1.     You can’t currently create a .bak file out of amazon rds.




2. Use the Azure migration wizard with amazon rds to copy the RDS database to the ec2 instance/Local server.
Once that is done you can create a .bak file from the SQL Server running on the EC2 instance. if you have the bandwidth or your database is small you may be able to use the migration tool directly on your target machine.

You can create a backup locally from AWS RDS. Using SQL Management Studio, right-click your database > Task > Export Data

----Import/Export wizard or Bulk Copy Program (BCP) for SQL Server
1)      By using SQLAzureMW Tool
2)      For databases of 1 GB or larger, it is more efficient to script only the database schema and then use the Import and Export Wizard or the bulk copy feature of SQL Server to transfer the data.


Backup using SQLAzureMW Tool




 BCP Script which is automatically generated.



 Result Summary



  

Amazon RDS -Restore



Apply script to target server.




 



Tables migrated.





































Interesting Facts About Database Indexes

Interesting Facts About Database Indexes
  • Create Clustered Indexes on the tables’ Natural Keys. Natural Keys, are the fields that best identify the row’s data. For example, the Primary Key used for a table might be a Unique Identifier or a Big Int, but I might access data through a combination of columns. For a car these columns could be Year, Make, Model and VIN. When you create the Clustered Index, order the columns by their selectivity and use this same order when you query for the information.
  • Create Non-Clustered Indexes on your Primary Keys
  • Create Non-Clustered Indexes for all Foreign Keys
  • Create Non-Clustered Indexes for columns that are used in WHERE, ORDER BY, MERGE, JOIN and other clauses that require matching data.
  • Create Filtered INDEXES to create highly selective sets of keys for columns that may not have a good selectivity otherwise.
  • Use Covering INDEXEs to reduce the number of bookmark lookups required to gather data that is not present in the other INDEXES.
  • Covering INDEXES can be used to physically maintain the data in the same order as is required by the queries’ result sets reducing the need for SORT operations.
  • Covering INDEXES have an increased maintenance cost, therefore you must see if performance gain justifies the extra maintenance cost.
  • If a CLUSTERED INDEX is present on the table, then NONCLUSTERED INDEXES will use its key instead of the table ROW ID.
  • To reduce the size consumed by the NONCLUSTERED INDEXES it’s imperative that the CLUSTERED INDEX KEY is kept as narrow as possible.
  • Physical reorganization of the CLUSTERED INDEX does not physically reorder NONCLUSTERED INDEXES.
  • SQL Database can JOIN and INTERSECT INDEXES in order to satisfy a query without having to read data directly from the table.
  • Favor many narrow NONCLUSTERED INDEXES that can be combined or used independently over wide INDEXES that can be hard to maintain.
  • NONCLUSTERED INDEXES can reduce blocking by having SQL Database read from NONCLUSTERED INDEX data pages instead of the actual tables.

January 21, 2016

Business Intelligence - Comparision Matrix

-->
BI ProductTableau Qlik- QlikViewSASMicrostrategy
Company Tableau SoftwareQlikTechSAS InstituteMicrostrategy
Product URLhttp://www.tableau.com/http://www.qlik.com/https://www.sas.com/www.microstrategy.com
SegmentLeadersLeadersLeadersLeaders
STRENGTHSUsers can leverage the power of Self Service analytics with cool visualizations, drangging and dropping objects, measures and dimensions.Scripting has ablity to make clean data and do best data modelling. Patent associative tecnology.
Schema & SQL Engine uniqueness
WEAKNESSESData integration can be quite complex and without a clean data source the software isn't much use.GUI is sill not good as compared to Tableau or Qlik Sense
Development is not good in Desktop & Web and many options miss in web.
Deployment Platforms WindowsWindowsWindows,LinuxWindows,Linux
Self-service toolsYesYesYesYes
MobilityYesYesYesYes

January 13, 2016

Data Warehouse in 2016

Data Warehouse in 2016

Data Warehouse Vendors in 2016 will distinguish innovations and feature enhancements in the areas of:
  • Integration with in-memory architectures to enable real-time analytics
  • Integration with Hadoop to support larger ingestion and transformation
  • Leveraging native data compression capabilities to secure sensitive data
  • Ability to simplify integration via data virtualization
  • Enabling in-database analytics to support sophisticated requirements
  • Analytic data platforms - Real time,ready-to-use tools—native SQL, integration with the R programming language, and data mining algorithms
  • Modern data types : Mobile devices, social media traffic, networked sensors (i.e. the Internet of Things)

Data Warehouse - Copmarision Matrix

DW ProductIBM - DB2 Data Warehouse Oracle - ExadataTeradata
Company IBM has four major businesses:
hardware, software, services, and
financing. Data warehousing is part
of the data management business,
which is part of the software business.
Oracle has three businesses: database,
applications, and consulting.
Its database business is the
largest by far and currently represents
80 percent of Oracle’s new
license revenues.
NCR has four businesses: data
warehousing, financial self-service,
retail store automation, and customer
services. The Teradata Division
is responsible for the data
warehousing business
Product URLwww.ibm.com/DataWarehousing‎https://www.oracle.com/database/data-warehouse/index.htmlwww.teradata.com/
STRENGTHS- Rich and flexible data partitioning capabilities
-strong analytic functionality in OLAP and data
mining
- Market presence
- Data Models
- Hardware bundle
- Partner network
- Strong services arm (IBM GSA)
- Intelligent Storage Grid
- Hybrid Columnar Compression
- Smart Flash Cache
- Massively-parallel, partitioned,
- Shared-nothing database server
architecture
- its simple and highlyautomated
physical data warehouse
implementation
-set of indexing approaches that enable
fast access to data
- scalable hybrid-storage capabilities
- Teradata has buddied up with all enterprise Hadoop distro providers, enabled new analytic workloads to be added to Teradata systems (JSON, geospatial, 3D geospatial and others) and more.
WEAKNESSES- complex physical
implementation
- lack of integration
with multidimensional
- High cost of ownership
- DB2 on the open systems platforms continues to suffer from locking problems.
• Closed systems; can’t easily ride cost curves associated with commodity hardware.
• Expensive fault tolerant solution compared with Exadata

- its incomplete visual tools for build and manage functionality
- Proprietary hardware
- Costly to maintain
and upgrade
- Limited skilled implementation expertise
Deployment Platforms IBM AIX
Microsoft Windows
Linux
Sun Solaris
IBM AIX
Hewlett-Packard HP-UX
Linux
Microsoft Windows
Sun Solaris
NCR SVR4 UNIX MP-RAS
Microsoft Windows
Server ArchitectureServer platform with a single processor
Single database partition on a
server platform with a multiple
processors
Multiple partition configurations
• Shared-nothing
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Single server platform
Distributed database
Real Application Cluster (RAC)
• Shared, partitioned data
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Single and multiple node organization
where a node is a hardware
and software platform specialized
and dedicated to data warehousing
Teradata Warehouse is a sharednothing
architecture in both its single
and multiple node configurations
Data Type Support SQL types:
• Numeric
• Binary
• Character
• Date time
DATALINK
XML
Large objects (max 2 gigabytes)
User-defined types (distinct—
renamed SQL types, structured—
object oriented, reference—
hierarchies of built-in types)
Oracle built-in data types (SQL
types):
• Numeric
• Binary
• Character
• Date time
Large objects (max 2 gigabytes)
User-defined types (object-oriented
types, object identifier types, arrays,
nested tables)
Oracle-supplied types
• Spatial
• Media
• Text
• XML
SQL types:
• Numeric
• Binary
• Character
• Date time
Large objects
Physical Design
Recommendation
Neutral on the physical design
of data warehouses.
Neutral on the physical design
of data warehouses.
Teradata is neutral on the physical
design of data warehouses but
recommends a physical design of
third normal form for data warehouses
to maximize flexibility.
Teradata further recommends that
denormalized structures be implemented
as views or redundant
structures (logical data marts or
special purpose tables).
Physical ImplementationManualManual
Template-based via templates and
Database Configuration Assistant
(DBCA) tool
Automated via Oracle managed
files
Automated
Custom Transformations May be written in:
SQL
Java
C++
May be written in:
SQL
PL/SQL
May be written in:
SQL
C++
Summary Table Support Materialized query tables automate
the creation and management of
summary tables. A materialized
query table stores the results of a
query in a table
Materialized views automate the
creation and management of
summary tables. A materialized
view stores the results of a query in
a table
The OLAP transformations of
Teradata Warehouse Miner can
create and manage summary tables.
SQL Extensions CUBE and ROLLUP in SELECT
Functions
• Aggregate
• Numeric
• Statistical
• Correlation
• Random number generation
• Regression
• Date time
User-defined
CUBE and ROLLUP in SELECT
Functions
• Ranking
• Window aggregate
• Reporting aggregate
• Lag/lead
• Linear regression
• Inverse percentile
• Hypothetical rank and distribution
• First/last
• Numeric
• Date time
User-defined
QUALIFY, SAMPLE, and WITH in
SELECT
Functions and operators
• Aggregate
• Numeric
• Date time
• OLAP
OLAP DB2 provides OLAP build and
manage capabilities, relational
OLAP on DB2 tables, and multidimensional
and hybrid OLAP on a
combination of DB2 tables and
external multidimensional structures.
DB2 OLAP Server is a separately-
priced and -packaged product
that is an external, but tightly
integrated, multidimensional OLAP
facility that IBM OEMs from Hyperion
Solutions.
Oracle OLAP is a separatelypackaged
and -priced product that
provides OLAP functionality
Provides relational OLAP on Teradata
Warehouse tables
Data Mining DB2 Intelligent Miner is bundled
with DB2 Data Warehouse Enterprise
Edition.
Oracle Data Mining is a separately-
priced and -packaged product.
Teradata Warehouse Miner is a
separately-packaged and -priced
product that is tightly integrated
with Teradata Warehouse.

Data Warehouse- Teradata


Company - DW ProductTeradata
Company NCR has four businesses: data
warehousing, financial self-service,
retail store automation, and customer
services. The Teradata Division
is responsible for the data
warehousing business
Product URLwww.teradata.com/
STRENGTHS- Massively-parallel, partitioned,
- Shared-nothing database server
architecture
- its simple and highlyautomated
physical data warehouse
implementation
-set of indexing approaches that enable
fast access to data
- scalable hybrid-storage capabilities
- Teradata has buddied up with all enterprise Hadoop distro providers, enabled new analytic workloads to be added to Teradata systems (JSON, geospatial, 3D geospatial and others) and more.
WEAKNESSES- its incomplete visual tools for build and manage functionality
- Proprietary hardware
- Costly to maintain
and upgrade
- Limited skilled implementation expertise
Deployment Platforms NCR SVR4 UNIX MP-RAS
Microsoft Windows
Server ArchitectureSingle and multiple node organization
where a node is a hardware
and software platform specialized
and dedicated to data warehousing
Teradata Warehouse is a sharednothing
architecture in both its single
and multiple node configurations
Data Type Support SQL types:
• Numeric
• Binary
• Character
• Date time
Large objects
Physical Design
Recommendation
Teradata is neutral on the physical
design of data warehouses but
recommends a physical design of
third normal form for data warehouses
to maximize flexibility.
Teradata further recommends that
denormalized structures be implemented
as views or redundant
structures (logical data marts or
special purpose tables).
Physical ImplementationAutomated
Custom Transformations May be written in:
SQL
C++
Summary Table Support The OLAP transformations of
Teradata Warehouse Miner can
create and manage summary tables.
SQL Extensions QUALIFY, SAMPLE, and WITH in
SELECT
Functions and operators
• Aggregate
• Numeric
• Date time
• OLAP
OLAP Provides relational OLAP on Teradata
Warehouse tables
Data Mining Teradata Warehouse Miner is a
separately-packaged and -priced
product that is tightly integrated
with Teradata Warehouse.

Oracle Data Warehouse - Exadata

-->
Company - DW ProductOracle - Exadata
Company Oracle has three businesses: database,
applications, and consulting.
Its database business is the
largest by far and currently represents
80 percent of Oracle’s new
license revenues.
Product URLhttps://www.oracle.com/database/data-warehouse/index.html
STRENGTHS- Intelligent Storage Grid
- Hybrid Columnar Compression
- Smart Flash Cache
WEAKNESSES
Deployment Platforms IBM AIX
Hewlett-Packard HP-UX
Linux
Microsoft Windows
Sun Solaris
Server ArchitectureSingle server platform
Distributed database
Real Application Cluster (RAC)
• Shared, partitioned data
• Multiple server platforms
• Server interconnect
• Any number of readers and
writers
Data Type Support Oracle built-in data types (SQL
types):
• Numeric
• Binary
• Character
• Date time
Large objects (max 2 gigabytes)
User-defined types (object-oriented
types, object identifier types, arrays,
nested tables)
Oracle-supplied types
• Spatial
• Media
• Text
• XML
Physical Design
Recommendation
Neutral on the physical design
of data warehouses.
Physical ImplementationManual
Template-based via templates and
Database Configuration Assistant
(DBCA) tool
Automated via Oracle managed
files
Custom Transformations May be written in:
SQL
PL/SQL
Summary Table Support Materialized views automate the
creation and management of
summary tables. A materialized
view stores the results of a query in
a table
SQL Extensions CUBE and ROLLUP in SELECT
Functions
• Ranking
• Window aggregate
• Reporting aggregate
• Lag/lead
• Linear regression
• Inverse percentile
• Hypothetical rank and distribution
• First/last
• Numeric
• Date time
User-defined
OLAP Oracle OLAP is a separatelypackaged
and -priced product that
provides OLAP functionality
Data Mining Oracle Data Mining is a separately-
priced and -packaged product.

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...