February 15, 2014

Essential Guidelines for Choosing a Business Intelligence

Implementing a Business Intelligence (BI) solution leads to dramatic operational improvements and benefits that far outweigh the investments in time, money, and personnel necessary to select, deploy, and maintain such an application.
The correct tool should fit the company like a glove.  It must be, amongst other things, adaptable, scalable, secure, affordable, and have the ability to report on a multitude of business trends.  It should inform of business practices and identify opportunities that allow the end user to make educated decisions.
Consolidation of data is the main drive in the development of business intelligence.  Data de duplication is a growing necessity due to the increase in mobility.  Organizations must now deal with increasing demands, higher degrees of accuracy, firmer controls and tighter timelines.  Companies must find a way to digest large data sets.  Automated reporting and notifications are essential in staying on top of the market trends.  The right business intelligence solution will include the ability to report, plan, monitor and analyze volumes of data.  In turn, this will allow for more strategic and informed business decisions.  The information can be processed and distributed on a company wide basis with even the simplest of templates and visuals.  In this modern age, it is essential for all employees to engage in the analytical process of information to some extent.   Reporting capabilities should be flexible and display actionable data sets.  The ultimate objective is to develop long term goals based on the data produced through business intelligence.
 
The Business Intelligence Technology Stack
To build a Business Intelligence solution, enterprises will need to consider new investments and upgrades to current technology to build out the BI technology stack.
Storage and computing hardware - To implement BI, firms will need to invest or upgrade their data storage infrastructure.
Applications and data sources - To develop an effective BI solution, source data will need to be scrubbed and organized. The challenge is that source data can come from any number of applications, most using proprietary data formats and application-specific data structures.
Data integration- Extraction, transformation and loading (ETL) tools pull data from multiple sources, and load the data into a data warehouse. Again, the trend in data integration and Enterprise Application Integration, in general, is toward standardization through XML and web services.

Relational databases and data warehouses -Firms will need a data warehouse to store and organize tactical or historical information in a relational database. Organizing data in this way allows the user to extract and assemble specific data elements from a complete dataset to perform a variety of analyses.

OLAP applications and analytic engines -  Online analytic processing (OLAP) applications provide a layer of separation between the storage repository and the end user's analytic application of choice. Its role is to perform special analytical functions that require high-performance processing power and more specialized analytical skills.

Analytic applications -Analytic applications are the programs used to run queries against the data to perform either "slide-and-dice" analysis of historical data or more predictive analyses, often referred to as "drill-down" analysis. For example, a customer intelligence application might enable a historical analysis of customer orders and payment history. Alternatively, users could drill down to understand how changing a price might affect future sales in a specific region.

Information presentation and delivery products - The results of a query can be returned to the user in a variety of ways. Many tools provide presentation through the analytic application itself and offer dashboard formats to aggregate multiple queries. Also, enterprises can purchase packaged or custom reporting products, such as Crystal Reports. An important trend in BI presentation is leveraging XML to deliver analyses through a portal or any other Internet-enabled interface, such as a personal digital assistant (PDA).
 
 
1. Is the package a complete solution?
Complete BI solution is how quickly your own data can be leveraged to produce reporting, analytics, visualization, and easy integration with a variety of disparate data sources.  It should go beyond to predefined queries to ad hoc queries and dynamic selection by user.
2.  Is the solution easy to use and administer?
It’s important that ongoing use of your selected system minimizes the amount of IT involvement required. One of the biggest burdens on IT is the mandatory data warehouses necessitated by many BI solutions
3. Investment in hardware and resources
Can we use existing technology, hardware, people resources.
Given the investment in hardware, software, training, and opportunity costs, it’s vital that the selected technology offer as short a path to productivity as possible. To attain a sufficient ROI, users will need to derive value very quickly.
4.Will the solution scale?
Even the most successful BI implementation can run into difficulties when faced with increased data volumes and usage loads. Workloads increase for a number of reasons, including natural data volume growth, selecting additional dimensions for analysis, and incorporating new data sources.
5. Self-Service BI Is About More Than Interactivity- Organizations constantly struggle with their data. Integrating, managing, and verifying data sources are continuous exercises required for businesses looking at ways to increase their competitive advantage and understand what is occurring within their organization’s daily operations. 
Why So Many BI Projects Fail?
1. A confusing product landscape - Confuse between requirement  i.e. reporting Vs  analytics
2. BI Cost model - Complicate the job of estimating a return on investment for the substantial BI outlays.
3.Operational limitations - Some BI software require very lengthy deployments users often complain of bloated feature sets, static reports and queries, and built-in IT when it’s time to modify or expand the inventory of queries and reports. On the other hand, point solutions regularly suffer from considerable amounts of missing functionality. 
 

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...