Wednesday, March 30, 2016

Presentation and Visualization Methods

Today with the amount of data that is being stored and processed in databases, an effective way of presenting and visualizing data has become equally important. This is necessary to extract meaningful information from the data that we store. Most business intelligence tools today provide features that can be used by  business owners to create useful metrics, reports and dashboards. This enables executives to see analytics presented visually, so that they can grasp useful insights and identify new patterns. For the purpose of this blog, I would like to use the following three vignettes to demonstrate some of the types of visualization that are commonly used. The three industry that I have selected are:

  • Transportation
  • E-commerce
  • Financial Service

Transportation:

Transportation industry data provides some interesting opportunity for great data visualizations which numbers alone cannot represent. Take the example of the following visualization made by a Kansas City-based performance consultancy company called BUCS Analytics. They used Spotfire to create the following visualization:

Image Credits: data-informed.com
The following graph is called as scatter plot. This type of graph is useful when there are large number of data points to be represented over a measure such as distance. The graph shown in this image compares the cost structure of owning trucks versus buying that capacity by distance. The visualization of the data enabled the company’s management to see that greater profit margins was possible on routes run by trucks they owned when the distance was less than 1,500 miles.

E-commerce

E-commerce industry is unique in the fact that unlike brick and mortar stores, business owners of e-commerce cannot observe their customers. Hence it is important to track, analyze, and view customer behavior through different BI tools. Consider the image shown below as a typical dashboard for e-commerce business owner:


Image Credits: hbr.org
This image shows the trend in sales across various dimension such as day of week, crowd size and the activity rate in sales. The dashboard shows both bar graphs as well as line charts which gives a clear representation of the data. It neatly shows the comparison between various attribute values of the dimension that we are interested in tracking. For example the dashboard can be used to track demographic data, customer engagement, enhance customer experience and even make predictive analysis possible.

Financial Services:

Visualization tools can be used to bring transparency & clarity to financial data. Day to day as well as historical transactions organized into interactive visualizations and reports. The image below shows how transactions in financial organizations can be neatly categorized to understand meaningful trends.

Image Credit: finance.strands.com
The visualization shown above is a bubble chart that is commonly used to create colorful and overlapping graphs. A bubble chart is a variation of a scatter chart in which the data points are replaced with bubbles, and an additional dimension of the data is represented in the size of the bubbles. The graph shown above helps users to track income and expenses in a more fun and intuitive manner using stunning data visualizations. The view can also be customized by time period, multiple accounts or category level.

Thus as we see, there are business intelligence tools that provide various type of graphs for effective visualization of data. The information that can be derived from these visualization enables business users to leverage the data that they have in a way that ordinary numbers cannot. Hence presentation and visualization of data becomes an important feature in any BI tool.

Reference:

  1. http://data-informed.com/visualizations-from-erp-data-brings-clarity-to-decision-makers/
  2. http://data-informed.com/optimize-ecommerce-analytics-visualization/
  3. http://finance.strands.com/products/pfm-personal-financial-management/


Wednesday, March 2, 2016

Big Unstructured Data v/s Structured Relational Data

As we know by now, data by definition is any information that can be translated into a form for convenience to move and process. Data can exist in many forms such as numbers or text in a piece of paper or bytes and bits stored in computer memory. Through this blog we try to understand how we classify data based on how it is stored in a computer system.



Structured Data

We can think of structured data as any data that resides in a file or record with a fixed field. For example, data that is stored in an excel sheet with different rows and columns or data that is systematically stored within a relational database, all these are examples of structured data.

As a basic definition we can say that data that is stored with a high degree of organization that can be easily accessed, searched or operated upon. Typically this type of data is stored in relational databases with ordered rows and columns along with fixed data types such as varchar, boolean, alphanumeric values etc. For this a data model must be defined first based on the business requirements followed by the data type, data constraints such as referential integrity etc., and metadata information for the relational database. Data is retrieved and managed using Structured Query Language (SQL) which is the most popular method used. Operations such as insert, update,  delete etc are performed on structured data using SQL.

Unstructured Data

This type of data generally includes texts and multimedia data such as images, audio, videos, webpages etc. They are usually called unstructured data because they typically do not fit into a conventional database. It is estimated that 80 to 90% of data in an organization is unstructured data and with the advent of advanced computing the volume of unstructured data is ever growing.

Unstructured data in its original form does not give any meaningful insight. This type of data needs to be first extracted and prepared to be processed to get some meaningful information for an organization.

Here's a simple table that shows the difference between structured and unstructured data:



Until recently organizations were overwhelmed by the large volume of unstructured data. Of late there are tools and techniques that are used to manage and organize data in an efficient manner. These tools and techniques can be broadly classified as follow:
  • Big data: Tools like Hadoop help in structuring data that are extremely complex and volatile in nature.
  • Business Intelligence: Tools such as IBM Watson help organization to analyze data and provide visualization of information through dashboards. 
  • Search & Indexing tools: These tools help in retrieving useful information from unstructured files such as web pages, word documents etc.
From a data warehousing point of view in any large organization, some of the common type of data that would be present are as follows:
  1. Metadata: Simply put, Metadata is defined as data about data. It contains information that is required for extraction, transformation and loading of data from various source systems into the data warehouse.
  2. Historic data: Organization have large volume of historic data that can be useful in providing insights into various aspects of the company. 
  3. Derived Data: Derived data is obtained from existing information through some mathematical operations or data transformation. Such data can be generated at run time and can sometimes also be stored as part of the database schema.

Data Warehousing

Data warehousing allows an organization to store large data sets from different departments or areas of the organization accumulated together. Data from different OLTP applications and other sources are extracted which can be then used by analytical applications and user queries. It helps to collect and process data that can be later presented to business users.
Here's a small video explaining the whats and hows of data warehousing.





Limitations of Data Warehousing

While data warehousing is a very useful method of effectively managing large volume of data from different sources, some of the limitations with it are as follows:
  • Ownership of data is lost once it enters data warehousing systems from original data source. Security, privacy and accountability of data can be in concern in this case.
  • There usually is a long implementation phase for initial implementation of data warehousing along with associated high costs
  • It becomes difficult to incorporate changes in data types, data source schema, indexes and queries once the data warehouse is completely setup 
  • Data warehousing requires a high maintenance cost. This is because any change in the business requirements or source data will lead to significant changes in the data warehousing process which increases the cost.


Future of Data Warehousing 

Data warehousing has been around for a few years now and it is now evolving as a analytic warehouse. More and more vendors are coming up with data warehouses that have advanced statistical capabilities for performing analytics and forecasting. Emerging platforms such as Hadoop act as a distributed file processing that can enable processing of large volumes of unstructured data easy. With cloud computing and mobile computing already popular and with the emergence of internet-of-things, the volume of data is going to increase exponentially. Processing data and performing data analytics in the cloud will become is predicted to be the norm which will make warehousing simple and convenient. With cloud-based data warehousing, the cost of traditional on-premises offerings as well as management overhead costs will significantly be lowered. We can also see structured and unstructured data from back end systems being brought into the data warehouse in real- and near-real time.
The ability to incorporate big data techniques, analytics technologies, back end systems, and traditional data warehouses will potentially change the economics of data warehousing in the future.


References:

  • http://www.webopedia.com/TERM/U/unstructured_data.html
  • http://www.webopedia.com/TERM/S/structured_data.html
  • deloitte.wsj.com/cio/2013/07/17/the-future-of-data-warehouses-in-the-age-of-big-data/
  • https://www.youtube.com/watch?v=cmQomHNZW4g



Thursday, February 18, 2016

Business Intelligence in Retail Industry

Retail industry is a highly competitive market with new players entering the fray every once in a while. Rapidly changing customer demands and mounting pressure from suppliers, effective management of information is imperative to make good business decisions.


Target Corporation is the second largest retailer in the US having its sales through store fronts as well as digital channels. It differentiates itself from other discount retailers like Walmart and K-mart stores by offering more trend-forward, upscale merchandise at lower costs. A typical Target store offers clothing, beauty products, electronics, health products, groceries, home and hardware supplies.
Along with popular brands Target also sells its own branded labels such as Archer Farms, Market Pantry, and Simply Balanced; Sutton & Dodge, their premium meat line; Threshold, their premium furniture line; and the electronics brand Trutech. Target also has a separate e-commerce operations through the target.com domain. From its aesthetically designed store fronts to an award winning i-Phone app, Target aims to make their customer’s shopping experience memorable and unique.

Target with its operations across 1,801 locations throughout the United States has large volumes of data, right from its supply chains, warehouses and store operations. With this myriad information at their disposal Target wishes to make leverage this data to develop actionable insights and deliver value. The CEO of the company Mr. Brian Cornell wishes to use Business Intelligence to have insights into the various facets of his business. At a very basic level he wishes to see the performance of the companies from the perspective of the following parameters:
  • Customer : Who are the type of customers that typically prefers to shop at Target stores.  What would be their average annual income? How many customers are enrolled for reward programs and how can they be segmented based on their spending. What is the age group of people shopping at the stores and the number of people preferring store fronts vs online shopping. 
  • Sales Channel: Which are the different channels that are more profitable to the company. While their standalone retail stores are their main source of income, alternate channels such as online shopping, TV shopping networks, mall stores etc. are also channels that Target is interested in.
  •  Promotional events/Discounts: What is the increase in sales during special promotional events such as Black Friday or Cyber Monday. How much discounts is to be provided as part of the reward program, which are the products on which discounts should be offered, for what duration should they be offered are some of the parameters the CEO would like to look into.
  • Vendor/Supplier Payments: Which are the vendors and suppliers that provide the best raw materials and at what price? what has the performance of the vendor been in terms of timely delivery of raw materials and if quality of the products have been consistent.
  • Employee Payments: Employees include clerks at the checkout counters, warehouse workers, delivery workers and other managerial and administrative employees. Information on the number of employees at each store, customer to employees ratio, salary paid to employees are different location and the working hours of employees are some of the parameters that the CEO would be interested in looking into.

      How Dimensional Modelling helps?

a     Using Data Dimensional modelling will help Target to understand its revenue model better by giving valuable insights on the sales it generates. By defining various dimensions such as store, product, date, promotion etc, the CEO will be able understand the granularity of its sales generation. Data dimensional will help understand that for a particular product, at a particular store at a particular location, what was the daily or monthly sales that occurred.  By observing the high cardinality, numeric measure of entries in the fact tables we would be able to get clear measure of various business event such dollars sales for a particular transaction by a customer. Additive and semi-additive facts will also help in aggregating various facts together to give the stakeholders the necessary level of granularity.

      Accumulating Snapshot Dimensional Model:
      Considering that there are many milestones in the entire process of obtaining the inventory, storing it in warehouses and delivering it across various stores across the country, an accumalting snapshot would be the ideal type of dimensional modelling for understanding the sales revenue for Target corporation.
  
     Below is a sample dimensional model that can be used for the Retail Industry. Some of the key dimensions are Product dimension, payment dimension, store dimension etc.




Thus through the use of dimensional modelling for data warehousing, key stakeholders of the business can get an idea of how Target is performing through implementation of various parameters.

References:

  • http://flylib.com/books/en/4.65.1.20/1/
  • https://www2.microstrategy.com/download/files/whitepapers/open/Business-Intelligence-and-Retail.pdf
  • https://en.wikipedia.org/wiki/Target_Corporation


Thursday, February 4, 2016

Business Intelligence & Analysis Tools

In today’s fast-paced business world, Business Intelligence (BI) tools are being used by organizations to make improved business decisions. It enables business users and analysts to access company data in an efficient and intuitive manner. During the past few years many BI tools have come to the market. Some of these are traditional stand-alone tools while some are enterprise wide, decentralized tools. This blog tries to look into some of the leading and well-known BI tools used by most organization. A weighted scoring model has been used to identify key evaluation criteria for these BI tools.

From the various BI tools available in the market, 5 tools have been selected for evaluation. These are as follows:

1)   IBM Cognos:


IBM Cognos provides an integrated BI platform that provides good flexibility and the ability to query packages in an ad-hoc way. It is ideal for large deployments across an enterprise that are typically centrally managed. 
IBM Cognos Analytics also provides good collaboration across teams, organizations, and ecosystems to amplify value and offer good scalability, governance, security and overall performance.

Unique selling proposition:

IBM Cognos Business Intelligence is a web-based, integrated business intelligence suite that provides tools right from reporting and analysis to score-carding and monitoring of events.

2)    SAS:

Business Intelligence and Analytics suite provides excellent predictive modeling with innovative visualization. Its interactive capabilities along with its diverse set of use cases, caters the analytic requirements right from an enterprise level organization to traditional user such as data scientists, IT developers and power users within these organization.
License cost can sometimes be a concern for SAS customers especially when there are wider deployments across an entire enterprise.

Unique selling proposition:

SAS meet both, the enterprise needs of IT, as well as the self-service needs of the business. This flexibility differentiates SAS from other vendors in the market.

3)   Microsoft BI and analytics:

The Microsoft BI and analytics suite supports different BI use cases and analytic requirement. Scalability is one of the major strength of this product. It ranks very well for data volume accessed, with an average data size of 62 TB, which is higher than any of its competitors. It also provides cloud-based services and collaboration capabilities through a subscription-based model using Office 365.

Microsoft BI and analytics also has a deployment with an average number of end users of 6,000 which in comparison to other products is higher.

Unique selling proposition:

With a strong customer base, the solution provides built-in connectivity to on-premises SQL Server Analysis Services, which will allow organizations to leverage existing data assets without having to move to in the cloud.

4)   Tableau

One of the most popular BI visualization tool, Tableau is the perceived market leader among all the other BI tools that are out there. Tableau supports a wide range of data sources from SQL to MDX, as well as a number of Hadoop distributions, support for Google BigQuery, Salesforce and Google Analytics.


Tableau also provides a robust mobile client. Visualizations are optimized for mobile devices and its touch-optimized controls makes accessing and viewing data easy and intuitive.

Unique selling Proposition

Tableau easy to use visual based data discovery capabilities enables business users and analysts to play with in data even without extensive skills or training with any previous BI platforms.

5)   Qlikview

QlikView, much like tableau, provides ease-of-use for both IT professionals and non-technical users. It provides a self-contained, tightly integrated development platform for building intuitive and interactive dashboard applications. It offers a highly interactive dashboard, centralized dashboard application development for an enterprise features to support governed data discovery.

Unique Selling proposition

Its key strength lies in the fact that it can combine data from diverse sources such as Salesforce.com, Oracle, SQL Server, SAP and Excel. QlikView’s also ranks high in customer loyalty, satisfactory performance, and is known for its product quality and overall market position.

Comparative Analysis

In order to compare these BI tools to see how they stand against each other, we have used a weighted score model to analyze their strengths and weakness. To do so we have identified the following evaluation criteria based on which we make our comparison. 

  1. Reporting: Reporting help users convert data into actionable knowledge. They allow users to better understand the analysis within reports, and the underlying data those reports are based on, to support better decision-making. Some of the basic features should include drill down through reports, conduct slice and dice OLAP analysis etc
  2. Data Source Support and Integration: This criteria tries to check the different data sources that the tool supports and ability to provide user-defined measures, sets, groups and hierarchies.
  3. Analytic Dashboards & Content: The quality of visualization and exploration capabilities and the ability of the tool to create highly interactive dashboards and content with ease of use that can be later consumed by others.
  4. Platform: The different platform that the tool can run on and its ability to deliver contents in an interactive mode on multiple computing environment such as cloud deployment, web-based, mobile or stand-alone platforms.
  5. Customer Support:This criteria includes the ways customers receive technical or account support. It compares the type of support available (e.g. Email, Phone, FAQ), availability of user groups, service-level agreements etc.
Based on these parameters the weighted analysis table looks something like this. Evaluation criteria are given a weight-age based on their importance that is required in any standard BI tool. Each vendors is scored on a scale of 1 to 5 with 1 = poor, 2 = average, 3 = good, 4 = excellent and 5 = outstanding.

CriteriaWeightIBM CognosSASMicrosoftTableauQlikView
Reporting
30%
4
3
3
4
4
Data Source Support and Integration
20%
3
4
4
5
5
Analytic Dashboards
30%
4
4
3
5
4
Platform
10%
3
4
4
5
4
Customer Support
10%
2
3
3
3
4
Score
100%
3.5
3.6
3.3
4.5
4.2
Rank
4
3
5
1
2

We have given scores to each of these features based on how predominant they are in these tools. We have also used the Customer Survey Metrics from Gartner Magic Quadrant for Business Intelligence and Analytics Platform to base our scoring on.

Once these scores were given we calculated the weighted score for each of the 5 BI tools. Based on this analysis we can see that Tableau has the highest score of 4.5 closely followed by QlikView with a score of 4.2.
SAS, IBM Congos and Microsoft BI have more or less the same score in all the criteria and they are ranked 3,4 and 5 respectively.

Final Verdict

Based on the selected features and the product scores given for each of these features we can see that Tableau is clearly the winner. As written in Gartner's Magic Quadrant report,
"Tableau has clearly defined the market in terms of data discovery, with a focus on helping people see and understand their data. It is currently the perceived market leader with most vendors viewing Tableau as the competitor they most want to be like and to beat"


References:
  1. http://www.gartner.com/
  2. https://www.wikipedia.org/
  3. https://www.yellowfinbi.com/YFCommunityNews-6-Key-Features-of-any-Business-Intelligence-Solution-100207
  4. https://www.yurbi.com/blog/straight-talk-review-of-tableau-software-the-pros-and-cons/