Wednesday, March 30, 2016

Presentation and Visualization Methods

Today with the amount of data that is being stored and processed in databases, an effective way of presenting and visualizing data has become equally important. This is necessary to extract meaningful information from the data that we store. Most business intelligence tools today provide features that can be used by  business owners to create useful metrics, reports and dashboards. This enables executives to see analytics presented visually, so that they can grasp useful insights and identify new patterns. For the purpose of this blog, I would like to use the following three vignettes to demonstrate some of the types of visualization that are commonly used. The three industry that I have selected are:

  • Transportation
  • E-commerce
  • Financial Service

Transportation:

Transportation industry data provides some interesting opportunity for great data visualizations which numbers alone cannot represent. Take the example of the following visualization made by a Kansas City-based performance consultancy company called BUCS Analytics. They used Spotfire to create the following visualization:

Image Credits: data-informed.com
The following graph is called as scatter plot. This type of graph is useful when there are large number of data points to be represented over a measure such as distance. The graph shown in this image compares the cost structure of owning trucks versus buying that capacity by distance. The visualization of the data enabled the company’s management to see that greater profit margins was possible on routes run by trucks they owned when the distance was less than 1,500 miles.

E-commerce

E-commerce industry is unique in the fact that unlike brick and mortar stores, business owners of e-commerce cannot observe their customers. Hence it is important to track, analyze, and view customer behavior through different BI tools. Consider the image shown below as a typical dashboard for e-commerce business owner:


Image Credits: hbr.org
This image shows the trend in sales across various dimension such as day of week, crowd size and the activity rate in sales. The dashboard shows both bar graphs as well as line charts which gives a clear representation of the data. It neatly shows the comparison between various attribute values of the dimension that we are interested in tracking. For example the dashboard can be used to track demographic data, customer engagement, enhance customer experience and even make predictive analysis possible.

Financial Services:

Visualization tools can be used to bring transparency & clarity to financial data. Day to day as well as historical transactions organized into interactive visualizations and reports. The image below shows how transactions in financial organizations can be neatly categorized to understand meaningful trends.

Image Credit: finance.strands.com
The visualization shown above is a bubble chart that is commonly used to create colorful and overlapping graphs. A bubble chart is a variation of a scatter chart in which the data points are replaced with bubbles, and an additional dimension of the data is represented in the size of the bubbles. The graph shown above helps users to track income and expenses in a more fun and intuitive manner using stunning data visualizations. The view can also be customized by time period, multiple accounts or category level.

Thus as we see, there are business intelligence tools that provide various type of graphs for effective visualization of data. The information that can be derived from these visualization enables business users to leverage the data that they have in a way that ordinary numbers cannot. Hence presentation and visualization of data becomes an important feature in any BI tool.

Reference:

  1. http://data-informed.com/visualizations-from-erp-data-brings-clarity-to-decision-makers/
  2. http://data-informed.com/optimize-ecommerce-analytics-visualization/
  3. http://finance.strands.com/products/pfm-personal-financial-management/


Wednesday, March 2, 2016

Big Unstructured Data v/s Structured Relational Data

As we know by now, data by definition is any information that can be translated into a form for convenience to move and process. Data can exist in many forms such as numbers or text in a piece of paper or bytes and bits stored in computer memory. Through this blog we try to understand how we classify data based on how it is stored in a computer system.



Structured Data

We can think of structured data as any data that resides in a file or record with a fixed field. For example, data that is stored in an excel sheet with different rows and columns or data that is systematically stored within a relational database, all these are examples of structured data.

As a basic definition we can say that data that is stored with a high degree of organization that can be easily accessed, searched or operated upon. Typically this type of data is stored in relational databases with ordered rows and columns along with fixed data types such as varchar, boolean, alphanumeric values etc. For this a data model must be defined first based on the business requirements followed by the data type, data constraints such as referential integrity etc., and metadata information for the relational database. Data is retrieved and managed using Structured Query Language (SQL) which is the most popular method used. Operations such as insert, update,  delete etc are performed on structured data using SQL.

Unstructured Data

This type of data generally includes texts and multimedia data such as images, audio, videos, webpages etc. They are usually called unstructured data because they typically do not fit into a conventional database. It is estimated that 80 to 90% of data in an organization is unstructured data and with the advent of advanced computing the volume of unstructured data is ever growing.

Unstructured data in its original form does not give any meaningful insight. This type of data needs to be first extracted and prepared to be processed to get some meaningful information for an organization.

Here's a simple table that shows the difference between structured and unstructured data:



Until recently organizations were overwhelmed by the large volume of unstructured data. Of late there are tools and techniques that are used to manage and organize data in an efficient manner. These tools and techniques can be broadly classified as follow:
  • Big data: Tools like Hadoop help in structuring data that are extremely complex and volatile in nature.
  • Business Intelligence: Tools such as IBM Watson help organization to analyze data and provide visualization of information through dashboards. 
  • Search & Indexing tools: These tools help in retrieving useful information from unstructured files such as web pages, word documents etc.
From a data warehousing point of view in any large organization, some of the common type of data that would be present are as follows:
  1. Metadata: Simply put, Metadata is defined as data about data. It contains information that is required for extraction, transformation and loading of data from various source systems into the data warehouse.
  2. Historic data: Organization have large volume of historic data that can be useful in providing insights into various aspects of the company. 
  3. Derived Data: Derived data is obtained from existing information through some mathematical operations or data transformation. Such data can be generated at run time and can sometimes also be stored as part of the database schema.

Data Warehousing

Data warehousing allows an organization to store large data sets from different departments or areas of the organization accumulated together. Data from different OLTP applications and other sources are extracted which can be then used by analytical applications and user queries. It helps to collect and process data that can be later presented to business users.
Here's a small video explaining the whats and hows of data warehousing.





Limitations of Data Warehousing

While data warehousing is a very useful method of effectively managing large volume of data from different sources, some of the limitations with it are as follows:
  • Ownership of data is lost once it enters data warehousing systems from original data source. Security, privacy and accountability of data can be in concern in this case.
  • There usually is a long implementation phase for initial implementation of data warehousing along with associated high costs
  • It becomes difficult to incorporate changes in data types, data source schema, indexes and queries once the data warehouse is completely setup 
  • Data warehousing requires a high maintenance cost. This is because any change in the business requirements or source data will lead to significant changes in the data warehousing process which increases the cost.


Future of Data Warehousing 

Data warehousing has been around for a few years now and it is now evolving as a analytic warehouse. More and more vendors are coming up with data warehouses that have advanced statistical capabilities for performing analytics and forecasting. Emerging platforms such as Hadoop act as a distributed file processing that can enable processing of large volumes of unstructured data easy. With cloud computing and mobile computing already popular and with the emergence of internet-of-things, the volume of data is going to increase exponentially. Processing data and performing data analytics in the cloud will become is predicted to be the norm which will make warehousing simple and convenient. With cloud-based data warehousing, the cost of traditional on-premises offerings as well as management overhead costs will significantly be lowered. We can also see structured and unstructured data from back end systems being brought into the data warehouse in real- and near-real time.
The ability to incorporate big data techniques, analytics technologies, back end systems, and traditional data warehouses will potentially change the economics of data warehousing in the future.


References:

  • http://www.webopedia.com/TERM/U/unstructured_data.html
  • http://www.webopedia.com/TERM/S/structured_data.html
  • deloitte.wsj.com/cio/2013/07/17/the-future-of-data-warehouses-in-the-age-of-big-data/
  • https://www.youtube.com/watch?v=cmQomHNZW4g