Apr 5, 2012

How to Simplify the Complexities of Big Data

Up until recently query response time determined a lot of what can or cannot be analysed. For example a recent company wanted to run ‘Intuitive Analytics’ in financial data for the month. This was a set of around 30-40 million GL records. Intuitive refers to when a financial analyst feels something is not right with a number and wants to dive deeper, with self-service capabilities, into the database to check their assumptions. I quote the user ‘I used to start a navigation step and I could go for coffee, talk to a friend and come back to still see the hour glass running. You must realize that it takes 20 to 40 navigation steps to find the truth’. We tracked and each step took anywhere from 500 to 1,300 seconds to run on 30-40 million records. When the company found a financial fraud they wanted to run these reports run faster because the current performance was a disincentive for analysts to conduct such research. After acceleration the report ran in 4 seconds.

·         Now remember we are only talking of finance reports and that too for a quarter. What if analysing the last 5 years of data would truly add value to the analysis. Suddenly we are talking of 700 million to a few billion records. What if we are talking of Cost & Profitability analysis where we not only need to look at a few billion records but also conduct calculations on these for cost and profit analysis? We are still talking only financial data within our enterprise walls.

·         The world and global competitiveness is changing fast. While we are talking of a few to tens of billions of records and the ability to handle that our BI infrastructure at this time starts to groan and creak with a 100 million records.

·         So we are faced with a decision dilemma. On one side we have the need to make globally comprehensive decisions to make our company more competitive. On the other hand the data volumes for such analysis tend to be rather large. Add to this that competition is also starting to mine social network data to analyse emotional scores on products and services of the company, including industry specific data that is now available from large subscription datasets. So from tens of terabytes we suddenly find ourselves entering data caverns in the thousands of terabytes, or Exabyte’s.

If we look at a scientific methodology to all this madness then here is an actionable roadmap with steps for advancement:

·         Step 1: Ensure you have a documented and communicated foundation of a global methodology in the form of a ‘Global BI Cookbook’ before you do anything else. Proceeding without this is another waste of time. Check 1: If you naming standards are ‘Z’ and ‘Y’ you have sub-optimal standards and these need to be changed.
·         Step 2: Clean up your current BI / W environment. Close to 90% of DW’s have redundant objects. Close to 80% of DW environments are architectured and modelled sub-optimally. First step is to clean all this unnecessary baggage in your existing DW/BI Environment.
·         Step 3: Benchmark your BI environments and selectively push objects into optimal environments. i.e. Never put all your cubes into BWA but select the ones that should be there.  
·         Step 4: Conduct an alternatives analysis for each step. i.e. don’t replicate your Cognos reports on a new BOBJ 4.x deployment. Don’t demand only WebI reports because you think they are great. Don’t think BWA or HANA will solve all your problems review Hadoop for social network data reduction.
·         Step 5: When thinking HANA think BWA and you are closer to the truth than you realize

 Modern systems create unbelievable amounts of data. Modern companies planning to track their company with a 360° will have to deal with extremely large volumes of data.

Let’s talk internal data. General ledger data can run anywhere from 40 to 500 million records. For large companies it can run deeper. Costing and Profitability analytics on these systems runs into many billions of records. Smart Meters are producing millions of records every ten or 15 seconds, so analysing a quarter of trends means billions of records. Retail outlets can create millions of records an hour. Airplane engine manufacturers want to conduct in-flight analytics so spares are on site even before the planes have landed this creates billions of records every hour. Homeland security needs to watch millions of passengers traveling inwards and outwards from their country this is billions of records every hour.

In almost all the above cases the speed of information is critical to goals. The general trend has been to either filter the data or view very small subsets of data because the technology was just not there to analyse very large volumes. Now the technology has finally arrived where we can analyse 40 billion or 200 billion records in less than 10 seconds. Such performance changes the competitive advantage of individuals, companies and nations. This speed enables ‘Trade promotions’ to be analysed on a daily basis, cost and profitability to be visualized as events happen, status of all the engines at this point of time across the planet, or a visual view of threat to our homeland at a second to second accuracy.

The digital economy has been here and now we can harness its power.

Lead this revolution with the following roadmap

·         Most critical: Build global methodologies and standards for harmonized data management
·         Collect and harness internal data for analysis
·         Then move to collecting many types of structured and unstructured data from outside the organization
·         Measure competitive advantage like emotional analysis of new product launch or behaviour predictability
·         Use this knowledge create proactive marketing targeted to synchronize with global emotions

No comments:

Post a Comment