Jul 1, 2012

Before leaping into Big Data ensure your Data is Solid

ProactiveExecutives and managers need the latest information to drive intelligentdecisions for assuring business success. Better, than competition, informeddecisions mean more revenue, less risk, decreased cost, and improvedoperational control for business agility and global competitiveness. In ourfast paced, technology-driven business processes, organizations are continuallystruggling to deal with growing data volumes and complexity to firstly usetheir own data efficiently. Now when we are bombarded with ‘big data’ the issueof global harmonization of data quality becomes all the more relevant.

Constrained with a globally competitiveflattening-world and newer data complexities are COO, IT Managers and BusinessConsultants who are actually asking for less number of analytics and reportswhile simultaneously expecting easier access to smarter and fasterdecision-promoting informatics. They need information that is highly visual,surgically accurate, of extremely high quality, up-to-date and as close to truereal-time, personalized and secure. Most companies want their analytics on thego, i.e. on their smart phones, tables and available for instant access. Informationconsumers want their information literally on their fingertips no matter wherethey are. There is a growing need to untether business information from theconfines of the desk or cubicle. In a globally interconnected world all this isnow possible – only if we scientifically plan to do it right the first time.
Firstly: Rewind your memory to eachcompany where you have implemented DW, BI or MDM and we may recognize that wehave probably never seen a company that works all of their business from asingle source system. I have personally seen large companies with as many as2,500 plus global data source systems and as many as 1,500 reportingapplications across the corporation. Medium sized global enterprises could haveas many as 1,000 data sources globally and as many as 500 reportingapplications Even the smallest ERP company would have 50 or so data sources andas many as 20 to 30 reporting applications. A flat file is a data source as isa vendor input from their system. If any report is generated from an externalDW, or a PC that becomes a reporting application.
Note: Scientific research hasclearly demonstrated that the greatest risk of data quality is at the point ofexchange or transformation.
Secondly: In almost each of thesecompanies it has been noticed that each of their Master data elements likecustomer, vendor or product, may contain over 4,000 fields in the systemof original records. For example the ECC Customer Master table contains morethan 4,500 fields that can be used as does the material master data table.
Focus1: Let’s look at Productas it is the element that is most critical interface between the value chainpartners, the company, its customers and its vendors. Each product may containnumerous attributes like height, length, packing and packaging dimensions,weights and storage requirement. In some industries these could be as few as 30in others as numerous as 350. Each of these attributes is a data element or anindependent field in some system, preferably a single system of records thatgoverns data quality across the enterprise. Each of these attributes is part ofthe master data entity. If your company is a pipe manufacturer the entitiescould be as few 10, if retail or wholesale they could be close to a 100 and ifpharmaceutical this number could be close to 500.
Focus2: even in our current state,i.e. when we are looking at data within the walls of the enterprise and whereour largest data warehouses are in the 50 terabyte range -  we are barely able to keep our global data quality in control. In mostcases each of the disparate data source system could either have their owninterpretation of data or a KPI, or manage it like a local asset with littleregard to global compliance. We continue to see data quality issues when allthe data is so much in our control.
Companies have barely been able to grasp theirdata quality issues within the walls of their enterprise, so opening the floodgates, to 10 Exabyte’s and above like accessing Facebook data pre-maturely orin a process that lacks scientifically de-cluttering may logically result inclouding the muddy waters all the more. We may traverse from our datacorruption environment right into a data anarchy situation.
Focus3: Add to this mix the factthat business today is dominated with acquisitions and new product launches andthe proactive and reactive process of global data governance becomes all themore imperative.
As stated by Claude Viman the global EnterpriseData Steward for J&J. ‘Proactive is always better than reactive” hecontinues, “ however, a strong data governance process has both” – but only ifplanned in a scientific manner.
The impact of bad data is more than familiar toall companies, especially to the report and analytics being churned to businessusers and decision makers, which we view as DW or BI. However, we must neverforget that BI and DW are the technologies that need to be leveraged to enhancethe business decision and operational performance of enterprises, and is neveran end by itself. According to the  Experian QAS research   close to 20 percent of customer contact datawith most companies remains flawed due to errors of data entry and 33% of thedata becomes naturally flawed or outdated within a year. Such inaccuracies injust the customer data can sway close to 18% of the corporate budgets andforecasts.
Now as SAP customers expand their analytics,customer and product lines across and outside their physical boundaries datafoundations and data governance needs to become a much higher priority forcustomers to ensure Information accuracy. As rightly stated by Dan Everett of Forbes he clearly statesthat in EIM solution marketing information governance is he elephant in theroom, he continues to state “To realize business value from bog data, companiesneed to have strong information governance, and few people seem to be talkingabout this”. Which translats into a fact that despite the big elephant standingin our BI rooms we seem to pretend it wither does not exist or we simply do notknow where to start.
 Vimanfrom J&J has an advice to this dilemma “Unfortunately, not too manycompanies realize the importance of data governance in advance, and then theyhave to learn if the hard way”

Sowhat is the difference between Data and Information Governance?

While is is clear that data is the foundationof all information and we have more too often heard ‘Garbage in, Garbage out’,these statements are simply kindergarten statements for modern BI environmentsand systems that can often merge data from hundreds of sources for corporateanalytics.  There is Master DataManagement,Rules and regulations for Data Quality. Localized and global TQM (Total QualityManagement) and the whole IDCM (Information Demand and Consumption Management) allof which together constitute the base for data governance. On top of this pyramidof data foundation stands information that needs to be governed on its ownaccount.

The question that must be asked, ‘Inan environment where all the base data is 100% clean can we still haveerroneous reports?’ and the answer is resounding yes. This is due tothe fact that a lot of information errors occur at the transformation layersand unless there is a high degree of informational governance there will beerrors in information. Just as an example if there are no naming standards eachdeveloper could churn out their own interpretation of a KPI or Metric. If thereis no change control in place a new developer could alter an existing KPI orMetric for a new request and an older user could continue to assume that thenumbers represent the older interpretation of information. Each of theseexamples leads to information error.

Information it has to be realized is not asupply management process but a demand and consumption management one.

The convergence of data quality standards andguidelines with rule based data governance with clear definitions of what typeof data is being accessed from what systems, what the DQS (Daa Quality Score)of each data source is, What information elements need to be stored, what isthe true system of records and whether global data has been physically orlogically cleansed, what systems will store what kind of data, how dataexchange will be accomplished in order to assure no terrorist data elementsenter the core Information repository, along with all the security mechanismsin place. Data governance is the foundation for information governance aswithout strict rule based data governance guidelines our information willalways be erroneous. The key to data governance is managing Master Data andtheir attributes.

Whoshould own data in an enterprise
 One of the frustrating problems in anyorganization is assigning ownership for data quality. According to SOXdefinitions business owns data definitions and data guidelines as they knowbest what each data element represents and how every transformation must beconducted.

Everytime IT owns data, in isolation ofbusiness participants it constantly leads to a maddening game of “Whose information definition is right” atmeetings. The larger the enterprise the more maddening this delta becomes untilwe lose a global definition of any business attributes.

In almost every meeting when we ask respondents“How is your data quality” we alwaysreceive a consistent “Fine, I Guess”. If we follow this question with “How was your last BI initiative” itoften leads to “It was an IT success, buta business failure” in varying flavors and interpretations. All of this isa global slowly escalating time bomb.

The final solutions is to have a scientific mixof TDQM (Total Data Quality Management) initiative that consists of businessusers that understand business needs and definitions, SME’s (Subject MatterExperts) that understand the configurations in the source systems and MasterData Controllers whose sole job it is to manage global Master Data and changecontrol for all Master data elements across the enterprise.

Just as an example companies like Johnson andJohnson have 16 full time employees who are dedicated to enterprise Master Datamaintenance. But the overall accountability of data must lie within the TDQMGroup as defined above.

Part of the TDQM process should encompass acquisitions.Typically the new company has to be integrated and products normally startshipping out of the gate within 2 to 4 months. During this time each product,which may have anywhere from 100 to 400 attributes and to be integrated intothe operational systems. From an executive and management perspective each ofthese products has to be aligned / merged to product / information groups andexisting global analytics.

Overall the TDQM must also deploy six-sigmachecks and reporting to assure the level of data accuracy across the enterpriseis maintained at 99% and above.  

Ifmy Information is bad why are we accelerating it?

This was a question asked at a meeting with aBI deployment where the fundamental reports were not meeting business needs andexpectations and the SI was trying to recommend installing a BW Accelerator tospeed query response. One of the business stakeholders asked the criticalquestion “If the information does notmeet our requirements, why are we wasting all this effort in making bad dataand information more efficient”. This is a question that organizations mustconsistently ask themselves before taking the leap of faith into newer technologieswith assumed benefits that later turn into IT successes and business failuresto varying degrees.

Nowhere comes HANA

Like with all other Information Deliverysystems and applications the basic foundation of data remains critical. Therule of ‘Garbage in, Garbage Out’ still remains consistent.

If HANA is deployed in a scientific and plannedmanner then its advantages can be many.

1.     MultiSource:HANA allows mixing of data from more than your SAP BW. Unlike the BWAaccelerator that could potentially only accelerate BW queries HANA accelerates allthe data and transformations.

2.     DirectSource ELT:A Standalone HANA runs off direct extract from source tables, i.e. ECC CO-PAtables for example. In all such cases the issue of data quality and dataredundancy is eliminated instantly. In traditional BI and DW environments weoften landed with multiple versions of the same Data and each point of dataexchange and transformation represents a potential DQ failure point. Byeliminating multiplicity of data copies HANA removes DQ probability by a factorof declining copies in older models. For example If we take a single G/L Account:[1] Copy 1 is in the transaction; [2] Copy 2 is in the ledger; [3] Copy 3 is inthe extractor; [4] Copy 4 is in the PSA; [5] Copy 5 is in the Raw GL DSO; [6]Copy 6 is in the reporting DSO freight costs for example; [7] copy 7 is in the InfoCube;etc.. Each of these copies is technically subject to transformations andinterpretations – or DQ compromises.  With HANA we can potentially eliminate DQ issuesonce and for all. The only control point is the modeling and transformationtool in HANA – by simply maintaining that we assure the highest data quality.

3.     BW onHANA:Even with BW on HANA the advantage is that we can either accelerate all the reportsand data on our current BW, or using a proprietary ‘HANA Safe Passage’methodology where we can deploy HANA for selective BW objects – that will trulybenefit the need for HANA acceleration and true-real-time analytics for selectInfoProviders only.

4.     ChangingDW Fundamentals: Thebig question is whether HANA and similar technologies have the potential tofundamentally eliminate the traditional DW concepts as for the first time SAPallows transformations and models to be created directly in their HANAdatabase. This is a 'Net New' privilege that most legacy technocrats have notfully wrapped their methodologies around as yet - The impact of this singlefunctionality is tremendous to say the least.

 How big is HANA, am I the bleeding edgecustomer?

No HANA is huge. Is it just barely a year oldand already boasts of over 358 customer, 159 implementations, with 65,000competitive users getting their reports faster, and already having crossed $250million in revenues. 

From an application side there are already over33 ‘Powered by HANA’ applications and many RDS (Rapid Deployment Solutions’that can be deployed in a few weeks. SAP is targeting to have the wholebusiness suite portfolio enabled on HANA by the end of 2012.

According to Bill McDermott SAP expects over1,000 customers to be on on HANA by the end of 2012 directly impacting therevenue growth for SAP and their global HANA Partners. BersteinResearch predicts that by 2015 HANA could be a $4.4 billion market for SAPand their partners. As of now there are over 1,800 HANA trained and certifiedconsultants – a number that continues to grow and will remain so as HANA movesfrom being a Stand Alone appliance to a SAP database platform for all SAPApplications. By 2015 HANA will have permeated all facets of SAP technologylandscapes, database management and business processes.

So as a customer one is fairly safe to commenceconsidering the HANA as a possible future option. The HANA methodology shouldbe business led and undertaken in a scientifically planned manner that isquality enhancing and cost mitigating at the same time.

It is now a question of whether your company will be migrating to HANA itseemingly is becoming a question of when.

…..ÏÜWWWQDfyy©yyfDQWWWÜÒ…..

No comments:

Post a Comment