Jun 21, 2015

Data Scientist Demistified - part 3



What makes a Data scientist is part 1 of this discussion. Also discussed are critical attributes of a data scientist in part 2.  This is the third blog on this topic as we learn more and mature in our understanding of what works and what to avoid.

Most corporations continue to be in a mode of  articles on the hunt for good data scientists- the modern unicorns’ of Big-Data. Some people have not heard of them, others claim to have read about them in articles, other seems to have seen them at google, LinkedIn and Facebook; while some have worked with them. Either way the internet is filled with ‘Seeking Data Scientist’ discussions. The reason for this is that data science is the most sought for skill in the new digital economy driven by data where companies like Google, and Amazon are the new digital alchemists – turning data into gold. Alchemy today is no longer a Chemical concept but a digital reality.

One of the biggest changes we have recently seen in LinkedIn profiles is the addition of the word ‘Data Scientist’. So suddenly we are finding a bunch of members who overnight morphed into data scientists by adding data scientist to their profile. It’s simply a demand-supply reality. As current demand is exceptionally high, with salaries anywhere from 30% to 100% higher than the normal- it automatically becomes beneficial to attach the tag of ‘data scientist’ to your profile in order to get a bunch of email from the other side of the world asking if you are looking for a job- these from more recruiters that too may not understand the basics of data science. So the first task is to filter out the Nuevo-data scientists from the true ones. Data scientists are thus not only becoming more and more rare but also mysterious creatures on the periphery of our needs. Also because everyone wants to compete in the new data science economy and mine their own gold so this is a rarified but still a crowded arena where employers have to tread very carefully.

Very much like in the world of BI in the past Big data has exactly the same issues. In BI we had the technical folks who still believe that technology can answer all the questions. Gartner places their methodology success at 30%. We also had our business value architects who sought business benefits above all- recommended by Gartner since 2009. Similarly,  Data Scientists also come in two varieties.
The first and most common variety are the  technocrats -  wizards of statistics, math’s and big data experience as described in my earlier blog on the critical attributes of a data scientist, but with little actual experience of  business or their needs. This group understands math’s, statistics and data but still need experience in applied sciences. They currently lack the business benefit and end user UI parts of data science.
The second is the Business Solution architect – people who understand business and business needs along with BI data science and Big data technology. They become the critical glue between (a) the business stakeholders and their needs plus benefits and (b) the technical data scientists who are akin to the BI developers of the past. People who can build all the algorithms and coding but without adequate experience turning data into actionable gold.

Big data projects are like cavernous black holes with so much data that traditional modelers could easily hyper ventilate just grasping the magnitude of the Volume, Variety and Velocity of these domains. While traditional SAP BW developers are used to a few terabytes of data, where the largest SAP BW I have personally worked with was a 107 TB single instance- the world of big data dwells in the zettabytes and petabytes. The NSA datacenter is starting to use words like Yottabytes today. This is like comparing a tablespoon of water with a swimming pool. In this environment the data scientist is envisioned as a wizard of the data mists. Where everybody sees a humongous, senseless, misty cloud of data the data scientist can use their algorithmic wand combines it with statistical chants and presto the mist clears and you get to glimpse of true business benefits. Without a data scientist we could be lost in the misty black hole of big-data for ever. However, without a Business Value Architect your projects could produce a lot of islands of information that your business does not need, 70% by Gartner’s reports.

Over the last three years of having worked with a lot of data scientists we have come up with a couple of definitions of the data scientists. The first is the Silicone valley Data Scientist: this is where some of the world’s greatest data scientists are reputed to exist in companies like Palantir, Google, Facebook, Target, etc. The second is the technocratic data scientist: these are analysts with degrees in math’s and statistics and all the algorithmic knowledge under their belt. The third is the Business Benefit data scientist: these, in our minds, are the true unicorns in our findings. They are people who have a solid BI, PKI’s and metrics knowledge, but more importantly have dealt with very large volumes of data and understand the business benefit side of data optimization more than the technology variants. They are the data scientists who sit at the cusp of the group two and business stakeholders. Our research indicates that data scientists and technology leaders alone cannot deliver high business value, and neither can the Business Benefit experts. This is time for co-innovation and teaming to consistently deliver exceptional value in the form of Digital gold to your information consumers.

When we do a search on what the world is looking we see the following rankings in a word cloud


This word cloud is made from analyzing the recent job postings and what companies are looking for in data scientists. What I see clearly missing, though we see a glimpse of that on the top, is Business Benefits. Just like I have been writing since 2009 that there are two ways to build BI solutions- [1] the technocratic way with a firm belief that technology alone can answer all business questions; and [2] business benefit focused architecture and design leveraging internal company resources who know the needs, the data and the business relevance to research, discover, model, filter, research and analyze.

There are far more global resources who know Hadoop, ABAP, SQL, Python, than there are people who understand your enterprise and company’s business. There are a few people who have worked on the Business Benefit side of BI. There are fewer true data scientists and they still remain a rarity.  The key is in teaming your people who understand your business goals, handle big data volumes, and can assist build your enterprise competitive differentiators. Your true gold from Data.

So what we are left with is co-innovation. Find partners that can build the critical glue towards strategic success by creating a team of your internal business stakeholders, combined with technical Business Value experts and the IT technical resources that have precise skills to fit the roles identified to meet business expectations. These team members with diverse skills will inspire and enrich the overall capabilities of the team – towards discovery, design, realization and bringing new capabilities and insights that bring true business benefits and insights into the enterprise decision machine.

 What we say to SAP HANA customers is the following

  1. The strategic goal of HANA is big-data analytics plan accordingly
  2. HANA is a business solution and not just another SAP technical install
  3. Without business in business intelligence, BI is dead (Gartner 2010)
  4. Do not start your HANA journey without a professional ‘Road to HANA’ workshop
     

What we say to Big Data customers is the following (the strategic goal of all HANA customers)

  1. Big Data  is a business solution and not just another technical install
  2. Plan your work and only then work your plan
  3. Do not start your Big-Data journey without first identifying business goals and benefits. Think Security very carefully
  4. Design your Tactical, Mid-Term and Strategic Goals, then align every step to the long term goal