Back at the turn of the Millennium in the late 1990’s, organizations struggled to solve the problem of who gets access to more information, more applications and more data. As enterprises began amassing more applications and data and began making that available to employees, partners and customers, security considerations escalated and the first single sign-on applications emerged together with many other authentication and security technologies. And rightfully so. That era saw rapid security innovation and the first appointments of Chief Security Officers (CSOs) who enabled these applications to proliferate the enterprise with beneficial results.
What were some of these turn-of-the-century innovations? “Identity management,” for example, emerged as a new category, including Symantec founder Gordon Eubanks-led Oblix (acquired by Oracle in 2005), and so began the delivery of new suites of software designed to authenticate identity, especially in Web services deployments. By 2000, industry watchers such as Elise Ackerman of the San Jose Mercury News described these early identity checkers that controlled “who gets access to what” as dot-com bust survivors.
Now, we face a new challenge when it comes to proliferation of data across the enterprise. Accessibility of data to those who need it across the enterprise is absolutely necessary to realize the promise of big data analytics. However, it comes with a challenge that’s leading to the appointment of yet another C-level position, the Chief Data Officer (CDO). Based on our dealings with Fortune 1000 companies during the last year, and the buzz around data lakes at Strata + Hadoop World in San Jose this week, there’s a more challenging conundrum that lays ahead in the new Data 2.0 age, and that is of stricter data governance along the journey of enabling wider data accessibility across large global enterprises.
The complexities and technical challenges we face in a Data 2.0 age include the emerging reality of data lakes – wherein all types of data are being put into a massive data hub. This so-called “hub” is essentially a cost-effective central collecting point for data coming from a variety of sources. This then makes it necessary to figure out a number of governance issues. These include governing what’s relevant for analysis; what’s not relevant but should be stored for compliance reasons anyway; how fresh is the data; where it came from; when was it last updated; who has access to it; who gets to “see” what real information lies in it; who does not get to see it; what should be masked but accessible in its aggregate form, and how will that governance process be meticulously managed and audited to ensure data is used appropriately and not misused in ways that could lead to the wrong conclusions. The number of governance considerations is immense and we are only at the start of a growing list.
What we know for sure is three things: more data is good, speed of accessibility is a necessity, and democratization of data in data lakes is essential to making better business decisions. To realize this, however, will require elevating the importance of a strict data governance model. In its entirety, this task is a large undertaking and responsibility for CDOs. It’s access and security again, but with a far more complex technical challenge since data is bigger, more fluid, and ever changing; whereas the last era of security and the CSOs’ challenge was a more bounded set of problems.
The overall Data 2.0 mission today includes three considerations:
1) The selection of the data lake platform itself – which could be Apache Hadoop, Hortonworks or Cloudera – which becomes the central repository for data streamed in from various data sources in different formats that can be combined together for deeper data intelligence. While business users can examine, dive in, or preview information in data lakes, the new CDO challenge will be securing and governing data usage so only the right people get approved access to particular sets of data in a lake.
2) The desire for business users from different departments to be more self-reliant with an ability to explore data themselves because they’re the domain experts. That requires a fast and intuitive way to see insights with new tools that eliminate the need to be IT-savvy so business leaders can get better answers and explore data analysis and collaborate with peers in real-time.
3) The blending and harmonization of diverse data sets from diverse sources and formats – be it structured, unstructured or semi-structured data – to reach a holistic insight that answers bigger, deeper business questions. This process includes otherwise hard-to-capture and hard-to-wrangle data coming from an exploding variety of devices and smart sensors connected to an emerging Internet of Everyday Things.
The companies and industries moving the fastest to realize the Data 2.0 advantages are those that aggressively compete against their peers day-to-day. They’re accessing more data, faster, as a means to staying ahead of their competitors. These companies include retailers, consumer-packaged goods (CPG) companies, pharmaceuticals, media and entertainment companies, insurance, automotive, and any industry where the consumer now has all the power, and every day is about attracting and keeping them.
Read the original guest blog post on Wired Insights: http://insights.wired.com/profiles/blogs/data-governance-rises-in-importance-for-data-2-0-age#axzz3SbFPOYc4