If You Want To Deal With Data Silos, Embrace Them

Written by: Kendall Clark, Founder & CEO of Stardog, the leading Enterprise Knowledge Graph company.

In 2017, Kobe Steel announced it had been shipping improperly certified raw products to 500 customers for more than a decade. Industry analysts blamed rigid, vertical data silos for allowing product divisions to operate outside higher management oversight.

Jump to 2019: The EU's GDPR regulations and California's CCPA required companies to unify data residing in multiple places. The cost of noncompliance is high; in the case of GDPR, a fine of 4% of global revenue or 20 million euros, whichever is greater. In this new world of regulation, what you don’t know as an enterprise can hurt you — literally.

There are other less immediate, but no less serious consequences of data silos. Consider the challenges inherent in digital transformation. Incumbents must either learn to monetize data effectively or compete with well-funded startups that have already solved data monetization.

In fact, many large companies are hugely inefficient at making use of data. Data scientists spend as much as 80% of their time wrangling data — a necessary, but insufficient condition to understanding it. This time wasted on struggling with data silos means less time to discover new products and buying patterns, achieve cost efficiencies or streamline processes.

The solution: Embrace your enterprise data silos

You're probably expecting me to tell you that, to win, you have to get rid of your enterprise data silos as soon as possible. But I'm going to recommend the opposite; rather than eliminating them, I suggest embracing them.

First, it's just realism. In my experience in my 20-year IT career, data silos exist for very good reasons. They allow for local control and governance when it is important to a particular part of your business, and they allow you to comply with certain regulations. Some data must be stored apart from other data to comply with legal regulation or simply for legacy business reasons. Or data is just too essential to business operations to bear the risk of consolidating, eliminating or modernizing it.

Second, the nature of enterprise data is unlikely to change anytime soon. Enterprise data is:

• Diverse, and it's only becoming more diverse as unstructured data growth rates skyrocket.

• Distributed across multiple systems in different places, particularly as hybrid and multi-cloud computing become necessary.

• Controlled by business leaders who may have competing interests.

• Enabled by vendors who want to lock you in to their solution.

Rather than fighting your data silos, learn to embrace them and use other tools that can help you connect the data inside them to generate insight.

Where data management solutions have failed

Making sense of data silos is not uncharted territory. However, many of the current solutions that attempt to disassemble your silos don't work. Let's evaluate a few and see where they've failed.

Enterprise data warehouse (EDW)

One way to connect data in different enterprise silos is to copy the data out of the silos and into a single system where it can be analyzed. This is the underlying approach to EDW, whether they are on-premise or in the cloud. Chief among EDW's disadvantages is that it's difficult to build a single relational schema that describes the complexity of the enterprise. Hence, few EDWs include semistructured or unstructured data or data that changes often.

Master data management (MDM)

Master data is the most highly governed data within an organization, touted for serving as a "single source of truth." Master data is matched and merged in order to resolve any discrepancies between various data sources. In theory, any data can be mastered; in practice, nearly all MDM solutions focus on either customer master data or product master data. Solutions that require externally-sourced data or datasets that are highly heterogeneous, machine-generated or unstructured are difficult to master.

Data lakes

Data lakes serve as vast collections of raw enterprise data. Using them involves migrating or copying raw data — structured, unstructured and semi-structured — as well as transformed data for specific purposes, such as analytics, visualization and reporting.

Data lakes once held the promise of making it easier to ingest, combine and analyze diverse data in service of machine learning and artificial intelligence (AI) efforts. In reality, as data lakes host more and more data, it becomes difficult for users to know what's in them and how the data is connected. In essence, data lakes are raw data in a large distributed file system like the one on your PC. So, the whole enterprise spends as much time looking for stuff in the data lake as we do individually our laptops.

Turning enterprise data into knowledge

You may not realize it, but you're probably already familiar with knowledge graphs since they're used extensively by Google, Facebook, Apple, LinkedIn, Uber and many others. Knowledge graphs connect data based on what it means, without duplicating or copying the data. This effectively allows companies to act on their data as if silos don't exist! It also carries the benefit of avoiding complex ETL (extract, transform, load) jobs and saving money on cloud hosting costs of duplicated data.

Having worked with knowledge graph technology for the past 15 years, I believe the most important practice for successfully launching a knowledge graph is to start small. Begin by identifying a pressing business question that can only be answered by pulling data from many places inside your enterprise. Seek out problems that, to date, have required an army of Excel ninjas working on ad hoc reports. Identify and unify the data needed to answer this question, then incorporate additional data to answer follow-on questions. With your answer — and a business process for arriving at this answer — in hand, you'll be able to build momentum within your organization to connect more data across more silos.

The path to digital transformation lies in embracing data silos rather than attempting to destroy them. Data is the most important raw asset in your organization — you just need to turn it into knowledge.

Matt Paulstardog