Article

Data Fabric: A Wet Blanket, or A Resilient Bedspread Over Your Data?

Introduction:

What if there was one place where you could go to get all the data you need? No matter what kind of data you are looking for you will find it in this place. It could be your company’s master data, transactional data, analytics data, IoT data or even a document or a video. Sure, you will need to authenticate yourself and have the rights to access the data, but if that’s the case, then you just need to look for it in one place. And best of all, you could rely on the data being presented in this place to be accurate. Like a bedspread wrapped over all your data assets giving shelter and at the same time providing a common interface to the surrounding world. Wouldn’t that be wonderful? Well, that is the main idea behind the concept of “Data Fabric”.

What is Data Fabric?

Data Fabric as a concept has been developing during the past years as an answer to the increasing challenge of getting full benefits out of enterprise-wide data where:

  • Companies are consuming IT services provided by multiple cloud platforms like AWS, Azure, GCP and of course, are using different SaaS more frequently.
  • The amount of data generated by IoT, apps and social media is exploding.
  • More evolving business models depend on the ability to have easy access to data and share more data with end consumers.
  • End consumers evaluate products and services by the attached digital services as a key differentiator in their final selection.
  • Data regulations like GDPR, DPP and AI-act are focusing on what specific data companies should share with regulating authorities and end consumers.
  • Many companies still struggle with data in silos and locked-in legacy systems.

The ideas behind Data Fabric have been in development for many years. It started with collecting data in data warehouses, data lakes and BI platforms and was further developed by adding integration, security, data lineage and master data aspects. In the beginning, this was branded as Enterprise Information hubs or sometimes even as API platforms, but that was unfortunately driven by software vendors who had a hard time delivering upon their promises. So, the more general architectural pattern of Data Fabric was born, though the definition is somewhat blurry and depends on who you ask.

The figure below shows a conceptual architecture of the data fabric concept in relationship to other established concepts.

Fig 1: Conceptual architecture of the data fabric concept in relationship to other established concepts

For example, the exhibit below gives an interpretation of how different vendors in this area define Data Fabric:

Table 1: Interpretation of Data Fabric by different vendors

As evidenced by the diverse interpretations provided in the table above, gaining a comprehensive understanding of the concept of Data Fabric is not straightforward, as it involves multiple perspectives from various sources and vendors. However, analyzing existing discussions surrounding the definition of a Data Fabric reveals a consensus that it serves as a mechanism for generating data pipelines and integrations from diverse sources within a unified platform. However, there are divergent views on whether a Data Fabric is considered an architecture or a broader concept encompassing various technologies and architectures.

Furthermore, when it comes to looking at what creates a Data Fabric, some of the suggestions are pushing the idea that the Data Fabric is based around metadata and metadata analysis. In a metadata-based driven Data Fabric, metadata is used through activation and is then pushed towards the users when creating pipelines but is also suggesting new metadata when data is created from external sources. The metadata would also be enriched with semantics, putting meaning and context to the metadata through knowledge graphs. On these knowledge graphs, it would be possible to apply artificial intelligence and machine learning, and then we have achieved the concept of active metadata. The active metadata feature is considered one of the key features in achieving a Data Fabric architecture, which is analyzed using semantics, knowledge graphs, artificial intelligence, and machine learning.

There are a few contrasting views of how Data Fabric is defined by market participants. While some call it a design concept which could be interpreted as architecture. Then there are others who view the data fabric as a ‘Solution’, thereby interpreting it as an instantiated architecture. Interestingly, most market participants agree that Data Management should be an integral part of the Data Fabric definition, however one goes as far as viewing data fabric as a data management approach which is a broad view which leaves a lot of room for own interpretations.

What challenges are the Data Fabric envisioned to solve?

One of the challenges in organizations is the diversity of sources and systems dealing with data. Data is generated at an increasing pace with the development of new technologies, regulations, and business needs. An increase in data volumes and number of data sources will make the landscape more complex and finding the right data, and understanding the data in context will therefore become more difficult. Additionally, with different systems and user groups, it is not unusual that data becomes siloed. Each system or user group will have access to and understand the data within their respective silos. However, their knowledge about data outside of their organizational unit will be limited.

In addition, this may also lead to difficulties in harmonizing data and establishing consistent data categorization across the organization. Instances may arise where identical data objects exist in multiple locations but with varying identification formats, or worse, different taxonomies. A prime example of this is product or customer information, which may be scattered across numerous systems and often lacks consistency or sometimes, may even present contradictory information.

Another common problem is that the depth of the data architecture needed is underestimated. Large companies tend to strive to simplify the complexity of their own operations to be able to achieve a higher degree of freedom in their business processes. Consequently, processes are not adequately documented, and data is not captured and stored with the requisite granularity and quality. When these companies are faced with higher requirements on data quality by external parties like end-customers, regulating authorities or business partners, it could be a painful wake-up call. Attempting to implement a better order in the base data of a fully operating business is comparable to performing engine repairs on an airplane while it is airborne. Frequently, the solution involves implementing a new IT platform, such as a more robust ERP or master data platform. This enables the enhancement of data quality to coincide with the implementation of the new platform.

Yet another problem is simplifying data architecture work by putting it in the hands of large commercially available off-the-shelf platforms. To avoid tedious data management work, organizations rely on the data architecture presented to them by large solution platforms. The argumentation is that these platforms have many kinds of customers and hence they probably have already thought all data architecture aspects through anyhow. Later it’s not uncommon that these organizations discover the hard way that the complexity of their own business does not fit into this architecture. Then, costly adjustments are made to the solution platforms, data lakes are installed to bridge the gaps and analytical tools are installed to try to understand the data. If these mistakes are repeated frequently across the organization, the final situation will be characterized by disorder and disintegration. Data Fabric has been presented as the cure for these problems, but the question managers should ask themselves is:

Will a Data Fabric really solve the problems, or will it rather attempt to minimize the damage done?

Well, a Data Fabric focused on connecting all data in a business in an easy achievable platform in one place will mostly focus on damage control. To increase the data quality and reliability it is not enough to just connect the data with the surroundings. Here is where a structured approach to cleansing, normalizing, and analyzing the data on the fly can make a real difference. However, even if we could attain a higher level of data quality by incorporating capabilities for “on-the-fly-data-quality-improvement” such as AI, active metadata, and machine learning, it’s essential to maintain a skeptical approach regarding the reliability of the output. Data-driven decision-making assumes that the data is as accurate as possible. However, would you trust entirely machine-generated data to make critical decisions?

Given this rationale, it’s probable that Data Fabric architectures will initially see implementations in domains characterized by high-volume public data flows. In these contexts, data can be interpreted and acted upon with a lower risk of serious consequences in the event of faulty actions. However, as algorithms improve, Data Fabric architectures are likely to ascend in the value chain, eventually becoming a substantial data source in executive decision-making.

Another common problem is to establish and operate a working data governance organization. Many consider working with data governance to be a tedious and time-consuming task that often does not receive adequate attention. Furthermore, even when prioritized, it is frequently assigned to individuals with limited understanding of the potential business consequences of poor or inaccurate data. By implementing a sophisticated Data Fabric Architecture some of the problems generated by poor data governance could potentially also be addressed by establishing a self-healing active metadata architecture. While it may seem utopian at this moment, accomplishing this feat would yield significant potential.

What Data Fabric related technologies are there on the market?

When it comes to technologies related to Data Fabric, these can be classified into different categories:

  • To start with, there are metadata technologies, such as data catalogs. Providers of data catalogs, such as Atlan and Informatica, tend to view metadata as a central part of the Data Fabric. These companies also seem to have come the farthest in AI capabilities related to Data Fabrics.
  • Other companies like TIBCO, IBM and Microsoft tend to view data pipeline platforms or data integration platforms as central to Data Fabrics.
  • Companies like Denodo and Cloudera which have a legacy in data virtualization over several cloud platforms provide offerings that tackle the issue by handling data across multi-cloud environments.

Not surprisingly the different vendors tend to accommodate the Data Fabric concepts as much as possible into their own domains. In the table below we highlight some of the vendors in this domain and their current offerings in the Data Fabric area.

Table 2: Offerings of different vendors in Data Fabric area

How, and to what extent, do these technologies address the challenges above?

Depending on how we view Data Fabrics, and what we need to get out of them, we could look at the following three components:

  • A data discovery part, which in essence means a metadata model which can push information to the user.
  • A data pipelining/integration part, where data is pulled from the source system and prepared and served to the user that needs the data.
  • An active metadata part, where the organization might cater and compensate for some of the worst data quality issues in the underlying data by actively and virtually create the metadata from inbound data instead of matching inbound data against static metadata in the background.

However, merely focusing on data discovery and automation for data pipelines would only address part of the issue as outlined earlier. The common perspective is that a Data Fabric platform should encompass data discovery, data catalog, and hybrid integration tools to effectively tackle these challenges.

Some experts offer AI capabilities to achieve the active metadata setup and argue that without this, it cannot really be considered a Data Fabric platform. The main argument is that if we are aware of which data assets exist and what level of quality those data assets hold, then we can address data quality issues in real-time using AI and machine learning to provide a shield of automated data improvements that will result in improved quality compared to the underlying sources. Nevertheless, to make decisions based on this aggregated data, you must have a high level of trust in your algorithms.

Working with metadata has traditionally been a time-consuming and relatively static process, involving the analysis of required attributes and characteristics of data objects. However, in the future, advancements in technology could enable machines to interpret metadata dynamically, analyzing input data and utilizing AI algorithms to meta-tag data based on similarities with previous instances. This would undoubtedly represent a paradigm shift in data interpretation, capture, and analytics, unlocking the potential for advanced data handling in a fraction of the time compared to current methods.

Some providers are also taking this one step further and are integrating AI tools to create a spoken natural language interface to the data discovery module making the data even more accessible.

 

What do we see as challenges with Data Fabric?

Implementing a seamless data layer on top of a large and scattered data landscape brings its fair share of challenges. It is likely that organizations with complex IT/data landscapes will be those with the most to gain from investing in a Data Fabric architecture. On the other hand, the implementation of such a fabric project could be an arduous and time-consuming process, increasing the possibility of failure.

While legacy issues such as data accessibility and presenting the data in a consumer-friendly format remain, modern hybrid platforms have the tools built-in to address them.

If a company has successfully implemented technologies for data access and publication, the primary challenge moving forward will be ensuring the trustworthiness of the outcomes. It is tempting to think that an enterprise’s common Data Fabric is seen as a cure for the bad underlying structure. What you achieve by implementing a Data Fabric on top of this mess is that you get a centralized mess where you easily can access the underlying bad data. This makes the poor data quality more visible rather than being restricted to a backbone legacy environment, translating into business decisions likely being taken based on incorrect data!

So, the real concern will be around data quality. The most effective approach to addressing this concern is by improving data quality at the source through processes such as data cleansing and enrichment. Further, an understanding of metadata and data structures will be required. AI can assist in automating manual and time-consuming tasks, not only in tidying up data sources but also during real-time data consumption.

What are our conclusions and recommendations?

 To summarise, we feel the following observations are note-worthy:

  • Consider Data Fabric to be an architecture rather than a tool.
  • Currently, we do not observe any vendor in the market offering a comprehensive tool or platform for building a Data Fabric architecture. However, several vendors provide platforms and tools that could serve as essential components of your overall Data Fabric architecture.
  • Investing in a Data Fabric architecture must be seen as a research proposition where parts of the investment can be attributed to building knowledge and gaining experience. Prepare for the need to replace components in your architecture later, as the technologies mature.

Data Fabric architecture will find its first successful and cost-saving implementations in organizations with a high volume of data exchange with the external world and where the quality level of the data exchange is not critical. For example, the distribution of reviews on hotels, products and restaurants to many different sites and platforms.

Get Access to More Insights from Opticos

Subscribe

Authors

Hans Bergström, Mattias Gustrin & Eric Wallhoff

Hans Bergström is an experienced Strategic Advisor and Partner at Opticos. He is also heading the Opticos Service Offering on Enterprise Architecture & Technology.
Mattias Gustrin, Senior advisor within Data Management, Advanced Analytics, and technology strategies across multiple industries.
Eric Wallhoff is a Senior Consultant with experience in Data Management and Data Analytics. He is also part of the Strategic IT capability.

 

You might find this content relevant as well: