The Answer is Meta-Data

Position paper, submitted to the Eighth (1999) High Performance Transaction Systems Workshop

Philip A. Bernstein

Microsoft Research

Now that so much of the world’s data and transaction processing services are available on the Internet, there is a greater need for better ways to integrate this data and services for development, management, and end-user access. The heart of such integration is meta-data that describes the things-to-be-integrated. But unfortunately, vendors and researchers aren’t giving meta-data the level of attention it needs to get this integration done.

In the traditional IT world, data warehouse construction tools are currently the main driver of meta-data requirements. These tools need schemas for the source and target of transformations for data scrubbing and integration. They also need detailed semantic descriptions of those transformations, for data lineage analysis and for generating code that performs the transformations. Vendors of these types of tools have proprietary meta-data repositories and the large database vendors are competing vigorously for these third party tools to integrate into their repository environments. Progress is being made, both in tool capability and in meta-data integration. But even with the best tools, it’s time-consuming and therefore expensive to scrub data for a warehouse. In effect, this is the static, batch-oriented version of the semantic data integration problem that has interested database researchers for decades. Researchers could help here, for example, by developing more powerful meta-data driven functionality for data transformations.

Despite its great promise, integrated CASE based on shared meta-data continues to be a slow-growth field. Although some enterprise-oriented tools are becoming more repository centric, most desktop development tools are still strongly file-oriented. Even many high-end configuration management tools use files, not databases. The benefits of using a database-based repository for fine-grained, versioned data sharing between development tools remain compelling: better reuse, better management of versioned configurations, impact analysis, automatic change propagation, inter-tool navigation (test, bug tracking, design models, code), etc. Developers of electronic commerce application would be major beneficiaries, given the many types of objects that need to be assembled in such applications and the different tools that must be used for these various object types. But products are still far from realizing all these benefits.

In the past, poor performance and low functionality were deterrents to using commercial database products for meta-data supporting design applications, such as environments for electronic commerce or data warehousing. These weaknesses are largely fixed by the latest generation of repository products and by the power of today’s desktop systems ¾ fixed well enough that they are no longer the main impediments to progress. It’s time for another run at using database technology to improve the productivity of designers and developers, and to improve the integration of the artifacts they produce.

In a better world, data and transaction servers would be more self-describing, making meta-data a more central aspect of the development of transaction process applications and the databases they produce. This better world would have the following characteristics:

Many of these scenarios require that meta-data expand beyond its traditional design-time usage into run-time scenarios. Some of the large application suites use meta-data repositories in this way today. Easy application integration requires that this usage style be expanded to all of the world’s on-line databases and applications.

Many of the above scenarios could be attained with increased vendor investment using state-of-the-art technology. However, many of them require research. For example, automating the integration of applications will require richer semantic models that can be processed with predictable and acceptable performance. This could be done using algorithms for merging, mapping and transforming heterogeneous and evolving transaction and database services. Like transaction processing, query processing, or access methods, meta-data management is a field, not just a problem. Database researchers should focus on it, in a systematic way, with the goal of converging on a standard meta-data architecture, powerful general-purpose meta-data management tools, and an orderly approach to applying these to the integration of internet services and databases.