The Lowell Database Research Self-Assessment Meeting

Lowell Massachusetts, 4-6 May 2003

Summary

Senior database researchers gather every few years to assess the state of database research and to recommend problems and problem areas that deserve additional focus. This report summarizes the discussion and conclusions of the sixth ad-hoc meeting held May 4-6, 2003 in Lowell, Mass. It observes that information management continues to be a critical component of most complex software systems. It recommends that the database research field focus on a number of important topics, which include: integration of text, data, code, and streams; information fusion of heterogeneous data sources; reasoning about uncertain data; unsupervised data mining for interesting correlations; information privacy; and self-adaptation and repair.

Introduction

Senior database researchers have gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. This report follows a number of earlier reports with similar goals, including: Laguna Beach, Calif. in 1989 [1], Palo Alto, Calif. (“Lagunita”) in 1990 [2] and 1995 [3], Cambridge, Mass. in 1996 [4], and Asilomar, Calif. in 1998 [5]. Continuing this tradition, 25 senior database researchers representing a broad cross section of the field in terms of research interests, affiliations, and geography, gathered in Lowell, Mass. in early May, 2003 for two days of intensive discussion on where the database field is and where it should be going. Several important observations came out of this meeting.

Our community focuses on information storage, organization, management, and access and it is driven by new applications, technology trends, new synergies with related fields, and innovation within the field itself. The nature and sources of information are changing. Everyone is aware that the Internet, the Web, Science, and eCommerce are enormous sources of information and information-processing demands. Another big source is coming soon: cheap microsensor technology that will enable most things of material value to report their location and/or state in real time. This information will support applications whose main purpose is to monitor the state and/or location of material objects. The world of sensor-information processing will raise many of the most interesting database issues in a new environment, with a new set of constraints and opportunities.

In the area of applications, the Internet is currently the main driving force, particularly by enabling “cross enterprise” applications. Historically, applications were intra-enterprise and could be specified and optimized entirely within one administrative domain. However, most enterprises are interested in interacting with their suppliers and customers to share information and provide better customer support. Such applications are fundamentally cross-enterprise and require stronger facilities for security and information integration. They generate new issues for the Database Management System (DBMS) community to deal with.

A second application area of growing importance is the sciences ľ notably the physical sciences, biological sciences, and health sciences and engineering ľ which are generating large and complex data sets that need more advanced database support than current products provide. They too need information integration mechanisms. In addition, they need help with managing the pipeline of data products produced by data analysis, storing and querying “ordered” data (e.g., time series, image analysis, computational meshes, and geographic information), and integrating with the world-wide data grid.

In addition to these new information-management challenges, we face major changes in the traditional DBMS topics such as data models, access methods, query processing algorithms, concurrency control, recovery, query languages, and user interfaces to DBMSs. These topics have been well studied in the past. However, technology keeps changing the rules. For example, disks and RAM are getting much larger and much cheaper per bit of storage. Access times and bandwidths are improving too, but they are not improving as rapidly as capacity and cost. These changing ratios require us to reassess storage management and query-processing algorithms. In addition, processor caches have exploded in size and have added levels, requiring DBMS algorithms to be cache-aware. These are but two examples of technological change inducing a reassessment of previous algorithms in light of the new state of affairs.

Another driver of database research is the maturation of related technologies. For example, over the past decade, data-mining technology has become an important component of database systems. Web search engines have made information retrieval a commodity that needs to be integrated with classical database search techniques. Many areas of artificial intelligence are producing components that could be combined with database techniques; these components allow us to handle speech, natural language, reasoning with uncertainty, and machine learning, for example.

Participants noted that it is a popular undertaking these days to propose “grand challenges” for various fields of computer science. Each grand challenge is a problem that cannot be solved easily, and is intended as a “call to action” for a given field, such as The Information Utility [5] and Building Systems With Billions of Parts [6]. We all agreed that we could define more grand challenges. In fact, we discussed a few, notably the personal information manager ľ a database that could store, organize and provide access to all of a person’s digitally-encoded information for a lifetime. But in the end, we decided that focusing on a single grand challenge was inappropriate, since information management technology is a critical component in most, if not all, of the proposed computer-science grand challenges. Moreover, many of those information-management challenges are well beyond the state of the art. The existing grand challenges are a full-employment act for the database community ľ we decided not to add any more.

During the two days, we noted many new applications, technology trends, and synergies with related fields that affect information management. In aggregate, these issues require a new information-management infrastructure that is different from the one used today. Hence, we spend Section 2 indicating the components of this infrastructure. We then spend Section 3 on a short discussion of the topics that generated controversy during the meeting, and a statement of next steps that can be taken to move the new information management infrastructure closer to reality.