Transaction Processing Concepts and Techniques

Western Institute of Computer Science @ Stanford

Aug 2-6 1999.

Apology:  this page was built with Netscape Composer 4.61 (Netscape cannot readOffice2000 output at all) but it seems that Netsacpe Navigator only sometimes reads the PowerPoint files.
If you have problems, please try viewing the pages with IE4 or IE5.

 
 
M: Aug 2
T: Aug 3
W. Aug 4
T: Aug 5
F: Aug 6
9:00 AM
Reuter:
Reuter
Gray:
Reuter
Reuter
11:00 AM
Gray
Gray
Gray 
Hope:
Reuter 
1:30 PM
Gray:
Gray 
Malaika:
Harkey
CORBA/EJB
Mohan
3:30 PM
Reuter
Gawlick
Reuter 
Advanced TM
Bernstein
6:00 PM
Reception
 Reuter
Gray;
party
play

Transaction Processing Concepts and Techniques August 2-6

This course covers both the theoretical and pragmatic issues addressed by transaction processing systems. The premise of the course is that RPC is the key to structuring distributed computations, and that transactional RPC is the best way to handle the inevitable exceptions that arise. This generalizes the transaction concept from its traditional database domain to the broader context of client-server computing.  More generally the course discusses how these ideas apply to the modern world of HTTP servers (a kind of RPC), Object Request Brokers, and workflow systems.

The course begins by defining basic terminology and concepts, then turns to empirical measures of system failure and the various approaches to dealing with such failures. This leads to a discussion of transaction programming styles and generalizations of the transaction concept to handle workflow applications. The role of a transaction processing system in application design, implementation, and operation is covered in the abstract. Then, specific systems are related to this framework. With this high-level view in place, subsequent lectures cover the theory and practice of implementing locking, logging, and the more generic topic of implementing transactional resource managers. As an extended example, the implementation of transactional files, records and access paths is covered in detail. The course includes "guest" lectures by specialists in workflow, Corba/EJB, Internet servers, COM+/MTS, Replication, and Performance metrics.

Topics include:

Instructors:

JIM GRAY is a Senior Researcher at Microsoft, working on scalable computing. He worked on many database and transaction processing systems at IBM, Tandem, Digital, and Microsoft. With Andreas Reuter, he co-authored the book Transaction Processing Concepts and Techniques. He recently received the ACM A.M. Turing Award for his contributions to transaction processing. 
ANDREAS REUTER is the Scientific Director of the European Media Laboratory (EML) in Heidelberg and Dean of the School of Information Technology at the International University in Germany at Bruchsal. He has been an independent consultant, a Professor at Kaiserslautern and at Stuttgart where he founded the Institute of Parallel and Distributed High Performance Systems.He was Computer Science Dean and later Vice-President of Stuttgart University. 
PHIL BERNSTEIN is a senior researcher in the Microsoft Research Database Group and an architect of the Microsoft Repository.His research is in the areas of databases, particularly on repository systems (object databases, information models, version and configuration management) and transaction processing (concurrency control and recovery). He coauthored the books Principles of Transaction Processing and Concurrency Control and Recovery in Database Systems, and teaches at U. Washington. 
DIETER GAWLICK is an architect in Oracle's Database Server development team. He focuses on extending the database technology to support messaging and EAI (Enterprise Application Integration).Before joining Oracle, Dieter developed a workflow system at Digital, and worked on high performance I/O technology at Amdahl.During his time at IBM, Dieter developed OLTP and database technology with the focus on high performance and high availability. Dieter is the inventor and architect of IBM's IMS Fast Path 
DAN HARKEY along with Robert Orfali and Jeri Edwards, is co-author of the best-selling books,Client/Server Survival Guide and Client/Server Programming with Java and CORBA.Dan also heads the CORBA/Java distributed objects master's program and lab at San Jose State University and is a distributed objects consultant for IBM. 
GREG HOPE is an architect at Microsoft working on the COM+, MTS, and DTC technologies (http://www.microsoft.com/com).  Prior to joining Microsoft's "viper" project, Greg built a variety of distributed OLTP systems, including co-founding Prologic in 1984, and implementing PROBE and Ovation, a PC based (MS-DOS / Windows NT) distributed TP monitor and retail banking system in production at over 300 banks in 20 countries (http://www.prologiccorp.com). 
CHARLES LEVINE is a Program Manager in the SQL Server Performance Group at Microsoft, focused on benchmark and ISV performance issues.  Charles has been active in the Transaction Processing Performance Council (TPC) since 1989, contributing to the definitions of TPC-A, B, and C.  For the last four years, Charles has served as Chairman of the TPC. 
C. MOHAN joined IBM Research in 1981. He was named an IBM Fellow in 1997 for contributions to transaction management. He is an IBM Master Inventor with 32 patents. His research results are implemented in numerous IBM and non-IBM products. He is the primary inventor of the ARIES family of recovery and locking methods, and the industry-standard Presumed Abort commit protocol. 
SUSAN MALAIKA is a senior software engineer at IBM's Santa Teresa DB2 development group. She specializes in distributed DB2, stored procedures, XML and the Web. Before joining DB2 in 1997, Susan was an Internet specialist in the UK and initiated projects that provide Web access to IBM systems. Susan also worked in the CICS transaction processing development group in the area of recovery, long running transactions, interfaces to database management systems, and distributed applications. 


Transaction Processing:  Concepts and Techniques

August 5-9, 1996

This course covers both the theoretical and pragmatic issues addressed by transaction processing systems. The premise of the course is that RPC is the key to structuring distributed computations, and that transactional RPC is the best way to handle the inevitable exceptions that arise. This generalizes the transaction concept from its traditional database-EDP domain to the broader context of client-server computing.  The course begins by defining basic terminology and concepts, then turns to empirical measures of system failure and the various approaches to dealing with such failures. This leads to a discussion of transaction programming styles and generalizations of the transaction concept to handle workflow applications. The role of a transaction processing system in application design, implementation, and operation is covered in the abstract. Then, specific systems are related to this framework. With this high-level view in place, subsequent lectures cover the theory and practice of implementing locking, logging, and the more generic topic of implementing transactional resource managers. As an extended example, the implementation of transactional files, records and access paths is covered in detail. The course includes "guest" lectures by specialists in workflow, Corba/EJB, Internet servers, COM+/MTS, Replication, and Performance metrics.

Text:

Transaction Processing: Concepts and Techniques, Gray and Reuter

For Whom/Prerequisites:

Anyone interested in distributed computer systems, web servers, database systems, or transaction processing systems. The course combines both the theory and practice of such systems, so there is something here for both the academic and the practitioner. A degree in Computer Science or four years of industrial experience implementing applications or systems should be sufficient background to grasp most of the material.

Course Outline

This course presents the basic concepts and implementation techniques of transaction processing systems. The key message is: Transaction processing is a prerequisite for mastering the complexity of distributed, heterogeneous systems. As such, it is the enabling technology for client-server computing.

1. INTRODUCTION

Historical Perspective; What is a Transaction Processing System as viewed by administrator, programmer, user; Transaction Processing System Feature List; Application Development Features; Repository Features; TP Monitor Features; Data Base Features; Client-Server and Network Features; Operations Features; Education and Testing Features

2. BASIC COMPUTER SCIENCE TERMINOLOGY

Basic Hardware; Basic Software - Address Spaces, Processes, Sessions; Clients and Servers; Naming; Authentication/Authorization; Scheduling and Performance; Files; Performance; Transaction Processing Standards

3. FAULT TOLERANCE

Definitions; Empirical Studies; Typical Module Failure Rates; Hardware Approaches to Fault Tolerance; N-Plex Idea; Failfast vs. Failsoft; Software Fault Tolerance; N-Version Programming and Software Fault Tolerance; Transactions and Software Fault Tolerance; Fault Model and Software Fault Masking; Storage, Processes, Messages; General Principles.

4. TRANSACTION MODELS

Atomic Actions and Flat Transactions; Spheres of Control; A Notation for Explaining Transaction Models; Flat Transaction With Savepoints (Chained Transactions); Nested Transactions; Distributed Transactions; Multi-Level Transactions; Open Nested Transactions; Long-Lived Transactions; Transactional Remote Procedure Calls; Resource Managers; Interfaces Between Resource Managers and the TP-Monitor; Functional Principles of the TP-Monitor; Managing Request and Response Queues; Other Tasks of the TP-Monitor (Load Balancing, Authentication and Authorization, Restart Processing, Context
Management)

5. TRANSACTION PROCESSING MONITORS and OBJECT REQUEST BROKERS - An Overview

The Role of TP Monitors in Transaction Systems; The Transaction-Oriented Computing Style; The Transaction Processing Services; TP System Process Structure; The Structure of a TP monitor; Transactional Remote Procedure Calls; Examples of the Transaction-Oriented Programming Style; relationship to Object Request Brokers

6. QUEUED TRANSACTION PROCESSING AND WORKFLOW

This part of the seminar covers the basic concepts of messaging: messages, queues, and propagation. Topics include: Operational interfaces , Administrative interfaces, Message payloads, Scenarios/applications, Standards, and Products.

7. ISOLATION CONCEPTS

First and Second Laws of Concurrency Control; The Dependency Model of Isolation; Isolation: The Application Programmer's View; Isolation Theorems; Degrees of Isolation Theorem; SQL and Degrees of Isolation; Phantoms and Predicate Locks; Granular Locks; Key-Range Locking; The DAG Locking Protocol; Locking Heuristics; Nested Transaction Locking; Scheduling and Deadlock; Probability of Deadlock; Exotics; Field Calls, Escrow, Optimistic and Timestamp Locking

8. LOCK MANAGER IMPLEMENTATION

The Need For Parallelism Within the Lock Manager; Semaphores; Lock Manager (externals, internals); Conversion and Escalation; Savepoints, Commit, and Rollback; Locking at System Restart; Deadlock Detection

9. LOG AND RESOURCE MANAGER CONCEPTS

Uses of the Log; Log Tables; Public Interface to the Log; Implementation Details of Log Reads and Writes; Careful Writes, Serial or Ping-Pong; Group Commit, Batching, Boxcaring; WADS Writes; Multiple Logs per Transaction Manager; Log Restart; Archiving the Log; Copy Aside vs. Copy Forward; Electronic Vaulting and Change Accumulation; Logging in a Client-Server Architecture Transaction Manager Interfaces; Transaction Manager Functions; Transaction Rollback; Restart; Media Recovery; Transactional Resource Manager Concepts; The Do-Undo-Redo Protocol; Communication Session Recovery; Real Operations; Idempotence and Testable; Logging Styles; Value Logging; Logical Logging; Shadows; Physiological Logging; The One-Bit Resource Manager; Logging Rules; The Fix Rule; Write-Ahead Log (WAL); Force-Log-at-Commit; Compensation Log Records; Idempotence of Physiological REDO; Physiological Logging and Shadows Compared; Two-Phase Commit - Making Computations Atomic; Centralized System Case; Distributed Transactions and Two-Phase Commit; Incoming and Outgoing Transactions; In-Doubt Transactions

10. TRANSACTION MANAGER: CONCEPTS AND IMPLEMENTATION

Normal Processing; Transaction Manager Data Structures; Begin Work(); Remote Commit_Work(), Prepare() and Commit(); Save_Work() and Read_Context(); Rollback_Work(); Checkpoint; System Restart; Resource Manager Restart; The Two-Checkpoint Approach; Why Restart Works; Distributed Transaction Resolution - 2 Phase Commit at Restart; Accelerating Restart; Archive Recovery Heterogeneous Commit Coordinators; Closed vs. Open Transaction Managers; Interoperating with a Closed Transaction Manager; Writing a Gateway to an Open Transaction Manager; Highly Available (Non-blocking) Commit Coordinators; Heuristic Decisions Resolve Blocked Transaction Commit; Transfer-of-Commit; Optimizations of Two-Phase Commit; Read-Only Commit Optimization; Lazy Commit Optimization; Linear Commit Optimization; Disaster Recovery at a Remote Site; System Pair Takeover; Session Switching at Takeover; 1-Safe, 2-Safe, and Very Safe; Catchup After Failure

11. CICS TRANSACTION PROCESSING AND THE INTERNET

CICS history, interfaces, components, example, Web applications history, interfaces, components, and an example system.  Integrating transaction processing applications with the Web; Comparing transaction processing applications with Web applications; System components of CICS; Transaction, task and session model; A CICS application; Performance; Transaction Processing and the Internet; History of Web applications; External interfaces; System components for Web applications; Transaction, task and session model; A Web application; Performance; Recent extensions in CICS; Recent extensions for Web applications; Integrating traditional transaction processing applications with the Web; Comparing transaction processing and Web applications;

12. FILE AND BUFFER MANAGEMENT

The Role of the File System in the Overall System Architecture; External Storage vs Main Memory; Levels of Abstraction in a Transactional File and Database Manager; Media and File Management; Objects and Operations of the Basic File System; Managing Disk Space; Catalog Management For Low-Level File Systems; Buffer Management; Functional Principles of the Database Buffer; Logging and Recovery from the Buffer's Perspective; Optimizing Buffer Manager Performance; Exotics; Side Files; Single Level Storage

13: COM+/MTS

History of COM/DTC/MTS/COM+; Programming model; Context and interception; Servers; Transactions; Administration; Security; Queuing; Events; Load Balancing; In-memory Database; Performance and Scalability; Integration; Interoperability; Futures.

14: CORBA/EJB

The Enterprise JavaBeans (EJB) Framework; The EJB CORBA connection; EJB transactions; Writing and deploying your first EJB; EJB Server and Tool Venders: Meet the players

15: DATABASE REPLICATION STRATEGIES

Performance and availability goals, Replicated data vs. processing; Synchronous update propagation; Handling communication failures; Single-master, primary-copy replication; Quorum consensus; Multi-master replication; One-copy serializability; Wingman algorithm; Example - SQL Server 7.0 replication

16. TUPLE-ORIENTED FILE SYSTEMS

Mapping Tuples into Pages; Internal Organization of Pages; Free Space Administration in a File; Tuple Identification; Physical Tuple Management; Representing Attributes & Tuples; Tuple Fragmentation; Complex Tuples & Long Attributes; Multi-valued Attributes and Repeating Groups; File Organization; System-Sequenced Files; Entry-Sequenced Files; Relative Files; Key-Sequenced Files and Hash Files; Clustered Files; Accessing Tuples via Scans; Partitioned Files; Using Transactions to Maintain the File System

17. ACCESS PATHS

Techniques to Implement Associative Access Paths; Hashing; B-Trees; Synchronization on B-trees; Recovering Operations on B-trees; Sample Implementation; Exotics; Extendible Hashing; The Grid File; Holey Brick B- Trees

18. THE EVOLUTION OF GROUPWARE FOR TP/BUSINESS APPLICATIONS

Case Study with Lotus Domino/Notes; Web server functionality and internet standards compliance; Scalability and recovery improvements; Enterprise integration with relational, ERP and legacy sources; Semi-structured data management; Workflow management; Asynchronous and synchronous collaboration; CORBA, OLE and Java enablement; Replication and high availability; Agent technology.

19. TP and DB Performance Metrics

What makes a good benchmark? History of TPC, TPC C overview, transactions, schema, workflow, ACID, rules of thumb, and competitive landscape; TPC D overview, schema, scaling, query set, update streams, metrics, competitive landscape; futures of TPC-C, TPC-D, TPC-W.