Jim Gray Summary Home Page

Microsoft eScience Group

As you may be aware, Jim Gray has gone missing.


We (his colleagues in Microsoft Research) have heard from many of his collaborators about projects and collaborations that he had underway with them and who are unsure how to proceed. If you find yourself in this situation, please email grayproj@microsoft.com and we will follow up with you to find the best way forward.

Jim Gray is a researcher and manager of Microsoft Research's eScience Group. His primary research interests are in databases and transaction processing systems -- with particular focus on using computers to make scientists more productive. He and his group are working in the areas of astronomy, geography, hydrology, oceanography, biology, and health care. He continues a long-standing interest on building supercomputers with commodity components, thereby reducing the cost of storage, processing, and networking by factors of 10x to 1000x over low-volume solutions. This includes work on building fast networks, on building huge web servers with CyberBricks, and building very inexpensive and very high-performance storage servers.

Jim also is working with the astronomy community to build the world-wide telescope and has been active in building online databases like http://terraService.Net and http://skyserver.sdss.org. When the entire world's astronomy data is on the Internet and is accessible as a single distributed database, the Internet will be the world's best telescope. This is part of the larger agenda of getting all information online and easily accessible (digital libraries, digital government, online science ...). More generally, he is working with the science community (Oceanography, Hydrology, environmental monitoring, ..) to build the world-wide digital library that integrates all the world's scientific literature and the data in one easily-accessible collection. He is active in the research community, is an ACM, NAE, NAS, and AAAS Fellow, and received the ACM Turing Award for his work on transaction processing. He also edits of a series of books on data management.

What's new? 

  Performance of a Sun X4500 under Windows, NTFS and SQLserver 2005,” (pdf) Sun loaned this storage/compute brick to JHU for some of the eScience internet services we are building there.   This preliminary performance report shows it to be a balanced system (4 cpus, 16GB ram, 48 disks, 24TB all in 4U using 800W.) Here is the spreadsheet with the numbers for the graphs. And here is a zip of the test tools and scripts.

A radical view of Flash Disks: document and talk.

 “SkyServer Traffic Report – The First Five Years,” is a study of the traffic on Skyserver.sdss.org, an eScience website.   Done jointly with Vik Singh, Alex  Szalay, Ani Thakar, Jordan Raddick, Bill Boroski, Svetlana Lebedeva, and Brian Yanny it analyzes the traffic trying to see how people and programs use the site, the data, and the batch job system.

Cross-Matching Multiple Spatial Observations and Dealing with Missing Data”,  with Alex Szalay, Tas Budavári, Robert Lupton, Maria Nieto-Santisteban, and Ani Thakar explains how to spatially correlate observations of the same area (of the sky or earth or..).

Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service,” with Katalin Szlavecz, Andreas Terzis, Razvan Musaloiu-E, Joshua Cogan, Sam Small, Stuart Ozer, Randal Burns, and Alex Szalay of JHU we built a end-to-end soil monitoring system deployed at a Baltimore urban forest. Sensor moisture and temperature reports are stored and calibrated in a database. The measurement database is published through Web Services interfaces. In addition, analysis tools let scientists analyze current and historical data and help manage the sensor network.

GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management ,” with Naga K. Govindaraju, Ritesh Kumar, Dinesh Manocha of UNC built a sorter that uses the Graphics Processing Unit (GPU) to sort real fast. I helped with the IO and with writing this report so that I could read it :). The GPUs have 10x the memory bandwidth and processing power of the CPU, and the gap is widening, so we have to learn how to use them. This is my first experience in this new world -- it’s a vector-coprocessor, it’s a SIMD machine, its really different -- and so a lot of fun. You get to rethink all your assumptions.

Empirical Measurements of Disk Failure Rates and Error Rates ”, with Catharine van Ingen describes moving two petabytes using inexpensive computers and reports the errors we observed -- SATA uncorrectable read errors happen, but they are not the main problem.

Three papers on doing a modern Finite Element Analysis System using off-the-shelf database and visualization tools for data management and data analysis:
Supporting Finite Element Analysis with a Relational Database Backend Part I: There is Life beyond Files Gerd Heber (Cornel Theory Center) explains how to represent FEA metadata and data using an SQL database.
Supporting Finite Element Analysis with a Relational Database Backend Part II: Database Design and Access Gerd Heber explains details of getting data into SQL, representing the mesh, and how to do point-in-tetrahedron and other standard queries.
Supporting Finite Element Analysis with a Relational Database Backend; Part III: OpenDX – Where the Numbers Come Alive”, Gerd Heber, Chris Pelkie, Andrew Dolger, Jim Gray, David Thompson, MSR-TR-2005-151, November 2005. Explains how the visualization tools work and how they connect to SQL.
I helped Gerd with some DB and spatial DB things, and helped write this paper, but it is really Gerd's work. There is one more installment in the pipeline.

“Petascale Computational Systems: Balanced CyberInfrastructure in a Data-Centric World,” pdf, is a letter from Gordon Bell, Alex Szalay, and me to NSF the Cyberinfrastructure Directorate. It argues that Computational Science is changing to be data intensive. NSF should support balanced systems, not just CPU farms; but also petascale IO and networking. NSF should allocate resources to support a balanced Tier-1 through Tier-3 national cyber-infrastructure.

Alex Szalay and Gyorgy Fekete of JHU and Bonnie Freiberg of SQLserver and I worked to get a spatial indexing sample into SQLserver 2005.This is a public domain implementation of the HTM algorithms documented in “Indexing the Sphere with the Hierarchical Triangular Mesh,” pdf, ( MSR-TR-2005-123).The library itself with examples is described in “Using Table Valued Functions in SQL Server 2005 To Implement a Spatial Data Library,” pdf, (MSR-TR-2005-122).

20 years ago today the Datamation article: A Measure of Transaction Processing appeared. Charles Levine and I thought it was time to benchmark a PC -- my 2-year-old TabletPC to be exact. We did TPC-B (DebitCredit without the message handling) and got about 8ktps (!). The 4-page report and 6-page script that builds the database and runs the benchmark in a half hour is: Thousands of DebitCredit Transactions-Per-Second: Easy and Inexpensive Abstract: A $2k computer can execute about 8k transactions per second. This is 80x more than one of the largest US bank’s 1970’s traffic – it approximates the total US 1970’s financial transaction volume. Very modest modern computers can easily solve yesterday’s problems. A second paper with a broader perspective appeared in “A Measure of Transaction Processing 20 Years Later,” with a broader perspective appeared as MSR-TR-2005-57 and in June 2005 IEEE Data Engineering Bulletin.

eScience: My involvement with the astronomers continues to be fun: We have built a batch system for long-running queries: “Batch is back: CasJobs, serving multi-TB data on the Web,” William O’Mullane, Nolan Li, Maria A. Nieto-Santisteban, Ani Thakar, Alexander S. Szalay, Jim Gray, February 2005. Have been musing about where scientific data management is going,
“Scientific Data Management in the Coming Decade,” Jim Gray, David T. Liu, Maria A. Nieto-Santisteban, Alexander S. Szalay, Gerd Heber, David DeWitt, we have been writing down our “lessons learned from the SkyServer “Where the Rubber Meets the Sky: Bridging the Gap between Databases and Science,” Jim Gray; Alexander S. Szalay, and experimenting with big spatial search queries using databases rather than file systems (and so going 50x faster): “When Database Systems Meet The Grid,” María A. Nieto-Santisteban; Alexander S. Szalay; Aniruddha R. Thakar; William J. O’Mullane; Jim Gray; James Annis. A paper describing the spatial data algorithms I have been developing with Alex Szalay et al, “ There Goes the Neighborhood: Relational Algebra for Spatial Data Search”, is interesting and there are 3 more such papers in the pipeline. There are lots more of course: these are some thought pieces in preparation for LSST and the World-Wide-Telescope (Virtual Observatory): Web Services for the Virtual Observatory, describes how we hope to use web services to build the world-wide telescope. Petabyte Scale Data Mining: Dream or Reality? is a thought experiment, scaling our current efforts 100 fold to see if we can handle petabyte databases in the year 2007, and Online Scientific Data Curation, Publication, and Archiving explores the issues of publishing and curating scientific data. The SkyQuery ( http://www.skyquery.net web service Malik, Budavari, and Szalay built at Johns Hopkins. Their web service that federates 15 different Astronomy archives and uses SdssCutout as a component. I continue to work on the SkyServer: The SDSS SkyServer – Public Access to the Sloan Digital Sky Server Data summarizes the SkyServer website design, database design and website usage. Data Mining the SDSS SkyServer Database describes our astronomy data mining efforts.

Storage Architecture: Peter Kukol and others have been working on moving bulk data: the goal is to move 1 Giga Byte per second from CERN (Geneva Switzerland) to Pasadena California so that the Physicists in California can see the data as it comes out of the Large Hadron Collider (LHC) that will come online in 2008. Many other science disciplines need this as well. This paper shows how to do local IO fast. “Sequential File Programming Patterns and Performance with .NET,” Peter Kukol, Jim Gray (describes and measures programming patterns for sequential file access in the .NET Framework. The default behavior provides excellent performance on a single disk – 50 MBps both reading and writing. Using large request sizes and file pre-allocation has quantifiable benefits. .NET unbuffered IO delivers 800 MBps on a 16-disk array, but buffered IO delivers about 12% of that performance. Consequently, high-performance file and database utilities are still forced to use unbuffered IO for maximum sequential performance. The report is accompanied by downloadable source code that demonstrates the concepts and code that was used to obtain these measurements With Caltech (Harvey Newman et al), CERN, AMD, Newisys, and the Windows™ networking group we have been working to move data from CERN to Caltech (11,000 km) at 1 GBps (one gigabyte per second.) We have not succeeded yet. Our progress is reported at Gigabyte Bandwidth Enables Global Co-Laboratories (4.2 MB MSword), ( pdf of slides + transcript (2.4MB)) in a presentation with Harvey Newman and I gave at Windows Hardware Engineering Conference, Seattle, WA, 3 May, 2004. Peter Kukol’s “ Sequential Disk IO Tests for GBps Land Speed Record,” tells how we move data the first and last meter at about 2GBps

TerraServer: Our investigation of CyberBricks continues with various whitepapers about our experiences: “TerraServer Bricks – A High Availability Cluster Alternative,” Tom Barclay; Wyman Chong; Jim Gray, describes the migration of the TerraServer to a brick hardware design and describes our experience operating it over the last year. It makes an interesting contrast to TerraServer Cluster and SAN Experience,” Tom Barclay, Jim Gray, describes our experience operating the TerraServer SAN cluster as a “classic” enterprise configuration for the three years. TerraService.NET: An Introduction to Web Services: tells how Tom Barclay converted the TerraServer to a web service, and how the USDA uses that web service. “A Quick Look at SATA Disk Performance,” Tom Barclay, Wyman Chong, Jim Gray investigates the use of Storage Bricks: low-cost, commodity components for multi-terabyte SQL Server databases. One issue has been the shortcomings of Parallel ATA (PATA) disks. Serial ATA (SATA) drives address many of these problems. This article evaluates SATA drive performance and reliability. Each disk delivers about 50 MBps sequential and about 75 read IOps and 130 write IOps on random IO. It is the sequel to TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange" that describes the storage bricks we use for data interchange, archiving, and backup/restore. Gives price, performance, and some rationale.

Deep Thought :) : An extended abstract of keynote talk at ACM SIGMOD 2004, Paris, France “ The Revolution in Database Architecture,” that enumerates the enormous changes happening to database system architecture.
“Consensus on Transaction Commit”, Jim Gray, Leslie Lamport, MSR-TR-2003-96, 32 p. The distributed transaction commit problem requires reaching agreement on whether a transaction is committed or aborted. The classic Two-Phase Commit protocol blocks if the coordinator fails. Fault-tolerant consensus algorithms also reach agreement, but do not block whenever any majority of the processes is working. Running a Paxos consensus algorithm on the commit/abort decision of each participant yields a transaction commit protocol that uses 2F +1 coordinators and makes progress if at least F +1 of them are working. In the fault-free case, this algorithm requires one extra message delay but has the same stable-storage write delay as Two- Phase Commit. The classic Two-Phase Commit algorithm is obtained as the special F = 0 case of the general Paxos Commit algorithm.

Grid Computing: Distributed Computing Economics discusses the economic tradeoffs of doing Grid-scale distributed computing (WAN rather than LAN clusters). It argues that computations must be nearly stateless and have more than 10 hours of cpu time per GB of network traffic before outsourcing the computation makes economic sense. This is part of the more general discussion of Grid computing. My views are presented in memo Microsoft and Grid Computing, a PowerPoint presentation Web Services, Large Databases, and what Microsoft is doing in the Grid Computing space, and Microsoft and Grid Computing Interview for Grid-Middleware Spectra.