The World-Wide Telescope: Building the virtual astronomy observatory of the
future; 4/5/2002
Astronomers are collecting huge quantities of data, and they are
starting to federate this data. They held a Virtual Observatory
conference in Pasadena to discuss the scientific and technical aspects
of building a virtual observatory that would give anyone anywhere access to all
the online astronomy data. My contribution
ppt was a computer science technology forecast.
doc,
pdf, The Virtual Observatory will create a "virtual" telescope on the
sky (with great response time). Information at your fingertips for
astronomers; and for everyone else. A single-node prototype for this is at (http://skyserver.sdss.org/).
More recently, Tanu Malik, Tamas Budavari, Ani Thakar, and Alex Szalay have
built a 3-observatory SkyQuery
(http://SkyQuery.net/) federation using .Net web services (I
helped a little).
Alex and I have been writing papers about this. A “general audience” piece on
the World-Wide Telescope for Science Magazine, V.293 pp. 2037-2038. 14 Sept
2001. (MS-TR-2001-77
word or
pdf.) More recently we wrote two papers describing the SkyServer. The
first describes how the SkyServer is built and how it is used.: “The
SDSS SkyServer - Public Access to the Sloan Digital Sky Survey Data,”.
A second paper (read only if you loved the first one) goes into gory detail
about the SQL queries we used in data mining. It is MSR TR 2002-01: "Data
Mining the SDSS SkyServer Database.” I have been giving lots of
talks about this.
Tom Barclay, Alex Szalay, and I gave an
overview talk at the Microsoft Faculty Summit that sketches this idea.
I gave a talk on computer technology, arguing for online disks (rather than
nearline tape), cheap processor and storage CyberBricks, and heavy use of
automatic parallelism via database technology. The talk's slides are
PowerPoint(330KB) and an extended abstract of the talk is at
Word (330KB) and
pdf (200KB). The genesis of my interest in this is documented in the
paper: "Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan
Digital Sky Survey", Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray
MSword(220KB) or
PDF (230 KB).
The
Future of super-computing and computers (Gordon Bell
was the principal author) (8/1/2001)
Gordon Bell assesses the SuperComputing every 5 years or so. This time I helped
and argued with him a bit. This discussion focuses on technical computing, not
AOL or Google or Yahoo! or MSN, each of which would be in the top 10 of the
Top500 if they cared to enter. After 50 years of building high performance
scientific computers, two major architectures exist: (1) clusters of
“Cray-style” vector supercomputers; (2) clusters of scalar uni- and
multi-processors. Clusters are in transition from (a) massively parallel
computers and clusters running proprietary software to (b) proprietary clusters
running standard software, and (c) do-it-yourself Beowulf clusters built from
commodity hardware and software. In 2001, only five years after its
introduction, Beowulf has mobilized a community around a standard architecture
and tools. Beowulf’s economics and sociology are poised to kill off the other
two architectural lines – and will likely affect traditional super-computer
centers as well. Peer-to-peer and Grid communities provide significant
advantages for embarrassingly parallel problems and sharing vast numbers of
files. The Computational Grid can federate systems into supercomputers far
beyond the power of any current computing center. The centers will become
super-data and super-application centers. While these trends make
high-performance computing much less expensive and much more accessible, there
is a dark side. Clusters perform poorly on applications that require large
shared memory. Although there is vibrant computer architecture activity on
microprocessors and on high-end cellular architectures, we appear to be
entering an era of super-computing mono-culture. Investing in next generation
software and hardware supercomputer architecture is essential to improve the
efficiency and efficacy of systems.
Digital Imortalitydoc
or
pdf (10/1/2000)
Gordon and I wrote a piece on the immortality spectrum between passing
knowledge on to future generations: one way immortatlity at one end, and
actually interacting with future generations via two-way immortality where
part of you moves to cyberspace and continues to learn and evolve. It is a
thought-piece for a "special" CACM issue.
A River System (Tobias Mayr of Cornell) (12/14/2000)
Data rivers are a good abstraction for processing large numbers
(billions) of records in parallel. Tobias Mayr, a PhD student at Cornell
visiting BARC in the fall of 2000 designed and started building a river
system. This small web
site describes the current status of that work.
The 10,000$ Terabyte, and
IO studies of Windows2000 (with Leonard Chung) (6/2/2000)
Leonard Chung (an intern from UC Berkeley) and I studied the
performance of modern disks (SCSI and IDE) in comparison to the 1997 study of
Erik Riedel. The conclusions are interesting: IDE disks (with their
controllers) deliver good performance at less than 1/2 the price. One can
package them in servers (8 to a box) and deliver very impressive performance.
Using 40 GB IDE drives, we can deliver a served Terabyte for about 10,000$
(packaged and powered and networked). Raid costs about 2x more. This is
approximately the cost of an un-raided SCSI terabyte. The details are at
IO Studies..
The 1,000$ Terabyte is here with
TeraScale Sneakernet .
This work is now ongoing with our plans to re-build the TerraServer with SATA CyberBricks.
4 PetaBumps (2/15/1999) This was an extension of some work we did last fall (0.5 PetaBumps) with U.
Washington, Research TV, Windows2000, Juniper, Alteon, SysKonnect, NTON, DARPA,
Qwest, Nortel Networks, Pacific Northwest GigaPOP, and SC99, we demonstrated
1.3 Gbps (gigabit per second) desktop-to –desktop end-user performance over a
LAN, MAN (30 km) , and WAN (300 km) using commodity hardware and software, and
standard WinSock + tcp/ip and 5 tcp/ip streams. Here is: The
press release, the white paper
word (210KB) or
PDF (780KB), and
PowerPoint Presentation (500KB) (12/20/99)
Rules of Thumb in DataEngineering (12/15/99)
Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Large Spatial Databases: I have been investigating large
databases like the TerraServer which is documented in two Microsoft technical
reports: We have been operating the TerraServer (http://TerraService.Net/)
since June 1998. At this point we have served over 4 billion web hits and 20
terabytes of geospatial images. We are working with Alex Szalay of Johns
Hopkins on a similar system to make the Sloan Digital Sky Survey images
available on the web as they arrive over the next six years. Our research
plan for handing this 40 Terabytes of data over the next five years is
described in the report Designing
and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey
, Alexander S. Szalay, Peter Kunszt, Ani Thakar, Jim Gray. MSR-TR-99-30 . In addition to the here are some
interesting air photos of the Microsoft Redmond Campus. This is a 1995 proposed
Alternate Architecture for EOS DIS (the 15 PB database NASA is
building). Here is my PowerPoint
summary of the report.(250KB) WindowsClusters: I believe you can build supercomputers as a
cluster of commodity hardware and software modules. A cluster is a collection
of independent computers that is as easy to use as a single computer. Managers
see it as a single system, programmers see it as a single system, and users see
it as a single system. The software spreads data and computation among the
nodes of the cluster. When a node fails, other nodes provide the services and
data formerly provided by the missing node. When a node is added or repaired,
the cluster software migrates some data and computation to that node. My personal (1995) research plan is contained in the document:
Clusters95.doc. It has evolved to a larger enterprise involving many
groups within Microsoft, and many of our hardware and software partners. My
research is a small (and independent) fragment of the larger NTclusters effort
lead by Rod Gamache in the NT group
Wolfpack_Compcon.doc (500KB) and a PowerPoint presentation of Wolfpack
Clusters by Mark Wood
WolfPack Clusters.ppt. (7/3/97) That effort is now called
Microsoft Cluster Services and has the
web site. Researchers at Cornell University, the MSCS team, and the
BARC team wrote a joint paper summarizing MSCS for the Fault Tolerant Computing
Symposium. Here is a copy of that paper
MSCS_FTCS98.doc (144KB) We demonstrated SQL Server failover on NT Clusters
SQL_Server_Availability.ppt (3MB). The WindowsNT failover time is about
15-seconds, SQL Server failover takes longer if the transaction log contains a
lot of undo/redo work. Here is a white-paper describing our design
SQL_Server_Clustering_Whitepaper.doc In 1997, Microsoft showed off many scalability solutions. A one-node terabyte
geo-spatial database server (the TerraSever ), and a 45-node cluster doing a
billion transactions per day. There were also SAP + SQL + NT-Cluster failover
demos, a 50 GB mail store, a 50k user POP3 mail server, a 100
million-hits-per-day web server, and 64-bit addressing SQL Server were also
shown. Here are some white papers related to that event: (5/24/97) A 1998 revision of the SQL Server Scalability white paper is
SQL_Scales.doc (800 KB) or the zip version:
SQL_Scales.zip (300 KB). There is much more about this at the Microsoft site
http://www.microsoft.com/ntserver/ProductInfo/Enterprise/scalability.asp I wrote a short paper on storage metrics (joint with Goetz Graefe) discussing
optimal page sizes, buffer pool sizes, an DRAM/disk tradeoffs to appear in
SIGMOD RECORD 5_min_rule_SIGMOD.doc
(.3MB Office97 MS Word file). Erik Riedel of CMU, Catharine van Ingen, and I have been investigating the best
ways to move bulk data on an NT file system. Our experimental results and a
paper describing them is at
Sequential_IO. (7/28/98) You may also find the
PennySort.doc (400 KB) paper interesting -- how to do IO cheaply! Database Systems: Database systems
provide an ideal application to drive the scalability and availability
techniques of clustered systems. The data is partitioned and replicated among
the nodes. A high-level database language gives a location independent
programming interface to the data. If there are many small requests, as in
transaction processing systems, then there is natural parallelism within the
computation. If there are a few large requests, then the database compiler can
translate the high-level database program into a parallel execution plan.
CacmParallelDB.doc. Performance: I helped define the
early database and transaction processing benchmarks (TPC A, B, and C). I edited
the Benchmark Handbook for Databases and Transaction Processing which is now online as a website at
http://www.benchmarkresources.com/handbook/,
managed by Brian Butler. (12/12/98)
I and am an enthusiastic follower
of the emerging database benchmarks Transaction
Processing Performance Council.
I am the web master for the Sort-Benchmark
web site. For 1998, Chris Nyberg and I did the first PennySort
benchmark PennySort.doc
(400 KB). Transaction Processing: Andreas
Reuter and I wrote the book
Transaction Processing Concepts and Techniques. Here are the errata for
the 5th printing:
TP_Book_Errata_9.htm (17KB) or in word
TP_Book_Errata_9.doc (50KB) (5/20/2001) I am working with
Microsoft's Viper team that built distributed transactions into NT. Andreas and
I taught two courses from the book at Stanford Summer Schools (with many other
instructors). The course notes are at
WICS99 and WICS96.
I helped organize the High
Performance Transaction Processing Workshop at Asilomar (9/5/99). The
web site makes interesting reading.
In February 1999, U. Washington (Steve Corbato and others),
ISI-East (Terry Gibbons and others), QWest, Pacific Nortwest Gigapop, and
DARPA's SuperNet, and Microsoft (Ahmed Talat, Maher Saba, Stephen Dahl,
Alesandro Forin, and I) collaborated to set a "land speed records" for tcp/ip
(they were the winners of the first
Interent2 Land Speed Record. The experiment connected two workstations
with SysKonnect Gigabit Ethernet via 10 SuperNet hops (Arlington, NYC, San
Francisco, Seattle, Redmond). The systems delivered 750 mbps in a single stream
tcp/ip (28 GB sent in 5 minutes) and about 900 Mbps when a second stream was
used. This was over a distance of 5600 km, and so gives the metric 3 PetaBumps
(peta bit meters per second). It was "standard" tcp/ip but had two settings:
"jumbo" frames in the routers (4470 bytes rather than 1550 bytes) that give the
endpoints fewer interrupts, and also the window size was set to 20 MB (since
the round trip time was 97 ms you need that much of a window to hold the "20M
in-flight" bits). The details are described in the submissions to the Internet2
committee.
The single-stream submission:
Windows2000_I2_land_Speed_Contest_Entry_(Single_Stream_mail).htm
The multi-stream submission:
Windows2000_I2_land_Speed_Contest_Entry_(Multi_Stream_mail).htm
The code: speedy.htm
, speedy.h,
speedy.c
And a powerpoint presentation about it.
Windows2000_WAN_Speed_Record.ppt (500KB)
A paper with Prashant Shaenoy, titled "Rules of Thub in Data
Engineering," that revisits Amdahl's laws, Gilder's laws, and investigates the
economics of caching disk and internet data. .
Wrote, with Bill Devlin, Bill Laing, and George Spix, a short piece
trying to define a vocabulary for scaleable systems: Geoplexes, Farms,
Clones, RACS, RAPS, clones, partitions, and packs. The paper defines each
of these terms and discusses the design tradeoffs of using clones, partitions,
and packs.
Was written just as the TerraServer was going online and describes the original
design.