About
Registration
Conference
Technical Program
Exhibits
News & Press
Travel
Advisory Committee
Contact Information
History
SC07 Committees
Sponsoring Societies
Steering Committee
Overview
Overview
Schedule
Last-Minute Schedule Updates and Changes
My Itinerary
Keynote
Broader Engagement
Cluster Challenge
Education
Important Dates
SCinet
Student Volunteers
SC Fellowship
Overview
Awards
BOFs
Challenges
Disruptive Technologies
Doctoral Showcase
Keynote & Invited Speakers
Masterworks
Panels
Papers
Posters
Tutorials
Workshops
Overview
Exhibitor Forum
Exhibitor Information
Floor Plan
Industry Exhibits
Research Exhibits
Exhibitor List
Overview
Press Releases
Newsletters
For Media Professionals
Overview
Conference Hotels
About Reno
Maps and Directions
Conference Shuttle Schedule
SCHEDULE: NOV 10-16, 2007
Warning: It appears you do not have Javascript enabled.
If so, you will have trouble creating and viewing your itinerary information.
Coordinated Fault Tolerance in High-end Computing Environments
Session:
Coordinated fault tolerance in high-end computing environments
Event Type:
Birds of a Feather
Time:
12:15pm - 1:15pm
Session Chair
:
Peter Beckman
Leader(s)
:
Pete Beckman, Rinku Gupta, Al Geist
Location:
A3 / A4
Abstract:
The ability to detect and recover from faults on large HPC systems would be greatly aided by a standardized interface to exchange fault information. A standard framework where any component of the software stack can report or be notified of faults through a common interface enables coordinated fault tolerance and recovery. This BOF will present the draft design of such an interface for comment by the HPC community, both users and vendors.
The objectives of this BOF session are:
(1) To have an open discussion about the usefulness, impact, and adoption of a comprehensive fault-tolerance framework in enterprise and research environments
(2) To better understand fault management and fault-tolerance challenges being faced in todays environment
(3) To bring together individuals dealing with high-end, petascale computing infrastructures, who have an interest in developing and tolerance in high-end computing environments
Chair/Leader Details:
Peter Beckman (Chair)
Argonne National Laboratory
Pete Beckman
Argonne National Laboratory
Rinku Gupta
Argonne National Laboratory
Al Geist
Oak Ridge National Laboratory
Home
|
About
|
Contact Us
|
Registration