The IBM® Blue Gene®/P Solution is a follow-on product to the successful Blue Gene/L Solution, and adheres to the original key design points set forth in the Blue Gene/L Solution. Specifically, the Blue Gene family of supercomputers has been designed to deliver ultra-scale performance within a standard programming environment while delivering efficiencies in power, cooling and floor-space consumption. Blue Gene/P extends the performance through density and frequency bump, 4-way SMP enhanced functionality, scalability for Petaflop performance, and aggressive power management for low power consumption first established with the Blue Gene Solution.
Comet on Blue Gene/P
A schematic overview of the proposed framework is presented above. The framework is composed of five key components, which are introduced below.
CometCloud
CometCloud is a core component that provides the key services required to providing the HPC-as-a-service abstraction, such as resource provisioning and management services, handles the scheduling and mapping of application tasks, and manages failures at the application and system levels.
Blue Gene Agent
Blue Gene Agent is the layer responsible for allocating partitions on Blue Gene/P and connecting them back to the BG Comet Worker to execute the given task on the compute nodes. Blue Gene Agent is also responsible for releasing unused resources, thus providing an elastic cloud based on the number of tasks currently in Comet Space. The agent is connected to Deep Cloud
Deep Cloud
Deep Cloud is a reservation-based system and a pricing model, currently being developed at the IBM T.J. Watson Research Center for managing demand and supercomputing resources to provide users with an abstraction of unlimited resources, and to maximize their satisfaction. Users reserve their supercomputing resources through Deep Cloud. Blue Gene Agent communicates with the Deep Cloud API to obtain information about the resources allocated for a given user at a certain time.
DISCOVER
DISCOVER (Distributed Interactive Steering and Collaborative Visualization Environment) is a generic framework that enables interactive steering of scientific applications and also allow for collaborative visualization of data sets generated by such simulations. DISCOVER is an optional tool for scientists to monitor and steer simulations on the HPC cloud.
Blue Gene Java API
A Java library is used to link desktop or mobile applications to the Blue Gene/P supercomputer. Java was chosen to provide a universal, platform independent, binary that can be deployed directly, and linked to any software. A documented API is provided for users to link the library to their applications. The library handles the communication with the Blue Gene/P supercomputer through SSH Tunneling. The library can be extended to include new applications newly introduced by the users.
Key components layout.
Together these components provide HPC-as-a-service. Their stack layout is shown in Figure 2. Users reserve their resources through Deep Cloud, run their application using the Java API. Then the system automatically utilizes these allocated resources efficiently through the Blue Gene Agent and CometCloud. Meanwhile users can login to DISCOVER to monitor and steer their simulation in real time.
Weather Forecast: Cloudy with a Chance of Supercomputing
Motivation
- Personal computers and handheld devices are abundant, but with limited computing power.
- Public clouds are ineffective for HPC applications.
- Supercomputers are powerful, but not widely available.
- Grid-based private clouds are expensive.
- The current HPC model is costly and far from ideal.
Goal
The goal of this work is to provide a framework that can harness the power of a supercomputer efficiently while providing the ease and availability of a personal computer.
Features
- HPC-as-a-service using IBM Blue Gene/P supercomputer.
- Easily accessible through desktop and mobile systems.
- Transform Blue Gene/P to an elastic cloud.
- Public cloud integration, when HPC is not required.
- Efficient utilization of Blue Gene/P’s resources.
- Maximizing the user experience.
- Simplify the deployment, execution, monitoring and steering of HPC applications from a personal computer, with no impact on performance, while increasing productivity.