|
MONDAY, MAY 12, 2008
1:00 – 5:00 pm
Using Grid Engine and UniCluster Express: A Primer and Sneak Peek
Speakers: Chris Dagdigian, Principal Consultant, BioTeam Inc. and Rich Wellner, Jr., Vice President of Professional Services, Univa UD
This special session, brought to you by Univa UD, will be a sneak peek of an upcoming advanced user tutorial series co-hosted by BioTeam and Univa UD. This session will be an introduction and live demonstration of UniCluster Express and Grid Engine. The purpose of this tutorial is a step-by-step instruction on the installation, configuration and basic use of a compute cluster with Grid Engine and UniCluster Express. This tutorial requires a rudimentary knowledge of Linux and administration of Linux systems, networking, and basic programming principles. Chances are, if you've come across this tutorial you do indeed have that skill set.
Globus Primer: An Introduction to Globus Software
Speaker: Lee Liming, Technology Analyst, Argonne National Laboratory and the University of Chicago
The Globus Toolkit is a collection of software solutions to many of the integration challenges that come up in Grid system and application development. Recommended for first-time GlobusWORLD attendees, this afternoon tutorial provides an introduction to the Globus Toolkit and its most common uses in science and engineering applications. It provides an overview of what a new attendee can expect throughout the week of the conference. The open source Dev.Globus community develops, distributes, and supports the Globus Toolkit and a variety of other software projects. This tutorial provides answers to critical questions for Grid project planners and product developers, including: What is Globus? What can the Globus Toolkit do for me? Where does Globus software fit in a Grid system or application? Where should I get started learning about Globus? What resources are available to help me when I use Globus? How have others succeeded using Globus software?
5:30 – 7:30 pm
Welcome Reception sponsored by Sun Microsystems
Please join us on the top floor of the Marriott City Center for dazzling views, hors d'oeuvres and a cool beverage as we kick off the conference.
TUESDAY, MAY 13, 2008
8:30 – 10:00 am
Welcome and State of The Union
Join conference organizers Fritz Ferstl,
Ian Foster
and Philip Papadopoulos for the conference opening session.
10:30 – 12:00 pm
GRID ENGINE #1
Sun/Community State of the Union
Speaker: Fritz Ferstl, Director of Grid Engineering, Sun Microsystems
New features in Sun Grid Engine 6.2
Speaker: Lubomír Petrík,Software Developer, Sun Microsystems Czech and Roland Dittel, Sun Microsystems Germany
This presentation will give an overview over the new features
and enhancements in Sun Grid Engine (SGE) 6.2. The topics covered
are Advance Reservations, improved Interactive Job Support, improved
Array Job Dependencies, support for SMF and Service Tags, JMX support
and Performance and Scalabilty improvements.
ROCKS #1
Workshop Goals and Rocks 5.0 Enhancements
Speakers: Greg Bruno and Mason Katz
This session will focus on the new features of Rocks 5.0 (V) including support for Xen-based virtual machines, enhancements/changes in the Rocks command structure, and general support for Version 5.0 of CentOS/RHEL. In addition, participants will have an opportunity to shape follow-on sessions for this year's workshop. A refresher (or introduction) to the Rocks configuration graph will be given. Participants are expected to have familiarity with previous versions of Rocks.
GLOBUSWORLD #1
Globus WS Core and Tools
Speakers: Rachana Ananthakrishnan, Senior Software Developer, Argonne National Laboratory;
Ashish Sharma, PhD, Research Scientist, Ohio State University; Ravi Madduri, Senior Software Developer, Argonne National Laboratory
The Java WS Core component in Globus Toolkit contains an implementation of the Web Services Resource Framework (WSRF) and Web Services Notification (WSN) family of specifications, and provides a container to build and deploy web services based on these specifications. This session will include presentations on:
- GT Java WS Core: Features and Roadmap (Rachana Ananthakrishnan): An overview of supported features, including new features in GT 4.2 such as dynamic deployment support, HTTP/S connection caching and WS Enumeration support, and discuss the latest on-going and planned work in this area.
- Authoring Services using Introduce (Ashish Sharma): An overview of Introduce, a GUI tool to author services using Java WS Core that enables service developers to focus on the business logic and automates the generation of web service pieces.
- Grid Remote Application Virtualization Interface (gRAVI) (Ravi Madduri): A tool that leverages Introduce and facilitates publishing of arbitrary applications as web services.
INTRODUCTION & OVERVIEW #1: CLUSTER STACKS
Intel Cluster Ready
Speaker: Clem Cole, Manager, Architecture and Development of Intel Cluster Ready
Cluster are a cost effective deployment platform for high performance computing. However, until recently each cluster tended to be a tad different. These differences, while often conceptually minor, are a major inhibitor to developing applications that can run with high conferences on many different clusters. Similarly, many different applications from different provider's can not be confident that their applications can run in harmony with other applications. In this talk, I will describe the Intel Cluster Ready program and it helps to make build, deploying and maintaining clusters easy for end users, application developers, administrators, as well as ISVs, component providers and platform integrators.
Introduction to Univa UD UniCluster Express: A Simple, Fully Integrated Cluster Stack
Speaker: Bill Bryce, Director Product Management-HPA, Univa UD
One of the most important decisions in building a HPC cluster is choosing the software that will deploy, monitor, manage and operate the system. In the past each software component was installed and configured separately leading to a high degree of complexity requiring deep HPC knowledge and system administration skills. Today Software Stacks such as UniCluster Express simplify many of the difficult and complex tasks involved in creating and managing a HPC cluster. The result is that users can focus on running their applications spending much less time on the cluster software infrastructure.
Solaris HPC Stack
Speaker: Daniel Templeton, Strategic Liaison Manager, Sun Grid Engine, Sun Microsystems
Sun has recently launched a new HPC community on the OpenSolaris.org
site. One of the purposes of that community is to facilitate
collaboration in building out an HPC stack for the OpenSolaris operating
system. The end goal of that HPC stack is to gather together HPC
expertise from both inside and outside Sun into a complete and
integrated software stack addressing the needs of both developers and
administrators. This presentation will introduce the project and the
community and discuss plans for the stack. Ample time will be left for
feedback from attendees.
12:00 – 1:30 pm
LUNCH (PROVIDED)
1:30 – 2:00 pm
SPONSOR TALK
Using OGF Standards for Grid and HPC
Speaker: Chris Smith, Vice President of Standards, Open Grid Forum
For a number of years, the OGF have been working on specifications that are intended to address common use cases in Grid and High Performance Computing. We now have a sufficient body of specifications to realize some of these uses cases. This talk will provide a snapshot of where we are with respect to OGF specifications such as DRMAA and the HPC Basic Profile, and the status with respect to implementations of these specifications.
1:30 – 3:00 pm
GRID ENGINE #2
Grid Engine at the Texas Advance Computing Center
Speaker: Roland Dittel, Sun Microsystems Germany
With "Ranger" the Texas Advance Computing Center deploys the largest
computing system in the world for open science research. The resource
management for job scheduling in this cluster is provided by Sun Grid
Engine (SGE). This presentation will give an overview about the cluster
setup and the implemented scalability improvements to utilize the 3,936
nodes and 62,976 processing cores.
Fun with Grid Engine XML
Speaker: Chris Dagdigian, Principal Consultant, BioTeam, Inc.
Organizations needing to programatically monitor the state and status
of Grid Engine systems prior to SGE 6.0 were often required to parse
spool files or manipulate the "human readable" text output from
commands such as qconf and qstat. The introduction of XML output
option flags in Grid Engine has opened the floodgates for far more
interesting and powerful tools to be developed. This talk will center
on methods for obtaining, searching on and transforming Grid Engine
XML data into various output formats. XML technologies including XPATH
and XSLT will be lightly covered using the code from http://xml-qstat.org
as examples. This will be a moderately technical talk aimed at
audiences with little prior exposure to XML transformation and
processing
Grid Engine Future Plans
Speaker: Daniel Templeton, Strategic Liaison Manager, Sun Grid Engine, Sun Microsystems
Sun Grid Engine software is one of the top distributed resource
management software packages in the industry. As both a licensed
product and an open source project, Sun Grid Engine has very broad
adoption across a wide range of industry segments, government
facilities, and educational institutions. The next release of the Sun
Grid Engine software will happen late this summer, but what then? In
this talk, we're peek ahead into what we're thinking about for the future.
GLOBUSWORLD #2
What's New in the Data Area? (45 minutes)
Speakers: Raj Kettimuthu, Argonne/UC; Ann Chervenak, USC/ISI; Rob Schuler, USC/ISI
We will present an overview of recent developments and future plans for major Globus components. For GridFTP, we will discuss the use of GridFTP over UDT; GridFTP with SSH; Multicasting in GridFTP; providing resource management for GridFTP transfers; and recent optimizations to support good performance for lots of small files. We will also discuss the planned work on automatic firewall traversal for GridFTP transfers. For replication services, we will discuss the embedded database backend for the Replica Location Service and the pure Java interface for RLS. We will also discuss new work on policy-driven data placement services and their relationship to workflow management systems. Finally, we will discuss recent developments in the OGSA Data Access and Integration System.
GridFTP and Cluster Meltdown: When No Means 'Maybe Later' (45 minutes)
Speaker: John Bresnahan, Argonne/UC
High speed wide area data transfer requires a quite a bit of compute resources, not only in terms of network bandwidth and disk space, but also in endpoint system memory and processing power. Too often system administrators inadvertently allow clients to 'overclock' their cluster's GridFTP servers by failing to protect them from clients that try to transfer too many files too fast and all at once. This ultimately acts as a denial of service causing thrashing and extremely sub optimal results. In this session we will explain how to properly configure Globus data transfer services including GridFTP and RFT. Attendees will learn when and how to make the choice between the two services in order to protect hardware resources and achieve the optimal results which a given set of hardware set can provide.
TUTORIAL #1: (ROCKS)
Introduction to Clusters and Rocks Overview
Speaker: Mason Katz
This session will cover the basics of types and design of clusters (from Beowulf, to Tiled Walls, to High-Performance Supers). The basic philosophy and first level design of Rocks will be presented as will comparisons to some other competitive methods. Getting started on Rocks will include building of real and virtual machines in Rocks 5.0
INTRODUCTION AND OVERVIEW #2: ENABLING SOFTWARE FOR DISTRIBUTED COMPUTING
Towards a Common Communication Infrastructure for Clusters and Grids
Speaker: Darius Buntinas, PhD, Assistant Computer Scientist, Argonne National Laboratory
Communication infrastructure for Clusters and Grids has traditionally been dealt with in a decoupled manner. For many years, cluster communication systems have been focusing on various optimization aspects relying on hardware protocol offload, RDMA, OS bypass and many other high-performance features. For Grids, on the other hand, TCP and UDP continue to be the dominant communication protocols of choice. As high-speed lambda connectivity between different sites is becoming common, it is becoming increasingly important to have a common communication infrastructure that can match the demands of both Cluster and Grid environments. Such a common communication infrastructure needs to provide various features such as low-latency, high-bandwidth and reduced CPU usage that application scientists have come to expect of most cluster interconnects. At the same time, this infrastructure should also be capable of meeting the various demands of wide-area communication such as efficiently utilizing high-bandwidth communication pipes in lambda grids and maintaining backward compatibility with existing infrastructure. In this talk, we will present different advances in communication technologies that have taken place in this area. Specifically, we will focus on two popular network technologies, InfiniBand and 10-Gigabit Ethernet (TCP/UDP offload engines, iWARP, MX) and present their latest advances in this area. We will also give an overview of different solutions available today and point out pros-and-cons of these technologies.
MPICH
Speaker: Darius Buntinas, PhD, Assistant Computer Scientist, Argonne National Laboratory
Open MPI and Sun HPC ClusterTools: A Technical Overview
Speaker: Leonard Wisniewski, PhD, Engineering Manager, Sun Microsystems / Software Developer Tools and Services
Open MPI was established four years ago as a clean slate implementation
of the MPI-1 and MPI-2 specifications. The goals of the Open MPI
project are to 1) create a free, open source, peer-reviewed,
production-quality complete MPI-2 implementation, 2) provide extremely
high, competitive performance, 3) directly involve the HPC community
with external development and feedback, 4) provide a stable platform for
3rd party research and commercial development, 5) help prevent the"forking problem" common to other MPI projects, and 6) support a wide
variety of HPC platforms and environments. Sun joined the Open MPI
community two years ago to add experience and expertise applied
previously to the proprietary Sun HPC ClusterTools product.
This talk will present an overview of the Open MPI architecture and what
hardware and software platforms it supports. Further, we examine the
Open MPI goals and highlight how these goals have been achieved to
date. We also provide details on how Open MPI has been used as the
basis of the "new" Sun HPC ClusterTools and how Sun has enhanced Open
MPI with its contributions to support Sun software such as Sun Grid
Engine and Sun Studio.
Stateless Provisioning with Perceus
Speaker: Greg Kurtzer, CTO, Infiscale.com
Stateless operating system management has many benefits in both enterprise and high-performance cluster computing. Perceus, like its predecessor Warewulf, facilitates provisioning industry-standard operating systems in a stateless manner, turning bare metal systems into production-ready servers almost indistinguishable to the user from fully-installed boxes, but in a fraction of the time and with little to no administrative effort. Already distribution-neutral, and working toward full operating system neutrality, Perceus can be used with most any hardware infrastructure or cluster software stack. All architectural decisions are up to the administrator or integrator of the system itself, and making changes on a thousand systems is as easy as making a change on a single system. Scaling to tens of thousands of nodes without compromising usability, Perceus now manages clusters of all sizes, from small ad-hoc home-brew systems to ten-thousand node behemoths. Leveraging our partnerships with software and hardware vendors, Perceus and its companion projects combine to form the only 100% free and open source solution available today which is certified as Intel Cluster Ready(tm). In this presentation, we will give an overview of Perceus, provide general usage and examples, and field audience questions.
3:00 – 3:15 pm
BREAK
3:15 – 4:00 pm
KEYNOTE PRESENTATION: How Open Source Drives Standards: Making HPC Clusters Simple
and Affordable
Speaker: Gary Tyreman, General Manager of HPC, Univa UD
“I invented nothing new. I simply assembled into a car the discoveries
of other men." -Henry Ford.
Henry Ford once said “the way to make automobiles is to make one
automobile like another automobile, to make them all alike.” A
visionary in time and motion business practices, Ford understood that
the key to mass market acceptance of the automobile was accessibility,
affordability and safety. The adoption of interchangeable parts,
mainstay in the typewriter and clocks industries for decades, was
precisely the catalyst required to drive volume and lower costs for
the nascent automotive industry. The HPC industry, like the clock,
typewriter and automobile market before it, is ready for adopting a
standardized design and leveraging interchangeable parts.
This keynote will discuss the powerful impact of open source software
on the acceleration of the commoditization of HPC Linux Clusters and
how Univa UD, the leader in open source cluster and grid solutions
will “assemble the discoveries of other men” into a simple to use
cluster software stack for the mass market.
4:00 – 4:30 pm
BREAK
4:30 – 6:00 pm
ROCKS #2
Xen VMs, Virtual Clusters and Programmatic Partitioning
Speakers: Mason Katz; Greg Bruno; Philip Papadopoulos, PhD, Program Director, San Diego Supercomputer Center at UC-San Diego; Anoop Rajendra
The internals of Xen support in Rocks will be presented and dissected in detail. A preliminary roadmap for enhanced support for completely virtualized clusters (frontends and slave nodes) will be given. New for Rocks 5.0 is the ability to fully program how a node partitions its local hard drives so that any partitioning policy can be implemented. Methods, techniques and examples of partitioning schemes will be presented.
GLOBUSWORLD #3
Grid Information Management Using MDS
Speakers: Laura Pearlman, USC/ISI, MDS Project Chair; JP Navarro, ANL; Yusuke Tanimura, AIST

Globus Monitoring and Discovery Services (MDS) allow for the monitoring
of the state of the grid and for discovery of available resources. In
this session we discuss the overall design, latest developments, and
future plans for these services and describe some user experiences with
them. We will focus on new developments and use cases involving the MDS
Index and Trigger Services, the WebMDS interface, and the components
used to publish information via MDS. This session will be structured as
a general overview of MDS topics, followed by a case study of MDS use in
TeraGrid and a discussion of S-MDS, a semantic modeling and discovery
system based on MDS.
TUTORIAL #2: (GRID ENGINE)
Using the New Features of Grid 6.2
Speakers: Roland Dittel, Sun Microsystems Germany and Lubomír Petrík, Software Developer, Sun Microsystems Czech
With the upcoming Sun Grid Engine 6.2 release a lot of new features will
be introduced which include new CLI, APIs and usage concepts. This
tutorial will show how to use and administer these new features and what
are their benefits.
INTRODUCTION AND OVERVIEW #3: GRID ENGINE TOPICS
Synopsis Use of Sun Grid Engine in EDA
Speaker: Joe Fu, Technical Manager, Synopsys, Inc. and Bogdan Vasiliu, Sun Microsystems, Inc.
Electronic Design Automation (EDA) applications stress computer systems in any imaginable way. They can be processor, memory, and I/O intensive, basically nothing is spared. Managing thousands of EDA compute jobs daily (nightly builds, regression runs, tests, benchmarks, interactive jobs, etc.) on geographically distributed grids, each consisting of hundreds to thousands of nodes, is a daunting task not for the faint hearted. To make things even more difficult, each of these grids may have its own access policies and restrictions, and special configurations. Specialized tools and special skills are required to handle this type of job. This talk will focus on how Synopsys, the largest EDA independent software vendor, utilizes Sun Grid Engine (SGE) to efficiently manage its internal EDA compute jobs' flow and execution. The presentation will cover the technical aspects of managing and configuring SGE at Synopsys: the setup and configuration of local grids, queues, complexes, access policies, etc., and various challenges and solutions for this type of large scale grid installations.
Grid Heating: Dynamic Thermal Allocation via Grid Engine Tools
Speaker: Paul Brenner, PhD, Scientist, University of Notre Dame Center for Research Computing
From 2006 to 2011 the national energy consumption for powering and cooling IT servers is estimated to grow from a cost of 4.5 to 7.4 billion dollars as reported by a recent EPA study which included current efficiency improvement trends. With growing national concern for energy efficiency and environmental stewardship, current power utilization trends in HPC and data centers cannot continue to scale with computational demands. I introduce a new grid heating framework to promote the efficient growth and sustainment of commercial, academic, and government computation capabilities. Grid Heating removes cooling expenditures while providing dynamic distributed heating benefits to target heat sinks. In this presentation I will introduce the grid heating framework and share experimental results heating a municipal botanical garden using Grid Engine tools to remotely harness HPC resources. Additional grid heating challenges and opportunities are discussed in regards to development, implementation, and deployment.
LSF vs Grid Engine
Speaker Chris Dagdigian, Principal Consultant, BioTeam, Inc.
As an independent consultant with years of Grid Engine and Platform
LSF experience, Chris Dagdigian has often been asked to help clients
with IT purchasing decisions. Often this includes assisting with
evaluation and selection of a distributed resource management ("DRM")
solution. Using past projects as examples, the background methodology
for making "Grid Engine vs. Platform LSF" deployment decisions will be
explained.
6:00 – 8:00 pm
Sponsor Reception
Join us for hors d'oeuvres and drinks by the sponsor tables and take this opportunity to view their displays and thank them for supporting the conference.
WEDNESDAY, MAY 14, 2008
8:30 – 10:00 AM
GRID ENGINE #3
Using OGF Standards for Grid and HPC
Service Domain Manager – Basics and Concepts
Speakers: Richard Hierlmeier and Ryszard Macidlowski
Service Domain Manager (SDM) is an upcoming product from Sun that will
allow administrators to configure policies to automatically reassign
resources from one service to another based on service level objectives
and the changing load conditions. The Sun Grid Engine 6.2 software will
include an early version of SDM that will allow multiple Sun Grid Engine
clusters to dynamically share resources to maximize utilization across
the entire grid.
This presentation will explore the SDM features that are exposed in Sun
Grid Engine 6.2. Topics covered will include the various SDM components
and the basic SDM concepts, such as services, resource, the spare pool,
etc. The presentation will also look ahead to what features the full
SDM release will provide.
Managing Multiple Grid Engine Clusters with Service Domain Manager
Speakers: Richard Hierlmeier and Ryszard Macidlowski
Service Domain Manager (SDM) is an upcoming product from Sun that will
allow administrators to configure policies to automatically reassign
resources from one service to another based on service level objectives
and the changing load conditions. The Sun Grid Engine 6.2 software will
include an early version of SDM that will allow multiple Sun Grid Engine
clusters to dynamically share resources to maximize utilization across
the entire grid.
In this presentation, the speakers will present a concrete use case for
SDM. The presentation will walk through assigning a resource to Sun
Grid Engine server, automating resource assignment through service level
objectives, automatically discovering resources, and mapping Sun Grid
Engine complexes to SDM resource properties.
Accounting and Reporting Console Multi-Cluster Support
Speaker: Jana Olivova, Sun Microsystems
The Grid applications produce large amounts of accounting data and users are posed with a perplexity of sorting through the data and generating constructive statistical business reports. Data about the load averages, cpu and memory usage, average throughput or the number of jobs completed are often needed for statistical evaluation of the Grid processing. In the time of ever increasing need for a statistical data analysis, database plays a crucial part in fulfilling the data management requirements of Grid applications due to its advanced data mining capabilities. Sun Grid Engine Accounting and Reporting Console (ARCo) addresses these needs and offers the possibility to gather and store the Grid accounting data in a standard relational database (PostgreSQL, Oracle, MySQL) and access them through an online graphical user interface. The online console contains a set of predefined SQL queries supplementing the most frequent statistical inquiries. Users are able to create custom queries, display the tabular data in a graphical representation or pivot table, store the result snapshots and export data in PDF or CSV format. This presentation familiarizes users with ARCo and explains its multi-cluster support functionality.
ROCKS #3:
Customizing Rocks through Rolls. How to Develop Your Own
Speaker: Tim McIntire, President, Clustercorp; Anoop Rajendra; Greg Bruno
Rolls are the primary mechanism for customizing Rocks installations while enabling reproducibility to any number of clusters. Rolls can be commercial or open-source. ClusterCorp has produced several rolls and will describe their techniques and issues. Techniques for how Linux-based rolls are built and tested at UCSD. An introduction to the needed Rocks changes to support Solaris and Rocks-on-Solaris will be presented
Building on Open Source Rocks: 3rd Party Rolls for Rocks (T. McIntire)
One of the great things about the Rocks Cluster Distribution is the ability to extend, tweak, and replace functionality by leveraging the Rocks framework to build “Rolls”. This has allowed 3rd parties to build off the base Rocks solution, while the open source team remains focused on core functionality and new features on the cluster management side. In this presentation, Tim McIntire, President, Clustercorp will discuss a variety of Rolls that available from 3rd parties, including the OFED Roll, PBS/Torque Roll, and compiler Rolls for Intel, AMD, and PGI.
Using What Your Momma Gave You: Leveraging the Rocks Framework (TBD)
Doing things the “right” way is critical in maintaining an efficient Rocks-based cluster. Many system administrators (including myself) have horror stories of early experiences in cluster configuration and maintenance that include a litany of custom scripts and hacks that keep a system and its users up and running with all the necessary components. Rocks provides a built-in mechanism, “Rolls”, for building software stacks directly into the cluster distribution. Leveraging Rolls for the complete configuration of you cluster, will ensure that redeployment of compute nodes, or even a complete rebuild from the head node up, will be a simple, repeatable process. While there is a learning curve to developing Rolls and working within the Rocks framework, the long-term benefits greatly outweigh the short-term overhead.
Moving Beyond the Womb: A Overview of Currently Available 3rd Party Rolls (TBD)
A brief complete overview of available Rolls with 3rd party contributions including absoft, amd, apbs, bio, cisco-ofed, condor, intel, opal, moab, pbs, pgi, pvfs2, qlogic-ib, voltaire-ib. Subsequently, we’ll go into detail on two 3rd party Rolls with an open-source bent: PBS/Torque from the University of Tromso and Cisco-OFED from Clustercorp.
TUTORIAL #3: (GLOBUSWORLD)
Configuring and Deploying GridFTP for Managing Data Movement in Grid/HPC Environments
Speaker: Raj Kettimuthu, Argonne / UC
One of the foundational issues in HPC computing is the ability to move large (multi Gigabyte, and even Terabyte) data files between sites. Simple file transfer mechanisms such as FTP and SCP are not sufficient either from the reliability or the performance perspective. Globus implementation of GridFTP is the most widely used Open Source production quality data mover available today. Key features of Globus GridFTP include:
Performance: Typically GridFTP provides order of magnitude performance improvements compared to standard FTP. GridFTP's capability to use non-TCP protocols such as UDT and parallel streams to minimize bottlenecks inherent in TCP/IP, allows it to achieve good performance.
Cluster-to-cluster data movement: GridFTP can do coordinated data transfer utilizing multiple computer nodes at source and destination. This can increase performance by another order of magnitude.
Reliability: GridFTP provides support for reliable and restartable data transfers.
Multicasting: Globus GridFTP is capable of doing one source to many destination transfers.
Multiple Security options: Globus GridFTP framework supports various security alternatives. It supports Grid Security Infrastructure, SSH based security, anonymous access, username and password based security.
Modular: XIO based Globus GridFTP framework makes it easy to plugin alternate transport protocols. The Data Storage Interface (DSI) allows for easier integration with various storage systems.
Third-Party Control: GridFTP also allows secure 3rd party clients to initiate transfers between remote sites.
Partial File Transfer: In many cases in the scientific community it is expedient to download only portions of a large file, instead of
The entire file. GridFTP supports this capability by specifying the byte position in the file to begin the transfer.
Negotiation of TCP buffer/window sizes: GridFTP employs FTP command and data channel extensions to support both automatic and manual negotiation of TCP to get optimal performance.
In this tutorial, we will quickly walk through the steps required for setting up GridFTP on Linux/Unix machines. Then we will explore the advanced capabilities of GridFTP such as striping, and a set of best practices for obtaining maximal file transfer performance with GridFTP.
INTRODUCTION AND OVERVIEW #4: GLOBUS TOPICS
GridWay: The Open Source Metascheduling Technology for Grid Computing
Speaker: Ruben S. Montero, PhD, Associate Professor, Universidad Complutense de Madrid
GridWay is a widely-used metascheduling technology that performs job execution management and resource brokering, allowing unattended, reliable, and efficient execution of jobs, job arrays, and workflows on heterogeneous and dynamic Globus grids. GridWay performs all the job scheduling and submission steps transparently to the end user and adapts job execution to changing Grid conditions by providing dynamic scheduling, fault recovery mechanisms, migration on-request and opportunistic migration. The GridWay metascheduler is a Globus product, released under Apache license v2.0, welcoming code and support contributions from individuals and corporations around the world. GridWay provides the following benefits to the different stakeholders involved in a Grid environment: (i) for project and infrastructure directors, GridWay is an open-source community project, adhering to Globus philosophy and guidelines for collaborative development; (ii) for system integrators, GridWay is highly modular, allowing adaptation to different grid infrastructures, and supports several OGF standards; (iii) for system managers, GridWay gives a scheduling framework similar to that found on local DRM systems, supporting resource accounting and the definition of scheduling policies; (iv) for application developers, GridWay implements the DRMAA API (C and JAVA bindings) OGF standard, assuring compatibility of applications with LRM systems that implement the standard, such as SGE, Condor or Torque; and (v) for end users, GridWay provides a LRM-like CLI for submitting, monitoring, synchronizing and controlling jobs that could be described using the JSDL OGF standard. The presentation consists of two parts. The first part is a description of the state of the technology: main benefits and major features, alternatives for scheduling infrastructures, relevant use cases, and project status and roadmap. The presentation will focus on its state-of-the-art functionality, such as the new scheduling policies, which comprise job prioritization policies (fixed priority, urgency, share, deadline and waiting-time) and resource prioritization policies (fixed priority, usage, failure and rank). The second part of the presentation demonstrates its main functionality on production infrastructures, showing how GridWay is able to simultaneously access to distinct middlewares (GT pre-WS, GT WS and EGEE services), additionally allowing Grid interoperability and providing support to the transition to new Globus versions.
Using Taverna to Orchestrate Grid Services in a Workflow
Speaker: Ravi Madduri, Senior Software Developer, Argonne National Laboratory, University of Chicago
caGrid is a service-based grid software infrastructure that effectively bringing together distributed data and analytic resources into a virtual collaborative platform for cancer research. In caGrid, many of the tasks involved in the analysis and aggregation of cancer-related data make use of “canned” solutions, or workflows. As a result, there is a need to orchestrate the invocation of caGrid services through the use of a workflow language and tooling. Given the need to orchestrate caGrid services through the use of a workflow, this presentation first summarizes the rational in selecting Taverna as the primary candidate for workflow authoring and invocation. The presentation then introduces the development of Taverna plug-ins in general, and how to extend Taverna for use with caGrid services. The presentation then details a real-world example and the lessons learned from our research and experiment. To provide a full-fledged, grid-enabled workflow solution, future works include: 1) support for Taverna version 2.0 (T2) which is to be released soon; 2) Support for secure grid services; 3) support for semantic-based service discovery in the scavenger. 4) support for stateful grid services.
MyProxy based Short Lived Credential CA Service at NERSC
Speaker: Shreyas Cholia, Computer Systems Engineer, National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory
This session will discuss how the National Energy Research Scientific Computing Center (NERSC) uses a MyProxy based Certification Authority (CA) to issue short-lived grid certificates to its users. PKI X509 certificates form the backbone of grid services in the Globus universe. However, the process of acquiring these certificates is often cumbersome, and can be a deterrent to new grid users attempting to automate and manage their workflows. Moreover, security concerns make the proliferation of long-lived certificate files on compute systems across the grid undesirable. Credential repositories like MyProxy allow for centralized management of proxy credentials that can then be reused across the grid, but rely on users to manage their own certificates, and involve delegation of trust to an entity external to the PKI. To deal with these challenges, MyProxy now supports a CA feature to sign and generate short-lived end-entity certificates. This allows the identity provider to directly manage the issuance of short-lived certificates for practical grid use. The MyProxy based NERSC Online CA attempts to remove the burden of grid certificate generation and management from the user. It ties in the authentication of users receiving certificates with the existing identity management infrastructure at NERSC. NERSC maintains a user database called the NERSC Information Management (NIM) System. NIM contains records for all NERSC users. NERSC users have already been vetted by an accounts-and-allocations process, which includes PI verification or face-to-face contact. This process is common in the United States research communities (NSF, DOE processes for ID vetting). Once the user has been vetted, there is enough information to establish a password based authentication process to the NERSC infrastructure. We wish to leverage this infrastructure for issuance of grid certificates. The NIM database is exported through an LDAP directory tree, including an md5 hash of the user’s password. The user connects to the MyProxy CA using a standard myproxy-logon client. The user enters her NIM password when prompted, which is SSL encrypted and sent to the MyProxy CA. The MyProxy CA verifies the user’s password against the LDAP exported NIM record (using PAM). If successful the MyProxy CA issues a short-lived (12 hours - 11 days) user certificate to the client. This certificate has a unique and persistent DN for a given user. The DN is generated using information from the NIM record (using a combination of the user’s full name and UID). This system can also plug into a portal framework using the Commodity Globus kits, thus enabling web-based grid applications. The NERSC Online CA is based on the IGTF SLCS profile, and includes an Aladdin EToken hardware security module to store the CA’s signing key. The NERSC Online CA provides its users with identity credentials that can be used for job and file management, while effectively leveraging the existing identity framework and authentication systems already in place at NERSC. This allows NERSC to put in explicit authentication controls on credential generation, while greatly simplifying the process of certificate acquisition for the user.
10:00 – 10:30 am
Break
10:30 – 12:00 pm
GRID ENGINE #4
Making Grid Engine Highly Available with Open High Availability Cluster and OpenSolaris
Speakers: Ashutosh Tripathi, Senior Software Engineer, Sun Microsystems
The Grid Engine job scheduling software, while highly scalable, is not highly available “out of the box.” Although Grid Engine can handle failures of individual nodes in the compute cluster, the Master Host itself is a single point of failure. Grid Engine does provide a Shadow Master host mechanism to increase availability. However, for the highest availability, the Master Host itself should be run on a high availability cluster. Open High Availability Cluster, the open-source version of Solaris Cluster, Sun's enterprise high availability product suite, is the first open-source HA Cluster based on a major proprietary HA Cluster. Open HA Cluster tightly connects multiple physical nodes to provide a high availability platform for off-the-shelf software applications. This presentation will introduce the concept of high-availability clusters, and will show how Open High Availability Cluster can run the Grid Engine Master Host on OpenSolaris in a highly available fashion, providing quick recovery times, integrated highly available NFS and IP addresses, and configurable service dependencies.
 HPC Visualization on the GrId
Speaker: Linda Fellingham, PhD, Manager, Visualization and Graphics, Sun Microsystems, and W. Dean Stanton, Senior Staff Engineer, Sun Microsystems
While the performance of 3D graphics hardware has been increasing at astounding rates, in excess of Moore's law, the graphics pipeline is only one part of effective solutions for compute-intensive visual applications. Many interactive visual problems require huge memories, large numbers of CPUs, and/or high-speed access to vast amounts of data storage. These problems are well-suited to execution in the server room, where secure, professionally-managed systems and high-speed interconnects are commonly available as shared resources. Running visual applications on the grid and displaying the images over the network on ordinary, low-cost systems allows tackling larger problems and providing access to many more users, even over wide distances. The challenges that must be met are many - allocating and sharing graphics resources among users; transparently transforming applications which were designed to be used by a single user on a single desktop into applications that can be used by a remote user (or users) scaled across multiple nodes with multiple graphics devices; providing interactive capability through grid interfaces better suited for batch environments; facilitating re-configuration of scalable visualization middleware and applications (and their associated complicated configuration files and scripts), for maximum simultaneous utilization of resources. In addition, GPU computing is coming to the fore for many algorithms which can take advantage of the massively parallel, stream-computing model. Many of the same hardware and software solutions that address visualization on the grid can be leveraged to facilitate grid-based GPU computing. This presentation will describe how Sun Grid Engine, Sun Shared Visualization and Sun Scalable Visualization software work together to provide users seamless access to high-performance visualization applications and GPU computing resources on the grid. It will describe deployments of this visual computation model, and discuss the problems that can be effectively addressed by this technology now and in the future.
PluS: An Advance Reservation plug in for Sun Grid Engine
Speaker: Hidemoto Nakada, PhD, Senior Research Scientist, National Institute of Advanced Industrial Science and Technology
Advance Reservation is an important technology to make resource co-allocation is possible in the Grid environment. This presentation introduces Advance Reservation plugin for Sun Grid Engine, called 'PluS'. Although Sun Grid Engine recently gained Advance Reservation capability, there are still several advantages you can get with PluS:
- PluS provides policy setting mechanism on acceptance of reservation request, that allows site administrators to setup site-local policies, such as group-wise priority settings or history based acceptance. For policy description, we employed the ClassAd language from Condor project, which is a well formalized and powerful language.
- PluS supports two-phase commit protocol, that is important for modification of co-allocation.
- PluS provides capability to execute specific jobs precede and succeed the reserved jobs. This capability was turned to be important to change network settings for co-allocated jobs.
The principal PluS commands are following:
plus_reserve - make a resource reservation and returns reservation ID.
plus_cancel - cancel reservation
plus_modify - modify a existing reservation
plus_status - lists existing reservations
PluS have to operation modes:
1) represents reservation as a queue and controls the queues using external interface.
2) completely replaces the scheduling module of the queuing manger,
The former approach is easy to deploy since it does not change the existing scheduling module of Sun Grid Engine at all. The latter approach has advantage over the former in capability, since it potentially allows any capability the administrator want to have. Another role of PluS is to serve as an easy to use Java toolkit to construct scheduling module that replaces sge_sched. PluS provides easy-to-use Java API to retrieve information from Sun Grid Engine and control node allocation and job execution. The Java API talks to the sge_master via C written proxy module, called operatord, that translate SGE's native protocol, GDI, into plain text notation based on XML. The API allows us to implement novel scheduling algorithms on Sun Grid Engine easily. You can write FIFO based round-robin toy-scheduler in 70 lines.PluS is available from http://www.g-lambda.net/plus .
GLOBUSWORLD #4
Globus Execution Services
Speakers: Stuart Martin, Senior Software Developer, Argonne National Laboratory, University of Chicago; Kate Keahey, Mathematics & CS Division, Argonne National Laboratory Computation Institute, University of Chicago; Suresh Marru, Indiana University; Ruben S.
Montero, PhD, Associate Professor, University of Madrid; Ioan Raicu, Univeristy of Chicago
Globus execution management services provide the capability to submit, monitor, and cancel jobs on Grid computing resources. The remote jobs may require coordinated staging of data and credential management into the resource prior to job execution and out of the resource following execution.
- What's New in 4.0 and 4.2 GRAM, What's Planned for the Future (Stuart Martin): An overview of the latest developments and future plans for the Globus GRAM service, including optimizations for high- throughput, auditing support, support for the OGSA BES standard, SAML authorization support, alternative clients (Java CoG, Condor-G, and others), grid-enabled MPI (MPIg), and dynamic service startup / task execution (Condor GlideIn and FALKON). We focus in particular on recent enhancements and new features in GT4.0 and GT4.2 releases.
- Virtual Machine Management Services (Kate Keahey): An overview of the Globus Toolkit Workspace Service that allows an authorized Grid user to provision and manage environments (currently implemented as virtual machines) in the Grid. The talk will provide an introduction to the cloud computing talk later in the week.
- Experiences with the use of GRAM in the LEAD portal: (Suresh Marru): An overview of the Linked Environments for Atmospheric Discovery (LEAD) portal that provides access to meteorological data, forecast models, and analysis and visualization tools to researchers, educators and students. The focus will be on the experiences of LEAD's use of GRAM and other Globus components during the Spring 2008 weather forecast challenge.
- The GridWay metascheduler (Ruben S. Montero): An overview of the GridWay metascheduler and it's integration with Globus components like GRAM and MDS.
- Swift and Falkon (Ioan Raicu): An overview of Swift, a system that bridges scientific workflows with parallel computing. And Falkon, a light-weight task execution service for optimized task throughput and resource efficiency when executing many independent jobs on large compute clusters.
TUTORIAL #4: (ROCKS)
Basic Management and Customization
Speaker: Greg Bruno
While Rocks clusters are turnkey, users always to manage and customize their cluster. Introduction of the Rocks configuration graph and how to add new packages and configuration will be covered. Other common customization scenarios will be described.
INTRODUCTION AND OVERVIEW #5: INNOVATIVE USES OF ROCKS
A Case Study on Building Faster, Easier HPC Clusters with Rocks at Stanford University
Speaker: Steve Jones, Manager, High Performance Computing Center, Stanford University
In just 11 days during 2007, the Stanford University High-Performance Computing Center Center was able to fully implement a 1,696 core cluster solution by leveraging the certification methodology from the Intel Cluster Ready Program. In addition to rapid deployment, the system nearly doubled the performance of the center’s existing compute system. The new Stanford solution leverages Dell, Clustercorp and Panasas technologies, providing the Center unprecedented flexibility to meet their ever-expanding computational and application requirements and enabling Stanford researchers to achieve faster time-to-results. Steve Jones, the founder and manager of the Stanford HPC Center, will discuss his experiences in the design and deployment of this system. Mission: CFD on Demand The goal of the expansion was simple. Acquire sufficient compute power to support the School of Engineering coursework and research efforts and support the university’s industrial affiliates program. Key research programs include the Department of Energy Advanced Simulation and Computation (ASC) program, sponsored by the National Nuclear Security Administration, and the next-generation Predictive Science Academic Alliance Program (PSAAP). The system had to be capable of accommodating over 200 researchers. Two groups, in particular, required large-scale, massively parallel computing resources for their work with the ASC program. The researchers in the mechanical engineering and aeronautics and astronautics departments leverage the HPC Center resources to analyze the details of flow and acoustics created by helicopters in forward flight. Critical applications include two major in-house-developed simulation codes: Stanford University multiblock (SUmb) and CDP, named for the late Charles David Pierce. Commercial applications include ANSYS, Gaussian, MatLab, and VASP. Result: 11-Day Deployment Delivers 2-14X Performance Improvement The entire deployment, including implementing an entirely new power and cooling infrastructure, took a total of 11 days. Dell’s Enterprise Deployment team played an integral role in this feat, coordinating the efforts of all participating vendors. The power, cooling, and system build out were completed in parallel. We used the Rocks+ Linux cluster distribution to configure master and compute nodes, and by day 11 researchers were able to submit jobs that were flawlessly executed producing scientific code and operations with unprecedented fidelity. The new cluster easily handles ten times the workload of the original 48-node configuration. Testing results show performance of 15.8 teraflops performance compared to 1.1 teraflops delivered by the smaller cluster.
Extending Rocks for the Creation and Management of Grid Systems for Biomedical Research
Speaker: Vicky Rowley, UCSD
The Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project, an NIH funded project, was launched in 2001 with the goal of fostering collaborations. With a focus on data and tool sharing for biomedical science (Grethe et al., 2005), the BIRN infrastructure is designed around a flexible large-scale grid model, combined with the conventional IT infrastructure to support the deployment of web servers, applications servers, database servers and authentication mechanisms. The result is a complete computing environment that facilitates biomedical research. To date, this system supports a production environment with over 25 fielded sites, with separate staging and development environments for black and white box testing.
A separate and distinct grid, including separate production, staging and development environments, has been established using the same software stack and deployment mechanisms, for the National Database for Autism Research (NDAR) project. NDAR is a collaborative bioinformatics system being created by the National Institutes of Health (NIH) to support research in autism spectrum disorder (ASD) and to help accelerate scientific discovery.
The scientific software integrated to date includes:
- web-based frontend software for end users
- application software for a wide variety of purposes, including web applications
- image processing software specific to processing medical imaging data and adapted to run on a large computational clusters,
- database applications using Oracle, MySQL and Postgres database engines
- “Point-of-Presence” servers, which connects an individual site’s data into the rest of the grid
Managing and deploying hundreds of servers over dozens of sites, including instantiation of multiple environments involving several server types would not be possible without the high level of automation provided by the Rocks-based framework used by the BIRN Coordinating Center. Rocks builds upon RedHat Linux’s use of RPMs and kickstart files , allowing customized, flexible, yet highly automated installation of Linux servers. Rocks allows server functionality and parameters unique to each server (e.g. IP address, hostname, timezone, etc.) to be established at install time using extremely minimal inputs. In contrast, conventional kickstart and so-called “golden image” installations leave the server with its software configured as it was for the server from which the kickstart file or image was made. It then has to be re-configured with the correct information. Also, unlike the golden image method, kickstart installations allow use of diverse hardware – a quality that Rocks installations inherit.
The ability to custom install a server based on both its unique parameters and it’s required functionality in a repeatable, highly automated way has been key in providing our ability to quickly and repeatably establish these grids. The Rocks software is designed for fast, repeatable deployment of computational clusters. For the BIRN project, the Rocks software was extended by the BIRN Coordinating Center (BCC) to support additional frontend types, including the web servers, application servers and database servers previously mentioned. In addition, to facilitate updating the grid software and to support additional software distribution paths, the software repository used for server installation was extended to support YUM (Yellow-dog Updater Modified).
Rocks based Virtual Cluster Management System: GriVon
Speaker: Takahiro Hirofuchi, PhD, Research Scientist, National Institute of Advanced Industrial Science and Technology
We introduce a virtual cluster management system, based on Rocks, called 'Grivon'. Virtual clusters are an virtual environment constructed on real, physical clusters. It provide better abstraction than mere virtual machines, with virtualized networks and virtualized storage. The virtual cluster networks are logically isolated from the real networks to provide better security. Grivon leverage Rocks, the cluster provisioning system, to maintain the virtual cluster. Users of the virtual cluster make reservation for 'virtual clusters' specifying time slot for the cluster, resource requirements, required Rolls, required appliances. When the specified time arrives, the system automatically create a virtual cluster using Rocks and provide it to the users. Behind the scene, on the specified time, the system sets-up virtual networks and storage, and create virtual machine configuration files, and then start up a virtual front end as one of the virtual machine. Thanks to the 'lights-out' installation capability of Rocks, the installation are performed without any interaction with human being. The virtual cluster setting information, such as number of each appliances and their MAC addresses, are automatically injected into the database on virtual front end via Rescue Rolls which are created based on users reservation requests and resource allocation. When the virtual front end installation completes, the system starts up other virtual nodes so that they are installed from the virtual front end. The virtual front end distributes packages according to the injected information in the database. Thus, a completely configured Rocks cluster is automatically installed on the virtual world. Grivon uses VMware Server for computer resources virtualization, VLAN for network virtualization and iSCSI with Logical Volume Manager(LVM) for storage virtualization. The system is also capable of multi-site hosted virtual cluster, that allows more flexible management and higher resource utilization. Inter-site communication are performed with VPN to ensure private communication. Another thing have to be noted is that, the Grivon system itself is implemented as a Rocks Roll, allowing easy installation. Grivon will be contributed to the community as a series of Rolls in the near future.
In this presentation, we explain our project overview and VMware Roll
released in May 2008.
12:00 – 1:30 pm
LUNCH (PROVIDED)
1:30 – 3:00 pm
ROCKS #4:
Extending Functionality Through the Rocks Command Line Roll Screen Development
Speakers: Nadya Williams, Grid Specialist, University of Zurich; Mason Katz; Anoop Rajendra
As an extension to the previous session, roll-developers can add new installation screens and have them integrated seamlessly. Nadya Williams will describe here test harness that significantly improves the development of installation screens. The Rocks command line is the way rolls extend the command structure for Rocks. The Rocks Viz Roll will be used as key example of roll-based extension to support tiled-display clusters. The Solaris command set (currently under development) will be illustrative of how Rocks commands can work across different architectures.
Roll Screen Development ( Nadya Williams)
Rocks Clusters Distribution has a powerful method to add software packages with a roll. Many rolls for diverse applications have been developed by Rocks group and by scientists who want to add their applications to the rocks clusters. Packaging the scientific application as a roll makes it easy to install and update the application and makes it convenient to share the application’s installation, configuration and updates with others. Rocks provides a mechanism for building your own roll using the rocks-specific tools thus making rolls’ integration into the cluster seamless and automatic. Some software, especially grid middleware, requires collecting a user input during the cluster install, and this is done via a screen forms mechanism. Creating a roll with the screen enabled requires additional pieces of software to be written. In addition, testing the roll’s screen requires building a cluster frontend to view and test the screen. We present an example of how the testing and debugging of the roll’s screen can be done “online” without building the frontend. The idea here is that a developer can use an iterative process to build the screen, test it and validate it without leaving a roll development directory. This approach helps to speed up the roll development cycle by providing a way to visualize and validate the roll’s screen in situation.
GLOBUSWORLD #5
Globus Security: Features and Roadmap & Building Secure VOs using Globus Toolkit

Speakers: Frank Siebenlist, Argonne National Laboratory; Rachana Ananthakrishnan, Senior Software Developer, Argonne National Laboratory; Kunal Modi, Security Solutions Architect, Ekagra Software Technologies / Center for Bio-Informatics and Information Technology (CBIIT) - NCI; Tom Scavo, Lead Developer of GridShib Project, National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign
Globus' Security framework and services ensure the integrity, privacy and policy enforcement of the communication and resource usage on the Grid. We will report on a number of exciting new features included in the GT4.2 release that allow for fine-grained policy enforcement of services and resources. Its sophisticated attribute-based framework allows the plugin of different attribute-collecting policy information points, and co-located and external policy decision point implementations. Furthermore, a number of Grid projects will discuss the security components and services that they are contributing to the Globus community at large, like the cancer grid project caBIG (caGrid's Grid Authentication and Authorization with Reliably Distributed Services (GAARDS)), Earth System Grid ("easy" PKI based on Myproxy's online-CA and auto-provisioning), TeraGrid/GridShib (SAML/Shib attribute services).
-GT4.2 Security update and futures & ESG's easy PKI (Frank Siebenlist)
-GT4.2 Security update and futures (Rachana Ananthakrishnan)
-GAARDS: caGrid's Grid Authentication and Authorization with Reliably Distributed Services (Kunal Modi)
-Attribute-based Authorization for Science Gateways Using GridShib (Tom Scavo)
A TeraGrid Science Gateway is an intermediary between a browser user
and one or more TeraGrid resource providers. The Gateway typically
provides a domain-specific portal interface that hides the details of
the computing environment from the user and hence provides a more
usable and productive computing experience. TeraGrid is asking Science Gateways to include user attributes in all
job submissions. This enables TeraGrid resource providers (RPs) to
perform two important tasks that are currently difficult to implement.
First, RPs would be able track individual gateway users and their
activities, and second, RPs would be able to block malicious or
runaway behavior of specific gateway users without disabling the
entire gateway. A new grid authorization model has been proposed that meets the
TeraGrid security requirements for attribute-based auditing and
authorization for Science Gateways. The proposed model incorporates
GridShib SAML Tools at the Gateway and GridShib for Globus Toolkit at
the RP. With these two software components installed, the Gateway
passes user information that the RP can use for fine-grained access
control, auditing and incident response. This presentation will provide some background on the GridShib
Project, outline the features and general availability of the various
GridShib software components, and show how the software is being used
to address the TeraGrid Science Gateway use case.
TUTORIAL #5: (GRID ENGINE)
Service Domain Manager Installation
Speakers: Richard Hierlmeier and Ryszard Macidlowski
Service Domain Manager (SDM) is an upcoming product from Sun that will
allow administrators to configure policies to automatically reassign
resources from one service to another based on service level objectives
and the changing load conditions. The Sun Grid Engine 6.2 software will
include an early version of SDM that will allow multiple Sun Grid Engine
clusters to dynamically share resources to maximize utilization across
the entire grid.
This tutorial will walk participants through the process of installing
and configuring the SDM version included with Sun Grid Engine 6.2 to
manage multiple Sun Grid Engine clusters. After this tutorial,
attendees should have a clear idea of the steps needed to both install
SDM and to configure it to manage a Sun Grid Engine cluster.
INTRODUCTION AND OVERVIEW #6: VIRTUAL MACHINES AND CLUSTERS
Grid Engine within the Amazon EC2 Cloud
Speaker: Chris Dagdigian, Proncipal Consultant, BioTeam, Inc.
Using real world examples taken from BioTeam consulting projects, this
talk will center on the questions of "when" and "how" to move Grid
Engine managed workflows into the Amazon Web Services ("AWS") cloud.
Core AWS services including the Elastic Computing Cloud (EC2) and the
Simple Storage Service (S3) will be covered.
EUCALYPTUS: An Open Source Service Infrastructure for Elastic Computing Research
Speaker: Rich Wolski, PhD, Computer Science Department, University of California-Santa Barbara
Elastic Computing has emerged as a popular SLA-based utility computing paradigm that is has seen rapid uptake in the small-and-medium business e-commerce market place. First deployed by Amazon.com, a number of service hosting enterprises and technology providers have since developed utility, cloud, or elastic product and/or service offerings. In this talk, we will present EUCALYPTUS -- Elastic Utility Computing Architecture for Linking Your Programs to Transiently Useful Systems – an open source service overlay that implements elastic computing and a hosted service using existing resources, The goal of EUCALYPTUS is to allow sites with existing clusters and server infrastructure to co-host an elastic computing service that is interface-compatible with Amazon's EC2. The talk will focus on three specific innovations that are necessary to provide an easy-to-install and maintain EC service: network provisioning, user account management, and system administration interfaces. Because EUCALYPTUS is designed to function as an overlay, it must be able to incorporate resources from different clusters or pools without requiring their reconfiguration or repurposing. For example, EUCALYPTUS allows its administrator to set up a cloud that permit users to virtualized OS instances on a number of clusters transparently. Enabling the necessary network interconnectivity in a way that is secure and portable is one novel feature of EUCALYPTUS. Another stems from its ability to provide interface compatibility with the existing Amazon EC2 service. EUCALYPTUS users can develop using their own local resources and then transition directly some or all of their functionality to EC2. Finally, a key requirement of EUCALYPTUS is that it be able to serve as a research platform for elastic computing. To this end, its design makes two significant contributions. The first concerns the use of scarce network resources in a structured way. A EUCALYPTUS allocation can function equally well in an environment in which all processors have externally routable IP addresses (e.g. Amazon's current environment) as well as one in which only a certain head instance is externally routable (as is the case with most academic research clusters today). Secondly, EUCALYPTUS leverages the extensive Linux packaging and deployment support that is currently available while requiring minimal modification to the existing installed OS base. Specifically, the target resources need only run a standard Xen-enabled kernel with Xen 3.1 or later hypervisor support. All other functionality installs directly without need for kernel patching or module additions to the local host OS domain. In the talk we will outline the design of EUCALYPTUS, discuss its architectural and administrative characteristics, and detail the degree to which it achieves the ability to implement elastic computing as a service overlay.
OpenNEbula: The Open Source Virtual Machine Manager for Cluster Computing
Speaker: Rubén S. Montero, PhD, Associate Professor, Universidad Complutense de Madrid
The aim of the OpenNebula technology is to transform a physical infrastructure into a virtual infrastructure by dynamically overlaying VMs over physical resources. So computing services, such as working nodes managed by existing LRMs (Local Resource Managers) like SGE, OpenPBS..., could be executed on top of the virtual infrastructure; so allowing a physical cluster to dynamically execute multiple virtual clusters. The separation of resource provisioning, managed by OpenNebula, from job execution management, managed by existing LRMs, provides the following benefits:
- Cluster consolidation because multiple virtual working nodes can run on a single physical resource, reducing the number of physical systems and so space, administration, power and cooling requirements. The allocation of physical resources to virtual nodes could be dynamic, depending on its computing demands, by leveraging the migration functionality provided by existing VMMs
- Cluster partitioning because the physical resources of a cluster could be used to execute working nodes bound to different virtual clusters
- Support for heterogeneous workloads with multiple (even conflicting) software requirements, allowing the execution of software with strict requirements as jobs that will only run with a specific version of a library or legacy application execution
Consequently, this approach provides the flexibility required to allow Grid sites to execute on-demand VO-specific working nodes and to isolate and partition the physical resources. Additionally, the architecture offers other benefits to the administrator of the cluster, such as high availability, support for planned maintenance and changing capacity availability, performance partitioning, protection against malicious use of resources. The OpenNebula Virtual Infrastructure Engine differentiates from existing VM managers systems in its highly modular and open architecture designed to meet the requirements of Grid site and high-performance cluster administrators. The OpenNebula Engine provides a command line interface for monitoring and controlling Xen VMs and physical resources quite similar to that provided by well-known LRMs. Such interface allows its integration with third-party tools, such as LRMs, service adapters, VM image managers…; providing a complete solution for the deployment of flexible and efficient computing clusters. The presentation will include a description of the architecture of the new open-source engine for virtual infrastructures and experimental results of its performance when managing a dynamic workload of VMs running execution hosts of a SGE cluster. The second part of the presentation will demonstrate its main functionality in the dynamic management of a computing cluster.
SPONSOR TALK
Innovation and Openness Creating New Opportunities in HPC/Grid - An
Update from Sun
Speaker: Bjorn Andersson, Sun Microsystems, Inc.
Learn how Sun innovation and openness (open source, open standards and
open platforms)
are making it simpler to solve your complex business problems faster.
The talk will cover Sun's
innovative "Designed for Performance" end-to-end HPC cluster portfolio
which brings new
levels of performance, efficiency and scalability to your
infrastructure. Learn how Sun can
assist you in achieving breakthrough economics plus the power to run
more simulations, perform
intricate business analysis faster, or bring new products to market faster.
3:00 – 3:15 pm
Break
3:15 – 4:00 pm
Keynote Presentation sponsored by Sun Microsystems:
Call for Participation in Open Source Efforts in Grid Computing and HPC
Speaker: Bob Porras, Vice President Solaris Data, Availability, Scalability & HPC, Sun Microsystems, Inc.
This keynote will introduce to open source communities such as OpenSolaris, OpenStorage, Lustre, Grid Engine, openXVM, Fortress, Open MPI, Crossbow, GRID.org and others. It will show the participation from Sun and demonstrate how Sun addresses Grid, Cluster and HPC requirements with open software stacks as well as how the audience can engage in these efforts.
4:00 – 4:30 pm
Break
4:30 – 6:00 pm
GRID ENGINE #5:
Voice of the Community BoF
Speakers: Miha Ahronovitz, Line Product Manager, Grid Computing Software Division, Sun Microsystems; Fritz Ferstl, Director of Grid Engineering, Sun Microsystems; Daniel Templeton, Strategic Liaison Manager, Sun Grid Engine, Sun Microsystems
A Birds of a Feather (BOF) session to provide Grid Engine enlarged community of users and developers a face to face exposure. The panel - all Grid Engine product team members - will answer questions and
facilitate dialogs on Grid Engine and broader topics - best
practices , open standards - and on what we can do more to take the
project to new higher levels of participation.
ROCKS #5
Talk Back to the Rocks Developers
Panel: Philip Papadopoulos, Mason Katz, Greg Bruno
This is the "open-mike" session of the Rocks Workshop where users can give direct feedback about the workshop, what they see as key issues that should be addressed and generally ask the developers any question on their mind. This session has often led to important changes in the Rocks infrastructure to better meet the community needs.
TUTORIAL #6: (GLOBUS WORLD)
Virtualization and Cloud Computing with Globus
Speaker: Kate Keahey, Mathematics & CS Division, Argonne National Laboratory Computation Institute, University of Chicago and Tim Freeman, Argonne/UC
One of the primary obstacles users face in grid computing is that Grids provide access to many diverse resources, their applications often require a very specific, customized environment. This disconnect can lead to resource underutilization, user frustration, and much wasted effort spent on bridging the gap between applications and resources. Virtual workspaces describe the environment required for the execution of an application that can be dynamically deployed across a variety of resources creating a working and consistent platform for grid applications.
This tutorial will introduce the Globus Toolkit workspace service that implements workspaces as Xen virtual machines and enables authorized grid clients to dynamically deploy them and manage their resources. Further, we will describe and demonstrate the workspace "cloudkit" that provides a user-friendly interface on top of the workspace service allowing authorized users to easily provision and run VMs on the available community clouds. Finally, we will describe how the process of contextualization can be used to provide on-demand functioning clusters and give examples of its use by applications.
INTRODUCTION AND OVERVIEW #7: DATA RESOURCE MANAGERS
Storage Resource Management: Grid Technology for Dynamic Storage Allocation and Uniform Access

Speaker: Arie Shoshani, PhD, Senior Staff Scientist, Lawrence Berkeley National Laboratory
The Storage Resource Management (SRM) Grid technology was developed in response to growing needs of managing large datasets on a variety of storage systems. In general, an SRM can be defined as a middleware component that manages the dynamic use and content of a storage resource in a distributed system. This means that space can be allocated dynamically to a client, and that the decision of which files to keep in the storage space is controlled dynamically by the SRM. This technology increases the user’s productivity by eliminating the tedious and time consuming tasks of managing storage, performing robust file movement using various transfer protocols, and dealing with security requirements at various storage sites. SRMs are used for dynamic space reservation and management, streaming data for analysis, and removing automatically unused replicated files. Unlike a centralized approach to storage access, where a single implementation is designed to interface to all storage systems, the SRM approach is to developed well-defined powerful standard interfaces that can be developed by various groups. This approach has already shown its power, in that there are already multiple SRM implementations for various storage systems in the US and Europe. This approach eliminates the dependence on a single implementation, a very important aspect of Grid middleware products. The SRM concept of a storage resource is flexible. Storage resources range from simple disk systems, to complex multi-disk and parallel file systems, as well as hierarchical mass storage systems, such as HPSS. Having to adapt and manage access to a variety of storage systems (some local and some remote) is a complex task for users. SRMs address such issues by providing a software layer in front of various storage systems based on the standard interface. Thus, a data movement module for moving robustly terabytes of data can rely on SRMs to take care of coordinating storage allocation, streaming the data between sites, and support security requirements of the storage systems. Similarly, a data analysis or computation task can request storage space dynamically before launching the task, to ensure that the task will not fail because of lack of storage space. Another important problem that SRMs address is storage clogging. Storage clogging is a important issue for large scale clusters, since the removal of files after they are used is not automated. SRMs help unclog temporary storage system, by providing lifetime management of accessed files. In this talk, we will describe the principles that underline the design of the SRM standard interfaces, illustrate several SRM implementations that currently inter-operate, and discuss several project that currently use SRMs successfully.
dCache, Managed Grid Storage
Speaker: Patrick Fuhrmann, PhD, Pyhsicist, Deutsches Elektronen Synchrontron (DESY)
End of 2008, the Large Hadron Collider (LHC) at CERN, the largest High Energy Physics Experiment ever, is expected to go online. A sustained stream of about 300 MB per second of valuable data will then be produced by the four attached experiments. Beside storing this data at CERN, which by definition is the Tier-0 in the LHC data grid model, the data will be further distributed among the 11 Tier-I centers around the world in real time. Subsequently, parts of this data will be sent to hundreds of associated Tier II's. Tier 0 and I, in contrast to Tier IIs, are responsible for storing data safely on tape media and as such build the global backup of this precious information. In order to process the data, massive CPU farms at the various sites need random, posix like access. The resulting datasets are again distributed up and downstream, using the same mechanisms. The actual data storage endpoints at the different Tiers within the LHC middleware stack are so called Storage Elements (SE). In order to separate the SE implementation from its functionality, SE's are required to support a set of protocol classes. These classes cover streamed data transfer and local, posix-like, data transfer protocols (http, gridFtp, dCap, xroot), a storage management protocol (SRM) as well as information publishing protocols. This presentation will introduce dCache, an LHC SE implementation jointly provided by DESY (Deutsches Elektronen Synchrotron) in Hamburg, FERMILab in Batavia (Chicago) and the Nordic Data Grid Facility (NDGF) in northern Europe. dCache is currently installed at 8 of the 11 Tier I centers and at about 60 Tier II's in 22 countries. dCache is going to manage the largest share of LHC data outside CERN. The complete dCache source code is publicly available at the dCache.org web side, including all mechanisms to build the most recent official dCache version. Beside briefly touching on the various data transfer, control and information provider protocols supported, the presentation will discuss the technology to manage local and remote storage in the order of a 2 digit petabyte area with dCache. This includes features like automatic dataset replication on either data access hot spot detection or following rules. We will show how dCache can inter-communicate with different tape back-end systems as well as how a single dCache instance serves storage of 4 different countries in northern Europe. Last but not least, the presentation will focus on dissemination and future work. An analysis of demands for non HEP communities, e.g. Astro, Biomed and Climate on grid storage has revealed that secure standard protocols for local posix access seem to be crucial to attract those communities. Therefore, the dCache team has already spend significant efforts in implementing the NFS4.1 protocol in dCache. The main advantages are certainly that NFS4 Clients are already available for Linux, Solaris and other OSs. Moreover NFS4.1 is the first open and standard network file system protocol honoring the fact that data of a single logical storage instance may be distributed among a large set of data servers, which is the case for dCache.
Grid Data Scheduling with Stork
Speaker: Tevfik Kosar, PhD, Assistant Professor, Louisiana State University
Modern scientific applications and experiments become increasingly data intensive. Large experiments, such as high-energy physics simulations, genome mapping, and climate modeling generate data volumes reaching hundreds of terabytes. In order to process these data, scientists are turning towards distributed resources owned by the collaborating parties to provide them the computing power and storage capacity needed to push their research forward. But sharing, disseminating, and processing of large data sets over widely distributed resources imposes new challenges. The systems managing these resources must provide robust scheduling and allocation of storage resources, as well as the reliable and efficient management of data movement. Traditional distributed computing systems closely couple data handling and computation. They consider data resources as second class entities, and access to data as a side effect of computation. Data placement (i.e. access, retrieval, and/or movement of data) is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. The insufficiency of the traditional systems and existing CPU-oriented schedulers in dealing with the complex data handling problem has yielded a new emerging era: the data-aware schedulers. One of the first examples of such schedulers is the Stork data placement scheduler that we have developed. Stork implements techniques specific to queuing, scheduling, and optimization of data placement jobs, and provides a level of abstraction between the user applications and the underlying data transfer and storage resources. Stork can interact with higher level planners and workflow managers such as Pegasus and Condor DAGMan. This allows the users to schedule both CPU resources and storage resources together. Stork also acts like an I/O control system between the user applications and the underlying protocols and data storage servers. It provides complete modularity and extendibility, so that the users can add support for their favorite storage system, data transport protocol, or middleware very easily. One of the most recent work we have been doing on Stork is the capability to predict the optimum number of parallel streams for wide area data transfers. With the improvements we have made on the existing mathematical prediction models, with little historical or instant information, we were able to predict peak parallelism levels for wide are GridFTP transfers. With this information at hand, Stork can optimize its each individual transfer with little information to be gathered and with minimal overhead. In this talk, we will discuss the limitations of the traditional CPU-oriented batch schedulers in handling the challenging data management problem of large scale distributed applications; give our vision for the new paradigm in data-intensive scheduling; and elaborate on our case study: the Stork data placement scheduler, its current status and future plans.
THURSDAY, MAY 15, 2008
8:30 – 10:00 AM
GLOBUSWORLD #6
Innovative Grid Applications
Three speakers describe large-scale grid deployments that have made big contributions to science or (in one case) biomedical research.
-The Earth System Grid: Don Middleton, National Center for Atmospheric Research
-The Southern California Earthquake Center: Phil Maechling, University of Southern California
-The Children's Oncology Grid and MEDICUS: Stephan Erberich, Children's Hospital Los Angeles
SUBMITTED TALKS #2
Statistical Virtualization of Resources for Grid Computing
Speaker: Dan Nurmi, University of California-Santa Barbara
Grid computing users generally rely on application concurrency to
achieve performance, which generally relies on the availability of
many distributed computational resources. Modern users have the
ability to draw from a vast array of distributed resources due to the
ever increasing quality of the software and networks connecting these
resources. However, as the pool of resources available to users
grows, so does the level of resource heterogeneity and performance
response dynamism. In this talk, we will discuss a new technique that
uses statistical methodologies to manage resource performance
dynamism, and virtualization techniques to abstract away resource
heterogeneity. In particular, we will show how we have successfully
applied the idea of statistical virtualization to the management of
dynamism found in both provisioning delay and availability of Grid
resources, and have been able to provide our solution to the Grid
community as a set generally applicable services (QBETS: batch queue
job delay prediction service and VARQ: statistical advance reservation
and co-allocation service). Both of these systems are in operation on
production HPC resources today and operate entirely as overlays atop
existing Grid and local software/administrative policies. Finally, we
will outline the next steps that are required before a fully
statistically virtualized Grid resource can be realized, and discuss
some of the challenges we face in pursuit of this goal.
Massive Deployment of Kerrighed Virtual SMP Cluster using Diskless Remote Boot Linux
Speakers: Rock Kuo, Assistant Research of Grid Technology Division, National Center for High-performance Computing and Jazz Yao-Tsung Wang, Assistant Researcher of Grid Technology Division, National Center for High-performance Computing
Kerrighed provides a virtual view to merge CPUs and memories of a cluster into a SMP machine, but it require skills to build Single System Image for your cluster. Diskless Remote Boot Linux (DRBL) is a network booting mechanism which automatically configure NFS Root File System for each diskless client. By integrating Kerrighed with DRBL, you can massively deploy Kerrighed to each cluster node and easily reconfigure the Host Linux OS in the future. You can also use DRBL command to add and remove cluster nodes dynamically.
Virtual Cluster Development Environment
Speaker: Thamarai Selvi Somasundaram, Professor, MIT Campus, Anna University and Kannan Govindarajan, Project Associate, MIT Campus, Anna University
This presentation is about virtual cluster formation on the fly using Xen hypervisor. The proposed work is for creating a virtual cluster based on the requirement of the user. Also the Globus Toolkit is automatically installed and configured on the fly in the newly created head node of the cluster. The virtual information service, virtual cluster formation service as well as Execution services are written to submit sequential as well as parallel jobs.
TUTORIAL #7 (GRID ENGINE)
Multi-Cluster Accounting with the Accounting and Reporting Console (ARCo)
Speaker: Jana Olivova, Sun Microsystems
This workshop is designed primarily for users who already know SGE ARCo and would like to gain more hands-on experience, but also users who wished to gain insight into this application are encouraged to attend. During the workshop, the ARCo multi-cluster support and the benefits of consolidating multiple databases under one database with multiple schemas will be introduced. The participants will perform migration of an existing ARCo database to a different schema, while also learning some useful database terminology and SQL commands. The ARCo installation, schema configuration in PostgreSQL database, Derived Values, Deletion Rules, cross-cluster SQL queries, and the ARCo GUI will also be visited and explored.
INTRODUCTION AND OVERVIEW #8: CLUSTER FILE SYSTEMS
PVFS: Past, Present, and Future
Speaker: Walter B. Ligon III, Associate Professor, Clemson University
PVFS has been around for 14 years starting life as a research prototype and now in use as a production parallel file system. In this talk we will give a brief history of the development, developers, and goals of PVFS. We will provide an overview of the PVFS architecture, features, and current status. Finally We will discuss future directions in both development and research.
Sun Open Storage and File Systems for Grids
Speaker: Robert Read, Senior Staff Engineer, Sun Microsystems and Brian Wong, Distinguished Engineer, Sun Microsystems
This talk will give an overview of the open source file system Technology that is applicable to grid computing. We will look at ZFS, pNFS and Lustre as well as other technologies
Comparison of Gfarm- and Lustre- based Storage Systems -- From Geosciences Applications Perspective
Speaker: Yusuke Tanimura, Researcher, National Institute of Advanced Industrial Science and Technology
This presentation will report users' experience of the large-scale data storage system built with open-source, scalable parallel filesystem software. In particular, Gfarm and Lustre are picked up and compared with regard to performance and operational expenses. Target applications on the systems are from geosciences, which require storing from hundreds terabytes to several petabytes of satellite observation data and performing data conversion/analysis in parallel. The performance measurement using one of the data conversion programs on both storage systems will be presented. In the geosciences, an IT infrastructure for storing all of the past and future satellite observation data, and for allowing geoscientists to access them quickly and easily, is giving great impact to various research fields such as disaster preservation, environment monitoring and so on. AIST has operated a Gfarm-based distributed storage system, which consists of 24 nodes of commodity PCs connected by Gigabit Ethernet, for the ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) data, since 2005. Gfarm is open-source software to build a parallel filesystem consisting of local disks of PCs, and has a unique feature that each filesystem node acts as a client of the Gfarm filesystem so that distributed access by the filesystem nodes realizes super-scalable I/O performance. Currently, the system stores more than 150 TB of images. The system's overview and requirements from geosciences applications will be introduced at beginning of the presentation. For higher resolution sensors than ASTER, which appears in the near future, AIST is planning a petabytes-scale storage system. One approach is to extend the current storage system, however other approaches which choose different storage middleware are considerable and Lustre is chosen as a candidate. Compared with Gfarm, Lustre has more popular architecture as other parallel filesystems and relies on high-speed network between client nodes and storage nodes, in order to benefit high performance. The evaluation was performed with 30 nodes of Sun Fire X4500. The Gfarm-based storage system used 20 nodes of them and provides 256.5 TB capacity. The Lustre-based storage system used 10 nodes of them and provides 135 TB capacity. In the latter case, 16 nodes of Sun Fire X4600 were additionally used for data processing and both storage and compute servers were connected by Infiniband. In the basic performance test, Gfarm fits dynamic striping of Solaris ZFS and aggregated I/O throughput increased almost linearly, as the number of the filesystem nodes increased. On the other hand, Lustre achieved much more scalable metadata access than Gfarm. In the experiment using the ASTER data conversion program, both storage systems achieved similar performance, under the circumstance that 80% of the disk capacity was filled. Consequently, system reliability and ease of maintenance are an important factor for choosing storage systems. Functional redundancy and self-management to reduce daily administrative work are required in the petabytes-scale distributed storage system.
10:00 – 10:30 am
BREAK
10:30 – 12:00 pm
SUBMITTED TALKS #3
Building Shared High Performance Computing Infrastructure for the Biomedical Sciences
Speaker: Marcos Athanasoulis, PhD, Chair of the Biomedical High Performance Computing Leadership Summit and Director of Client Services and Research Information Technology, Harvard Medical School
In recent years high performance computing has moved from the sidelines to the mainstream of biomedical research. Increasingly researchers are employing computational methods to facilitate their wet lab research. Some emerging laboratories and approaches are based on a 100% computational framework. While there are many lessons to be learned from the computational infrastructure put into place for the physical and mechanical sciences, the character, nature and demands of biomedical computing differ from the needs of the other sciences. Biomedical computational problems, for example, tend to be less computationally intensive but more “bursty” in their needs. This creates both an opportunity (it is easier to meet capacity needs) and a challenge (job scheduling rules are more complicated to accommodate the bursts). Harvard Medical School provides one of the most advanced shared high performance research computing centers at an academic medical center. In 2007, Harvard convened the first Biomedical High Performance Computing Leadership Summit to explore the issues in creating shared computing infrastructure for the biomedical sciences. We brought together over 100 leaders in the field to exchange ideas and approaches. Through special sessions and direct participant surveys a number of themes emerged around best practices in deploying shared computational infrastructure for the biomedical sciences. Based on prior experience and the summit findings, this workshop summarizes the approaches and ideas to providing a technical and process blueprint for organizations wishing to provide shared research computing research resources for groups small or large – from a few hundred CPUs and terabytes of data to thousands of CPUs and a petabyte or more of data. The session also covers the opportunities and obstacles for connecting special purpose biomedical clusters with open grid technologies like Globus.
Building the Next Generation of Grid Middleware
Speaker: Arnie Miles, Grid Middleware Architect, Georgetown University
Project Thebes is a community based consortium project targeted at designing scalable and secure grid middleware. The current Grid Security Infrastructure (GSI) library is based on public key infrastructure (PKI) technology, using X.509 certificates for both application identity verification and user (subject) identification. The necessity for organization users to adopt PKI in order to access grid services has become a barrier to entry, prevented groups such as small institutions and commercial organizations from participating. Efforts to aggregate users have added extra layers of complexity while obfuscating important attribute information. These issues can be addressed by designing and building next generation implementation of this infrastructure with an abstract subject interface at its core that acts as an interface between one of several authentication mechanisms and the various services. Existing GSI services can be ported to the new subject interface and additional services can be developed using convergent technologies, resulting in lowered costs.
Managing and Executing Loosely Coupled Large Scale
Applications on Clusters, Grids, & Supercomputers
Speaker: Ioan Raicu, University of Chicago
With the advances in e-Sciences and the growing complexity of
scientific analyses, more and more scientists and researchers are
relying on workflow systems for process coordination, derivation
automation, provenance tracking, and bookkeeping. Workflows are not a
new concept and have been around for decades. In
the scientific community, several systems for scientific programming
and computation have emerged.
We present Swift , a system that bridges scientific workflows with
parallel computing. It is a parallel programming tool for rapid and
reliable specification, execution, and management of large-scale
science workflows. Swift takes a structured approach to workflow
specification, scheduling and execution. It consists of a simple
scripting language, SwiftScript, for concise specifications of complex
parallel computations based on dataset typing and iterations, and
dynamic dataset mappings for accessing large-scale datasets
represented in diverse data formats. The runtime system relies on the
CoG Karajan workflow engine for efficient scheduling and load
balancing, and it integrates the Falkon light-weight task
execution service for optimized task throughput and resource
efficiency when executing many independent jobs on large compute
clusters. It combines three techniques to achieve this goal: (1) multi-
level scheduling techniques to enable separate treatments of resource
provisioning and the dispatch of user tasks to those resources; (2) a streamlined task dispatcher able to achieve order-of-
magnitude higher task dispatch rates than conventional schedulers; and (3) performs data caching and uses a data-aware scheduler to
leverage the co-located computational and storage resources to
minimize the use of shared storage infrastructure.
We believe the synergy found between Swift and Falkon can address
issues such as scalability, reliability, scheduling and monitoring,
data management, collaboration, workflow provenance, and workflow
evolution. The science community is demanding both specialized,
domain-specific languages to improve productivity and efficiency in
writing concurrent programs and coordination tools, and generic
platforms and infrastructures for the execution and management of
large scale scientific applications, where scalability and performance
are major concern. High performance computing support has become an
indispensable piece of such workflow languages and systems, as there
is no other viable way to get around the large storage and computing
problems emerging in every discipline of 21st century e-science.
Both Swift and Falkon have been Globus Incubator
projects since 2007, and have already attracted many users. Swift and
Falkon have been used in a variety of environments from clusters (i.e.
TeraPort), to multi-site Grids (i.e. Open Science Grid,
TeraGrid), to specialized large machines (SiCortex), to
supercomputers (i.e. IBM BlueGene/P). Large-scale applications
from many domains (i.e. astronomy, medicine,
chemistry, pharmaceuticals, and economics) have
been run at scales of tens of thousands of jobs on thousands of
processors, with an order of magnitude larger scale on the horizon.
Swift and Falkon are being actively developed primarily at University
of Chicago, in the Computer Science Department and Computational
Institute with funding from NSF, DOE, and NASA.
SUBMITTED TALK #4
Issues in Logging and Troubleshooting Grid Applications and Middleware
Speakers: Brian Tierney, Staff Scientist, Lawrence Berkeley National Laboratory; Dan Gunter, Scientist, Lawrence Berkeley National Laboratory; Shrevas Chiolia, Computer Systems Engineer, National Energy Research Scientific Computing Center
Troubleshooting Grid middleware is quite difficult due to the large number of interconnected components. For example, a single action, such as reliably transferring a directory of files, can result in the coordination of a wide suite of loosely coupled software tools. These include security software to handle the certificates, check permissions, perform delegation, and possibly encrypt the message streams, file transfer tools to check the disk space, set up the connections, and transfer data between resources, and reliability software that must understand re-try policies, track transfer status behavior, and react to failures. Each of these systems may suffer from various forms of failure, which may or may not be reported. If failures are reported, it is typically via a log file with various styles of logging. Combining log information from several components in order to understand what caused a given failure can be challenging. However this is exactly what is needed to troubleshoot a problem as it cascades from one component into the next. In this proposed session we explore some of the issues related to Grid Troubleshooting, and describe tools and techniques we have been developing to address this problem.
TUTORIAL #8 (GLOBUS)
Globus Administration Overview
Speaker: Charles Bacon, Software Engineer, Argonne National Laboratory Computation Institute of the University of Chicago
This session aims to give hands-on people an overview of setting up and using the major services available in the Globus Toolkit. After the session is over, attendees should be able to setup and use the toolkit at home, impressing friends and colleagues. The focus is on the practical, with live demonstrations of installing, configuring, and running services and clients.
GRILL THE GURUS LAB
This session gives attendees the opportunity to talk informally with
gurus from various open source communities. It is the perfect
opportunity to have your hard questions answered, discuss future
directions and features, and learn more about the communities. The
session will comprise a number of stations, with each station focused
on a particular topic and staffed by gurus on that topic. Topics
include:
- Globus Execution Management (Stuart Martin, Kate Keahey, Tim
Freeman): Come by and ask questions about GRAM2, GRAM4, auditing, GARS
- Advance Reservation, Nimbus Cloud Computing, Virtual Machine
technology
- Globus Data Management (Raj Kettimuthu, Ann Chervenak, Rob Schuler,
John Bresnahan): Come by and ask questions about GridFTP, RFT, RLS,
DRS, XIO
- Globus Java WS Core and Security (Rachana Ananthakrishnan)
- Grid Information Management (Laura Pearlman)
- Joining the Globus Community - Dev.Globus (Stuart Martin): Come find
out how you can participate in the development of Globus software:
start a new incubator project, contribute to existing projects,
monitor the roadmap and progress of specific projects, etc.
- Service Oriented Science (Ravi Madduri, Joshua Boverhoff): Walk
through creation of a secure grid service using Introduce and wrapping
a legacy application as a grid service using gRAVI in less than 10mins
- Sun Grid Engine 6.2
- Sun Service Domain Manager
- HA Grid Engine Clusters
- Rocks Consulting (Philip Papadopoulos, Anoop Rajendra, Mason Katz)
- UniCluster Express
INTRODUCTION AND OVERVIEW #9: MISCELLANEOUS
Berkeley Laboratory Checkpoint Restart
Speaker: Eric Roman, Lawrence Berkeley National Laboratory
In this talk we describe Berkeley Laboratory Checkpoint Restart (BLCR) for Linux. BLCR is a kernel module that saves (checkpoints) process state to a context file, and restarts the process from the information in the context file. In the first part of this talk, we describe how BLCR implements checkpoint and restart for simple processes, multithreaded processes and shell scripts with multiple processes. In the second part of the talk, we describe how BLCR interacts with the OpenMPI library, and the Torque Resource Manager to support checkpoint/restart of parallel jobs in a batch execution environment.
GridGain - Java Grid Computing Made Simple
Speaker: Nikita Ivanov, President, GridGain Systems
The topic of this presentation is about fastest growing open source Java grid computing framework called GridGain and how its focus on elegant simplicity and Enterprise Java integration is helping to revolutionize the grid computing for Java in the same way as Spring or JBoss have changed Enterprise Java landscape.
Optimised Scientific Dataset Transfers in the LHC Grids – dCache and Fast Data Transfer
Speakers: Julian Bunn, Caltech and Kamran Soomro, Caltech
The high energy physics community will be meeting an unprecedented computing challenge in 2008: how to manage global teraByte- to petaByte- scientific data transfers in the worldwide Grids set up for CERN’s LHC (Large Hadron Collider) physics experiments. The experiments have adopted a globally distributed hierarchical tiered model for managing the data. Grid software is used to provide the LHC applications with access to the global resources. In the hierarchy, the Tier-1 centers, and the majority of Tier-2 centers in the U.S.A. have deployed the dCache storage system for managing storage related operations. dCache is a storage management system that provides access to huge amounts of data stored on heterogeneous nodes or pools via a virtual filesystem interface. It provides features such as space management, replication, and failure detection/recovery as well as the ability to use a tertiary storage solution along with it. The I/O performance of the existing network transfer tools that interact with dCache clusters suffer due to a combination of factors, including single file – single TCP connection design. FDT (http://monalisa.cern.ch/FDT) is an open source, high performance data transfer application based on a multithreaded engine and asynchronous communication channels. It continuously streams a set of files, using independent threads for each physical device, to efficiently transport data across wide area networks using one or several TCP channels. It uses a managed pool of buffers to dynamically optimize the throughput based on the disk-I/O capabilities and the available bandwidth which is monitored in real-time. Tests at the Supercomputing 2006 and 2007 conferences have shown that FDT is capable of stable bidirectional throughput matching the read/write speed of the disks over trans- and intercontinental distances. In this paper, we describe the performance gains we have achieved by incorporating FDT into dCache, illustrated with real world demonstrations that set new storage to storage file transfer records in the LHC Grids.
SPONSOR TALK
Univa UD HPC Product Roadmap
Speaker: Bill Bryce, Director Product Management-HPC, Univa UD
Over the past 5 years High Performance Computing environments have quickly transitioned from expensive dedicated hardware to clusters built from economical commodity servers. The result of this dramatic change is a rise in the complexity of cluster and grid computing environments. Univa UD Grid MP and UniCluster Express products address these challenges and simplify the complex nature of cluster and grid computing environments. The roadmaps for both products further enhance the Univa UD vision of simplifying HPC for small-scale workgroup clusters to enterprise grids.
12:00 – 1:30 pm
LUNCH (PROVIDED)
1:30 – 3:00 pm
SUBMITTED TALKS #5
Monitoring User-Level Grid Functionality and Performance using Inca
Speaker: Shava Smallen, Developer, San Diego Supercomputer Center
The primary goal in the creation of Grids is to provide unified and coherent access to distributed computing, data storage and analysis, instruments, and other resources to advance scientific exploration. Grids combine multiple complex and interdependent systems that may span several administrative domains. This complexity poses challenges for both the administrators who build and maintain the Grid resources and the scientists who use them. For example, a scientist may find it difficult to migrate an application from one resource of a Grid to another due to inconsistencies or misconfiguration in the installed software, user environment differences, or authorization/permission problems. Grid monitoring is used to collect information on the state and health of software. It can help users decide where to execute their application and assist Grid managers in ensuring that resource providers support a stable and consistent environment for the users. While other Grid monitoring tools provide system-level information on the utilization of Grid resources, the Inca system provides user-level Grid monitoring via periodic, automated testing of the software and services required to support Grid operation. Inca is highly configurable and flexible, providing easy-to-use mechanisms that manage and collect a large variety and amount of results across Grid resources. It archives results and provides Web page views that include summary status charts, detailed descriptions of specific test results, and graphs that depict changes in test results over time. These views allow Grid operators and system administrators to identify, analyze, and troubleshoot user-level Grid failures, thereby improving Grid stability. This talk will describe the motivation for Inca, its architecture, and features of the current Inca release. It will also describe our future work plans and the use of Inca on systems such as TeraGrid and GLEON.
Bridging the High Performance Computing Gap: the OurGrid Experience
Speaker: Walfredo Cirne, PhD, Faculty, and Francisco Brasileiro, PhD, Professor, Universidade Federal de Campina Grande (UFCG), Brazil
eScience is rapidly changing the way we do research. As a result, many research labs now need non-trivial computational power. Grids based on the notion of Virtual Organizations (VOs) and voluntary computing are well established solutions for this problem. However, not all labs can effectively benefit from these technologies. In particular, small and medium research labs (which are the majority of the labs in the world) have a hard time using these technologies as they demand high-qualified infrastructure support personnel and, sometimes, high visibility projects. This talk is about OurGrid, a system that has been designed to fill in this gap. OurGrid (see http://www.ourgrid.org/) is a middleware that supports the creation of open, free-to-join, cooperative grids in which labs donate their idle computational resources in exchange for accessing other labs’ idle resources when needed. The vision is that OurGrid enables labs to combine their resources in a massive worldwide computing platform. Differently from voluntary computing platforms, an OurGrid system is also open from the resource consumer’s perspective that is free to use the resources to execute its own applications. Among its many features, OurGrid includes: i) an incentive mechanism that makes it in the best interest of participants to collaborate with the system; ii) scheduling algorithms that perform well even in the absence of complete and accurate information about the application and the infrastructure; iii) a security portfolio to cater for the varying requirements of both resource providers and consumers; and, iv) a distributed peer-to-peer Grid Information Service that performs multi-attribute range queries efficiently in a large system. OurGrid has been used to support the OurGrid Community – a public free-to-join grid that is in production since December 2004 (see http://status.ourgrid.org/), and ShareGrid – a collaborative project, coordinated by TOPIX in the framework of the Innovation Development Program and funded by Regione Piemonte in Italy (see http://ramses.di.unipmn.it:8080/status/WebStatusServlet). In spite of being target to open grids, OurGrid can also be used in private grids as a way to better accommodate newcomers and less resourceful sites. In this case, one can leverage on OurGrid’s native sharing incentive mechanism to provide seed resources with processing power shared in a secure and fair way. These resources can be used, for example, by new users to test their applications before joining a VO or by new VOs that are not yet supported by the e-Infrastructure in grids that use traditional grid middleware such as Globus and gLite. The talk will discuss how the OurGrid middleware has been used to support the execution of a number of applications. Further, it will also present the main benefits brought up by the system and the difficulties that have been faced by the system developers, users and managers of OurGrid installations.
Orchestrating Production Computer Algebra Components into Portable Computational Grids Programs
Speaker: Abdallah Al Zain, PhD, Research Associate, Heriot-Watt University
In this talk we demonstrate that it is possible to obtain good, scalable parallel performance by coordinating multiple instances of unaltered sequential computational algebra systems in order to deliver a single parallel system. The paper presents the first substantial parallel performance results for SymGrid-Par, a system that orchestrates computational algebra components into a high-performance parallel application. We show that SymGrid-Par is capable of exploiting different parallel/multicore architectures without any change to the computational algebra component. Ultimately, our intention is to extend our system so that it is capable of orchestrating heterogeneous computations across a high-performance computational Grid. For now, we illustrate our approach with unmodified production computational algebra system in GAP and Maple, running on two common commodity architectures --- a homogeneous cluster and an eight-core system. Computational algebra applications are large, specialized, and symbolic, rather than the more commonly studied numerical applications. They also exhibit high levels of irregularity, and multiple levels of irregularity. We demonstrate that a good parallel speedup is possible relative to a sequential GAP and Maple system running on a single processor/core.
TUTORIAL #9: (ROCKS)
Introduction to Building Your Own Roll
Speaker: Philip Papadopoulos, PhD, Program Director, San Diego Supercomputer Center at UC-San Diego
Rolls are the way to customize Rocks. The implementation of Rolls is defined and the levels of customization is presented. A detailed example of building a straightforward will be worked out during this session.
TUTORIAL #11: (GlobusWorld)
Porting Applications with Globus GridWay
Speaker:
Javier Fontán, Grid technology Engineer. Universidad Complutense de Madrid
GridWay is a widely-used metascheduling technology that performs job execution management and resource brokering, allowing unattended, reliable, and efficient execution of jobs, job arrays, and workflows on heterogeneous and dynamic Globus Grids. The GridWay metascheduler is a Globus product, released under Apache license v2.0, that implements different OGF standards, such as DRMAA or JSDL. The aim of the tutorial is to provide a global overview of the process of installing, configuring and using GridWay. The tutorial also focuses on the development of codes using the C and JAVA bindings of the DRMAA OGF standard. The development of codes using DRMAA assures compatibility of applications with other management systems that implements the standard, such as SGE, Condor, Torque. During the tutorial, participants will receive a practical overview of the following topics having the opportunity to exercise GridWay functionality with examples on a real grid infrastructure:
1. An Introduction to the GridWay Metascheduler
2. Installation and Basic Configuration
3. Submission, Monitoring and Control of Jobs
4. Programming with the DRMAA OGF standar
User level knowledge and skills in Unix or Linux systems and C or JAVA programming recommended.
INTRODUCTION AND OVERVIEW #10: DATA GRIDS
Project Convergence: Integrating Data Grids with Compute Grids
Speakers: Victoria Livschitz, Founder and CEO, Grid Dynamics and Eugene Steinberg, PhD, CTO and Director of Engineering, Grid Dynamics
Project Convergence is an open source project aimed to provide interoperability between computational grids and data grids with enabled data-aware job scheduling. Computational grid systems are used to run parallel applications over the grid of physical or virtual machines. A common challenge for modern parallel applications running on ANY computational grid is fast access to the shared data, stored in a database or a file system. Data Grids are commonly used to deliver data in-memory of the server grid to create large, distributed, globally coherent application memory. The biggest performance gain often comes from the ability to run the job on the same server that has correct data pre-loaded in its memory. This is called “data-aware scheduling” which requires the job scheduler managed by the Compute Grid to understand data partitioning topology, managed by the Data Grid. Since Compute Grids and Data Grids are two different middleware products, interoperability between them is a common problem. Project Convergence solves the problem by providing a new service external to either Compute Grid or Data Grid. This service, called the Monitor, uses public API of Data Grid to learn about the data topology. The Monitor then uses public API’s of Compute Grid to “instruct” the scheduler which nodes to schedule the jobs for. Such de-coupled architecture allows Convergence to support any Compute Grid and Data Grid middleware. Convergence 1.0 supports plug-ins for Data Synapse’s GridServer as a Compute Grid and GigaSpaces XAP as Data Grid. On the roadmap is the support for other grid systems, including Sun Grid Engine, MS Windows Compute Cluster Server, Platforms LS, GridGain, Condors, Oracle Coherence, Gemstone and others. The talk will introduce the concepts behind Convergence, present the latest features of the project and demonstrate the impact of data-aware scheduling on performance of typical computational jobs via the demo.
Trends in Open Source Data Grids
Speakers: Victoria Livschitz, Founder and CEO, Grid Dynamics; Nikita Ivanov, President, GridGain Systems; Nati Shalom, Founder and CTO, Gigaspaces; Daniel Templeton, Strategic Liaison Manager, Sun Grid Engine, Sun Microsystems; Doug Cutting, Creator of Hadoop
This session brings together the leading open source grid vendors to discuss the trends in stateful scalable service grids and debate the merits of various distributed caching architectures. Each panelist will give a short 10-min presentation introducing their respective products, followed by a moderated Q&A discussion.
3:30 – 5:00 pm
SUBMITTED TALKS #7
Implementing a Philosophy for Transitioning Existing Software to Network Processing
Speaker: Peter Raeth, PhD, Research Engineer, Ball Aerospace & Technologies Corp.
Over the years that Ball Aerospace & Technologies Corp. has been satisfying the ever-expanding computational throughput needs of our customers, we have identified two key customer themes: 1) production code modules are certified and therefore can not be touched and 2) hardware is expensive and difficult to purchase so it is essential to take advantage of existing infrastructure. Working from that perspective, we have developed and implemented a computational throughput enhancement philosophy that allows us to move production software onto existing networks for multi, cluster, and distributed processing without requiring extensive re-engineering of the software. Our transition philosophy has been implemented for codes written in C++, IDL, and MatLab for network processing applications running on Linux and Windows networks. MPI handles all message passing for cluster processing. Condor forms the basis for distributed processing. pthreads is our choice for multi processing. A modular architecture allows us to keep the software generic, making it independent of the operating system and network configuration. The modular approach also makes subsequent software transitions relatively easy and reduces cost through the use of open-source products. Opticks is the user-interface to some of our applications. Opticks is an open-source sensor data processing tool developed by Ball Aerospace & Technologies Corp. Plug-ins for Opticks can be written to take advantage of networked resources. As our customers experience increasing data volumes, deeper data analysis algorithms, expanding model resolution, and finer simulation sampling, we have successfully employed our transition philosophy to bring them the throughput they require, while laying the groundwork for infrastructure expansion as needed. This talk will outline our approach to providing an open-source solution to our customer's throughput needs.
Approaching the Challenge of Grid-enabling Applications
Speaker: Patrick Choi, Senior Software Engineer and Kieran Nolan, Principal Software Engineer
Software packages that demand resources beyond those available on a single computer system trigger the consideration of leveraging a grid infrastructure to provide the necessary horsepower. The challenge quite often lies in the fact that this software has been developed to run within a particular environment and not within the grid infrastructure. The options on how to approach the adaptation of these software solutions to the grid application interface shall be the focus of this session. The session shall start with an introduction to the problem, including an evaluation of the benefits of such work and the associated challenges. Some criteria for choosing which applications are candidates for grid-enabling shall also be presented. A high level background on how to approach grid enabling applications will be considered.
A specific example from the scientific computing arena based on complex numerical method algorithms will be presented. In particular, the problem of simulating dynamic implicit surfaces using the level-set method will be considered. The challenges associated with the calculations of the signed-distance functions and various partial differential equations shall be considered. This example will be used to demonstrate how this type of algorithm suits the criteria for being grid enabled and one of the possible architectural approaches to the problem being grid enabled. After the introduction and level set method have been presented, questions and open discussion on the concept of grid-enabling applications will be welcomed.
A Multilayer Approach to Simulate Large Multiscale Computational Mechanics Problems Using Grids
Speaker: Leopold Grinberg, Brown University
We present a new scalable approach for simulating large multiscale computational mechanics problems on a network of distributed computers or grid. Specifically, we consider 3D simulation of blood flow in the human arterial tree using the spectral/hp element method. We employ a multi-layer hierarchical approach whereby the problem is solved on two layers. On the inner layers solutions of large tightly coupled problems are performed simultaneously on different supercomputers while on the outer layer the solution of the loosely coupled problem is performed across distributed supercomputers and involves considerable inter-machine communication. The heterogeneous communication topology (i.e., both intra- and inter-machine communication) is performed initially by MPICH-G2 and later with the recently developed MPIg libraries. MPIg's multithreaded architecture provides applications an opportunity to overlap computation and inter-site communication on multicore systems. Cross-site computations performed on the TeraGrid's clusters demonstrate the benefit of MPIg over MPICH-G2.
TUTORIAL #10 (GLOBUS WORLD)
Service Oriented Science Tutorial
Speakers: Ravi Madduri, Senior Software Developer, University of Chicago and Joshua Boverhof, Software Engineer, Lawrence Berkeley National Laboratory
The need to make application code accessible as a Web Service arises frequently in scientific applications. Depending on context, this apparently simple task can introduce a wide range of requirements, including interface generation, authorization of requests, generation of code to dispatch calls to application code, monitoring and management of tasks, data management, and dynamic mapping of application tasks to processors in respond to changing workloads. The Globus Alliance aims to provide solutions to the most persistent and vexing problems that come up in Grid projects and applications. Our solutions to date are collected in the Globus Toolkit and these solutions are used in many Grid applications and systems. While the Globus Toolkit makes it easier to conduct Grid-based projects, the challenges are still far from easy and the Globus Toolkit does not provide a “turnkey” solution. Success in a Grid project depends on a clear vision of the problem(s) to be solved, awareness of relevant tools (both within and beyond the Globus Toolkit), and a strategy for applying the technology. This tutorial provides answers to critical questions for Grid project planners and product developers, including:
• How can you wrap your application code into a grid service easily?
• What do you need besides the Globus Toolkit to have a useful solution to your problem?
INTRODUCTION AND OVERVIEW #11: OPEN SOURCE COMMUNITIES
On Balancing Open-Source Cluster Software Development between Industry and Research
Speaker: Leonard Wisniewski, PhD, Engineering Manager, Sun Microsystems / Software Developer Tools and Services
Two years ago, Sun Microsystems joined the Open MPI open-source effort, choosing the benefits of contributing to the community and leveraging innovation that happens elsewhere over endlessly re-inventing features to keep up with the state-of-the-art. Contrary to open-source efforts spawned from a single innovator or shepherd organization, Open MPI resulted initially from the merging of four separate efforts to implement MPI, and subsequently the melding of experience from a couple dozen other member and partner organizations from industry, academia, and research labs. This combination of perspective and organizational goals has resulted in a surprisingly diplomatic and mutually beneficial relationship which bridges the gap between practical product considerations and fresh innovative possibilities. One significant attribute of our involvement is a healthy tension between being a good team collaborator and engaging separately in our own activities of corporate self-interest. As a good community citizen, Sun engaged in activities to add a product perspective from our experience with our previously proprietary Sun HPC ClusterTools(TM) product. This included actively investing in quality evaluation and development of test infrastructure to promote greater stability and quality of the code base as it transitioned to becoming a commercially supported product by the various vendor members. As a corporate entity pursuing self-interests, Sun contributed modifications and new code to Open MPI to support its own Solaris(TM) platform on both SPARC® and x86/x64, Sun(TM) Studio compilers, and Sun Grid Engine resource manager. This presentation reflects on our first two years of participation in the Open MPI effort in the following ways: 1) an examination of data to identify how different types of development activity are distributed among the industrial, academic, and research lab members, 2) a conceptual abstraction of how self-interest drives this balance, 3) analysis of the unique attributes of the cluster-oriented development interactions within our community, and 4) conjecture about how to leverage the synergy of competing self-interest in an increasingly grid- and cluster-oriented development environment.
Introduction to Teaching Grid Computing
Speakers: Clayton Ferner, PhD, Associate Professor, University of North Carolina-Wilmington and Barry Wilkinson, Professor, University of North Carolina-Charlotte
The purpose of this session is to describe the topics that make up a Grid computing course at the senior undergraduate or first year graduate level. This session is based upon recent developments we have made to a Grid computing course first taught across
North Carolina in 2004.
GRID.org: Creating an Open Source Community of Communities
Speaker: Silona Bonewald, Open Source Evangelist for GRID.org
Grid and cluster computing has a long history of using open
source
components, with vibrant communities around many of these components. With the UniCluster Express stack, which integrates many such components into a single distribution, we find the need to create a Grid.org open source community that integrates and complements the communities of UniCluster's many components. In this talk, you will learn how GRID.org is employing a combination of open source tools, open data standards, and feeds to allow open source cluster users to collaborate on a new level.
FRIDAY, MAY 16, 2008
8:30 – 12:00 pm and 1:30 – 4:00 pm
Managing Grid Engine Clusters
This tutorial will take participants through the ins and out of managing clusters running the Grid Engine software. Topics covered will include application license management, workflows, parallel environments, custom resources, and policies. The topics will be
taught through lecture combined with hands-on exercises. Machines will be provided. Participants should ideally already have some experience with Grid Engine, but beginners are also welcome.
Instructor: Daniel Templeton, Strategic Liaison Manager, Sun Grid Engine, Sun Microsystems, Inc.
|