-
Expanding IceCube GPU computing into the Clouds
Authors:
Igor Sfiligoi,
Shava Smallen,
Frank Würthwein,
Nicole Wolter,
David Schultz,
Benedikt Riedel
Abstract:
The IceCube collaboration relies on GPU compute for many of its needs, including ray tracing simulation and machine learning activities. GPUs are however still a relatively scarce commodity in the scientific resource provider community, so we expanded the available resource pool with GPUs provisioned from the commercial Cloud providers. The provisioned resources were fully integrated into the norm…
▽ More
The IceCube collaboration relies on GPU compute for many of its needs, including ray tracing simulation and machine learning activities. GPUs are however still a relatively scarce commodity in the scientific resource provider community, so we expanded the available resource pool with GPUs provisioned from the commercial Cloud providers. The provisioned resources were fully integrated into the normal IceCube workload management system through the Open Science Grid (OSG) infrastructure and used CloudBank for budget management. The result was an approximate doubling of GPU wall hours used by IceCube over a period of 2 weeks, adding over 3.1 fp32 EFLOP hours for a price tag of about $58k. This paper describes the setup used and the operational experience.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Building An Information System for a Distributed Testbed
Authors:
Warren Smith,
Shava Smallen
Abstract:
This paper describes an information system designed to support the large volume of monitoring information generated by a distributed testbed. This monitoring information is produced by several subsystems and consists of status and performance data that needs to be federated, distributed, and stored in a timely and easy to use manner. Our approach differs from existing approaches because it federat…
▽ More
This paper describes an information system designed to support the large volume of monitoring information generated by a distributed testbed. This monitoring information is produced by several subsystems and consists of status and performance data that needs to be federated, distributed, and stored in a timely and easy to use manner. Our approach differs from existing approaches because it federates and distributes information at a low architectural level via messaging; a natural match to many of the producers and consumers of information. In addition, a database is easily layered atop the messaging layer for consumers that want to query and search the information. Finally, a common language to represent information in all layers of the information system makes it significantly easier for users to consume information. Performance data shows that this approach meets the significant needs of FutureGrid and would meet the needs of an experimental infrastructure twice the size of FutureGrid. In addition, this design also meets the needs of existing distributed scientific infrastructures.
△ Less
Submitted 12 December, 2013;
originally announced December 2013.
-
GRAPPA: Grid Access Portal for Physics Applications
Authors:
D. Engh,
S. Smallen,
J. Gieraltowski,
L. Fang,
R. Gardner,
D. Gannon,
R. Bramley
Abstract:
Grappa is a Grid portal effort designed to provide physicists convenient access to Grid tools and services. The ATLAS analysis and control framework, Athena, was used as the target application. Grappa provides basic Grid functionality such as resource configuration, credential testing, job submission, job monitoring, results monitoring, and preliminary integration with the ATLAS replica catalog…
▽ More
Grappa is a Grid portal effort designed to provide physicists convenient access to Grid tools and services. The ATLAS analysis and control framework, Athena, was used as the target application. Grappa provides basic Grid functionality such as resource configuration, credential testing, job submission, job monitoring, results monitoring, and preliminary integration with the ATLAS replica catalog system, MAGDA. Grappa uses Jython to combine the ease of scripting with the power of java-based toolkits. This provides a powerful framework for accessing diverse Grid resources with uniform interfaces. The initial prototype system was based on the XCAT Science Portal developed at the Indiana University Extreme Computing Lab and was demonstrated by running Monte Carlo production on the U.S. ATLAS test-bed. The portal also communicated with a European resource broker on WorldGrid as part of the joint iVDGL-DataTAG interoperability project for the IST2002 and SC2002 demonstrations. The current prototype replaces the XCAT Science Portal with an xbooks jetspeed portlet for managing user scripts.
△ Less
Submitted 26 June, 2003;
originally announced June 2003.