Home Strategy Activities Grants Publications People Sponsors Contact Us 
  

Welcome | Committees | Authors

UPDATE: Deadline has been extended to 30 August.

Welcome to TARDIS2011

The International Workshop on fault Tolerant Architectures for Reliable Distributed Infrastructures and Services (TARDIS2011) will be held at the 4th IEEE International Conference on Utility and Cloud Computing (UCC 2011).

Online coverage

HPC in the Cloud

Twitter hashtag: #TARDIS2011

HPC in the Cloud: "Toward a Fault-Tolerant Cloud"

DSA-Research.org Blog: "Will you let the Sky fall down?"

Description

“Not letting the Sky fall down” [1,2,3]

[1] http://www.zdnet.co.uk/blogs/mapping-babel-10017967/aws-disrupted-by-us-east-coast-failure-10022283/
[2] http://justinsb.posterous.com/aws-down-why-the-sky-is-falling
[3] http://aws.amazon.com/message/65648/

IEEE TCSC Cloud Computing has moved the center of gravity of application distributed execution, by exploiting virtualization at different layers and by adding a complexity level to the scheduling problem. While Cloud computing can bring more flexibility in the design of applications, it also arises new research challenges. Compared with the traditional method of dedicating one server to a single application, consolidation through virtualization can boost the resource utilization rate by aggregating workloads from separate machines into a small number of servers: workloads can be now executed in a dense environment using much less machines, in which the impacts of faults can be vastly magnified. For example, any single hardware failure will affect all the virtual servers in that physical machine, or under dynamic workloads, it may be difficult to distinguish real faults from normal system.

Download TARDIS2011 flyer The need of this concept revisiting is fundamental when provisioning is left to public Cloud infrastructures, where an optimal budget must be met. Different strategies can be tailored, from hybrid architectures to service distribution across cloud providers. Additionally, cloud providers typically establish Service Level Agreements (SLAs) with their customers, and providers must also enforce the Quality of Service (QoS) in their infrastructures, under an unreliable and highly dynamic environment.

Cloud computing is playing an increasingly important role in current distributed computing, which involves a wide community. The Cloud provides a scalable, computational model where users access services based on their requirements without regard to where the services are hosted or how they are delivered: computing processing power, storage, network bandwidth or software usage can be provided as services over the Internet. In consequence, applications developed over such on-demand infrastructures can be built upon more flexible principles, being more fault tolerant, more resilient and more dynamic. Although fault tolerance in distributed systems has been a matter of research in the past that has generated a wide collection of algorithms for fault detection, identification and correction, these concepts will have to be re-visited in the context of Cloud computing.

Papers on all aspects of Fault tolerance and reliability in private, public and hybrid Clouds are expected.

Topics

  • Application-level (including workflows, or any other problem solving environment), Middleware-level or Virtual and Physical Resource-level fault tolerance techniques.
  • Programing models for Cloud computing including Fault tolerance.
  • Fault tolerance detection & identification techniques in Cloud computing.
  • Fault diagnosis systems in Cloud computing.
  • Fault Tolerance recovery techniques in Cloud computing.
  • Cloud Computing Fault Taxonomies.
  • Fault Prediction Techniques and Models in Cloud computing.
  • Fault tolerance in resource provision (SLA level and provision policies).
  • The relationship between Quality of Service and Fault tolerance in *aaS.
  • Fault tolerance solutions in other distributed computing environments than Cloud that would definitely benefit this paradigm.

Important Dates

  • Deadline for contribution submissions: 30 August 2011
  • Acceptance notification: 15 September 2011
  • Deadline for camera ready contributions: 25 September 2011

Acknowledgments

This Workshop couldn't be possible without the following projects: MEDIANET (Comunidad de Madrid S2009/TIC-1468), HPCcloud (MICINN TIN2009-07146) and TIN2010-17905.

Clicky

Site Meter

Admin · Log In