Architecting Resilience in Self-Adaptive Software Systems

Contact: Dr. Rogerio de Lemos


This project will look into the provision of assurances to self-adaptive software systems, and how this can be achieved through dynamic processes. Depending on the expected system resilience,  evidence should be collected during run-time, and which should be associated with the one collected during development-time, in order to build up arguments that a system is acceptably resilient in a certain operational context. Process should assume a key role since these should be dynamically generated according to the required level of resilience associated with the system.

Extending the Concept of Dependability Case to Address Resilience

Contact: Dr. Rogerio de Lemos


A resilient system is a system that is able to deliver service that can justifiably be trusted, when facing changes, in other words, the system has to be dependable when facing changes. Similar to dependability cases, which is based on the concept of safety cases, a resilience case could be seen as a clear and defensible argument that a system is acceptably resilient in a certain operational context. What distinguish the operational context of a dependable system from that of a resilient system are the changes that might occur to the environment of the system, to the system itself and its requirements. The challenge now is how to define a dynamic operational context, instead of a very particular operational context, which has been the basis for producing safety cases. Relying on existing work on dependability and safety cases, the aim of this work is to define what should be a resilient argument, and how a collection of these could be use to build a resilient case.

Architecting resilience in self-organising systems

Contact: Dr. Rogerio de Lemos

Self-organising systems are those systems where there is no explicit central control either internal or external. In other words, these systems are decentralized in nature, implying that access to global information is limited or impossible, hence adaptation should be based on interactions occur locally (among neighbours) and based on local information. On other hand, resilience is related to the persistence of service delivery that can justifiably be trusted, when facing changes. There are two issues that need to be considered in resilient system: first, it is the ability of the system to provide resilience, and second, it is the ability to justify the provided resilience. This project aims to define an architectural abstraction for supporting resilience in self-organising systems.

Autonomy in the presence of failures

Contact: Dr. Rogerio de Lemos

The level of autonomy in embodied systems is essentially constrained by the intent of the system and the resources available to achieve that intent, thus the reasoning about failures in system resources should be an integral part in system design. In other words, the provision of autonomy should rely on how the system is built in terms of its hardware and software components, and how these can be exploited for the continue delivery of system services despite the presence of failures. The basis of the proposed approach is the definition of architectural abstractions that support adaptability, facilitate the handling of faults, and promote autonomy. These abstractions contain well-defined interfaces that express the different roles in which an abstraction might be involved, promote error confinement, and enforce crash-failure semantics when enough resources are available or employ dynamic recovering techniques for exploiting inherent system redundancies.

Dynamic processes for self-adaptive software systems

Contact: Dr. Rogerio de Lemos

The self-adaptation of a software system is a complex process that depends on the system requirements, its operational state and environment, which may change during the system operational lifetime. Hence, the process for coordinating the self-adaptation should also be adaptable to changes that may occur during run-time. As a means for coordinating the self-adaptation process of software systems, a solutiion is to employ workflows that are dynamically generated for dealing with the variability associated with the self-adaptation process. In this context, the aim of the project is to define and develop techniques for automatically generate workflows for coordinating the self-adaptation of software systems. Funding is available for this project.

Run-time validation for self-adaptive software systems

Contact: Dr. Rogerio de Lemos

Modern systems are increasingly required to be capable of adapting at run-time. How can we guarantee that a system continues to satisfy certain properties after self-adapting to changing requirements, resources and faults? Traditional validation techniques need to be complemented by run-time analysis techniques. Validation becomes a perpetual activity to be performed during system execution. In this context, self-adaptation of the validation techniques could help to improve the system dependability, and self-test or built-in tests should be considered for the provision of assurances. For that, new processes and techniques should be envisaged that would support testing during run-time. Funding is available for this project.

Architectural abstractions for dependable software architectures

Contact: Dr. Rogerio de Lemos

The aim of the proposed research is to investigate means and mechanisms to effectively incorporate the specification of exception behaviour in component based software. Exception handling is a structuring technique that facilitates the design of dependable computing systems by encapsulating the process of error recovery. Component based software is the concept of designing and implementing software from already available components. Some component based approaches have introduced the notion of collaborations as a modelling abstraction for representing the complex interactions between components. The two main objectives of the research are the following:

Improving safety assurances for COTS-based software

Contact: Dr. Rogerio de Lemos

This project will investigate methodologies for assuring the safe use of commercial off-the- shelf (COTS) systems, especially COTS with a high software or complex hardware content. The aim of the project is to improve the efficiency of system safety assurance activities when deploying complex COTS components within the next generation of Air Traffic Management / Communication, Navigation and Surveillance (ATM/CNS) systems.

Agreement in distributed systems based on swarm techniques

Contact: Dr. Rogerio de Lemos

Existing distributed systems techniques do not scale for envisaged future systems, for example, decentralised peer-to-peer, self-organised ad hoc networks, ubiquitous systems, etc. Traditional approaches specify system properties in a top-down manner, which are amenable to rigorous or mathematical analysis. This project will investigate how nature inspired decentralised control techniques can be used in building distributed systems. In these systems, a large population of entities independently gather information and decide how to behave, entities in general have simple functionality and limited communication. It is expected that emergent behaviour from these entities would be to provide solutions to specific problems. In general, this project tries to bridge the gap between consensus in distributed systems and emergence from natural computation. While consensus are associated with protocols that reach agreement under particular failure loads, emergence is associated with a collection of simple autonomous and self-sufficient entities that are able to adapt to changing environments. Funding is available for this project.

Fragmentation, replication and scattering in swarms

Contact: Dr. Rogerio de Lemos

This project is about fault tolerance - the continue delivery of services by a system despite the presence of faults. The aim of the project is to exploit some basic principles of swarm intelligence for supporting fragmentation, replication and scattering of information. The objective is to evaluate the feasibility of swarm techniques in supporting the distribution of information among hundreds or thousands of computers. In the context of future ad hoc or P2P networks, the purpose of this project is to investigate how redundancies available in these environments can be exploited for supporting mass replication of information. Funding is available for this project.