Air France Flight 447 was an international, long-haul passenger flight, from Rio de Janeiro to Paris. On 1st June 2009 the aircraft crashed into the Atlantic Ocean killing everybody on board. The aircraft is thought to have crashed due to temporary inconsistencies between airspeed measurements, caused by the aircraft’s pitot tubes being blocked by ice crystals. Ultimately, the autopilot disconnecting and the crew reacting incorrectly, led the aircraft to an aerodynamic stall from which they did not recover (BEA, 2012).
The accident resulted from a combination of factors relating to both the technology of the aircraft and the training of the crew (BEA, 2012). The technological failures were: poor feedback mechanisms, unclear display of airspeed readings, confusing stall warnings, absence of visual information and poor indications by the Flight Director. Failures in training resulted in the crew; not responding to the stall warning, not being trained in icing of the Pitot tubes and lacking practical training in manually handling the aircraft. Moreover, incomprehension of the situation and poor management of emotions weakened the task sharing ability of the co-pilots.
This accident has highlighted a number of human – automation issues in aviation. Automated flight-control functions can remove some danger from aviation, however it also changes the activities, workloads, situation awareness and skill levels of the operators, which can cause problems (Hodgson, Siemieniuch & Hubbard, 2013).
The first problem highlighted by this accident is the crew’s change of role from operator to monitor. Flight deck automation uses the crew’s ability to perform a passive monitoring role, rather than an active operating role. One problem associated with this is a drop in vigilance (Mackworth, 1948), which is exacerbated when a system is highly reliable (Parasuraman, Molloy & Singh, 1993). However, these accidents are not human operator errors, they are automation system design errors. More importantly, the crash of Flight 447 was partly attributed due to loss of situation awareness, possibly due to pilots having to perform a passive monitoring role. Monitoring roles can reduce the situation awareness of the current “flying state” of the aircraft, as well as the awareness of its predicted future behaviour (Sarter & Woods, 1995).
Lack of situation awareness can also be an outcome of complex automation, such as a having a complicated flight automation system which can result in pilot confusion due to poor interface design. In the case of Flight 447 the BEA (2010) report shows that a poor Human Computer Interface played a main part in the crash. There were a number of reasons for this: the Flight Director display was inaccurate, therefore accounting for most of the wrong pitch-up inputs due to an altimeter error. Airspeed inconsistencies that had been identified by computers were not clearly displayed. Failure messages were generated but only showed the consequences not the origin of the problem. There was no indication of a blocked pitot tube on the flight displays. There was also an absence of Angle of Attack information, which is important in identifying and preventing a stall. This information was sent to on-board computers but there were no displays to convey this information.
Furthermore, as the level and complexity of automation increases, the levels of experience and skill needed to be able to recover from a failure or unexpected situation have increased (Hodgson, Siemieniuch & Hubbard, 2013). This is because there is less time for the operator to become aware of and correct developing problems. For example in Flight 447 the crew had less than three minutes to find the problem and take action.
Additionally, in the case of aircraft, the ability to recover from a failure or unexpected situation relies on the crews manual flying abilities too. However, with highly automated aircrafts there is a loss of manual flying skills experienced by pilots (Wood, 2004). Fanjoy and Young (2005) found that training and airline policies on automation, often lead to a lack of opportunities to practice resulting in pilot complacency as well as the deterioration of flying skills. Furthermore, Young, Fanjoy and Suckow (2006) found that crews who used the most flight deck automation had poorer manual flying skills than others. This has implications when there is an abnormal situation in which the automation system disengages without prior warning, as the crews will rely on their manual flying skills. Furthermore, automation will maintain stability until it is no longer possible, resulting in the aircraft going out of control as the flight crew take over, meaning crews need to have good manual flying skills.
A further problem with this is that automation increases mental workload during high-load periods (Funk et al, 1999). This workload problem increases when there are situations that need further mental workload during an already high workload time. When the crew’s workload is high, developing failures of the automation system are more likely to be allowed to develop into a critical situation. For example, if damage has occurred or instrumentation has failed, the Flight Management System advice is often misleading or incorrect, and flight crews can be overloaded with a vast amount of information and alarms, making it difficult to identify what the problem is. For example, the crew of the A447 were faced with more than 50 simultaneous alarms.”One alarm after another lit up the cockpit monitors. One after another, the autopilot, the automatic engine control system, and the flight computers shut themselves off” (Traufetter, 2010). This lead to them not being able to understand or identify what the problem was before it turned into a critical situation, ultimately ending in disaster.
The above problem could be due automation being an inadequate crew member. Automation can act as a poorly trained, incommunicative member of the system’s crew. There is often poor interaction between crews and automation systems (Norman, 1990), yet there is a need for multisensory feedback to crews (Sarter 1999). In order for a crew to achieve a safe level of shared situation awareness, the automated system must become part of the crew. It needs to do this by communicating its adjustments in order to maintain shared situation awareness. Current automated systems may indicate adjustments on a dial or screen, but they do not typically draw attention to them because they lack situation awareness of the “bigger picture.” Clear communication can prevent accidents. For example in Flight 447 if there would have been clear communication that the pitot tube was frozen then this would have stopped the chain of events from unfolding.
To improve automation it is proposed that aircraft should be made into more effective team players. A human–automation team should be defined as “the dynamic, interdependent coupling between one or more human operators and one or more automated systems requiring collaboration and coordination to achieve successful task completion” (Cuevas, Fiore, Caldwell & Strater, 2007). Current automation systems perform as very inadequate team members, leaving the human operators or crew unprepared when failure occurs or unusual events arise. (Hodgson, Siemieniuch & Hubbard, 2013). To improve human-automation interaction, systems should be able to trade and share control so that interacting with a system is more like interacting with a teammate (Scerbo, 2007). Future systems, such as Free Flight, are envisioned to have human–automation teams sharing and trading tasks (Inagaki, 2003) as situational demands change (van Dongen & van Maanen, 2005). Such dynamic situations create occasions where human–automation teams can implicitly coordinate (Rico, Sanchez-Manzanares, Gil & Gibson, 2008) on an almost exclusively cognitive basis (Hoc, 2001). This would enable automation systems to become good team players. Furthermore, good team players make their activities observable for fellow team players, and are easy to direct (Christofferson & Woods, 2002). To be observable, automation activities should be presented in ways that capitalise on human strengths (Klein 1998). For example; they should be: Event-based: representations need to highlight changes and events, Future-oriented: Human operators in dynamic systems need support for anticipating changes and knowing what to expect and where to look next and Pattern-based: operators must be able to quickly scan displays and pick up possible abnormalities without having to engage in difficult cognitive work. By relying on pattern-based representations, automation can change difficult mental tasks into straightforward perceptual ones.
Overall, changes in workload, reduced situation awareness, reduced operator skills, automation failures and unexpected behaviours have caused many accidents over the past three decades, including flight 447. As a result of these factors, manual recovery when the automation system fails is often compromised. These issues may have been exacerbated by having a tightly coupled system. Tight coupling reduces the ability to recover from small failures before they expand into large ones. Tighter coupling between parts spreads effects throughout the system more rapidly. This means that problems have greater and more complex effects that can spread quickly. When automated partners are strong, silent, clumsy and difficult to direct, then handling these demands becomes more difficult. The result is coordination failures and new forms of system failure. Currently it is argued that aircraft systems are only moderately tightly coupled. However, airlines, for financial reasons, are pressing for a reduction of flight crews from three (pilot, co-pilot, and engineer) to two (pilot and co-pilot) on the grounds that computers and other devices reduce the engineering load. More automation in its system and reducing the number of controllers will lead to much tighter coupling resulting in less resources for recovery from incidents (Perrow, 2011).
Now the problems with the automation in Flight 447 have been identified, it is important to understand how safety models contributed to the understanding of the accident and what the implications are for managing safety in the future, to prevent history from repeating itself. The first safety model and safety management strategy is known as Safety-I. According to Safety-I, things go wrong due to technical, human and organisational causes such as failures and malfunctions, with humans being viewed as a main hazard. The safety management principle is to react when something goes wrong; by investigating and identifying the causes of the accident and then trying to eliminate the causes or improve barriers. This results in safety being a condition where the number of adverse outcomes is as low as possible. The principles of safety-1 have been expressed by many different accident models; the best known accident model being the Swiss cheese model (Reason, 1990).
This model posits that accidents occur due to multiple factors jointly. These factors align creating a possible trajectory for an accident. These can either be latent conditions, such as problems with the organisation due to its design or management, which are present in the organisation long before an incident is triggered. Active failures are mistakes made by human operators, which when combined with the latent failures, result in an accident. It states that that no one failure, human or technical, is sufficient to cause an accident. Rather, it happens due to the unlikely and often unforeseeable event of several contributing factors arising from different levels of the system.
In the case of Flight 447 the model would allow each contributing factor to be identified. For example the technical faults would be: the Human Computer Interface, pitot tubes, controls not being linked between pilots, misleading stall warnings. Human faults would be the Co-pilot pulling back on stick, poor management of startle effect, poor communication and the captain leaving the room. Organisational faults would be poor training, delayed installing new pitot tubes, poor design of HCI. When put together all of these factors played a part in causing the accident.
Looking for human errors after an event is a “safe” choice, as they can always be found in hindsight. Looking and finding human errors makes it easier to find who should be held accountable and where preventative measures should be aimed. However, when “the cause” has been attributed to individual error, the preventative measures are usually misaimed. Accidents occur from a combination of many factors and by blaming the individual, people often assume that the system is safe, as soon as it can get rid of the “bad apples”.
However more recently, a proactive model of safety has been suggested. Proactive safety management is part of the aim of Safety-II, which argues that focusing on cases of failure does not show how to improve safety and that instead of looking at what goes wrong, there should be a focus on looking at what goes right in order to understand how that happens. In hindsight after an accident, many weaknesses existing in organisations are usually revealed. For example, detect the “deviations” from rules and regulation and find the “cause”. However, the fact that something did deviate from a prescribed rule is not necessarily a contributor to an accident or even an abnormal event. On the contrary, adaptations are often a norm rather than an exception (Reimana & Rollenhagen, 2011). It should be acknowledged that the everyday performance variability needed to respond to varying conditions is the reason why things go right. Humans are consequently seen as a resource necessary for system flexibility and resilience. The safety management principle is continuously to anticipate developments and events. When something goes wrong, we should begin by understanding how it usually goes right, instead of searching for specific causes that only explain the failure. This strategy posits that accidents are not resultant but emergent.
In consequence of this, the definition of safety should be changed from ‘avoiding that something goes wrong’ to ‘ensuring that everything goes right’. The basis for safety and safety management must therefore be an understanding of why things go right, which means understanding everyday activities. Safety management must be proactive, so that interventions are made before something happens. In the case of Flight 447 safety management needs to ask: What could have been done before that flight to minimise the possible risks associated with it? (McDonald & Ydalus, 2010) The risks were built into the operational situation before take-off. Routine measures in advance could not just prevent this accident happening again but provide a more general preventive shield against a wide range of system accidents.
This has been explained in a FRAM analysis model (Hollagenel, 2004). In this model there is a need to understand the essential system functions, their variability and how these can resonate, in order to identify barriers for safety. Furthermore, another way to understand why an accident occurred is to determine why the control structure was ineffective (Leveson, 2004). Preventing future accidents requires designing a control structure that will enforce the necessary constraints. In systems theory, systems are seen as hierarchical structures, where each level puts constraints on the activity of the level below. This means that constraints or a lack of constraints at a higher level allow or control behaviour at a lower level (Checkland, 1981). The cause of an accident is viewed as the result of a lack of constraints due to inadequate enforcement of constraints on behaviour at each level of a socio-technical system.
The model has two basic hierarchical control structures; one for system development and one for system operation, with interactions between them. Between the hierarchical levels of each control structure, good communication channels are needed. A downward reference channel provides the information needed to apply constraints on the level below and an upward measuring channel provides feedback about how effectively the constraints were applied. At each level, inadequate control may result from missing constraints, inadequately communicated constraints, or from constraints that are not enforced correctly at a lower level. (Leveson, 2011). Therefore, understanding why an accident occurred requires determining why the control structure was ineffective and preventing future accidents requires designing a control structure that will enforce the necessary constraints.
Therefore the implications for managing safety are that by combining safety-I and safety-II techniques, so that there is a proactive focus looking at how everyday activities go right, then accidents could be prevented by being able to identify the organisational and societal problems, which can then be changed before an accident happens, for example by making sure the right constraints are in place.
Overall, pilots are part of a complex human-automation system that can both increase and reduce the probability of an accident. Training, automation systems, and cockpit procedures can be changed so that certain mistakes will not be made again. However, it could be that with the inclusion of the humans and their variability, there will always be the possibility of an accident. However turning automation systems into effective team players may transform aviation, preventing avoidable catastrophes. Furthermore, safety management strategies should focus on how to be proactive in order to identify potential accidents before they happen, focusing on how variability and adjustments are a part of what goes right in everyday performance, which may prevent accidents from happening.