Investigators blame factory workers for satellite accident
Posted: October 4, 2004
NASA investigators have issued their final report into last year's embarrassing accident in which the NOAA N-Prime weather satellite was significantly damaged after falling off a handling cart in the Lockheed Martin factory because workers failed to install two dozen bolts.
The executive summary is posted below. You can download the full report here.
Report executive summary
The operation scheduled for that day was to shim the Microwave Humidity Sounder (MHS) instrument by removing and replacing the instrument. This operation required the spacecraft to be rotated and tilted to the horizontal position using the TOC. The spacecraft fell to the floor as it reached 13 degrees of tilt while being rotated. The reason was clear from inspection of the hardware: the satellite fell because the TOC adapter plate was not secured to the TOC with the required 24 bolts.
Three days after the mishap, on September 9, 2003, Dr. Ghassem Asrar, NASA Associate Administrator for Earth Science established the NOAA N-PRIME Mishap Investigation Board (MIB) in the public interest to gather information, conduct necessary analyses, and determine the facts of the mishap. To identify the root causes at work in the NOAA N-PRIME Mishap, the MIB undertook two approaches. The first was an extensive analysis of the sequence of events prior to and on the day of the mishap; the planned operational scenario vs. the actual execution; and the planning activities, including scheduling, crew assembly and test documentation preparation. The second approach was to utilize the Human Factors Analysis and Classification System (HFACS) (2000) to provide a comprehensive framework for identifying and analyzing human error. Evidence from a number of sources, including witness interviews, test and handling procedures, and project documents, were used to develop the accident scenarios and populate the HFACS model.
Proximate Cause: The NOAA N-PRIME satellite fell because the LMSSC operations team failed to follow procedures to properly configure the TOC, such that the 24 bolts that were needed to secure the TOC adapter plate to the TOC were not installed.
The root causes are summarized below along the four levels of active or latent failures as ascribed by the HFACS framework.
The TOC adapter plate was not secured to the TOC because the LMSSC operations team failed to execute their satellite handling procedures.
The Responsible Test Engineer (RTE) did not "assure" the turnover cart configuration through physical and visual verification as required by the procedures but rather through an examination of paperwork from a prior operation. Had he followed the procedures, the unbolted TOC adapter plate would have been discovered and the mishap averted. Errors were also made by other team members, who were narrowly focused on their individual tasks and did not notice or consider the state of the hardware or the operation outside of those tasks. The Technician Supervisor even commented that there were empty bolt holes, the rest of the team and the RTE in particular dismissed the comment and did not pursue the issue further. Finally, the lead technician and the Product Assurance (PA) inspector committed violations in signing off the TOC verification procedure step without personally conducting or witnessing the operation. The MIB found such violations were routinely practiced.
The LMSSC operations team's lack of discipline in following procedures evolved from complacent attitudes toward routine spacecraft handling, poor communication and coordination among operations team, and poorly written or modified procedures.
It is apparent to the MIB that complacency impaired the team directly performing the operation and those providing supervision or oversight to this team. The operation was consistently characterized as routine and low risk, even though it involved moving the spacecraft. Several other adverse mental states, including fatigue and external constraints that limited the availability of portions of the crew to a half day, also may have had roles in the mishap. Incomplete coordination concerning ground equipment use and status, and late notification of operation schedules exacerbated the lack of rigor in handling operations. Standard operating procedures contained ambiguous terminology (e.g., "assure") and can be significantly modified using redlines for unique (one time only) operations. These practices were the preconditions or latent failures that promoted the mishap occurrence.
The preconditions within integration and test (I&T) operations described above existed because of unsafe supervision practices within the LMSSC project organization, including ad hoc planning of operations, inadequate oversight, failure to correct known problems, and supervisory violations.
The RTE and I&T manager failed to provide adequate supervision and repeatedly violated procedures when directing and monitoring their operations crews. Waiving of safety presence, late notification of government inspectors, poor test documentation, and misuse of procedure redlines were routinely permitted. Further, the MIB believes that planning for the lift/turnover operation was hurried and resulted in a hastily formed operations team. Although all team members were experienced and competent, this atypical mix of authority among the various roles created dynamics that were not conducive to open discussion and shared responsibility. The MIB concludes that the lack of enforcement and support by the supervisory chain concerning the roles and responsibilities of the operation team members and the hurried planning for this operation are factors in this mishap.
The unsafe supervision practices within the TIROS program had their roots in the LMSSC organization: the inadequate resources and emphasis provided for safety and quality assurance functions; the unhealthy mix of a dynamic I&T climate with a well-established program and routine operations; and the lack of standard, effective process guidelines and safeguards for operations all negatively influenced the project team and activities.
The MIB finds the LMSSC system safety program to be very ineffective. Few resources are allocated to system safety, few requirements for safety oversight exist and little programmatic supervision was provided for the safety representatives. The I&T environment within the TIROS program is engendered by routine operations for which schedules and specific activities are frequently optimized. Such an environment requires rigorous oversight and processes to prevent overconfidence and complacency. The MIB believes that LMSSC failed to provide the organizational safeguards to prevent this and other potential mishaps, especially in key areas that regulate operational tempo, operations planning, procedure development, use of redlines, and Ground Support Equipment (GSE) configurations.
The in-plant government representation, Defense Contract Management Agency (DCMA), and the GSFC Quality Assurance (QA)/safety function failed to provide adequate oversight to identify and correct deficiencies in LMSSC operational processes, and thus failed to address or prevent the conditions that allowed the mishap to occur.
The in-house Government Quality Assurance Representative (QAR) (acting as a DCMA agent) inappropriately waived a Mandatory Inspection Point during the Saturday morning operation. Although his presence may not have prevented the mishap, the MIB believes this waiver is indicative of a failed oversight process and barrier. The MIB finds that the government quality assurance and safety oversight at GSFC were also deficient, having become issue driven due to the maturity of the project. Once issues were brought to their attention, the QA/safety personnel worked their resolution but there was very little proactive oversight, audit, inspection, etc. of the LMSSC operations. The in-house Government QAR knew of some of the problems associated with procedure discipline and safety and program assurance oversight but did not communicate them to the NASA project. Given the prevalence of some of the contractor deficiencies identified in this investigation, however, it is the MIB's assessment that the government in-plant representative, DCMA, and the GSFC QA/Safety function should have identified and demanded correction for these deficiencies.
The Government's inability to identify and correct deficiencies in the TIROS operations and LMSSC oversight processes were due to inadequate resource management, an unhealthy organizational climate, and the lack of effective oversight processes.
Relative to resource management, the GSFC project, in working to deal with a declining workload and resources, allowed and even encouraged trade-offs between the schedules, staffing and milestones for the two remaining satellites in the Polar Operational Environmental Satellite (POES)/(TIROS) project. These constant and rapid trade-offs exacerbated the already fast operational tempo of the LMSSC I&T team. Organizational climate was found to be an issue, primarily in the government on-site structure. There is no Project in-plant civil servant government presence. The Project in-plant government representatives (one in quality assurance, two in I&T) were past employees of LMSSC and were hired as outside contractors by the GSFC Project. The MIB believes that their past associations with the company might precipitate undue complacency due to familiarity. Although the POES Project and the contractor track and trend closure of contractor generated Non-Conformance Reports (NCRs) for timeliness, there is no process in place to analyze and trend NCRs for cause and to identify systemic problems. The MIB found no effective process in place to follow up on closure of Defense Contractor Management Agency (DCMA) generated Corrective Action Requests (CARs). Supplier Assurance Contract (SAC) generated audit deficiencies, and action items from an external review (TIROS Anomaly Review). Likewise lacking is the government organizational oversight to monitor, verify, and audit the performance and effectiveness of the I&T processes and activities.
The MIB found the DCMA CAR assessment and reporting process and other DCMA audit processes to be deficient in identifying troubling trends in the LMSSC facility. Review of CARs indicates repeated requirement violations and bypassing of Mandatory Inspection Points by the contractor. The DCMA Technical Assessment Group (TAG) facility audits, the DCMA annual safety audits, and the DCMA facility summary reports of CARs prior to the mishap, however, all indicated a healthy facility environment, with no noteworthy problems reported. MIB recommendations to correct the findings/deficiencies above are provided in section 8 - Recommendations.
It is the MIB's assessment that many of the findings uncovered in this mishap investigation are not specific to this mishap but are systemic in nature. A separate follow-up investigation should be conducted to further examine and characterize these systemic problems.