Don't Just Fix It: Find It Before Your Customer Does

When mistakes happen in your 
company,  who takes the blame?
It is human nature to find a single cause of a problem, fix it, and assume everything is fine. In fact, the use of a scapegoat is documented in writings from four thousand years ago. All of the sins of a community were laid on the head of a single goat that was then driven into the wilderness, relieving every one of their problems. Four millennium later, we still believe in the power of the scapegoat to take away problems. When a product fails in the hands of a customer, triggering a recall or creating excessive warranty costs, companies quickly find a solution (or fire someone) and assume all is well. Mistakes are embarrassing and it is easier to assume they won’t happen again. The GM recall case is a classic example of this. News articles focus on one or two issues: the engineer who signed off on the change or the automotive safety reporting requirements. Everyone is pointing a finger at each other.

The GM case is not unique. When a problem surfaces, firefighting teams typically investigate the production activities (e.g., suppliers, quality control, or production control). Rarely do organizations involve the original team in taking a hard look at all of the product development and launch activities. To fix the underlying causes, teams need to understand why the problem was allowed to be created and why it wasn’t prevented from escaping. In addition, companies need to take a critical look at the cultural blind spots that allow teams to overlook potential failures. If these issues aren’t addressed, companies end up playing the quality-whack-a-mole game (see my post on that topic).

To learn from failures, organizations need to shift their thinking from “which step failed?” to “every step failed!”
White Paper
This article is related to the White Paper:
Complaint Handling as an Integral Part of FDA and ISO Compliance
To get the full details, please view your free White Paper.

In product development, a small percentage of resources are spent actually creating product and process definitions. In an ideal world, a single brilliant designer could define a problem, draw all of the parts and assemblies for a new product, and ship those drawings to a manufacturer. The manufacturer, in turn, would get the parts right the first time, assemble them correctly, package and ship them. The product would then work flawlessly for its entire life. Unfortunately, this ideal world does not exist other than in the mind of freshman engineering students (and sometimes marketing managers).

Rather, teams spend most of their product development resources on other activities. These activities define the problem being solved so everyone is on the same page, find and fix design weaknesses, prevent problems from being created during production, and ensure the project is on track. A majority of activities act as filters that find problems so they can be fixed before a customer has it in their hands.

 This diagram illustrates a small subset of the activities related to
 finding and fixing problems before they get to the customer

For example, concept selection can have a significant impact on final product quality. If the wrong criteria are used in concept selection or concepts are selected without rigorous analysis, then the teams will struggle with problems throughout the rest of the design process. For example, in the Takata airbag recall, the choice of accelerant created long term quality and design problems. At the other end of the design process, if the product testing does not model the actual usage conditions, products pass functional tests but will fail in the field. I wonder if GM would have found the ignition problem earlier if they tested it with a very heavy key chain. The figure above lists a small subset of the activities related to finding and fixing problems before they get to the customer.

When a quality failure escapes, ALL of the product development activities have failed to catch it.

Let’s take a semi-hypothetical example (every one of my clients will think I am talking about them). A product has been in the field for a year when suddenly customers from one region begin to complain about its performance. Nothing appears to have changed. The QA people can’t find any changes in the incoming parts, everything is passing inspection, and the final functional tests are not finding the problem. After a long and painful investigation, the fire-fighting team figures out that a supplier made a change to the manufacturing process to decrease cycle time. The new process has an acceptable Cpk level but the investigation highlights that the original process had a much higher Cpk. The increased variability, combined with the higher temperatures in a specific region, reduced the performance enough to trigger a complaint. The team tightens the tolerance, imposes additional quality control, chastises the supplier and assumes the problem is fixed. The next year, a completely unrelated problem occurs when unexpected wear triggers a failure. The teams find the cause of that problem: using a new material with different mechanical properties. The sustaining engineering team fixes the problem using a running design change and replaces defective product as problems are found by customers.

Why did both of these failures make it to the customer? On the surface, they look like very different problems created and solved by different parts of the organization. One was a supplier problem, the other a design engineering oversight. However, when you take the view of what filters failed, similarities emerge. In both cases:

  • The specification documents were very high level and didn’t have detailed descriptions of the operating conditions. Because the operating temperature in the first product and the actual loads on the second were not specified, the teams were not able to translate the requirement into the proper tolerances and analyze the risks of performance across operating conditions. The supplier would have caught the problem if the appropriate tolerance was set for the specification. Testing at the proper loads would have highlighted the material problem earlier.
  • The teams locked into both concepts early because of tight schedules. There was pressure to lock into a concept early to get tooling completed. Ironically in both cases, the schedules were pushed out anyway. The teams were forced to work with a concept that wasn’t sufficiently evaluated and spent the “extra” time getting a sub-optimal idea to work.
  • Design reviews were essentially status meetings. Especially in the second case, the teams were not familiar with the new material and assumed it would be fine. Design teams were unable to critically evaluate their own designs and were blind to the risks. No external reviewers were brought in to provide additional perspectives. When concerns were raised, teams dismissed them by either hypothesizing why the issue wasn’t a problem or saying “we don’t have time to test everything.”
  • There was insufficient engineering analysis and user testing. The team assumed that since the products worked well in the bench test and passed the standard suite of evaluations, the quality was sufficient. User testing was haphazard and didn’t test across all of the use conditions. Even when products were found to fail, the failures were explained away. During pilot there were so many design changes, the teams assumed that they had fixed the failures that were identified.
  • Design changes were made up to the start of production. Many of the rules for pilot process were circumvented to accommodate last minute minor changes to marginally improve performance. Late design changes were allowed to bypass testing and validation protocols.
  • A combination of schedule, inexperience, language, cultural barriers and time zone differences meant that suppliers did not give and/or the feedback was not received on design for manufacturability. Suppliers were asked “can you make this?” and the answer was always yes.
While the technical root cause of the two failures look very different: one was a supplier change and one was a material specification problem. The activities (or lack of activities) that allowed that failure to escape to the customer had a lot of common ground. If the company continues with the existing product development approach, there is a high probability that another quality problem will emerge.

When there is a failure, organizations need to take a hard look at everyone’s culpability in contributing to the problem. There is significant research that shows that companies that discuss and are open about failures, are less likely to repeat them. Often the laundry list of problems seems overwhelmingly large. However, for the next quality problem to be caught it only requires one of the filters to work.

The next time your organization has a quality failure, don’t just stop once the technical cause is found. Look to all of the reasons why that problem wasn’t found and try and fix at least one. In subsequent articles I will provide a framework for how to bring teams together to systematically evaluate the whole process.

Anna Thornton earned her B.S.E. from Princeton University, her Ph.D. from Cambridge University, and was an Assistant Professor of Mechanical Engineering at MIT for six years. In 2000 she joined Analytics Operations Engineering ( where she currently works with a wide variety of companies to develop new tools, methods and approaches to address a wide variety of problems in multiple global industries in the fields of product development, quality systems, and production systems. She is the author of Variation Risk Management and many other articles on product development and quality.
Other posts by Anna Thornton:  Stop playing quality-whack-a-mole.