From One Test to Many: Why “How You Sample” Matters

In the early 2000s, Takata was one of the world’s largest airbag manufacturers. Their inflators were tested routinely, passed specifications, and shipped globally.

But over the next decade, millions of those airbags ruptured during deployment, sending metal fragments into vehicle cabins, causing deaths and injuries. It became one of the largest recalls in automotive history.

What’s most troubling is that Takata did test these inflators, and the data looked fine. The company ran validation tests, production audits, and qualification builds. They collected data, but their sampling didn’t represent the real variation in materials, humidity exposure, or propellant aging. Takata’s internal testing and reporting often pooled inflators from different production lots and environmental conditions. When results were summarized in aggregate form, the influence of those factors and the real variation behind them was hidden inside averages. (Senate Commerce Committee Report pdf link)

Aside from any cultural or political issues within the company, they were collecting the wrong kind of data and summarizing it poorly. Takata’s problem wasn’t a lack of testing, it was a lack of understanding what their data represented. Even when utilizing a variable measurement system and leveraging large sample sizes, how does a failure of this scale still occur? And if a risk like this can make it through a launch process, what should we be doing differently?

The Next Step on the Test Continuum – Variable Responses with distributions

In the last article, we looked at “test to pass” systems that only record a binary outcome. This next stage looks similar on the surface, evaluating to a pass criterion, but the change is recording variable data along with collecting a representative sample size.

This critical shift brings the advantage that variation can now be studied. Studying variation is how we discover new knowledge. This shift from outcomes to measurements can allow for a change in perspective. Everything exists on a gradient. Variation is always present, and truth is never fully known. A single outcome is never the whole story. What we’re after is understanding the system, not simply conformance. The test method we’re examining in this article is better than the last one, but as you’ll see in the next few articles, we still have a way to go.

In this test method, sampling is leveraged to generate a distribution of results. Now the team can understand both variation as well as proximity of performance to threshold. Not all sampling is created equal though, so next we’ll explore how to get those samples as well as the topic of sample size.

                                                              

Why more data isn’t always better – Random vs. Rational Subgrouping

When planning a test, the common thinking is that more samples are always better. However, more doesn’t necessarily mean better, either from a business ROI standpoint or from a knowledge standpoint. 

Random sampling’s goal is to obtain an unbiased representation of the entire population when the process is stable. This would mean selecting units from a population purely by chance, so every unit has an equal probability of being chosen. When considering product development or manufacturing processes, is a truly random sample even possible? I’d argue it isn’t, especially in analytical work like test engineering. In cases like this the full population doesn’t even exist yet, is not in a location where logistics would facilitate proper random sampling, and most important of all doesn’t even provide the right type of data for predicting future performance.

Rational subgrouping’s goal is to identify and separate sources of variation in a process. This means intentionally sampling according to theories and sources of identified variation in the process that were noticed during process or product mapping activities. Notice that to do this you need some subject matter knowledge. I like that it forces you to be engaged with the process or design you’re trying to improve or validate. This “get to the gemba” concept is critical to know what subgrouping will be applicable. Placing certain sources to within or between subgroups allows for more meaningful cause and effect studies.

Both sampling methods can create a distribution, but only rational subgrouping can efficiently capture the full variation and then partition it into the various sources sampled across.

Is It a Good Test?

Again, referencing back to the first article of this series, I offered criteria for evaluating test approaches across a spectrum of sophistication. Let’s see how measuring a variable Y with a sample stacks up.

Serve a clear purpose: The purpose is there although still a bit simple. Verify whether a distribution meets a threshold.

Allow for inference that supports practical and statistical conclusions: No, still falling short here. Only a very limited ability to predict something about the system performance and it’s very dependent on the test specifics and level of subject matter expertise.

Include variation that reflects the real customer environment: Not really. Again, it’s up to the specifics of the test to replicate a real life worst case environment. Only sample variation is represented, no environment variation.

Connect directly to design or process decisions: A small amount of variation around process and product conditions can be learned. At this stage it can get dangerous or misleading like we saw in the airbag recall example. The sample can bring a false sense of certainty if used improperly for decision making.

Provide a reasonable return on investment: Still very little knowledge gained about true product, process, and environmental variation. The test cost would need to be very minimal to bring a knowledge / cost ratio that makes sense.

Overall, although it’s regarding the distribution this time this approach still only answers “Does it pass?” and doesn’t come close enough to the better question “How and why does it perform this way?”

Composite Flooring Example

An RV manufacturer is seeing a considerable rise in consumer demand and recognizes the volatility of the supply chain. Therefore, they have a desire to validate a new secondary supplier for a structural composite panel used in flooring. The company needs to ensure the new material meets a flexural strength standard of 50 MPa before approving it for production.

Rather than just testing a handful of panels and calling it a pass/fail check, the team decides to measure the flexural strength of several panels from two production lots. This generates continuous data they can analyze statistically. They’re able to check process stability using control charts, build a greater inference space by looking at not just one production lot but two, and compare the data vs. the threshold of 50 MPa. 

Both Lot A and Lot B averages are above the threshold. A simple approach that looked only at averages would conclude success, but in reality, many individual panels would fail in use.

A graph of a number of different types of dataAI-generated content may be incorrect.
A graph of different types of linesAI-generated content may be incorrect.

Since the team took the time to collect and plot variable data from two separate material lots they’re able to see that this supplier’s product is too near the threshold to be considered. The supplier’s process isn’t robust to material lot variation. Lot A would have been approved since the lower control limit is above the threshold, but we can clearly see Lot B performed much worse. Improvements would be needed before using them in production.

This approach demonstrates that by measuring variable data and applying some critical thinking and statistical methods, the team gains actionable knowledge about how and why the product performs the way it does rather than simply whether it passes a threshold. It is a small step beyond pass/fail testing but it dramatically improves the insights available for decision-making.

Beyond Pass/Fail

Takata’s engineers eventually found that inflators exposed to high humidity for long periods degraded the propellant and caused overpressure. The variation had been there all along, it was just masked by how data were grouped and summarized.

The lesson for any team isn’t about airbags. It’s about the shift from testing to confirm toward testing to learn. If your data don’t reflect the real variation in your system, the lessons they offer will be shallow.

Book a demo
By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy policy & Terms of use for more information.