Big data gets the headlines, but most decisions are based on small data, and human beings are naturally bad with it.
by Jason Martin
The Army’s ability to generate data from many sources is world class. However, my experience is that the Army isn’t doing the best job in creating the right data and turning it into the information decision-makers need most. The reason is simple. As a whole, we have failed to recognize a weakness scientists have shown that almost everyone has, and to use the statistical methods that improve our abilities to create and use small data.
Small data is the basis for many, if not most, of our acquisition decisions. There is no set definition for it, but it answers specific questions. It is gathered from planned experiments and tests (I’ll use these words interchangeably) such as laboratory bench experiments, system-level field tests, simulations, the list goes on. Small data can consist of less than 10 to hundreds, and possibly thousands, of data points depending on the situation. When we consider all of the decisions we make based on small data—from basic research to making design tradeoffs to system fielding and sustainment and everything in between—it isn’t a stretch to think that the Army makes hundreds or even thousands of decisions involving small data every day.
Big data is a popular topic, and for good reason. The Army is putting significant emphasis on improving our ability to use it. To demonstrate why we must also focus on our abilities to work with small data, consider an unmanned aerial system that autonomously detects hostile forces. The system likely would be developed using big data in the form of pictures or videos to train the system to distinguish between civilians, friendly forces and hostile forces. But when it comes time to evaluate how well the system works and to make fielding decisions, small data from developmental and operational tests most likely will be used. Small data probably would be used in designing the system, as well. Both big and small data are important.
THE BEST LAID PLANS…
Here’s a real-world example demonstrating the challenges of small data and what can happen if we don’t understand it. I worked on a test in which we measured whether a vehicle could avoid detection by a sensor. If it did, then we recorded a success. If not, a failure.
During the test, we systematically controlled five variables to determine their effect on the ability of the vehicle to avoid detection. The variables included distance from sensor to the vehicle, speed of the vehicle, the aspect of the vehicle relative to the sensor, etc.
The test plan we developed involved a few hundred data points that would enable us to learn not only the effects of the five variables, but also effects of all two-way combinations of the five variables. For example, we didn’t just want to understand how distance affected detection; we wanted to understand how the effect of distance changed as speed changed.
This is one of the underappreciated challenges of small data. The effects of combinations of variables are often important, and likely will become even more important as the systems we develop become more complex. Experiments must be planned appropriately to ensure we are able to understand these combinations, or interactions.
While it may take a little time to grasp the concept of an interaction fully (it did for me), they are very common. If you’re familiar with baking, you know that, to a point, adding salt can enhance other flavors. This is an example of an interaction between salt and another ingredient; the effects of another ingredient on flavor depend on how much salt is added. More generally, in a scientific formula, every time you see two or more values multiplied or divided, that’s an interaction. Though common, interactions often are not considered when planning a test and analyzing data, and we learn less than we can if they are considered.
…CAN GO AWRY
After the test was complete, I was told that our original plan was modified midway through the test. An operator made an error when executing a move to avoid the sensor, and the vehicle avoided detection (a success) when it wasn’t expected to. So the same “error” was tried again, and the result was another success. After a dozen or more trials with the same “error,” there were more successes than experts expected, so they concluded that the change most likely improved the ability of the object to avoid detection. This was the first mistake.
The rest of the test was altered to include the change with the expectation of more thoroughly demonstrating an important improvement. While changing a test is frequently necessary, engineers made a second mistake in the way the test was modified.
Unfortunately, the two mistakes prevented us from understanding whether the “error” was actually an improvement and, worse, from understanding the effects of the original five variables, the original goal of the test. I should mention that the engineers who made these mistakes are excellent engineers, among the best in their field. If they can make these mistakes, any of us can.
WHAT HAPPENED?
How could so much go wrong when some excellent engineers made a seemingly simple change to a test? Working with small data has challenges that are often underappreciated, and the testers weren’t aware of two common challenges and how to address them. It is important to point out that while this example involves modifying a test, these challenges are equally relevant when initially planning a test and the same mistakes described below are often made at that time.
The first challenge is knowing how much data is necessary to make a decision. Though we can’t know for sure for reasons that will be explained below, it appears the unexpected successes were not a result of the “error,” but just random occurrences. This is the same kind of randomness that allows you to get eight heads when you toss a coin 10 times and then get four heads in the next 10 tosses. It was as if the vehicle went on a lucky streak and avoided the sensor more times than expected, but the streak was mistakenly attributed to the “error.” The decision to change the test was made without enough data to distinguish a lucky streak from something meaningful. To avoid such mistakes we have to collect enough data (but not too much, that’s overly costly) to allow us to determine with acceptable risk whether something happens for an identifiable reason or just randomly.
The second challenge is planning or modifying a test in a way that allows us to understand clearly the effect of each variable on the result we are measuring. We wanted to understand how five variables, including their two-way interactions, affected the ability of a vehicle to avoid a sensor. In the middle of the test, we added the goal of understanding whether the “error” improved this ability.
While it is sometimes necessary to modify a test, this one was changed in a way that confounded the “error” with some of the other controlled variables, so we couldn’t tell what variables actually affected the vehicle’s ability to avoid detection. The people who made the change had no idea this had happened. The sidebar, “UNFORCED ERROR,” provides a simple example to explain what happens when a test is designed or modified incorrectly.
A SYSTEMIC PROBLEM
Small data decisions are difficult for nearly everyone. In his book “Thinking, Fast and Slow,” Nobel Prize laureate, psychologist and behavioral economist Daniel Kahneman discusses the weakness humans have in our intuition about small data. We tend to think that small amounts of data tell us more about future events than they do. This appears to be what happened when engineers believed the “error” had an impact of the ability of the vehicle to avoid the sensor. Intuitions about the meaning of the data they collected failed them, and they didn’t know the statistical methods that would help them avoid the mistake.
The engineers made mistakes when modifying the test because they didn’t understand the necessity or the use of statistical methods, known as design of experiments, that were needed to modify (and initially create) the test correctly. They neither realized the need nor knew how to make sure Bert and Ernie push from adjacent sides of the box.
The mistakes happened because, through no fault of their own, some excellent engineers did not understand a few fundamental statistical concepts.
It is tempting to think that engineers and scientists who are good with numbers are also naturally good at collecting and analyzing data. This is not true. Creating the right data to help answer a question and analyzing it in the most informative way requires an understanding of statistical methods that allow us to deal effectively with randomness and uncertainty. Intuition is completely insufficient.
While the Army certainly has individuals and groups with expertise in working with small data, mistakes with small data are systemic and partly a result of deficiencies in engineering and scientific curriculums. Most college graduates in sciences and engineering arrive in the workforce with an understanding of equations and theories, but with limited skills to deal with the random variability inherent in the real world. Statistical methods allow us to cope with this variability when deciding what data to create, when analyzing the data to develop useful information and when making decisions. Even those with academic training often struggle with practical application for complex military applications. Though available, few people receive on-the-job training in effective use of statistical methods.
This lack of statistical knowledge has important consequences. Without using statistical methods to plan an experiment, it is more likely to provide too much data (overly costly), to provide too little data (not enough data to answer questions accurately) or to have little hope of providing data needed to answer the questions of interest. The latter was what happened when the test was modified in the sensor example above. A thorough explanation of these points may be useful, but would also a bit much to cover in this article. Suffice it to say that if people plan an experiment by thinking of interesting things to do, without using statistical methods to create and evaluate the plan, it’s easy to unknowingly make a mistake. Using design of experiments helps us leverage our knowledge to avoid these mistakes.
Once data is collected from a well-designed experiment, we should use statistical methods such as regression analysis to understand how the variables we controlled in our experiment (and sometimes those we didn’t) affect the results we measured. Where our intuition often fails us, statistical analysis allows us to understand better whether changes in our data were caused by changes to the controlled variables or by randomness. Furthermore, we can understand the uncertainty in our conclusions. Understanding our uncertainty is crucial to making decisions that appropriately consider risk. Take a look back at Figure 2. Without considering the uncertainty, it would be difficult to make good decisions that rely on knowing how hard Bert and Ernie are pushing the box. Just knowing the average values (the black dots) isn’t enough. We must consider the uncertainty because it is directly related to the risk of a wrong decision.
Without appropriate statistical methods, it’s easy to plan an experiment, execute it, analyze the data, report results and make a decision without ever knowing mistakes were made. Such decisions are built on a house of cards that can be costly in terms of dollars, time or even lives.
A SYSTEMIC SOLUTION
The hole in our small data capabilities also presents a tremendous opportunity. For each of the thousands of small data decisions we make, we can learn to use statistical methods that help ensure that we 1) spend appropriate resources to collect the right amount of data, 2) collect the right data to most fully answer our questions and 3) perform analysis that most accurately quantifies what we believe in a way that communicates the uncertainty in conclusions. This will fundamentally change our abilities to most effectively use resources and take calculated risks.
QUESTIONS FOR LEADERSHIP
I know from experience that widespread adoption of the statistical methods we need is not likely to happen without strong leadership. Decision-makers must encourage it by asking the right questions and insisting that we use rigorous statistical processes to create and analyze data. We need leaders and decision-makers to know which questions to ask and how to recognize an adequate answer.
Here are examples of some important questions and information we should always know. The answers should be based on rigorous statistical methods, not opinions.
- Is that test the right size? Do we need more or fewer test runs? What assumptions were made to determine the size of the test and why? Please show me the (simple) results of calculations that support the plan.
- Exactly what do I expect to learn from this test and how much uncertainty can I expect to face when making a decision based on it?
- How do I know that we can understand the effect of every variable in the test on the outcome? Will I be able to understand how the effect of one variable changes as another variable changes? If not, why don’t we think that is important to know?
- How much uncertainty is there in the conclusions you have provided? (The answer should be quantitative, not just an opinion. For example, “We think the answer is 4, but statistical analysis indicates the answer is between 3 and 5 with 95 percent confidence.”)
These questions are straightforward; any good test plan and resulting data analysis addresses them. Understanding the high-level statistical concepts needed to ask them and to assess answers does not require in-depth knowledge of statistics. Anyone can learn with a reasonable amount of training. Asking these questions will encourage due diligence from those collecting data, performing analysis and creating information used to make decisions.
Providing appropriate answers to the questions above will certainly require more statistical knowledge than asking them and recognizing an adequate answer. Will this require everyone working with data to become a statistician? Not at all, but it is necessary that those planning experiments and analyzing data have an understanding of design of experiments and statistical analysis and know when to call someone more knowledgeable. Just as nearly everyone needs to understand Microsoft Excel or PowerPoint at an appropriate level to do a job, nearly everyone should understand statistical concepts at an appropriate level. Otherwise, we are expecting people to manage risk without the skills needed to understand and cope with the uncertainty that causes the risk.
Fortunately, there are already small groups in the Army, throughout DOD and in private industry that have in-depth knowledge of applying statistical methods to the development of military systems. Some have developed extensive training. We have a small but very capable base to grow from. In addition, commercial and free software tools have seen significant increases in capabilities over the last decade. Finally, while in-person training is often more effective, we have learned to work and train remotely over the past year, and training can be done more efficiently than ever. Everything is in place to make the needed improvements in our capabilities to plan tests, analyze data and create the information needed to best support decision-makers. We just need leaders to help us focus.
CONCLUSION
Though we have pockets of excellence, the Army has a systemic weakness in its ability to efficiently create small data and turn it into the most useful information for decision-makers. By recognizing and understanding this weakness, we create an opportunity to fundamentally change our ability to develop military systems. For every decision based on small data, our goal must be to create the information needed using the right statistical methods.
This will only happen if leaders make an effort to truly understand our current weaknesses, recognize the opportunity and begin to lead the change. Our ability to maintain military superiority may depend on it. For a given amount of resources, there is a significant risk we will achieve less than those who effectively apply statistical methods to small data, because day in and day out, they will make better decisions, both small and large. It will take time to develop the capabilities we need, but all of the necessary pieces are in place to begin to improve. To maintain our position as the world’s most powerful military, we need leadership to help us get started.
For more information on application of statistical methods to planning experiments, conducting appropriate analyses and providing the most useful information to decision makers, visit www.testscience.org.
JASON MARTIN has been test design and analysis lead for 10 years at the U.S. Army Combat Capabilities Development Command Aviation & Missile Center. He has an M.S. in statistics from Texas A&M University and an MBA and a B.S. in mechanical engineering, both from Auburn University. He is Level III certified in test and evaluation and in engineering.