PUTTING AI TO THE TEST: Soldiers assigned to the 6th Squadron, 8th Cavalry Regiment, and the Artificial Intelligence Integration Center, conduct drone test flights and software troubleshooting during Allied Spirit 24 at the Hohenfels Training Area, Joint Multinational Readiness Center, Germany, March 6, 2024. (Photo by Spc. Micah Wilson, Joint Multinational Readiness Center)
An approach through Agile development and model quality simulation.
by Capt. Hannah Fairfield, Capt. Dylan Hyde and Capt. John T. McCormick
The concept-development and acquisition communities have long treated artificial intelligence and machine learning (AI/ML) as speculative future technologies for next generation military systems, but the Army can no longer ignore the problems of procuring and supplying AI/ML models in current military systems. As military transformation expert, Peter W. Singer, noted in his April 2024 article, “The AI Revolution is Already Here,” that “The battlefield applications of AI are quickly expanding from swarming drones to information warfare and beyond.” Though the U.S. Army has already started awarding contracts for AI, most of these contracts are for large monolithic platforms and unified systems such as Palantir Foundry, reported on by Lindsay Clark in her article “Palantir wins U.S. Army contract for battlefield AI,” in March 2024. These systems do serve an essential purpose for the enterprise as a whole, but they also leave a critical gap in the acquisition of individual AI/ML models for narrowly scoped systems. The war in eastern Europe is revealing that the pace of adaptation in contemporary large-scale conflicts will require the rapid development and procurement of modular AI-enabled systems, such as first-person view drones and tactical dashboards, as described in Mick Ryan’s February 2024 article “Russia’s Adaptation Advantage.”
Based on our experience developing and deploying Griffin Analytics, a maintenance management and predictive logistics application currently employed by Army Aviation, we advocate two forward thinking research and development processes that together will enable affordable and adaptable procurement of AI/ML systems. First, the Soldier-led Agile software development being pioneered by the Army Artificial Intelligence Integration Center (AI2C) and the Army Software Factory (ASWF) produces Army owned code that can integrate AI/ML models, while avoiding the vendor lock of monolithic systems. Second, the model-quality simulation framework being developed by cadets and faculty at the United States Military Academy Department of Mathematical Sciences enables AI/ML performance metrics to be translated into operational terms and establish specific benchmarks for individual models. Together, these processes would allow the procurement of AI/ML models to be more like ordering parts or components that improve existing Army processes, than the purchase of major end items.
ARMY ARTIFICIAL INTELLIGENCE INTEGRATION CENTER
As a direct report to Army Futures Command (AFC), AI2C plays a pivotal role in integrating AI/ML technologies into Army operations, driving innovation and adaptability. AI2C executes the Soldier-led Agile software development process, which is crucial for several reasons:
- Army-Owned Code: By developing software in-house, AI2C ensures the Army retains ownership and control over its codebase. This reduces reliance on external vendors and mitigates the risk of vendor lock-in, which can limit the military’s flexibility in adapting to new technologies or changing operational needs.
- Modular AI/ML Systems: The Agile development approach enables the creation of modular AI/ML solutions that can be integrated into various military systems. This flexibility allows the Army to deploy AI/ML models that are narrowly scoped to specific tasks, such as predictive analytics or decision support, facilitating rapid response to evolving challenges.
- Agile Response: The Soldier-led Agile development model promotes iterative and incremental improvements, allowing AI2C to refine its solutions based on real-time feedback from Army Soldiers, noncommissioned officers and officers. This continuous feedback loop ensures that the solutions remain relevant and effective, addressing the immediate needs of those on the ground.
Griffin Analytics is one of AI2C’s flagship prototypes and exemplifies the success of the Soldier-led Agile processes and is currently employed by XVIII Airborne Corps, Army Reserve Aviation Command and CENTCOM to facilitate better tracking and management of rotary wing assets.
Griffin is an aviation maintenance management application that uses AI/ML algorithms to predict maintenance needs and logistical requirements for Army Aviation assets. This proactive approach minimizes downtime, ensures mission readiness and reduces costs associated with unexpected repairs or delays. The application provides real-time, data driven insights into the performance and condition of aviation assets, allowing personnel to make informed decisions. These insights can be used to optimize resource allocation, manage maintenance schedules and improve overall operational efficiency. Griffin’s modular design allows it to integrate data from other military systems, incorporate new AI/ML models and be hosted on Army platforms from the tactical edge to the enterprise cloud.
Despite the success of the prototype, one persistent issue that the development team has identified is the lack of consistent requirements for Army software—particularly the predictive components of AI-enabled systems. The system’s modular design has made it easier to incorporate new AI/ML models, but without strict criteria and a connection between operational results and model quality it was initially unclear what level of predictive accuracy the new models would require. The ongoing research collaboration with faculty and cadets at West Point have helped start to address this gap through model-quality simulations and operational impact assessment.
WEST POINT DEPARTMENT OF MATHEMATICAL SCIENCES
Currently, cadets and faculty at West Point are researching ways to translate model quality to the Army’s vernacular through the perspective of unit predictive maintenance. If an AI/ML model can predict when a part breaks with a sensitivity of 90%, then what does that mean from the perspective of a maintenance officer? It is tempting to assume that if this model is utilized, the unit’s operational readiness rates will rise to at least 90%, but this is not generally a true statement. Therefore, model sensitivity is a component that needs to be considered when translating an AI/ML model to operational readiness rates, but it is not the only metric. For example, if a model is 90% sensitive but can only predict one operating hour in the future, then it provides almost no benefit to operational readiness, especially if parts are not on hand.
What is Sensitivity? Sensitivity is a quality metric that measures how well a model is correctly making a relevant selection. In our predictive maintenance case, it is the number of times a model correctly predicts that a part will break, divided by the total number of times that the part actually breaks. |
The framework for mapping an AI/ML model’s quality to a unit’s operational readiness rate has been developed by cadets and faculty at West Point. To create this model quality to readiness map, historical information is needed about the vehicle and its components, such as number of vehicles in the fleet and general information about how often components of each vehicle fail. Given this information, a simulation maps the model quality to expected average operational readiness rates for a fleet of vehicles. Currently, model quality is represented by the sensitivity and the amount of time in the future the model can predict out to.
For example, in a fictional situation where a unit’s average operational readiness rate is 80%, historical data could be used to set the parameters of the simulation. Once these parameters are established, the simulation can identify the sensitivity and how far out an AI/ML model would need to predict in order to increase that unit’s average operational readiness rate to any given rate, such as 90%. This information can then be used to set the requirements for procuring an AI/ML model.
Further work on this framework will need to implement metrics for inventory management. Use of AI/ML models have the potential to impact units’ budgets, storage space in Supply Support Activities, as well as national supply chains. Because of this, it will be crucial to understand the impacts of AI/ML models on these areas prior to any large-scale implementation for the Army.
CONCLUSION
The transition from physical industrial products to digital software applications has led to major challenges for military technology development and acquisition. The integration of AI/ML models into these algorithmic tools will only exacerbate these challenges—not only because modern AI/ML is contingent on software, but also because the stochastic (randomness or chance) nature of these models makes it difficult to determine what impact they will have on Army organizations and processes. Moreover, the state of the art for both software and AI/ML moves much faster than traditional technology for military products. As such it will become increasingly important to maintain flexibility and adaptability in procuring AI/ML models. Additionally, while developing those models it will be equally important to determine the level of performance required to achieve the desired operational impacts. The research and development approach addresses both these challenges and can serve as a model for elsewhere in the Army Acquisition enterprise.
The Agile software development methodologies being leveraged by AI2C, the Army Software Factory, and in other AFC organizations offer a paradigm shift in military technology development. Army development teams, consisting of Army personnel with minimal contractor support, can create modular systems that are model agnostic, allowing for seamless integration of different AI/ML models as technology evolves. The AI2C Sustainment team has proven that iterative development with continuous employment by operational units is both possible and effective at delivering incremental value as models and technologies develop. Moreover, modular systems built on agile principles are inherently flexible, enabling rapid customization and reconfiguration to meet specific mission needs. By leveraging in-house Agile software development, defense acquisition can overcome the challenges posed by the dynamic nature of AI/ML technologies, ensuring that military systems remain adaptable, responsive and future proof.
In parallel, leveraging model quality simulations offers a strategic approach to determining the performance requirements for the AI/ML model. The research underway by the West Point Department of Mathematical Sciences validates this concept by developing performance benchmarks for individual AI/ML models, allowing AI2C to commoditize the predictive maintenance models being deployed by Griffin. As this collaboration matures, developers in AFC can continue to maintain the holistic AI-enabled systems while contracting out individual machine learning components, enabling the procurement processes to transition from acquiring end systems to purchasing modular AI/ML. Just as components in traditional manufacturing undergo stringent quality testing, AI/ML can be evaluated against predefined performance benchmarks defined in both machine learning and operational terms. By establishing clear requirements for individual AI/ML models using quality simulations, defense acquisition can ensure that AI-enabled systems deliver immediate value and meet the demands of future battlefields, all while streamlining the procurement process and avoiding vendor lock.
DISCLAIMER: The views expressed herein are those of the authors and do not reflect the position of the United States Military Academy, the Army Artificial Intelligence Integration Center, the Department of the Army or the Department of Defense.
For more information on our research and development collaboration, go to https://www.westpoint.edu/academics/departments/mathematical-sciences.
CAPT. HANNAH FAIRFIELD is a logistics officer currently serving as a data scientist in the Sustainment Portfolio of AI2C. A member of the second cohort of the Army Artificial Intelligence Scholars Program, she holds an M.S. in business intelligence & data analytics from Carnegie Mellon University and a B.S. in human geography from the United States Military Academy.
CAPT. DYLAN HYDE is a logistics officer currently serving as an instructor at the United States Military Academy Department of Mathematical Sciences. He has an M.S. in applied mathematics from the Naval Postgraduate School, an M.A. in international relations from the University of Oklahoma and a B.S. in information technology from the United States Military Academy.
CAPT. JOHN T. MCCORMICK is an Army ORSA (FA49) currently serving as a data scientist in the Sustainment Portfolio of AI2C. A member of the inaugural cohort of the Army Artificial Intelligence Scholars Program, he holds an M.S. in business intelligence and data analytics from Carnegie Mellon University and a B.S. in mathematics and military history from the United States Military Academy.