LOGISTICS FOR DATA

ALTArticle_LogisticsForDataPt3

ON TRACK: As new data sources become available, so must methods to track data from those sources. (Photo by Anete Lusina, Pexels)

 

 

Data transportation and maintenance: Getting battlefield data to the right place at the right time.

Part three of a three part series

by Thom Hawkins, Andrew Orechovesky and Sunny Zhang

Running out of toilet paper has a way of focusing your attention. In that way, the COVID-19 pandemic made many of us more familiar with logistics and the supply chain. We had to estimate the depletion rate of our supplies and the lead time to procure replacements. We discovered the convenience of delivery—not only books and household goods but also food, and even alcohol. The supply chain became so overwhelmed with demand that some of us were drafted into it, distributing misdelivered packages to our neighbors.

The first two entries in this series (“Logistics for Data,” Army AL&T Fall 2023 and “Logistics for Data,” Army AL&T Winter 2024) discussed the demand signal for data and the inventory and warehousing of data. This third and final entry will focus on data transportation, synchronization and provenance.

DATA TRANSPORTATION

Whether toilet paper and milk or fuel and ammunition, logistics ensures that something gets to where it needs to be when it needs to be there. On the battlefield, transporting data comes at a price: detectability. Data sent from one place to another via line-of-sight communication or through a satellite emits a signal. The more data transmitted, the bigger—or longer—the signal. While we want to capitalize on the promise of data, we must do so efficiently. If moving the data to the computing resources has its limitations, so does the alternative: moving the computing power to the data. A local data node that provides an on-premises means for processing data can increase costs because hardware must exist for each node, while size, weight and power restrictions limit capacity.

The Army has settled on a hybrid of a cloud node and multiple local data nodes for the mission command domain. The cloud node provides scalability and flexibility, offering expansive storage and computational power for machine learning and advanced data analytics. It excels in handling large volumes of data and adapts to fluctuating workloads. The cloud node is the central repository for relatively static data, such as personnel and equipment.

The local data nodes, in contrast, offer a more controlled environment. They are reliable and secure, especially when handling sensitive data or operations requiring strict data governance. Their physical proximity to the core operations often translates into faster data processing speeds, albeit with limited scalability compared to a cloud node.

Effective load balancing and using content delivery networks also can mitigate network lag by distributing the data load across multiple local nodes, reducing the strain on any single node and ensuring a more efficient data flow. When real-time data transfer is not critical, employing asynchronous data replication methods can help manage connectivity issues by allowing data to be queued and transmitted when the connection is stable. In short, managing data in a distributed architecture with network challenges involves a combination of technological solutions and strategic data-handling practices to ensure efficient, secure and reliable data flow.

TRUST BUT VERIFY Data is only as trustworthy as its source—data provenance ensures information security. (Image by iStock/Getty Images)

TRUST BUT VERIFY: Data is only as trustworthy as its source—data provenance ensures information security. (Image by iStock/Getty Images)

DATA SYNCHRONIZATION

The key difference between a computer network in a tactical environment and a commercial one is availability. If your home internet connection lacks reliability, often the problem is calling from inside your house, and a router reset will put it right. If the issue is outside your control, you switch providers or ask for a refund because, doggone it, you paid for full service.

Resilient data management systems are essential in a degraded network environment with intermittent or unstable connectivity. Robust data synchronization protocols can handle connection interruptions and prioritize efficient data transmission when the network becomes available. Intelligent caching mechanisms can also store and update data locally, reducing the impact of network disruptions.

This reality is another reason for the Army’s hybrid data architecture. Ideally, the data in the cloud and local nodes are synchronized. However, the volume of data not only may overwhelm the means of transporting data under ideal circumstances, but also slow to a stop under some conditions.

Once the network is restored in a battlefield environment, establishing priorities for data updates becomes crucial. Not all data must be synchronized simultaneously, as certain information is more time-sensitive or critical than others. Clear priorities ensure that the most important data is updated first, enabling effective decision-making and operational coordination. For example, critical data might include data directly relevant to ongoing operations, situational awareness and imminent threats. Tactical intelligence is important but not as immediate as imminent threats. Finally, less time-sensitive historical data, such as archives, past mission records and lessons learned, can be updated after synchronizing more critical data.

By establishing clear priorities for data updates, battlefield commanders can ensure that the most critical information is available right after network restoration. This facilitates swift decision-making and effective response in dynamic and rapidly changing environments.

CAUSE AND EFFECT A representation of the causes and effects of network constraints. Resilient data management systems are essential in a degraded network environment with intermittent or unstable connectivity. (Graphic by the author)

CAUSE AND EFFECT: A representation of the causes and effects of network constraints. Resilient data management systems are essential in a degraded network environment with intermittent or unstable connectivity. (Graphic by the author)

PROVENANCE AND MAINTENANCE

As new data sources become available, so must methods to track data from those sources. Machine learning uses known data points to build a model that predicts data within the model’s parameters. For example, knowing where munitions fell concerning their target under certain conditions can help predict where munitions will fall under similar conditions, and can be corrected to hit the target more precisely. Because data is only as trustworthy as its source, data provenance is necessary to ensure information security. This has always been important, but with the growth of data and increasing complexity of models, trust now borders on faith that the output is correct.

An enemy could subtly inject false data that evades notice but impacts the result, such as targeting coordinates. Attackers inherently have the advantage of needing only a single vulnerability and a few moments to breach a system that defenders must always be ready to protect.

The level of certainty in data, derived from its source, may not appear in the output but is a consideration for how we use that output. Just as we would likely avoid a medication that causes death in 30% of the people who take it, we should similarly reject a targeting system with only 70% accuracy. Fusing data with varying degrees of certainty can be a complicated business. However, confirming data with a second independent source can increase the certainty if it confirms the data from the original source.

Data is also often subject to time constraints. You would avoid using a six-month-old weather report to decide what to wear outside because time-variable data eventually expires, so it must be regularly updated. Data also may need to be updated based on other parameters; for example, machine learning models are subject to drift over time and must be retrained when the drift reaches a particular threshold. As with fielded hardware, there must be a plan to maintain data to ensure its accuracy and quality.

DOUBLING DOWN ON DATA Managing data in a distributed architecture with network challenges involves a combination of technological solutions and strategic data-handling practices to ensure efficient, secure and reliable data flow. (Photo by Brett Sayles, Pexels)

DOUBLING DOWN ON DATA: Managing data in a distributed architecture with network challenges involves a combination of technological solutions and strategic data-handling practices to ensure efficient, secure and reliable data flow. (Photo by Brett Sayles, Pexels)

CONCLUSION

This series has demonstrated that concepts underpinning logistics for a hardware system—such as demand, storage and transportation—are also relevant to data. Not only should we think about data in these terms, but we should also plan for it. As data models increasingly serve as battlefield tools, there is an unanswered question about whether logistics for data would fall under sustainment or operations and maintenance from a funding perspective or whether, as data models increasingly serve as battlefield tools, their updates should be a function of acquisition. This must be solved to ensure the availability and integrity of our data.

Personnel is another necessary facet of the data infrastructure. Certainly, we need training, but we also need personnel, such as data engineers, who specialize in getting data from one place to another, responding to demand, and ensuring quality and security.

This series began with the claim that “data is the new ammunition”—but it still doesn’t feel real in the way ammunition does. Data is not treated as fungible—the way a 9 mm round is a 9 mm round—so it’s often kept siloed as if there’s something special about our data versus yours. Logistics implements equalizers—such as how to answer the demand, the availability and use of common data, and getting the trusted data we need from an outside source. If we get these right, we can treat data as a commodity, just like ammunition.

For more information, contact Thom Hawkins at jeffrey.t.hawkins10.civ@army.mil.

 


 

THOM HAWKINS is the team lead for data architecture and engineering with Project Manager Mission Command, assigned to the Program Executive Office for Command, Control and Communications – Tactical (PEO C3T) at Aberdeen Proving Ground, Maryland. He holds an M.S. in library and information science from Drexel University and a B.A. in English from Washington College.

 ANDREW ORECHOVESKY is a senior systems engineer for the data architecture and engineering team within Project Manager Mission Command, assigned to PEO C3T at Aberdeen Proving Ground. He holds a Doctor of Science from Capitol Technology University, an M.S. in cybersecurity from the University of Maryland, Baltimore County, and a B.S. in information technologies from the University of Phoenix.

SUNNY ZHANG is a systems engineer for the data architecture and engineering team within Project Manager Mission Command, assigned to PEO C3T at Aberdeen Proving Ground. He holds a B.A. in arts, technology and emerging communications from the University of Texas at Dallas.

   

Read the full article in the Fall 2024 issue of Army AL&T magazine. 
Subscribe to Army AL&T – the premier source of Army acquisition news and information.
For question, concerns, or more information, contact us here.