AI on the Frontline: Managing Speed, Stability, and Accountability in Combat

November 24, 2025

By Sarah Kreps

Security, Technology

This article was published alongside the PWH Report “The Future of Artificial Intelligence Governance and International Politics.”

For decades, military innovation suffered from a chronic lag between technology availability and doctrine adaptation. During the Cold War, systems such as early laser-guided bombs in Vietnam or the Navy’s initial attempts at networked battle management were technically viable well before they were widely used. In both cases, long testing cycles, inter-service disputes, and slow doctrinal adaptation meant the capabilities spent years in limited trials or niche roles before becoming central to operations.

Unlike earlier technologies that stalled for years, artificial intelligence (AI) has moved from research to combat employment at unprecedented speed. Analysts argue that the ability to shift an algorithm from lab to battlefield in months rather than years offers a strategic edge. Rapid AI deployment has already delivered operational gains in the form of faster sensor-to-shooter timelines, lighter analyst workloads, and tighter cross-domain integration.

Recent examples show both the upside and the risk. In Ukraine, Operation Spider’s Web saw AI-assisted first-person view (FPV) drone coordination move from prototype to battlefield within a single campaign. Israel’s Gospel and Lavender systems generated targeting recommendations at unprecedented scale, accelerating operations but raising concerns about bias and misidentification. In India, the Akashteer AI-powered air-defense “war-cloud” was deployed in 2024 before procedures were fully codified, providing short-term advantage but leaving doctrinal adaptation behind.

These examples raise the question of whether militaries capture the advantages of rapid AI deployment without multiplying the risks of inconsistency, miscalculation, or failure under stress?

This memo evaluates current AI adoption pathways and their implications for military effectiveness. It categorizes adoption into sanctioned programs, exercise-driven experimentation, bottom-up “shadow” use, and iterative in-theater updates. It then considers doctrinal, training, oversight, and stress-testing gaps; examines operational impacts of uneven adoption; and concludes with policy proposals to align speed with reliability, interoperability, and accountability.

Current Trajectories and Operational Use Cases

Although public information on battlefield AI and decision-support systems is limited, several confirmed cases provide a window into current trajectories that point to four broad modes of AI adoption. The first category consists of sanctioned, top-down programs, like the Project Maven flagship, which illustrates how AI tools can be introduced directly into core intelligence workflows with senior-level sponsorship and dedicated funding. Maven’s initial mission, which was to automate object detection in full-motion video, progressed from algorithm training to live operational use in under a year. A similar dynamic appears in the Pentagon’s frontier AI initiative, where senior leaders awarded contracts of up to $200 million to firms like OpenAI and Google, accelerating integration into defense missions while bypassing lengthy acquisition delays.

A second use case is exercise-driven experimentation, which differs from top-down programs like Project Maven. Rather than flowing from senior sponsorship and earmarked budgets, these initiatives are embedded within sanctioned training environments, giving developers and operators space to trial tools under realistic, but controlled, conditions. NATO and U.S. joint drills have become testbeds for AI-enabled decision-support—target identification, data fusion, and battle management—serving as a bridge between lab trials and operational deployment. For instance, during Exercise Talisman Sabre 2025, allied forces fielded decision-support tools like ConductorOS from BigBear.ai, which fused sensor data to streamline human-machine decision-making in live-fire, amphibious, air, and marine operations. The payoff is rapid learning and operator feedback; the limitation is that results may overstate battlefield readiness if exercises lack the stress and unpredictability of adversarial environments.

A third category of bottom-up, or “shadow,” adoption consists of individual units or commands adapting commercially available or open-source AI tools for mission-specific purposes without going through formal acquisition channels. These ad hoc integrations can range from data-visualization platforms that support mission planning to language models used for drafting intelligence summaries. For example, during Pacific exercises, a U.S. soldier improvised by pairing a reconnaissance drone with an FPV drone to improve targeting, despite there being no formal directive or training for that tactic. Such innovations often deliver immediate tactical utility, but they can also embed systems that have not been vetted for security, reliability, or compliance with operational doctrine. Historical parallels exist in the rapid, informal uptake of commercial GPS devices by soldiers in the 1990s, which proved useful in the field but initially bypassed standardized training and maintenance structures.

The final category consists of iterative in-theater updates, which differ from bottom-up shadow adoption in that they are sanctioned but evolve rapidly under battlefield conditions. Ukraine’s Delta system illustrates this trajectory. Originally a top-down program developed under NATO sponsorship and backed by the Ukrainian defense ministry, Delta moved into a wartime rhythm of constant iteration: developers pushed software refinements into the live system as units were fighting, adding features for data fusion and automated target recognition. This allowed commanders to integrate drone, satellite, and sensor inputs in real time, with updates arriving in days rather than months.

What ties these modes together is the collapsing distance between innovation and use. Capabilities that once would have taken years to prototype, approve, and field are now appearing in months—or even weeks—of technical feasibility. The upside is speed: commanders and units gain tools that shape operations in real time. The downside is vulnerability: adoption races ahead of doctrine, training, and oversight, drawing forces onto systems that can introduce risks for which they have not prepared.

Risks

Battlefield AI introduces risks that mirror its advantages: speed, distribution, and adaptability can all cut against coherence and reliability if not carefully managed. These risks appear along three main channels. First, centralized decision-making has long ensured unity of effort, doctrine, and accountability. Battlefield AI decision-support offers speed and awareness but risks fragmenting authority, either pushing decisions downward to operators acting on AI prompts or diffusing them across nodes generating competing recommendations. NATO officials at Naval Air Station Sigonella have acknowledged the challenge: member states are deploying AI-enabled systems, but each applies different confidence thresholds and operational rules.¹ The result is a patchwork in which allies may interpret the same data differently, complicating coordination in high-stakes situations.

Second, when deployment cycles shorten, training pipelines contract, narrowing the scenarios operators have rehearsed. In one Pacific exercise, a U.S. soldier improvised by pairing a reconnaissance drone with a first person view (FPV) drone for targeting, a tactic that worked in permissive conditions. The speed of such adoption leaves little time to test every possible degraded or contested environment, where adversaries are likely to employ jamming or spoofing. Historical precedent is instructive.

In 1988, the Aegis system on USS Vincennes misidentified and shot down Iran Air Flight 655, killing 290 civilians. Investigators concluded that while the system performed as designed, operators misread ambiguous signals under stress, a failure linked to insufficient preparation for degraded conditions. Without such training, which is in part a function of time, capable systems can still fail catastrophically.

Third, commercial AI introduces another channel of risk. Platforms like Safe Pro Group’s imagery analysis tool, now under U.S. Army evaluation, can be fielded faster than acquisition processes normally allow. Like the consumer GPS devices of the 1990s, such tools may diffuse informally, bypassing security vetting and interoperability standards. Commanders risk losing oversight of which tools shape planning and execution.

Across these examples, the through-line is both a mismatch in pace but also a cluster of risks: fragmented authority, operators unprepared for degraded conditions, diffusion of unvetted commercial tools, and brittle systems untested against adversaries. Unless doctrine, training, oversight, and stress-testing accelerate in step with deployment, AI-driven dependencies could erode unity of command long before adversary action makes those weaknesses visible.

Consequences

Variation in how units adopt and employ AI systems can directly affect operational tempo. Imagine a targeting exercise in which two allied teams receive the same intelligence, surveillance, and reconnaissance (ISR) feed but operate with different AI default settings. One team fires immediately on the system’s recommendation, while the other waits for human review. The result would be a delayed strike window, duplicated ISR coverage, and wasted aircraft fuel. In high-tempo operations, similar mismatches could translate into missed targets, squandered munitions, or avoidable exposure of friendly forces.

Differential rates in AI adoption can come not only from technical settings but also from human judgment. Some operators may default to AI outputs for speed, while others insist on verification even at high confidence levels. These differences in practice break the synchronization on which sequenced fires or coordinated maneuvers depend, and, over time, they can erode coalition trust if allies act at different tempos despite sharing the same data.

As militaries begin relying on AI-enabled targeting, the absence of reliable logs or metadata makes accountability harder rather than easier. Accountability is a legal and ethical issue but also a cornerstone of military effectiveness: without records, commanders cannot diagnose mistakes, refine tactics, or translate lessons into improved performance. Israel’s “Gospel” system illustrates the challenge. Public reporting does not clarify what data it ingests, though experts suggest it draws on drone feeds, intercepted communications, surveillance, and behavioral patterns. Such opacity is not unique to Gaza but reflects a broader problem. If commanders or coalition partners cannot trace which inputs shaped a recommendation, they cannot reliably explain why a strike occurred or whether it conformed to doctrine or law.

Shadow use of unvetted commercial tools compounds operational risks. In Ukraine, soldiers initially turned to messaging apps like WhatsApp and Telegram for tactical coordination because they were convenient and easy to use. But while encrypted, these apps still leaked metadata, fragmented communication records, and created vulnerabilities exploitable through traffic analysis or captured devices. By September 2024, the Ukrainian government banned Telegram on state-issued devices, warning that Russian forces could exploit it to track movements, spread phishing messages, or calibrate missile strikes. The episode illustrates how quickly tools adopted for speed and accessibility can turn into liabilities, a pattern that could be repeated if units reach for public AI services in the field. Just as with messaging apps, commanders may not know what data such AI tools expose or how their outputs shape decisions, eroding both security and oversight.

These effects accumulate. Variance in adoption undermines operational tempo, over-reliance fractures unity of command, gaps in accountability erode legitimacy, shadow use exposes sensitive data, and insufficient stress testing magnifies vulnerability. Together, they produce strategic liabilities: wasted sorties, fractured coalition performance, diminished trust, and heightened political fallout. In other words, what begins as technical or procedural inconsistency can scale into operational incoherence and strategic risk.

Policy Prescriptions and Conclusions

The core challenge in battlefield AI adoption is aligning the operational advantages of speed with the stability that comes from deliberate integration. Moving quickly can deliver immediate gains in the form of higher tempo, reduced cognitive load, and options adversaries cannot easily match. For years, defense reformers have worked to shorten acquisition cycles, seeking to overcome decades-long delays in fielding new technology. AI now accelerates that dynamic even further. But if adoption outpaces doctrine, training, and oversight, the same speed that promises operational advantage can also fragment decision-making and erode unity of command.

Policy responses should focus on anchoring rapid adoption within predictable, accountable practice. First, services should standardize how operators act on AI outputs—whether expressed as confidence scores, confidence bands, or other reliability indicators. For instance, high-confidence identifications could authorize immediate action; medium-confidence ones could require human review; and low-confidence ones would mandate further collection. Standardizing these thresholds across services and coalitions would reduce practice variance and clarify how to adapt procedures in degraded or adversarial conditions. Second, services should require every AI-assisted recommendation to carry uniform metadata—timestamps, model versions, thresholds, and supporting evidence—so commanders and investigators can reliably reconstruct how a decision was made during after-action review and oversight. Because shadow use of commercial tools won’t generate this data, commanders must also reduce the incentive for unsanctioned adoption by fielding secure, vetted AI options for common tasks and training personnel on the risks of public systems.

Third, commanders should reduce the incentive for shadow use of unvetted AI tools by providing secure, approved alternatives for common operational tasks, coupled with clear guidance on data handling. Where shadow use persists, units should flag it in reporting channels so commanders can at least maintain visibility into what tools are shaping decisions. Finally, commanders should use joint and coalition exercises to expose divergence in how units interpret AI outputs. By feeding identical inputs to multiple units and comparing their responses, commanders can identify inconsistent practices and recalibrate training or doctrine accordingly.

These measures would not slow AI adoption but make it interoperable, auditable, and resilient. Historical experience shows that the gap between technological possibility and doctrinal assimilation creates both opportunity and vulnerability. In this cycle, the gap has shrunk to months instead of years, heightening the risks of uneven adoption. Speed alone is a necessary but not sufficient condition for advantage. To turn rapid adoption into lasting capability, militaries must pair it with the doctrine, training, and accountability that hold units together in combat.

Sarah Kreps is the John L. Wetherill professor of government, adjunct professor of law, and director of the Tech Policy Institute at Cornell University.

AI on the Frontline: Managing Speed, Stability, and Accountability in Combat

Current Trajectories and Operational Use Cases

Risks

Consequences

Policy Prescriptions and Conclusions

About the author

Related News & Insight

The Role of "Everyone Else" in the International Effort Toward Responsible AI

Bureaucracy and the Future of Global AI Competition