Data: Supabaseread-onlyRetrieved106Live DeepSeeknot runSupabase writesnot run

Radar

Usable radar list over the currently available retrieval evidence. It discloses source, freshness, uncertainty, review status, and citations before treating any item as report-ready signal.

Total retrieved items

106

Visible after filters

Included

101

needs_review

Excluded

Failed

Category tabs

Browse the visible public retrieval set by signal family.

Selectedagent

All106 research50 product update22 other17 open source12 agent9 model release9 opinion8 media interview5

Filters

Query-param filters are applied server-side and do not change the retrieval source.

Reset

CaveatsCompletenessnot claimed

Read-only Supabase public radar retrieval was used; no Supabase write path ran.
5 item(s) are marked needs_review and require human confirmation before confident synthesis.
This surface shows available AI Radar evidence only; it is not a claim of complete current AI industry coverage.

Evidence rows

Dense rows keep source, status, confidence, timing, and citation visible next to the claim.

Visible items9

01includedConfidence86%Overall0.91TierT1

The Scaling Laws of Skills in LLM Agent Systems

This study analyzes 15 frontier LLMs, 1,141 real-world skills, and over 3 million routing/execution decisions, identifying two coupled scaling laws in LLM agent systems: the routing law (single-step routing accuracy decays logarithmically with library size) and the execution law (correct execution improves difficult downstream decisions by about 4×). A single parameter b couples the two laws. Law-guided optimization raises held-out routing accuracy from 71.3% to 91.7%, reduces hijack from 22.4% to 4.1%, and improves pass rates on downstream benchmarks. Results show agent performance depends not only on model capability but also on skill library structure, granularity, and exposure policy.

Why it matters: May add technical evidence for future radar tracking: The Scaling Laws of Skills in LLM Agent Systems

02includedConfidence22%Overall0.91TierT1

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

AgentStop is a lightweight efficiency supervisor for locally deployed LLM agents that predicts and terminates unlikely-to-succeed trajectories, reducing energy waste by 15-20% with minimal performance impact (<5% utility drop).

Why it matters: May add technical evidence for future radar tracking: AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

03includedConfidence82%Overall0.91TierT1

SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

This arXiv cs.AI paper introduces SDOF, a framework that models multi-agent orchestration as a constrained state machine, using an online-RLHF intent router (trained via GRPO) and a state-aware dispatcher to enforce business stage constraints. Evaluated on a recruitment system (Beisen iTalent, 6000+ enterprises), the 7B model achieves 80.9% joint accuracy on an FSM-constrained benchmark (GPT-4o: 48.9%), end-to-end task completion rate of 86.5%, and blocks all 22 injection/illegal operations. Message-level blocking achieves 100% precision and 88% recall.

Why it matters: May add technical evidence for future radar tracking: SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

04includedConfidence82%Overall0.89TierT1

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

This paper identifies a compounding occupancy shift failure in sequential fine-tuning of multi-agent LLMs and proposes TeamTR, a trust-region framework that resamples trajectories and enforces per-agent divergence control, achieving 7.1% average improvement over baselines.

Why it matters: May change available building blocks for teams evaluating open implementations: TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

05includedConfidence86%Overall0.87TierT1

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

This paper introduces OSCToM, an RL-guided approach for generating high-order Theory of Mind conflicts to improve LLMs' recursive reasoning in complex social settings. It achieves 76% accuracy on FANToM and is 6x more efficient in data synthesis.

Why it matters: May add technical evidence for future radar tracking: OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

06includedConfidence86%Overall0.87TierT1

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

Proposes SOLAR, a self-optimizing lifelong autonomous reasoner that leverages parameter-level meta-learning and multi-level reinforcement learning for continual adaptation without gradient updates, outperforming strong baselines on commonsense, math, medical, coding, social, and logical reasoning tasks.

Why it matters: May add technical evidence for future radar tracking: SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

07includedConfidence86%Overall0.87TierT1

ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

ReacTOD is a bounded neuro-symbolic architecture for zero-shot dialogue state tracking. It reformulates NLU as discrete tool calls within a self-correcting ReAct loop with deterministic validation. On MultiWOZ 2.1, it achieves 52.71% joint goal accuracy with gpt-oss-20B (14 points improvement) and 47.34% with Qwen3-8B. On SGD, Claude-Opus-4.6 achieves 80.68% JGA. The architecture improves accuracy by up to 9.3% over single-pass inference and achieves 93.1% self-correction rate on intercepted errors.

Why it matters: May add technical evidence for future radar tracking: ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

08includedConfidence86%Overall0.87TierT1

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

The paper introduces PQR, a framework for automatically generating diverse and realistic user queries that elicit failures (e.g., unhelpfulness, unsafety) in LLM-based QA agents. It operates via iterative interaction between a query refinement module and a prompt refinement module, producing failure-triggering queries that resemble real user intents. Evaluated on an e-commerce QA agent, PQR uncovers 23%-78% more unhelpful responses and generates more diverse and realistic queries than previous methods.

Why it matters: May add technical evidence for future radar tracking: PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

09includedConfidence86%Overall0.87TierT1

DeepSlide: From Artifacts to Presentation Delivery

DeepSlide is a human-in-the-loop multi-agent system that supports the full presentation preparation process, from requirement elicitation and time-budgeted narrative planning to evidence-grounded slide-script generation, attention augmentation, and rehearsal support. It integrates a controllable logical-chain planner, a lightweight content-tree retriever, Markov-style sequential rendering with style inheritance, and sandboxed execution. A dual-scoreboard benchmark separates static artifact quality from dynamic delivery excellence. Across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while achieving larger gains on delivery metrics such as narrative flow, pacing precision, slide-script synergy, and clearer attention guidance.

Why it matters: May add technical evidence for future radar tracking: DeepSlide: From Artifacts to Presentation Delivery

Visible citations

SourcearXiv cs.CLPublishedMay 19, 2026, 04:00 AM UTCStatus: includedConfidence86%

The Scaling Laws of Skills in LLM Agent Systems

https://arxiv.org/abs/2605.16508

SourcearXiv cs.LGPublishedMay 18, 2026, 04:00 AM UTCStatus: includedConfidence22%

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

https://arxiv.org/abs/2605.15206

SourcearXiv cs.AIPublishedMay 18, 2026, 04:00 AM UTCStatus: includedConfidence82%

SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

https://arxiv.org/abs/2605.15204

SourcearXiv cs.LGPublishedMay 18, 2026, 04:00 AM UTCStatus: includedConfidence82%

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

https://arxiv.org/abs/2605.15207

SourcearXiv cs.AIPublishedMay 21, 2026, 04:00 AM UTCStatus: includedConfidence86%

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

https://arxiv.org/abs/2605.20423

SourcearXiv cs.AIPublishedMay 21, 2026, 04:00 AM UTCStatus: includedConfidence86%

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

https://arxiv.org/abs/2605.20189

SourcearXiv cs.CLPublishedMay 20, 2026, 04:00 AM UTCStatus: includedConfidence86%

ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

https://arxiv.org/abs/2605.19077

SourcearXiv cs.CLPublishedMay 19, 2026, 04:00 AM UTCStatus: includedConfidence86%

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

https://arxiv.org/abs/2605.16551

SourcearXiv cs.AIPublishedMay 18, 2026, 04:00 AM UTCStatus: includedConfidence86%

DeepSlide: From Artifacts to Presentation Delivery

https://arxiv.org/abs/2605.15202