Data: Supabaseread-onlyRetrieved106Live DeepSeeknot runSupabase writesnot run

Radar

Usable radar list over the currently available retrieval evidence. It discloses source, freshness, uncertainty, review status, and citations before treating any item as report-ready signal.

Total retrieved items

106

Visible after filters

9

Included

101

needs_review

5

Excluded

0

Failed

0

Categories

research50product update22other17open source12model release9agent9opinion8media interview5

Source families

Research feeds43Other public sources21Open source18Company/lab17Analysis/media7

Source tiers

T180T218T1.57unreviewed1

Sources

arXiv cs.CL12arXiv cs.CV12arXiv cs.LG10OpenAI News9arXiv cs.AI9Anthropic Python SDK4Lex Fridman4Hugging Face Transformers3

Category tabs

Browse the visible public retrieval set by signal family.

Selectedagent

Filters

Query-param filters are applied server-side and do not change the retrieval source.

Reset
CaveatsCompletenessnot claimed
  • Read-only Supabase public radar retrieval was used; no Supabase write path ran.
  • 5 item(s) are marked needs_review and require human confirmation before confident synthesis.
  • This surface shows available AI Radar evidence only; it is not a claim of complete current AI industry coverage.

Evidence rows

Dense rows keep source, status, confidence, timing, and citation visible next to the claim.

Visible items9
01includedConfidence86%Overall0.91TierT1

The Scaling Laws of Skills in LLM Agent Systems

This study analyzes 15 frontier LLMs, 1,141 real-world skills, and over 3 million routing/execution decisions, identifying two coupled scaling laws in LLM agent systems: the routing law (single-step routing accuracy decays logarithmically with library size) and the execution law (correct execution improves difficult downstream decisions by about 4×). A single parameter b couples the two laws. Law-guided optimization raises held-out routing accuracy from 71.3% to 91.7%, reduces hijack from 22.4% to 4.1%, and improves pass rates on downstream benchmarks. Results show agent performance depends not only on model capability but also on skill library structure, granularity, and exposure policy.

Why it matters: May add technical evidence for future radar tracking: The Scaling Laws of Skills in LLM Agent Systems
02includedConfidence22%Overall0.91TierT1

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

AgentStop is a lightweight efficiency supervisor for locally deployed LLM agents that predicts and terminates unlikely-to-succeed trajectories, reducing energy waste by 15-20% with minimal performance impact (<5% utility drop).

Why it matters: May add technical evidence for future radar tracking: AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices
03includedConfidence82%Overall0.91TierT1

SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch

This arXiv cs.AI paper introduces SDOF, a framework that models multi-agent orchestration as a constrained state machine, using an online-RLHF intent router (trained via GRPO) and a state-aware dispatcher to enforce business stage constraints. Evaluated on a recruitment system (Beisen iTalent, 6000+ enterprises), the 7B model achieves 80.9% joint accuracy on an FSM-constrained benchmark (GPT-4o: 48.9%), end-to-end task completion rate of 86.5%, and blocks all 22 injection/illegal operations. Message-level blocking achieves 100% precision and 88% recall.

Why it matters: May add technical evidence for future radar tracking: SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch
04includedConfidence82%Overall0.89TierT1

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

This paper identifies a compounding occupancy shift failure in sequential fine-tuning of multi-agent LLMs and proposes TeamTR, a trust-region framework that resamples trajectories and enforces per-agent divergence control, achieving 7.1% average improvement over baselines.

Why it matters: May change available building blocks for teams evaluating open implementations: TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination
05includedConfidence86%Overall0.87TierT1

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

This paper introduces OSCToM, an RL-guided approach for generating high-order Theory of Mind conflicts to improve LLMs' recursive reasoning in complex social settings. It achieves 76% accuracy on FANToM and is 6x more efficient in data synthesis.

Why it matters: May add technical evidence for future radar tracking: OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
06includedConfidence86%Overall0.87TierT1

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

Proposes SOLAR, a self-optimizing lifelong autonomous reasoner that leverages parameter-level meta-learning and multi-level reinforcement learning for continual adaptation without gradient updates, outperforming strong baselines on commonsense, math, medical, coding, social, and logical reasoning tasks.

Why it matters: May add technical evidence for future radar tracking: SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
07includedConfidence86%Overall0.87TierT1

ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking

ReacTOD is a bounded neuro-symbolic architecture for zero-shot dialogue state tracking. It reformulates NLU as discrete tool calls within a self-correcting ReAct loop with deterministic validation. On MultiWOZ 2.1, it achieves 52.71% joint goal accuracy with gpt-oss-20B (14 points improvement) and 47.34% with Qwen3-8B. On SGD, Claude-Opus-4.6 achieves 80.68% JGA. The architecture improves accuracy by up to 9.3% over single-pass inference and achieves 93.1% self-correction rate on intercepted errors.

Why it matters: May add technical evidence for future radar tracking: ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking
08includedConfidence86%Overall0.87TierT1

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

The paper introduces PQR, a framework for automatically generating diverse and realistic user queries that elicit failures (e.g., unhelpfulness, unsafety) in LLM-based QA agents. It operates via iterative interaction between a query refinement module and a prompt refinement module, producing failure-triggering queries that resemble real user intents. Evaluated on an e-commerce QA agent, PQR uncovers 23%-78% more unhelpful responses and generates more diverse and realistic queries than previous methods.

Why it matters: May add technical evidence for future radar tracking: PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures
09includedConfidence86%Overall0.87TierT1

DeepSlide: From Artifacts to Presentation Delivery

DeepSlide is a human-in-the-loop multi-agent system that supports the full presentation preparation process, from requirement elicitation and time-budgeted narrative planning to evidence-grounded slide-script generation, attention augmentation, and rehearsal support. It integrates a controllable logical-chain planner, a lightweight content-tree retriever, Markov-style sequential rendering with style inheritance, and sandboxed execution. A dual-scoreboard benchmark separates static artifact quality from dynamic delivery excellence. Across 20 domains and diverse audience profiles, DeepSlide matches strong baselines on artifact quality while achieving larger gains on delivery metrics such as narrative flow, pacing precision, slide-script synergy, and clearer attention guidance.

Why it matters: May add technical evidence for future radar tracking: DeepSlide: From Artifacts to Presentation Delivery

Visible citations