Flecto JA

AI Papers

Curated research — 48 papers

RSS
LLMReasoningRL
🔴 Advanced

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

Linhao Yu, Xingguang Ji, Zhenghan Chen et al.

What if the secret to better LLM reasoning is giving hints that are just enough — not too much, not too little? KnowRL breaks problems into atomic Knowledge Points and uses Constrained Subset Search to find the minimal hint that unblocks exploration without leaking answers. On a 1.5B model, it beats GRPO by +9.63 points across 8 benchmarks.

2026-04-16T00:00:00+00:00
AgentReasoningBenchmark
🔴 Advanced

Toward Autonomous Long-Horizon Engineering for ML Research

Guoxin Chen, Jie Chen, Lei Chen et al.

AiScientist treats long-horizon ML research engineering as a systems problem: thin orchestrator control over thick durable state. The File-as-Bus workspace delivers +10.54 pts on PaperBench and 81.82 Any Medal% on MLE-Bench Lite.

2026-04-15
AgentSecurityLLM
🔴 Advanced

SoK: Agentic Skills — Beyond Tool Use in LLM Agents

Yanna Jiang, Delong Li, Haiyu Deng et al.

The first systematic map of the agentic skill layer — from formal definition to marketplace attacks. This SoK reveals 7 design patterns, introduces trust tiers for skill governance, and documents the ClawHavoc supply-chain attack that compromised 36.8% of marketplace users.

2026-02-24T00:00:00+00:00
AgentReasoningLLM
🔴 Advanced

RAGEN-2: Reasoning Collapse in Agentic RL

Zhefei Yu, Sipeng Zheng, Kun Shao et al.

RL-trained LLM agents silently collapse into repetitive templates despite high entropy. Mutual information (+0.39 Spearman) beats entropy (-0.14) as a diagnostic, and SNR-Aware Filtering restores diverse reasoning across 4 environments.

2026-04-08T00:00:00+00:00
AgentLLMBenchmark
🟡 Intermediate

SkillNet: Create, Evaluate, and Connect AI Skills

Yuan Liang, Ruobin Zhong, Haoming Xu et al.

SkillNet introduces an open infrastructure for creating, evaluating, and connecting AI agent skills at scale, featuring a unified ontology over 200,000+ skills that boosts average rewards by 40% and cuts execution steps by 30% on ALFWorld, WebShop, and ScienceWorld.

2026-02-26T14:24:02+00:00
LLMMemoryAgent
🔴 Advanced

MemOS: A Memory OS for AI System

Zhiyu Li, Shichao Song, Chenyang Xi et al.

What if LLMs had their own operating system for memory? MemOS unifies plaintext, KV cache, and model weights as schedulable resources—achieving state-of-the-art on all major memory benchmarks.

2025-07-04T17:21:46+00:00
AgentRetrievalLLM
🔴 Advanced

Learning to Retrieve from Agent Trajectories

Yuqi Zhou, Sunhao Dai, Changle Qu et al.

A new training paradigm for IR systems: learning to retrieve from agent trajectories bridges the gap between human-designed search and LLM-powered agent consumption.

2026-03-30T00:00:00+00:00
AudioLLMDiffusion
🟡 Intermediate

VibeVoice Technical Report

Zhiliang Peng, Jianwei Yu, Wenhui Wang et al.

VibeVoice synthesizes 90-minute, 4-speaker conversations using next-token diffusion with a 7.5 Hz tokenizer that compresses speech 80× vs Encodec — making long-form multi-speaker TTS feasible in a standard LLM context window.

2025-08-26T17:09:12Z
MultimodalSpatial ReasoningViewpoint
🟡 Intermediate

Token Warping Helps MLLMs Look from Nearby Viewpoints

Phillip Y. Lee, Chanho Park, Mingue Park et al.

Token warping — rearranging ViT image tokens rather than pixels — enables MLLMs to reason from nearby viewpoints without fine-tuning, consistently outperforming all baselines on the new ViewBench benchmark.

2026-04-03T00:00:00+00:00
LLMAgentReasoning
🔴 Advanced

Self-Distilled RLVR

Chenxu Yang, Chuanyu Qin, Qingyi Si et al.

RLSD solves the information leakage problem of on-policy self-distillation by repurposing the teacher as a token-level magnitude evaluator, achieving state-of-the-art on 5 multimodal reasoning benchmarks.

2026-04-03T00:00:00+00:00
MultimodalAudioVision
🟡 Intermediate

LTX-2: Efficient Joint Audio-Visual Foundation Model

Yoav HaCohen, Benny Brazowski, Nisan Chiprut et al.

A unified foundation model that jointly generates synchronized audio and video from text prompts, eliminating the need for separate audio and video pipelines.

2026-01-06T18:24:41+00:00
LLMReasoning
🔴 Advanced

Attention Residuals

Kimi Team, Guangyu Chen, Yu Zhang et al.

A simple architectural modification to Transformers that feeds attention outputs back as residuals, improving reasoning and long-context performance without additional parameters.

2026-03-16T09:32:21+00:00
AgentLLM
🔴 Advanced

Natural-Language Agent Harnesses

Linyue Pan, Lexiao Zou, Shuo Guo et al.

This paper introduces Natural-Language Agent Harnesses (NLAHs), showing that agent control logic can be expressed in editable text rather than code — achieving a 55% performance boost when migrating from code to natural language.

2026-03-26T00:00:00Z
LLMharness engineeringAutoML
🔴 Advanced

Meta-Harness: End-to-End Optimization of Model Harnesses

Aiden Grossman, Sanyam Kapoor, Arjun Desai et al.

A coding agent that automatically discovers better LLM harnesses—achieving rank #1 on TerminalBench-2 and +7.7 points over ACE on text classification, using filesystem access for causal diagnosis.

2026-03-28T17:59:04+00:00
LLMReasoningSuperintelligence
🔴 Advanced

Tool Building as a Path to "Superintelligence"

David Koplow, Tomer Galanti, Tomaso Poggio

Could AI achieve superintelligence by building its own tools? This paper argues yes — through the Diligent Learner framework combining test-time search with tool-building.

2026-02-25T00:00:00+00:00
AudioMultimodal
🟡 Intermediate

Voxtral TTS

Alexander H. Liu, Alexis Tacnet, Andy Ehrenberg et al.

Voxtral TTS by Mistral AI generates highly natural multilingual speech from minimal data, setting a new standard for expressive text-to-speech.

2026-03-26T15:23:34+00:00