Jana’s cozy corner A prison of my own making 2025-06-30 – Killing the joy of homelabbing with...
DZdano
People rant about having to learn algorithmic questions for interviews. I get it — interview system is...
Starting this month, parking lots in South Korea with more than 80 spaces will be required to...
Thanks to our supporters on Patreon I’d like to thank everyone who supported this project on Patreon,...
“Why don’t you use dependent types?” 02 Nov 2025 [ memories AUTOMATH LCF type theory Martin-Löf type...
Client Challenge JavaScript is disabled in your browser. Please enable JavaScript to proceed. A required part of...
I won’t mince words: Sling TV is confusing. It has, by far, the most confusing lineup of...
Explore your public land Discussion | Link
GITHUB HUGGINGFACE MODELSCOPE SHOWCASE From Chatbot to Autonomous Agent# We are proud to present Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI’s DeepResearch across a comprehensive suite of benchmarks. Tongyi DeepResearch demonstrates state‑of‑the‑art results, scoring 32.9 on the academic reasoning task Humanity’s Last Exam (HLE), 43.4 on BrowseComp and 46.7 on BrowseComp‑ZH in extremely complex information‑seeking tasks, and achieving a score of 75 on the user‑centric xbench‑DeepSearch benchmark, systematically outperforming all existing proprietary and open‑source Deep Research agents. Beyond the model, we share a complete and battle‑tested methodology for creating such advanced agents. Our contribution details a novel data synthesis solution applied across the entire training pipeline, from Agentic Continual Pre‑training (CPT) and Supervised Fine‑Tuning (SFT) for cold‑starting, to the final Reinforcement Learning (RL) stage. For RL, we provide a full‑stack solution, including algorithmic innovations, automated data curation, and robust infrastructure. For inference, the vanilla ReAct framework showcases the model’s powerful intrinsic capabilities without any prompt engineering, while the advanced Heavy Mode (test‑time‑scaling) demonstrates the upper limits of its complex reasoning and planning potential. Continual Pre‑training and Post‑training Empowered by Fully Synthetic Data# Continual Pre‑training Data# We introduce Agentic CPT to deep research agent training, creating powerful agentic foundation models for post‑training. We propose AgentFounder, a systematic and scalable solution for large‑scale data synthesis that creates a data flywheel with data from the post‑training pipeline. Data Reorganization and Question Construction. We continuously collect data from various sources, including documents, publicly available crawled data, knowledge graphs, and historical trajectories and tool invocation records (e.g., search results with links). As shown in the figure, these diverse data sources are restructured into an entity‑anchored open‑world knowledge memory. Based on randomly sampled entities and their corresponding knowledge, we generate multi‑style (question,answer) pairs. Action Synthesis. Based on diverse problems and historical trajectories, we construct first‑order action synthesis data and higher‑order action synthesis data. Our method enables large‑scale and comprehensive exploration of the potential reasoning‑action space within offline environments, thereby thereby eliminating the need for additional commercial tool API calls. Specifically, for the higher‑order action synthesis, we remodel trajectories as multi‑step decision‑making processes to enhance the model’s decision‑making capabilities. Post-training Data#...
