DZdano, Author at GigaPlayLaps

A prison of my own making

DZdano November 2, 2025

Jana’s cozy corner A prison of my own making 2025-06-30 – Killing the joy of homelabbing with...

At the end you use git bisect

DZdano November 2, 2025

People rant about having to learn algorithmic questions for interviews. I get it — interview system is...

New national law will turn large parking lots into solar power farms

DZdano November 2, 2025

Starting this month, parking lots in South Korea with more than 80 spaces will be required to...

Palantir Thinks College Might Be a Waste. So It's Hiring High-School Grads

DZdano November 2, 2025

Comments Source link

Writing FreeDOS Programs in C

DZdano November 2, 2025

Thanks to our supporters on Patreon I’d like to thank everyone who supported this project on Patreon,...

Why don’t you use dependent types?

DZdano November 2, 2025

“Why don’t you use dependent types?” 02 Nov 2025 [ memories AUTOMATH LCF type theory Martin-Löf type...

Client Challenge

DZdano November 2, 2025

Client Challenge JavaScript is disabled in your browser. Please enable JavaScript to proceed. A required part of...

5 Best Live TV Streaming Services (2025), Tested and Reviewed

DZdano November 2, 2025

I won’t mince words: Sling TV is confusing. It has, by far, the most confusing lineup of...

Dispersed

DZdano November 2, 2025

Explore your public land Discussion | Link

Tongyi DeepResearch: A New Era of Open-Source AI Researchers

DZdano November 2, 2025

GITHUB HUGGINGFACE MODELSCOPE SHOWCASE From Chatbot to Autonomous Agent# We are proud to present Tongyi DeepResearch, the first fully open‑source Web Agent to achieve performance on par with OpenAI’s DeepResearch across a comprehensive suite of benchmarks. Tongyi DeepResearch demonstrates state‑of‑the‑art results, scoring 32.9 on the academic reasoning task Humanity’s Last Exam (HLE), 43.4 on BrowseComp and 46.7 on BrowseComp‑ZH in extremely complex information‑seeking tasks, and achieving a score of 75 on the user‑centric xbench‑DeepSearch benchmark, systematically outperforming all existing proprietary and open‑source Deep Research agents. Beyond the model, we share a complete and battle‑tested methodology for creating such advanced agents. Our contribution details a novel data synthesis solution applied across the entire training pipeline, from Agentic Continual Pre‑training (CPT) and Supervised Fine‑Tuning (SFT) for cold‑starting, to the final Reinforcement Learning (RL) stage. For RL, we provide a full‑stack solution, including algorithmic innovations, automated data curation, and robust infrastructure. For inference, the vanilla ReAct framework showcases the model’s powerful intrinsic capabilities without any prompt engineering, while the advanced Heavy Mode (test‑time‑scaling) demonstrates the upper limits of its complex reasoning and planning potential. Continual Pre‑training and Post‑training Empowered by Fully Synthetic Data# Continual Pre‑training Data# We introduce Agentic CPT to deep research agent training, creating powerful agentic foundation models for post‑training. We propose AgentFounder, a systematic and scalable solution for large‑scale data synthesis that creates a data flywheel with data from the post‑training pipeline. Data Reorganization and Question Construction. We continuously collect data from various sources, including documents, publicly available crawled data, knowledge graphs, and historical trajectories and tool invocation records (e.g., search results with links). As shown in the figure, these diverse data sources are restructured into an entity‑anchored open‑world knowledge memory. Based on randomly sampled entities and their corresponding knowledge, we generate multi‑style (question,answer) pairs. Action Synthesis. Based on diverse problems and historical trajectories, we construct first‑order action synthesis data and higher‑order action synthesis data. Our method enables large‑scale and comprehensive exploration of the potential reasoning‑action space within offline environments, thereby thereby eliminating the need for additional commercial tool API calls. Specifically, for the higher‑order action synthesis, we remodel trajectories as multi‑step decision‑making processes to enhance the model’s decision‑making capabilities. Post-training Data#...

DZdano

A prison of my own making

At the end you use git bisect

New national law will turn large parking lots into solar power farms

Palantir Thinks College Might Be a Waste. So It's Hiring High-School Grads

Writing FreeDOS Programs in C

Why don’t you use dependent types?

Client Challenge

5 Best Live TV Streaming Services (2025), Tested and Reviewed

Dispersed

Tongyi DeepResearch: A New Era of Open-Source AI Researchers

You may have missed

The Killing Joke Costs $17 Grand

Takashi Tezuka Reinvented Zelda Without Overthinking It

Fans Think New GTA 6 Emails From Sony Signal Something Big

Fan Updates Nirvanna The Band’s Wii Shop Song For Current Gen