Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

An Open and Reproducible Deep Research Agent fo...

Avatar for Ikuya Yamada Ikuya Yamada
December 11, 2025

An Open and Reproducible Deep Research Agent for Long-Form Question Answering

Slides for an open and reproducible deep-research system, a winning entry in the NeurIPS 2025 MMU-RAG competition.

Avatar for Ikuya Yamada

Ikuya Yamada

December 11, 2025
Tweet

More Decks by Ikuya Yamada

Other Decks in Research

Transcript

  1. An Open and Reproducible Deep Research Agent for Long-Form Question

    Answering Ikuya Yamada1,2 Wataru Ikeda2 Ko Yoshida2 Mengyu Ye2 Hinata Sugimoto2 Masatoshi Suzuki1,2 Hisanori Ozaki3 Jun Suzuki2,4,5 1Studio Ousia 2Tohoku University 3DENTSU SOKEN INC. 4RIKEN 5NII LLMC
  2. 2 Open and Reproducible Deep Research Open-weight Models Qwen3 LLM

    and Qwen3 Reranker Reproducible Search API built on Open Corpus DeepResearchGym API and ClueWeb22 corpus Open-sourced Code
  3. System Overview 4 Question Search API Reranker Summarizer Research Agent

    Answer Search Tool Retrieved documents Reranked documents Search tool generates summaries of web search results Research agent manages the entire reasoning flow: • Decides whether additional searches are required at each turn • Invokes the search tool if additional search is needed • Generates the final answer at the end Document Summaries Inference can span multiple turns
  4. Search API: ClueWeb22 API (provided by the competition organizers) Reranker:

    Qwen3-Reranker-0.6B Summarizer: Qwen3-Next-80B-A3B-Thinking (untuned) No training performed; off-the-shelf models used with prompting System Overview: Search Tool 5 Question Search API Reranker Summarizer Research Agent Answer Search Tool Retrieved documents Reranked documents Document Summaries Inference can span multiple turns
  5. Built on a reasoning model (Qwen3-Next-80B-A3B-Thinking) • Prompt-based control for

    reasoning, tool use, citation generation, and answer generation Enhanced the reasoning quality by preference tuning: • Direct Preference Optimization (DPO) with LoRA • Synthetic preference data based on LLM-as-a-judge signals System Overview: Research Agent 6 Question Search API Reranker Summarizer Research Agent Answer Search Tool Retrieved documents Reranked documents Document Summaries Inference can span multiple turns
  6. Citation Generation 7 Summarizer: instructed to insert citation markers in

    the summary Research agent: instructed to place citation markers in the appropriate positions in the answer A simple prompt-based approach works well for citation generation
  7. Preference Tuning: Metrics Clarity: reflects logical coherence and linguistic fluency

    Insightfulness: captures analytical nuance and depth of reasoning Factuality: measures consistency with underlying facts 8 Three LLM-as-a-judge-based metrics for evaluating answer quality (Coelho et al., 2025) All metrics are scored by OpenAI’s o3-mini
  8. Preference Tuning: Input Questions 500 questions from Researchy Questions: non-factoid

    questions sampled from Bing 200 questions from Natural Questions: factoid questions sampled from Google 300 questions from Competition Validation Set: questions designed to match the competition setting 9 Diverse set of factoid & non-factoid questions to reflect real-world diversity
  9. Preference Tuning: Dataset & Training Dataset Rollout Generation: Generate 20

    rollouts with the untuned model Preference Score Evaluation: Compute the metrics for each answer Preference Pair Construction: • Highest score → Chosen • Lowest score → Rejected • Tie → More search calls as Chosen; few search calls as Rejected Training Direct Preference Optimization (DPO) with LoRA for one epoch 10
  10. Experimental Results • All LLM-based metrics improve after preference tuning

    • Search count slightly increased 11 Results on 100 test samples from Researchy Questions Model LLM-based Metrics Search Count Clarity Insightfulness Factuality Vanilla 6.71 6.52 43.4% 1.03 Preference-tuned 8.18 7.50 44.3% 1.20
  11. 12 The pandemic has significantly impacted various aspects of society.

    Economically, lockdowns affected small and large businesses, leading to bankruptcies and increased unemployment (#e070,#de63,#0071). Socially, interactions between families and friends have decreased due to social distancing, with people enclosed in their homes and relying on digital platforms (#e070,#de63,#c4dc). Education shifted to online learning, causing challenges such as poor internet connectivity and reduced interaction between students and teachers (#e070,#dc5a,#c4dc,#345a). Health impacts include direct effects from the virus and indirect issues like obesity, heart disease, and stress from inactivity (#e070,#345a,#8fc7). Mental health challenges such as anxiety, depression, and social isolation have become widespread (#e070,#8fc7). The pandemic also influenced international relations, with countries collaborating on vaccine distribution and humanitarian aid (#e070). Additionally, it affected religion through increased individual spiritual practices but reduced collective gatherings (#e070), and strengthened nationalism through community volunteer efforts (#e070). Work environments transitioned to remote settings, though many lost jobs in non-essential sectors (#0071,#de63,#af16). These multifaceted consequences underscore the profound and far-reaching nature of the pandemic’s influence across nearly every dimension of human life, from individual well-being to global systems. The pandemic has profoundly reshaped nearly every facet of human life and the natural world, with impacts spanning economic, social, educational, health, environmental, and geopolitical domains. Below is a detailed breakdown of these effects, supported by verified sources: Economically, lockdowns and restrictions caused widespread disruption, leading to business closures, bankruptcies, and a surge in unemployment. Small and large enterprises alike faced unprecedented challenges, with non-essential sectors experiencing severe job losses (#e070,#de63,#0071). This economic strain rippled through global supply chains and consumer markets, exacerbating inequalities and slowing recovery in many regions. Socially, physical distancing measures drastically reduced face-to-face interactions among families and friends, forcing communities to rely heavily on digital platforms for communication and connection. While technology enabled virtual gatherings, it also highlighted disparities in access to reliable internet and digital literacy, particularly in underserved populations (#e070,#de63,#c4dc). Education systems underwent a rapid shift to online learning, which introduced significant challenges. Students and educators grappled with poor internet connectivity, lack of access to devices, and diminished opportunities for collaborative learning. These issues disproportionately affected marginalized groups, widening educational gaps and raising concerns about long-term academic outcomes (#e070,#dc5a,#c4dc,#345a). Health impacts were both direct and indirect. Beyond the immediate threat of the virus itself, the pandemic exacerbated chronic conditions such as obesity and heart disease due to reduced physical activity and disrupted healthcare services. Stress from uncertainty, isolation, and economic instability further compounded physical health challenges (#e070,#345a,#8fc7). Mental health crises became widespread, with anxiety, depression, and social isolation affecting millions globally. The loss of routines, grief from bereavement, and fear of infection created a mental health epidemic that strained healthcare systems and highlighted the need for accessible psychological support (#e070,#8fc7). Internationally, the pandemic spurred unprecedented collaboration in vaccine development and distribution, with countries sharing resources and data to combat the virus. Humanitarian aid efforts also expanded to support vulnerable populations, though geopolitical tensions sometimes hindered global cooperation (#e070). Religious practices adapted significantly, with increased individual spiritual activities such as home-based prayers and virtual services, while collective gatherings like congregational worship were severely restricted. This shift reshaped community dynamics within faith-based organizations and altered traditional rituals (#e070). Nationalism was both reinforced and challenged during the pandemic. While some nations saw heightened community solidarity through volunteer efforts and mutual aid, others experienced political polarization and border closures that intensified nationalist sentiments (#e070). Work environments transitioned to remote setups for many, accelerating the adoption of digital tools and flexible work models. However, this shift was uneven, with essential workers in healthcare, retail, and logistics facing higher risks, while others lost jobs entirely in sectors like hospitality and tourism (#0071,#de63,#af16). Critically, the pandemic also had profound environmental consequences. Air quality improved dramatically in many regions, with nitrogen dioxide levels dropping by up to 40% in China, 30% in U.S. cities, and 60% in parts of the U.K. due to reduced industrial activity and transportation (#9759). Water pollution decreased in areas such as India’s Dal Lake and the San Francisco Bay, where suspended particulate matter fell by 15.9% (#9759). Greenhouse gas emissions declined globally as lockdowns curtailed energy use and travel (#170b,#8824,#9759,#e1e1,#064d). Wildlife returned to urban areas, and seismic noise—measured by ground vibrations—decreased by up to 50% in some regions, reflecting reduced human activity (#9759). However, these positive trends were largely temporary. As economies reopened, emissions rebounded rapidly, underscoring the fragility of short-term environmental gains (#9759,#064d). Negative impacts included a surge in medical waste and plastic pollution, with approximately 75% of pandemic-related plastic waste—such as masks, gloves, and packaging—ending up in landfills or oceans (#9759). Improper disposal of personal protective equipment (PPE) contaminated ecosystems, while reduced monitoring in protected areas led to increased illegal deforestation, fishing, and wildlife poaching (#8824,#064d). Waste management systems were overwhelmed, and recycling efforts declined due to safety concerns, exacerbating pollution challenges (#064d). Experts emphasize that while the pandemic demonstrated nature’s resilience and the potential for rapid environmental improvement through reduced human activity, these changes are not sustainable without systemic policy changes. Addressing root causes like habitat loss, pollution, and climate change remains critical to preventing future crises (#170b,#8824,#da1e). This multifaceted impact underscores the pandemic’s role as a catalyst for both disruption and reflection, revealing vulnerabilities in global systems while offering lessons for building more resilient and equitable societies in the future. Question: what has the pandemic affected Response of Vanilla model: Response of preference-tuned model: The entire response becomes much longer and more detailed.
  12. 13 Education shifted to online learning, causing challenges such as

    poor internet connectivity and reduced interaction between students and teachers (#e0,#dc,#c4,#34). Education systems underwent a rapid shift to online learning, which introduced significant challenges. Students and educators grappled with poor internet connectivity, lack of access to devices, and diminished opportunities for collaborative learning. These issues disproportionately affected marginalized groups, widening educational gaps and raising concerns about long-term academic outcomes (#e0,#dc,#c4,#34). Question: what has the pandemic affected Vanilla (education): Preference-tuned (education): Work environments transitioned to remote settings, though many lost jobs in non-essential sectors (#00,#de,#af). Work environments transitioned to remote setups for many, accelerating the adoption of digital tools and flexible work models. However, this shift was uneven, with essential workers in healthcare, retail, and logistics facing higher risks, while others lost jobs entirely in sectors like hospitality and tourism (#00,#de,#af). Vanilla (work env.): Preference-tuned (work env.):
  13. 14 Education shifted to online learning, causing challenges such as

    poor internet connectivity and reduced interaction between students and teachers (#e0,#dc,#c4,#34). Education systems underwent a rapid shift to online learning, which introduced significant challenges. Students and educators grappled with poor internet connectivity, lack of access to devices, and diminished opportunities for collaborative learning. These issues disproportionately affected marginalized groups, widening educational gaps and raising concerns about long-term academic outcomes (#e0,#dc,#c4,#34). Question: what has the pandemic affected Vanilla (education): Preference-tuned (education): Work environments transitioned to remote settings, though many lost jobs in non-essential sectors (#00,#de,#af). Work environments transitioned to remote setups for many, accelerating the adoption of digital tools and flexible work models. However, this shift was uneven, with essential workers in healthcare, retail, and logistics facing higher risks, while others lost jobs entirely in sectors like hospitality and tourism (#00,#de,#af). Vanilla (work env.): Preference-tuned (work env.): The explanation becomes more detailed and specific, noting broader social implications
  14. 15 The pandemic has significantly impacted various aspects of society.

    Economically,... The pandemic has profoundly reshaped nearly every facet of human life and the natural world, with impacts spanning economic, social, educational, health, environmental, and geopolitical domains. Below is a detailed breakdown of these effects, supported by verified sources: Economically… Question: what has the pandemic affected Vanilla (opening): Preference-tuned (opening): The opening becomes more detailed and better organized. These multifaceted consequences underscore the profound and far-reaching nature of the pandemic’s influence across nearly every dimension of human life, from individual well-being to global systems. This multifaceted impact underscores the pandemic’s role as a catalyst for both disruption and reflection, revealing vulnerabilities in global systems while offering lessons for building more resilient and equitable societies in the future. Vanilla (ending): Preference-tuned (ending): The ending becomes more interpretive and assertive.
  15. Summary 16 We presented an open deep research system enhanced

    by DPO training with high-quality synthetic data and an improved search tool. Source Code: github.com/efficient-deep-research/efficient-deep-research Contact: [email protected] ikuyamada