每日研究简报 2026-04-25

每日研究简报 2026-04-25

📅 生成时间:2026-04-25 22:50 (Asia/Shanghai) | 数据来源:arXiv · GitHub · HackerNews · 科技媒体 · 大厂博客


📄 一、arXiv 最新论文

1. Seeing Fast and Slow: Learning the Flow of Time in Videos

  • 方向:arXiv/CV
  • 摘要:How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in …
  • 推荐原因:计算机视觉在感知层的突破持续,尤其是与深度学习的结合带来了精度大幅提升。
  • 链接:https://arxiv.org/abs/2604.21931v1

2. Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

  • 方向:arXiv/CV
  • 摘要:Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framewor…
  • 推荐原因:模型安全评测是保证大模型可靠部署的基础。
  • 链接:https://arxiv.org/abs/2604.21926v1

3. Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

  • 方向:arXiv/LG
  • 摘要:Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal partitioning. We argue that this temporal taskification step is not a neutral preprocessing choice, but a structural component of evaluation: different valid splits of the same stream can induce different CL regimes and therefore different benchmark conclusions. To study this…
  • 推荐原因:端侧视频编解码的 AI 化正在改变流媒体传输效率。
  • 链接:https://arxiv.org/abs/2604.21930v1

4. Fine-Tuning Regimes Define Distinct Continual Learning Problems

  • 方向:arXiv/LG
  • 摘要:Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable parameter subspace, is itself a key evaluation variable. We formalize adaptation…
  • 推荐原因:模型安全评测是保证大模型可靠部署的基础。
  • 链接:https://arxiv.org/abs/2604.21927v1

5. When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

  • 方向:arXiv/AI
  • 摘要:Despite impressive progress in capabilities of large vision-language models (LVLMs), these systems remain vulnerable to hallucinations, i.e., outputs that are not grounded in the visual input. Prior work has attributed hallucinations in LVLMs to factors such as limitations of the vision backbone or the dominance of the language component, yet the relative importance of these factors remains unclea…
  • 推荐原因:是近期技术趋势的典型案例,有一定代表性。
  • 链接:https://arxiv.org/abs/2604.21911v1

6. From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation

  • 方向:arXiv/AI
  • 摘要:Scientific workflow systems automate execution – scheduling, fault tolerance, resource management – but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an agentic architecture that closes this gap through three layers: an LLM interprets…
  • 推荐原因:Web/代码 Agent 落地加速,工程实践价值显著。
  • 链接:https://arxiv.org/abs/2604.21910v1

7. Evaluation of Automatic Speech Recognition Using Generative Large Language Models

  • 方向:arXiv/CL
  • 摘要:Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human perception, but decoder-based Large Language Models (LLMs) remain underexplored for this task. This paper evaluates their relevance through three approaches: (1) selecting the best hypothesis between tw…
  • 推荐原因:视觉-语言模型(VLM)弥合了感知与理解之间的鸿沟,是当前最活跃的研究方向之一。
  • 链接:https://arxiv.org/abs/2604.21928v1

8. MathDuels: Evaluating LLMs as Problem Posers and Solvers

  • 方向:arXiv/CL
  • 摘要:As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they cast models solely as solvers of fixed problem sets. We introduce MathDuels, a self-play benchmark in which models occupy dual roles: each authors math problems under adversarial prompting and solves pr…
  • 推荐原因:模型安全评测是保证大模型可靠部署的基础。
  • 链接:https://arxiv.org/abs/2604.21916v1

🌟 二、GitHub 热门项目

1. kyegomez/OpenMythos

  • Stars:⭐ 10,274 · Python
  • 简介:A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
  • 推荐原因:从社区反馈来看有一定价值,可以快速浏览。
  • 链接:https://github.com/kyegomez/OpenMythos

2. alchaincyf/huashu-design

  • Stars:⭐ 6,366 · HTML
  • 简介:Huashu Design · HTML-native design skill for Claude Code · Claude Code 里 HTML 原生的设计 skill · 高保真原型 / 幻灯片 / 动画 + 20 设计哲学 + 5 维评审 + MP4 导出 · Agent-agnostic
  • 推荐原因:多 Agent 协作是今年最活跃的研究方向之一,展示了 AI 系统自动化的新可能。
  • 链接:https://github.com/alchaincyf/huashu-design

3. ConardLi/garden-skills

  • Stars:⭐ 1,182 · CSS
  • 简介:An AI agent skill that transforms AI-generated web pages from “functional” to “stunning."(Inspired by Claude Design)
  • 推荐原因:多 Agent 协作是今年最活跃的研究方向之一,展示了 AI 系统自动化的新可能。
  • 链接:https://github.com/ConardLi/garden-skills

4. leigest519/OpenGame

  • Stars:⭐ 1,057 · TypeScript
  • 简介:OpenGame: Open Agentic Coding for Games
  • 推荐原因:多 Agent 协作是今年最活跃的研究方向之一,展示了 AI 系统自动化的新可能。
  • 链接:https://github.com/leigest519/OpenGame

5. cosmicstack-labs/mercury-agent

  • Stars:⭐ 891 · TypeScript
  • 简介:Soul-driven AI agent with permission-hardened tools, token budgets, and multi-channel access. Runs 24/7 from CLI or Telegram.
  • 推荐原因:Agent 是当前 AI 工程化的核心方向,代表了大模型从「对话」到「执行」的关键跃迁。
  • 链接:https://github.com/cosmicstack-labs/mercury-agent

6. ZeroZ-lab/cc-design

  • Stars:⭐ 630 · JavaScript
  • 简介:High-fidelity HTML design and prototype guidance skill for AI agents
  • 推荐原因:是近期技术趋势的典型案例,有一定代表性。
  • 链接:https://github.com/ZeroZ-lab/cc-design

7. GammaLabTechnologies/harmonist

  • Stars:⭐ 521 · Python
  • 简介:Portable AI agent orchestration with mechanical protocol enforcement. 186 agents, zero runtime dependencies.
  • 推荐原因:Agent 是当前 AI 工程化的核心方向,代表了大模型从「对话」到「执行」的关键跃迁。
  • 链接:https://github.com/GammaLabTechnologies/harmonist

8. TheRealSeanDonahoe/agents-md

  • Stars:⭐ 510
  • 简介:Drop-in AGENTS.md that makes every coding agent behave like a senior engineer instead of an eager intern. Kills sycophancy, stops drive-by refactors, forces verification loops. Synthesizes Karpathy’s
  • 推荐原因:多 Agent 协作是今年最活跃的研究方向之一,展示了 AI 系统自动化的新可能。
  • 链接:https://github.com/TheRealSeanDonahoe/agents-md

📰 三、AI 科技媒体 & 大厂博客

1. Meta’s loss is Thinking Machines’ gain

  • 来源:TechCrunch AI · AI 媒体
  • 摘要:Meta has been poaching talent from Thinking Machines Lab. But it’s a two-way street.
  • 推荐原因:提供了一些新的思路和视角,可以扩展知识面。
  • 链接:https://techcrunch.com/2026/04/24/metas-loss-is-thinking-machines-gain/

2. ComfyUI hits $500M valuation as creators seek more control over AI-generated media

  • 来源:TechCrunch AI · AI 媒体
  • 摘要:ComfyUI, whose tools give creators more control over AI image, video, and audio generation, just raised $30 million.
  • 推荐原因:TTS(文本转语音)在情感和表现力上的突破,使 AI 更具可用性。
  • 链接:https://techcrunch.com/2026/04/24/comfyui-hits-500m-valuation-as-creators-seek-more-control-over-ai-generated-media/

3. Google to invest up to $40B in Anthropic in cash and compute

  • 来源:TechCrunch AI · AI 媒体
  • 摘要:Google plans up to $40B investment in Anthropic as AI rivals race to secure massive compute capacity, following the limited release of its powerful, cybersecurity-focused Mythos model.
  • 推荐原因:AI 安全和对齐问题日益突出,评估体系和防护手段是重要研究方向。
  • 链接:https://techcrunch.com/2026/04/24/google-to-invest-up-to-40b-in-anthropic-in-cash-and-compute/

4. Discord Sleuths Gained Unauthorized Access to Anthropic’s Mythos

  • 来源:Wired AI · AI 媒体
  • 摘要:Plus: Spy firms tap into a global telecom weakness to track targets, 500,000 UK health records go up for sale on Alibaba, Apple patches a revealing notification bug, and more.
  • 推荐原因:引发了不少讨论,值得了解业界观点。
  • 链接:https://www.wired.com/story/security-news-this-week-discord-sleuths-gained-unauthorized-access-to-anthropics-mythos/

5. Ace the Ping-Pong Robot Can Whup Your Ass

  • 来源:Wired AI · AI 媒体
  • 摘要:Ace can read the trajectory of a ball, adjust the racket angle, and respond with strokes that keep the exchange alive with real players.
  • 推荐原因:是当前热门方向之一,了解一下没坏处。
  • 链接:https://www.wired.com/story/ace-the-robot-wants-to-become-the-world-table-tennis-champion/

6. AI-Designed Drugs by a DeepMind Spinoff Are Headed to Human Trials

  • 来源:Wired AI · AI 媒体
  • 摘要:Isomorphic Labs president Max Jaderberg said at WIRED Health in London that the startup has built a “broad and exciting pipeline of new medicines.”
  • 推荐原因:从社区反馈来看有一定价值,可以快速浏览。
  • 链接:https://www.wired.com/story/wired-health-2026-how-ai-is-powering-drug-discovery-max-jaderberg/

7. Why are top university websites serving porn? It comes down to shoddy housekeeping.

  • 来源:Ars Technica · AI 媒体
  • 摘要:Hundreds of subdomains from dozens of universities have been hijacked by scammers.
  • 推荐原因:从社区反馈来看有一定价值,可以快速浏览。
  • 链接:https://arstechnica.com/security/2026/04/why-are-top-university-websites-serving-porn-it-comes-down-to-shoddy-housekeeping/

8. In a first, a ransomware family is confirmed to be quantum-safe

  • 来源:Ars Technica · AI 媒体
  • 摘要:Technically speaking, there’s no practical benefit to use PQC. So why is it being used?
  • 推荐原因:模型安全评测是保证大模型可靠部署的基础。
  • 链接:https://arstechnica.com/security/2026/04/now-even-ransomware-is-using-post-quantum-cryptography/

9. Microsoft issues emergency update for macOS and Linux ASP.NET threat

  • 来源:Ars Technica · AI 媒体
  • 摘要:When authentication fails, things can go very, very wrong.
  • 推荐原因:提供了一些新的思路和视角,可以扩展知识面。
  • 链接:https://arstechnica.com/security/2026/04/microsoft-issues-emergency-update-for-macos-and-linux-asp-net-threat/

10. Three reasons why DeepSeek’s new model matters

  • 来源:MIT Tech Review · AI 媒体
  • 摘要:On Friday, Chinese AI firm DeepSeek released a preview of V4, its long-awaited new flagship model. Notably, the model can process much longer prompts than its last generation, thanks to a new design t
  • 推荐原因:HN 社区讨论热度高,反映了开发者社区的真实关注点。
  • 链接:https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/

11. The Download: supercharged scams and studying AI healthcare

  • 来源:MIT Tech Review · AI 媒体
  • 摘要:This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. We’re in a new era of AI-driven scams When ChatGPT was release
  • 推荐原因:技术实现有一定参考价值,可借鉴到类似项目中。
  • 链接:https://www.technologyreview.com/2026/04/24/1136400/the-download-supercharged-scams-questionable-ai-healthcare/

12. Health-care AI is here. We don’t know if it actually helps patients.

  • 来源:MIT Tech Review · AI 媒体
  • 摘要:I don’t need to tell you that AI is everywhere. Or that it is being used, increasingly, in hospitals. Doctors are using AI to help them with notetaking. AI-based tools are trawling through patient rec
  • 推荐原因:内容偏向工程实践,对实际项目有一定帮助。
  • 链接:https://www.technologyreview.com/2026/04/24/1136352/health-care-ai-dont-know-actually-helps-patients/

🔥 四、HackerNews 近 48h 热门

1. DeepSeek v4

  • 热度:1987 points · 💬 1516 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://api-docs.deepseek.com/news/news260424
  • HN 讨论:https://news.ycombinator.com/item?id=47884971
  • 高赞评论(原文+中文): · hodgehog11 :There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting. I have a collection of novel probability and statistics problems at the masters and PhD level with varying… → 这里有很多关于基准测试和编码性能的评论。我想就其在活跃的研究环境中处理数学问题的能力提供一些意见。我在硕士和博士水平上有一系列新颖的概率和统计问题,这些问题各不相同…… · throwa356262 :Seriously, why can’t huge companies like OpenAI and Google produce documentation that is half this good?? https://api-docs.deepseek.com/guides/thinking_mode No BS, just a concise description of exactly what I need to write my own agent. → 说真的,为什么像OpenAI和谷歌这样的大公司不能提供一半好的文档? ? https://api-docs.deepseek.com/guides/thinking_mode没有BS ,只是简明扼要地描述了我自己编写代理所需的内容。 · orbital-decay :>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead Pretty cool, I think they’re the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it… → >我们以最小的性能开销实现端到端、按位批量不变和确定性内核相当酷,我认为它们是第一个使用固定种子或在温度为0时保证确定性的内核。谷歌接近了,但从未保证过AFAIK。DeepSeek展示了他们的根源–它……

2. GPT-5.5

  • 热度:1552 points · 💬 1032 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://openai.com/index/introducing-gpt-5-5/
  • HN 讨论:https://news.ycombinator.com/item?id=47879092
  • 高赞评论(原文+中文): · tedsanders :Just as a heads up, even though GPT-5.5 is releasing today, the rollout in ChatGPT and Codex will be gradual over many hours so that we can make sure service remains stable for everyone (same as our previous launches). You may not see it right away, and if you don’t, try again later in the day. We… → 正如注意事项一样,即使GPT-5.5今天发布, ChatGPT和Codex的推出也将在许多小时内逐步进行,以便我们能够确保每个人都能保持稳定的服务(与我们之前的发布相同)。您可能不会立即看到它,如果没有,请在当天晚些时候再试一次。我们…… · simonw :This doesn’t have API access yet, but OpenAI seem to approve of the Codex API backdoor used by OpenClaw these days… https://twitter.com/steipete/status/2046775849769148838 and https://twitter.com/romainhuet/status/2038699202834841962 And that backdoor API has GPT-5.5. So here’s a pelican:… → 这还没有API访问权限,但OpenAI似乎批准了OpenClaw最近使用的Codex API后门… https://twitter.com/steipete/status/2046775849769148838和https://twitter.com/romainhuet/status/2038699202834841962 ,该后门API具有GPT-5.5。这里有一只鹈鹕: …… · jfkimmes :Everyone talked about the marketing stunt that was Anthropic’s gated Mythos model with an 83% result on CyberGym. OpenAI just dropped GPT 5.5, which scores 82% and is open for anybody to use. I recommend anybody in offensive/defensive cybersecurity to experiment with this. This is the real data… → 每个人都在谈论营销噱头,这是Anthropic的门控神话模型,在CyberGym上的结果为83%。OpenAI刚刚放弃了GPT 5.5 ,得分为82% ,任何人都可以使用。我建议任何从事进攻性/防御性网络安全的人尝试一下。这是真实的数据……

3. An update on recent Claude Code quality reports

  • 热度:925 points · 💬 710 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://www.anthropic.com/engineering/april-23-postmortem
  • HN 讨论:https://news.ycombinator.com/item?id=47878905
  • 高赞评论(原文+中文): · dgreensp :This reveals a staggering level of incompetence, if that’s really all it is, and lack of transparency. They don’t have ANY product-level quality tests that picked this up? Many users did their own tests and published them. It’s not hard. And these users’ complaints were initially dismissed. I don’t… → 这揭示了惊人的无能,如果这就是全部,以及缺乏透明度。他们没有任何产品级别的质量测试来证明这一点?许多用户自己做了测试并发布了它们。这并不难。这些用户的投诉最初被驳回。我不…… · 6keZbCECT2uB :“On March 26, we shipped a change to clear Claude’s older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem… → “3月26日,我们发布了一项更改,从闲置了一个多小时的会话中清除Claude的旧思维,以减少用户恢复这些会话时的延迟。在课程的剩余时间里,一个错误导致这种情况不断发生,而不仅仅是一次,这让克劳德看起来…… · cmenge :Bit surprised about the amount of flak they’re getting here. I found the article seemed clear, honest and definitely plausible. The deterioration was real and annoying, and shines a light on the problematic lack of transparency of what exactly is going on behind the scenes and the somewhat… → 对他们到达这里的高射炮的数量感到有点惊讶。我发现这篇文章似乎清晰,诚实,绝对合理。这种恶化是真实而令人讨厌的,它揭示了幕后究竟发生了什么以及某种程度上……缺乏透明度的问题。

4. I cancelled Claude: Token issues, declining quality, and poor support

  • 热度:905 points · 💬 536 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://nickyreinert.de/en/2026/2026-04-24-claude-critics/
  • HN 讨论:https://news.ycombinator.com/item?id=47892019

5. Meta tells staff it will cut 10% of jobs

  • 热度:791 points · 💬 859 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://www.bloomberg.com/news/articles/2026-04-23/meta-tells-staff-it-will-cut-10-of-jobs-in-push-for-efficiency
  • HN 讨论:https://news.ycombinator.com/item?id=47879986

6. Google plans to invest up to $40B in Anthropic

  • 热度:683 points · 💬 672 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://www.bloomberg.com/news/articles/2026-04-24/google-plans-to-invest-up-to-40-billion-in-anthropic
  • HN 讨论:https://news.ycombinator.com/item?id=47892074

7. US special forces soldier arrested after allegedly winning $400k on Maduro raid

  • 热度:673 points · 💬 726 comments
  • 推荐原因:HN 热门文章,热度很高,强烈推荐。
  • 链接:https://www.cnn.com/2026/04/23/politics/us-special-forces-soldier-arrested-maduro-raid-trade
  • HN 讨论:https://news.ycombinator.com/item?id=47882645

8. Sabotaging projects by overthinking, scope creep, and structural diffing

  • 热度:476 points · 💬 115 comments
  • 推荐原因:HN 讨论热烈(115 条评论),社区关注度高。
  • 链接:https://kevinlynagh.com/newsletter/2026_04_overthinking/
  • HN 讨论:https://news.ycombinator.com/item?id=47890799

📚 深读推荐

类型 标题 方向 备注 链接
📄 论文 Seeing Fast and Slow: Learning the Flo… CV arXiv
📄 论文 Seeing Without Eyes: 4D Human-Scene Un… CV arXiv
📄 论文 Temporal Taskification in Streaming Co… LG arXiv
📄 论文 Fine-Tuning Regimes Define Distinct Co… LG arXiv
📄 论文 When Prompts Override Vision: Prompt-I… AI arXiv
🌟 项目 kyegomez/OpenMythos GitHub Python GitHub
🌟 项目 alchaincyf/huashu-design GitHub HTML GitHub
🌟 项目 ConardLi/garden-skills GitHub CSS GitHub
🔥 热帖 DeepSeek v4 HN 1987 pts HN
🔥 热帖 GPT-5.5 HN 1552 pts HN
🔥 热帖 An update on recent Claude Code qualit… HN 925 pts HN
🔥 热帖 I cancelled Claude: Token issues, decl… HN 905 pts HN
🔥 热帖 Meta tells staff it will cut 10% of jo… HN 791 pts HN
📑 目录