The Agentic Engineer, Issue #1

February 25, 2026

TL;DR

A startup trained a computer-use model on 11 million hours of screen recordings. FDM-1 compresses 2 hours of video into 1M tokens. It does CAD, drives cars, and fuzzes websites.
Anthropic raised $30B at a $380B valuation. Run-rate revenue hit $14B. That's 10x annual growth for three consecutive years.
Your anonymous posts aren't anonymous anymore. Researchers showed LLM agents can match your Reddit and HN accounts to your real identity with high precision.

THE BIG ONE: Someone Finally Built a Computer Agent That Watches Video

Every computer-use agent you've seen works the same way. Take a screenshot. Send it to a vision model. Get back a click coordinate. Repeat.

It works for demos. It falls apart for anything that takes longer than 30 seconds.

SI Inc. just released FDM-1, and it breaks that pattern completely. Instead of processing screenshots one at a time, FDM-1 trains directly on video. Their custom video encoder compresses nearly 2 hours of 30fps footage into 1 million tokens. That's 100x more efficient than OpenAI's approach and 50x better than the previous best.

The training data is wild. They built an 11-million-hour dataset of screen recordings, then trained an inverse dynamics model to label every keystroke and mouse movement automatically. No human annotators needed. Think of it like OpenAI's Video PreTraining for Minecraft, but for every application on a computer.

The results speak for themselves. FDM-1 extrudes faces on an n-gon to make a gear in Blender. It drives a car using arrow keys after less than 1 hour of finetuning data. It fuzzes a mock banking app by exploring unique states to find bugs.

Here's why this matters for builders. Current computer-use agents (Claude's, OpenAI's) burn through context windows in seconds. They can't maintain state across a 10-minute task. FDM-1's 2-hour context window changes the math entirely. A model that can watch you work in Blender for 90 minutes understands your project. A model that sees 5 seconds of screenshots does not.

The catch: FDM-1 isn't open-source yet. SI Inc. is a small startup, and they're clearly positioning for enterprise deals in CAD, finance, and engineering. The demos are impressive but controlled.

I'm cautiously optimistic. The architecture is sound. Training on video instead of screenshots is obviously correct in hindsight.

But the bigger signal here is about data. The largest open computer-use dataset before this was 20 hours. SI Inc. built one that's 11 million hours. That's a 550,000x gap. If you're building computer-use agents on screenshot datasets, you're bringing a knife to a gunfight.

The VLM-screenshot-click loop had a good run. FDM-1 suggests the next generation of computer agents will think in video, not frames.

QUICK HITS

Anthropic's $30B War Chest

Anthropic closed a $30B Series G led by GIC and Coatue, valuing the company at $380B. Their run-rate revenue is $14B, growing 10x annually for three straight years. They also shipped Claude Opus 4.6 this month, claiming industry-leading performance on agentic coding and tool use. The money and the model together signal Anthropic is playing for keeps in the agent race.

Microsoft Quietly Sunsets AutoGen

If you're starting a new project with AutoGen, Microsoft now redirects you to their new "Agent Framework" instead. AutoGen (54.8k stars) will only receive bug fixes and security patches going forward. The repo's README says it plainly. This is a big deal for the 54k+ developers who built on it. Migration guides exist, but the message is clear: Microsoft is consolidating its agent strategy.

LLM Agents Can Deanonymize You at Scale

Researchers demonstrated that LLM agents can match anonymous Hacker News and Reddit accounts to real LinkedIn profiles. The method combines embedding search with LLM reasoning and scales to tens of thousands of candidates. From a handful of comments, the system infers your location, job, and interests, then finds you on the web. The paper is at arxiv.org/abs/2602.16800. If you post anonymously, assume that's temporary.

Google Ships Gemini 3 to Enterprise + Managed MCP Servers

Gemini 3 is now available on Vertex AI and Gemini Enterprise. Alongside it, Google launched managed MCP servers for Cloud databases and the GEAR (Gemini Enterprise Agent Ready) program for building agents at scale. They also cut Vertex AI latency by 35% using GKE Inference Gateway. Google is clearly betting that enterprise agent infrastructure is the real battleground.

PAPER BREAKDOWN: DEEPSYNTH

Paper: DEEPSYNTH: A Benchmark for Deep Information Synthesis (ICLR 2026)

Core insight, simply: Imagine asking an AI agent to figure out which of 67 countries had the highest renewable energy growth last quarter by pulling data from government websites, PDFs, and databases. That's what DEEPSYNTH tests. It's 120 tasks that require gathering information from multiple real-world sources, combining it, and reasoning to produce an answer. Not trivia. Actual analysis work.

The punchline: The best agents scored 8.97 F1. Out of 100. Eleven state-of-the-art models and deep research agents were tested. None cracked 18 on the LLM-judge metric. The tasks aren't impossible for humans. They're just hard enough to expose where agents actually break down: hallucinating sources, losing track of information across documents, and failing to reason over large data.

Why builders should care: If you're building agents that do research, analysis, or multi-step data gathering, this benchmark tells you exactly where the ceiling is right now. Your agent probably hallucinates when it needs to synthesize 5+ sources. DEEPSYNTH gives you 120 test cases to prove it (or fix it).

What you can do today: The benchmark is public. Run your agent against it. If you score above 20 F1, you're beating every frontier model tested. That's both a low bar and a useful one.

Time saved: 8 min read vs 45 min paper. 5.6x compression.

TOOL OF THE WEEK: CLIHub

MCP is great until you look at the token bill.

Every MCP session dumps the full tool catalog into your conversation as JSON Schema. With a typical setup (6 servers, 84 tools), that's 15,540 tokens before your agent does anything useful.

CLIHub converts MCP servers into CLIs. One command. Same tools, same OAuth, same API underneath. The difference: CLI lazy-loads tool definitions only when needed.

The numbers: MCP costs 15,540 tokens at session start. CLI costs 300. That's a 98% reduction upfront and 94% savings overall across typical usage.

It also beats Anthropic's Tool Search (which drops usage by 85% but is Anthropic-only). CLI works with any model.

# Install
npm install -g clihub

# Convert any MCP server to CLI
clihub convert @anthropic/mcp-notion --output notion

# Your agent discovers tools lazily
notion --help        # ~600 tokens, only when needed
notion search "q"    # ~6 tokens to execute

If you're running agents with more than 3 MCP servers, CLIHub pays for itself on the first call.

AGENT INDEX

Weekly star tracker for the frameworks that matter.

Framework	Stars	Trend
OpenClaw	229,402	Open-source leader
n8n	176,390	Dominant in no-code
Dify	130,362	Strong growth
LangChain	127,454	The incumbent
AutoGen	54,848	Maintenance mode now
Flowise	49,347	Steady
LlamaIndex	47,200	Pivoting to OCR/parsing
CrewAI	44,621	Enterprise push
Semantic Kernel	27,306	Microsoft's bet
LangGraph	25,108	LangChain's agent layer
Haystack	24,318	Quiet but solid
Vercel AI SDK	22,045	TypeScript-first
Mastra	21,406	Rising fast
OpenAI Agents SDK	19,139	New but growing
Strands SDK (AWS)	5,197	Early days

Notable moves: AutoGen entering maintenance mode is the biggest shift. Microsoft is pushing developers toward their new Agent Framework. Meanwhile, Mastra (21.4k) is quietly climbing as the TypeScript agent framework of choice alongside Vercel AI SDK. Strands SDK from AWS is still early at 5.2k stars but has native MCP support and multi-provider backing that could accelerate adoption.

HOT TAKE

The agent framework wars are already over. We just don't know the winner yet.

Look at the Agent Index. There are 15 frameworks on that list. Most of them do roughly the same thing: wrap an LLM, give it tools, run a loop. The differentiation is thin. LangChain vs CrewAI vs AutoGen vs Strands vs OpenAI Agents SDK. Pick your syntax flavor.

Honestly, the real winner will be whoever nails memory and state management first. Not tool calling. Not multi-agent orchestration. Memory. Because right now, every agent wakes up with amnesia every single session. The framework that solves persistent, queryable, cross-session memory becomes the default. LangChain's Agent Builder memory system blog post this week tells me they know it too.

The rest is plumbing.

Subscribe: theagenticengineer.beehiiv.com

I'm Nate Archer. AI engineer turned writer. I read the repos so you don't have to.