Roey Ben Chaim

Agents. Benchmarks. Systems. Repeat.

The Sandbox Wars

Mar 30, 202611 min read

Zero-days are the new normal.

Benchmarking LLM Inference: The Metrics That Actually Matter

Sep 25, 20257 min read

People love to brag about “low latency” or “optimized inference,” but unless you’re clear about what you’re measuring, those numbers are basically mea...

inferencebenchmarks

From RAG to Agentic Retrieval

Sep 22, 20256 min read

Large language models are great at reasoning, but terrible at remembering everything.

ragretrieval

From DDP to ZeRO-2 and ZeRO-3

Sep 20, 20254 min read

If you’ve ever trained or fine-tuned a large language model in PyTorch, you’ve probably started with Distributed Data Parallel (DDP).

trainingdistributed

Distributed Data Parallel (DDP) for Training Models

Sep 18, 20254 min read

Training large models on a single GPU can be painfully slow. PyTorch's Distributed Data Parallel (DDP) is the standard way to scale training across mu...

trainingdistributed

Training LLMs 101

Sep 17, 20255 min read

Large Language Models (LLMs) don’t start out as friendly assistants. They begin as vast, raw systems trained on enormous datasets—powerful but unpolis...

trainingfundamentals

Ray for LLM Inference

Sep 12, 20255 min read

Ray is a distributed execution engine. Its job is to take a messy cluster of machines and make it feel like one giant computer.

inferencedistributed

vLLM: LLM Inference That Doesn't Waste Your GPU

Sep 12, 20255 min read

vLLM is a library for running LLMs on GPUs. It is designed to be fast and efficient.

inferenceperformance

Three Practical Ways to Detect Sensitive Data

Sep 11, 20254 min read

Agents don’t just think — they move data between systems.

securityprivacy

Evals: How to Evaluate Agents

Sep 9, 20255 min read

Evaluating agents is messy. Traditional software is deterministic — same input, same output. Agents don’t work that way. They reason in loops, call to...

evalsagents

Why Multi-Agent Systems Matter

Sep 7, 20258 min read

MAS are emerging as a serious pattern for tackling the limits of single agents.

agentsarchitecture

From Zero to Agent: ReAct, Reflection, and Planning

Sep 6, 20257 min read

We've covered a lot of topics in the past few posts, but one thing that is missing is the concept of agents.

agentsreasoning

How Agents Remember: On Memory and the Art of Context Engineering

Sep 5, 20256 min read

When we talk about memory in LLM agents, we’re not talking about neurons or synapses — we’re talking about tokens, context windows, and clever hacks t...

agentsmemory

Structured Outputs in Practice: Instructor vs PydanticAI vs BAML

Sep 5, 20253 min read

In part one, I wrote about why structured outputs matter and why just asking an LLM to “return JSON” doesn’t cut it.

structured outputtools

Structured Output

Sep 4, 20254 min read

When you build with LLMs, you quickly run into a recurring issue:

structured outputfundamentals

Engineering Books

Aug 9, 20252 min read

Listing some technical books that I higly recommend (and actually read).

bookslearning