MIT licensed· open source· an Agent Skill

deckhand

Agent-native PowerPoint manipulation.

One CLI — deck.py — lets AI agents inspect, edit, create, and verify .pptx files with the fidelity of a human operator. Works in Claude Code, claude.ai, and anywhere an agent can run a shell command.

newwrite a slide as HTML, compile it into your deck — the browser measures, deck.py writes →

> /plugin marketplace add EveryInc/deckhand
EveryInc/deckhand

§ 01The problem

A .pptx is a zip of XML.
There were two bad ways in.

Agents that edit the XML directly hand-write OOXML — fragile, token-hungry, one namespace typo from a corrupt file. Agents that regenerate decks from scratch lose everything a template encodes. deckhand takes a third path.

Hand-write OOXML fragile

Unzip, edit slide3.xml raw, rezip. Burns thousands of tokens on angle brackets, and a single bad namespace produces a file PowerPoint refuses to open.

Regenerate from scratch lossy

Rebuild the deck with a generation library. Clean code — but the master, theme colors, layouts, and every designer decision the template encodes are gone.

Declarative patch deckhand

The agent writes a JSON patch describing intent — replace this text, swap that image. The tool validates everything, then executes it atomically.

§ 02The patch model

Say what should change.
Never touch the XML.

A patch is a list of operations against named shapes on numbered slides. One file, one apply, one atomic result.

patch.json
{"ops": [
  {"op": "replace-text",
   "scope": "master", "from": "Globex", "to": "Acme"},
  {"op": "set-text",
   "slide": 3, "shape": "s12", "text": ["Q3 results", "Tokens down 84%"]},
  {"op": "swap-image",
   "slide": 4, "shape": "s9", "image": "screenshot.png"},
  {"op": "duplicate",
   "slide": 5, "shape": "s31", "offset": [0, 1.2], "text": ["Fourth pillar"]}
]}

replace-text · scope: master

Rebrand the whole deck — masters, layouts, every slide — in one op.

set-text

Address a shape by its stable id. Each list item becomes a paragraph; formatting survives.

swap-image

New picture, same frame, same crop, same position. The layout never knows.

duplicate

Clone an existing styled shape, offset it in inches, rewrite its text. The template is the design system.

deck.py deck.pptx apply patch.json --fix --render img/
— validated first, applied atomically, linted after.

§ 03The differentiator

Built around how agents actually fail

The interesting part isn't that it's a CLI — it's that every design choice targets a known LLM failure mode.

/ FAILURE MODE: HALLUCINATED REFERENCES

Errors teach instead of scold

Agents guess shape ids. A bad reference doesn't return shape not found and force another round trip — it returns the slide's real shape inventory, so the agent self-corrects from the error message alone.

deck.py deck.pptx apply patch.json
PATCH REJECTED  2 validation error(s), nothing was modified:
  - op[0] set-text: shape 's9999' not found on slide 0.
shapes on slide 0:
  s16    PICTURE      [-1.25,-0.91 15.0x8.44in]  (image image3.png)
  s18    AUTO_SHAPE   [7.00,5.17 2.6x0.25in]  Session Management
  s19    TEXT_BOX     [0.60,1.00 4.0x0.4in]  USING CLAUDE CODE

/ FAILURE MODE: CASCADING PARTIAL STATE

All errors at once, atomically

A patch with nine mistakes returns nine actionable errors and writes zero bytes. Never a half-applied patch, never fix-one-rerun-find-the-next. The agent repairs everything in a single pass.

/ FAILURE MODE: NO SPATIAL SENSE

The linter watches the agent's hands

Agents can't feel when a text box overflows or two shapes collide. After every apply, the linter reports only the geometry problems the patch introduced — exact inch values, and the exact command that fixes them. No noise about pre-existing quirks.

/ FAILURE MODE: AGREEABLE OVERCONFIDENCE

Repair is honest

fix re-measures the slide after repairing it. Anything still broken is reported as residue — never claimed as fixed. The tool refuses to tell the agent what it wants to hear.

/ FAILURE MODE: BLIND EDITING

Verification is visual

Slides render to JPGs; the agent can crop and zoom any region to inspect a chart label or a kerning disaster up close. It looks at its work the way a human operator would — before anyone else has to.

§ 04The create path · new

Write the slide as a webpage.
Ship it as a slide.

Templates carry the brand, but free-form layout is where decks get designed — and HTML/CSS is the layout language agents are most fluent in. html2patch renders your HTML in a headless browser, reads back every measured box and computed style, and compiles it into an ordinary deck.py patch. The browser is a measuring engine. deck.py stays the only writer.

slide.html → patch.json → deck.pptx
<h1>The deck builds <em>itself</em> now.</h1>
<div class="card">…gradient, radius, flexbox…</div>
<img style="object-fit:cover" src="chart.png">

$ python html2patch.py slide.html --deck deck.pptx
  → patch.json: 23 ops (add-slide, add-shape, add-picture…)
$ python deck.py deck.pptx apply patch.json --render img/
  ✓ applied atomically · linted · rendered for your own eyes

One writer

The output is a patch, not a second file format. A compiled slide is born with the same shape ids, lint coverage, fix loop, and diff as every edit — the agent can nudge one box seconds after creating it, with an ordinary op.

Creates into your template

The patch can add-slide with a layout from your branded master and place the HTML-measured shapes onto it. Free-form design inside the client's own deck — the thing generate-a-new-file architectures structurally cannot do.

Drift is caught, not hoped away

Browsers and PowerPoint wrap text slightly differently. The post-apply linter re-measures the real deck and reports any overflow with the exact fix — the existing safety net covers the create path with zero new machinery.

Gradients, tables, crops — compiled, not approximated

CSS gradients become native gradient fills. Tables keep per-cell fills and measured column widths. object-fit: cover becomes a true picture crop. <ol> numbers, rotation rotates, padding becomes text insets.

stress-tested against the browser, pixel by pixel — and head-to-head against the best-known HTML-to-PPTX engine on its own constraint set: closer to the browser's render, and where that engine reported success while silently dropping an entire table, deckhand carried it.

§ 05Coverage

The whole lifecycle of a deck

Read

Full deck inventory — slides, shapes, geometry, text, formatting runs, speaker notes — as structured output an agent can reason over.

Edit

Text, images, formatting, position, borders, fills, hyperlinks, alt text, slide backgrounds, theme colors and fonts, document metadata — every mutation through the same validated patch pipeline.

Create

New slides from the template's own layouts, new shapes cloned from styled ones — or write the slide as HTML and compile it in. The brand survives by construction.

Structure

Add, delete, reorder, and duplicate slides; merge decks; add and remove table rows and columns. Work across masters and layouts, not just the surface.

Verify

Render to images, crop, zoom, lint geometry, diff before/after. Trust is earned per-edit, not assumed.

Escape hatch

When the patch model can't express it, drop to raw OOXML deliberately — with the tooling still validating the result.

Out of scope, honestly: native chart objects, animations, and slide transitions. deckhand does the 95% that client decks are actually made of, and says so.

§ 06The comparison

Same job. Different machinery.

Anthropic ships a pptx skill, and parts of it are genuinely good — the design guidance reads like a designer wrote it. The difference is architectural: their guardrails are advice to the model. deckhand's are machinery in the tool — and advice can be ignored on exactly the slide that matters. We didn't leave that as an opinion — we raced them, blind-judged, four rounds — § 07.

the moment
Anthropic's pptx skill
deckhand
Read the deck
markitdown text dump; thumbnails for picking layouts. Geometry and formatting live in the raw XML.
Full inventory — ids, geometry in inches, formatting runs, image names, detected issues. --brief: one line per shape.
Make an edit
Unpack the zip, hand-edit slide3.xml raw — the skill's own docs list the OOXML pitfalls to dodge while you type angle brackets.
Declare the op. set-text inherits the old text's formatting; swap-image keeps frame and crop. The XML is the tool's problem.
Get it wrong
Validation runs at pack time — after all the editing. Errors arrive late, one repack per discovery, and the tree on disk is already half-edited.
Patch rejected before a byte is written — every error at once, with the slide's real shape inventory in the message. The agent repairs in one pass.
Text overflows its box
Nothing notices. The deck ships with the overflow.
The linter reports it in inches with the exact fix command — and after repair, anything still broken is reported as residue, never claimed fixed.
Re-theme the deck
Unpack and hand-edit hex values across every slide's XML, repack, hope nothing else changed.
One replace-color op per palette mapping, one atomic patch. A missed color errors out listing the deck's actual palette, with counts.
Create from scratch
Write PptxGenJS code. The output is a new file on a blank theme — your template's master, layouts, and brand aren't in the room.
Write the slide as HTML; html2patch compiles it into ordinary ops — including straight into your branded template's layouts.
Verify the result
Thumbnail grids for template analysis; for real QA, assemble your own soffice + pdftoppm pipeline.
Built in: render any slide, crop and zoom any region, diff the structural changelog, lint the geometry. Trust is earned per edit.
Who owns it
Proprietary license — no derivative works, no redistribution.
MIT. Fork it, extend it, ship it inside your own product, run it on any model, any platform.

If you're the agent

You never hold raw XML in context, never burn tokens re-reading a deck to find a shape, and every mistake comes back as a correction you can act on — in one round trip.

If you're the human

The file that comes back always opens, is never half-edited, and still looks like your deck — because the edits ran inside your template instead of around it.

If you're the platform

It's a CLI under an MIT license. Wire it into any agent product — Claude, or anything else that can run a shell command — and keep the verification loop.

§ 07The benchmark · new

So we made them race

Claims are cheap, so we ran the experiment: same brief, same model, same plan-build-review budget — one agent on deckhand, one on Anthropic's pptx skill. Three from-scratch decks and one heavy edit task, every round judged blind by three independent judges with randomized labels and rotated viewing order. Every finding the judges produced shipped as a release — most within hours.

the agent on deckhand

works like a designer

One slide at a time: write it as HTML, compile it into the deck, render, look at it, patch what's wrong, look again. In one run that was 87 tool calls and 27 separate looks at its own renders — and when a fix was needed, it was a two-line patch op, never a rebuild.

per-slide loop · renders inspected: 27 · fixes: surgical ops

the agent on the pptx skill

works like a coder

One artifact: a 40KB build.js, written nearly blind, run once, then thumbnails at the end. Faster and cheaper when everything works — but every fix means editing the program and regenerating the deck, and what the thumbnails miss, ships.

single build script · writes: 2 · fixes: edit + regenerate

Round 1 · create
finding: one overridden lint flag
The judges' sharpest catch: one clipped serif headline. The linter had flagged it — the agent overrode the flag as a "false positive" without zooming in. The safety net worked; the agent talked itself past it. That's a guidance bug, and guidance we control. v2.2.0 — text-under-picture lint (covered_by), wider serif re-wrap margins, and a rule: no flag is "false positive" without a render --crop zoom
Round 2 · create
judges: "near-flawless mechanics"
Zero layout defects — the round-1 failure class was already extinct. The judges' only catch was a single compiler bug: <br> inside a table cell compiled to nothing, so one table read "Who decideswhat runs next." v2.2.1 — <br> survives as a real line break, with a regression test
Round 3 · create
judges: 2–1 deckhand, two CLEAR verdicts
9/9/9-grade grid discipline, no overflows, no collisions, "airtight" typography — won on exactly the dimension round 1 slipped on. The pptx-skill deck carried edge-bleed artifacts on three slides and a footer collision on its own closer — the same failure modes as rounds 1 and 2, because nothing in a prompt-guidance toolchain learns between rounds.
Round 4 · edit
forensic audit: "the cleaner edit by the strict letter" — grade A
Each agent re-themed its own deck to a 9-color palette and inserted a slide mid-narrative. Ours delivered zero collateral damage, coherent renumbering, and a recolor down to the macOS traffic-light dots. The finding that mattered: the agent had to hand-roll a recolor script, because deck.py had no color op. v2.3.0 — replace-color: a re-theme is now a handful of declarative ops in one atomic patch, with zero-hit errors that list the deck's actual palette

Four rounds, one unmistakable pattern: every finding became machinery — a lint, a compiler fix, a new op — and that failure class never appeared again. The pptx skill's defects recurred every round, because advice to a model can't accumulate. Tools learn; prompts don't. That's the bet this whole project is built on.

§ 08Provenance

We use this every week

deckhand wasn't built as a demo. We built it to make our own decks — every training Every Consulting runs ships with a branded deck, and our agents build them with deckhand: same master template, same brand, same review standards, with agents doing the assembly, the edits, and the visual QA.

And the hard parts were learned on client work. With one client whose deck production ran on human hours — a person spent hours on every deck, and the team spent hundreds of hours — we ran an earlier version of this pipeline. It surfaced every way agents fail at PPTX, and those failures became this tool's design. The failure-mode list above isn't theory; it's field notes. Now it's yours.

decks were one workflow. automating your team's work is literally what we do.

§ 09Install

Three ways in. Pick yours.

deckhand is packaged as an Agent Skill — but underneath it's just a CLI, so any agent that can run a shell command can use it.

Claude Code

One command

Add the marketplace and the skill is available in every project, from any working directory.

> /plugin marketplace add EveryInc/deckhand
> /plugin install deckhand@deckhand

claude.ai

Upload the skill

Zip the skill folder and upload it under Settings → Capabilities. Claude picks it up whenever a deck shows up.

$ zip -r deckhand.zip skills/deckhand
# then upload in Settings → Capabilities

Any agent

It's just a CLI

Clone the repo, point your agent at the docs, and let it drive deck.py directly.

$ git clone https://github.com/EveryInc/deckhand
$ python deckhand/skills/deckhand/scripts/deck.py docs