The Marketplace SDK Dogfood Loop
Ten numbered runs hardened Marketplace skills; Run 10 (PageShot) shipped first in Pages. Later apps dogfed the same learnings—patches, scale proof, or regression when skills already matched.
I — The shape of the approach
Skills aren't trustworthy until something real has been built against them. Each numbered run (Run 01–Run 10 in the ledger below) was a fresh scaffold that exercised one extension point end-to-end, surfaced gaps as patches, then re-fed the next run.
Run 10 is PageShot — the last row in that ledger and the first production run in this narrative: the first app taken from PRD through /ship inside a real Sitecore Pages tenant, built against the patched Marketplace skills. The ships that followed — QuickCopy, Component Atlas, and Last-Edit Trail — are not Run 11–13. They build on Run 10 and everything before it (same skill files, same pipeline, new PRDs) and each one still dogfoods the process: a real scaffold, a real iframe, real tenant traffic.
Did they all add value? Yes — dogfood value is not only “new patches.” QuickCopy returned catalogue patches when production diverged from what tests encoded (the response-shape class of bugs). Component Atlas returned scale and coverage — two extension points, tenant-wide graphs, heavier xmc.agent.* use. Last-Edit Trail returned regression proof: zero new skill patches, meaning the skills already matched reality for that build. Those are different signals; all three strengthen the loop.
II — The runs, one by one
Every row is a real Marketplace app build. Each one had to do something new — a new extension point, a new scaffold, a new module — so the skills had something fresh to fail against.
III — Inside the loop
The phases above answer what ran. This section answers how — the mechanism a critic should be able to reproduce.
How a finding becomes a patch
Every run follows the same ten-step contract: prep → execute the skill as written → instrument every SDK call site with structured request / ok / error logs → agent-side gates (typecheck, build, lint, unit tests, no escape hatches) → user-side checklist → record in the matching catalog → enqueue patch candidates → apply patches to the skill files → update ledger → next run.
The artefact is the patch. Each one gets an ID and lands in a specific section of a specific skill, with a one-line lesson. The ledger records which run applied it and which deferred — with reason. That is the audit trail for the "42 patches" headline number — they are addressable, not aggregate.
Graduation gates
Four explicit sequencing rules — none implicit, none "when it feels ready".
| Gate | Rule |
|---|---|
| Synthetic → Live | An M-run must be done (agent-verified) before its matching L-run starts. Mixing code-gap signal with runtime-gap signal kills causality. |
| One run per sitting | Execute, record, patch. No batching. The next run is the regression test for the previous one's patches. |
| Re-dogfood cuts the line | An SDK bump or a freshly-applied patch fires a re-dogfood trigger. The triggered run runs before the planned next run. |
| Patch budget | More than five patch candidates in a single run pauses the phase until they're applied — otherwise the causal trail breaks. |
What we sort findings into
Findings split three ways at recording time, so live signal stays separable from code signal:
- Agent-verified — passes typecheck / build / lint / unit tests; call shapes match skill files verbatim; no forbidden casts.
- User-verified — Christian confirmed the Sitecore-side effect in SitecoreAI (item renamed, field updated, canvas refreshed).
- Live-observed — only visible with Claude tailing the dev-server log while Christian drove the portal at the same time.
Patches themselves bin into six categories: doc-gap, anti-pattern,
type sloppiness (wrapper or upstream fix), architectural assumption
(often a multi-skill rewrite — Run 7 rewrote three), sandbox / runtime
constraint (PageShot's iframe <a download> block), upstream SDK bug
(tracked + deferred, never silently absorbed).
Who does what in the live phase
The topology is fixed and load-bearing.
- Claude starts
npm run devas a background process, tails the dev-server log, confirms the route is reachable, and reports each expected log line as it fires. - Christian drives the Cloud Portal and Sitecore UI, performs the scripted actions one at a time, copies devtools console lines into chat when needed.
- Both must agree a discrepancy is real before it becomes a patch candidate.
The harness UI standard (per-test card, sticky log panel, manual-check checkboxes, explicit Init / Reset / Destroy buttons) is what makes this observable in real time. Skipping it isn't a shortcut — it makes the live phase blind, and a future run has to re-do the work with the harness in place.
Regression coverage, in two words
Re-dogfood. Every patch survives to the next qualifying run or it gets ripped out. A patch that doesn't survive a fresh scaffold is worse than no patch — it's a false sense of safety. That's why Run 2 exists. QuickCopy's scaffold pass showed zero friction and validated earlier patches at compile time — then production still surfaced new unwrap lessons, which is also the loop closing, one layer deeper.
IV — The production apps
Each row below went through the full /create-prd → /architect →
/task-breakdown → /implement → /code-review → /test → /document →
/ship pipeline and shipped inside a real Sitecore Pages editor against a real
tenant. PageShot is Run 10 — the first production dogfood ship and the bridge
out of the ten-run table in section II. QuickCopy, Component Atlas, and Last-Edit
Trail are later dogfood ships that apply the same learnings (patched skills,
conventions, CI gates); they are not extra numbered ledger rows, but they still
exercise the methodology every time — evidence over sprint cycles.
V — What it added up to
Patch and catalogue numbers below are from the skill ledger snapshot 2026-04-27. The four production apps above are the narrative continuation — new ships can add patches or add none, but they all extend the same feedback loop.
the catalogue
across all runs
fully exercised
pending
VI — What the loop actually taught us
The patches are the artefact. These are the ideas that crystallised during the ten numbered runs and stayed true through the production apps that followed.
Skills aren’t real until something fails against them.
Run 1 produced 11 doc-gaps in a single scaffold. Reading the docs would never have surfaced any of them — you have to build against them with type checking on.
Re-dogfood is the contract.
Every applied patch fires a re-dogfood trigger on the next qualifying run. A patch that doesn’t survive a fresh scaffold is worse than no patch — it’s a false sense of safety.
Live co-execution catches what code-only can’t.
The Standalone umbrella model, the AI payload-size ceiling, and the iframe sandbox trap all required a human + agent in the portal at the same time. None would have shown up under a type-check or a unit test.
Wrong assumptions are the highest-value bugs.
Run 7’s “Standalone has no XMC” framing was wrong. Catching it before a customer did — and rewriting three skills as a consequence — was worth more than any of the syntax patches.
The harness pattern was reusable.
Per-test cards, sticky log panel, manual-check checkboxes, structured tag-prefixed logs — built once in Run 3, reused in every harness after. The pattern itself became part of the skill set.
Real products test the skills holistically.
PageShot crossed scaffold, SDK, Agent API, OAuth, Blok, Tailwind, Geist, iframe sandboxing, and Permissions-Policy in a single build. QuickCopy validated that the stack could ship again without scaffold friction; Component Atlas scaled surface area (two extension points, tenant-wide graphs); Last-Edit Trail showed the same pipeline could run end-to-end with no new skill patches — the ultimate regression pass.
Fast context switching needs explicit async discipline.
When pages.context fires faster than version fetches complete, UI can flash the wrong page's data. Last-Edit Trail fixed this with a request-id guard (ADR-0010) — a pattern worth copying anywhere subscription-driven panels overlap slow queries.
Related case studies