Every WebSketch capture is a tree of nodes annotated with a fixed grammar. This page documents the roles, symbols, classification logic, and cleanup passes that produce the final output.
WebSketch maps every visible element to one of 22 UI primitives. The vocabulary is fixed — the same roles appear on every website, so LLMs learn them once and apply them everywhere.
After classification, WebSketch runs a series of cleanup passes to minimize noise:
Transparent table traversal — Intermediate elements like TR, TD, TH, and LI are skipped. Their children are promoted to the surface so the tree reflects content, not markup structure.
Zero-content pruning — Empty, non-interactive, and invisible nodes are dropped entirely.
Wrapper collapsing — Meaningless single-child SECTION wrappers are removed, pulling the child up one level.
Cascading prune — Hollow wrapper chains (nested containers with no actual content) are eliminated entirely.
Label extraction — Visible text is pulled from links, buttons, headings, images (alt text), and inputs (placeholder or value) and placed in the "label" annotation.
The result is the minimum set of nodes needed to understand the page.