This document describes the four built-in tasks in CodeBatch’s full pipeline.
Each task’s capabilities, output kinds, and language support are detailed.
The full pipeline runs these tasks in order:
01_parse → 02_analyze → 03_symbols → 04_lint
Each task reads from snapshot files and/or prior task outputs, producing indexed records that can be queried.
Purpose: Parse source files and produce Abstract Syntax Trees (AST).
| Language | Parser | AST Mode | Notes |
|---|---|---|---|
| Python | stdlib ast |
Full | Names preserved (FunctionDef.name, etc.) |
| JavaScript | tree-sitter* | Full | Real structural AST |
| TypeScript | tree-sitter* | Full | Full type annotation support |
| JavaScript | fallback | Token | Without tree-sitter (import count only) |
| TypeScript | fallback | Token | Without tree-sitter (import count only) |
| Other | None | Skip | No parsing, analysis still available |
*tree-sitter is optional: pip install codebatch[treesitter]
| Kind | Description | Fields |
|---|---|---|
ast |
Parsed AST stored in CAS | path, object, format, ast_mode |
Python AST (Full Mode):
{
"type": "Module",
"ast_mode": "full",
"body": [
{
"type": "FunctionDef",
"name": "calculate_total",
"lineno": 1,
"col_offset": 0,
"args": {
"type": "arguments",
"args": [
{"type": "arg", "arg": "items", "lineno": 1}
]
},
"body": [...]
}
]
}
JavaScript/TypeScript AST (tree-sitter):
{
"type": "program",
"ast_mode": "full",
"parser": "tree-sitter",
"children": [
{
"type": "function_declaration",
"name": "fetchData",
"start_point": [0, 0],
"end_point": [2, 1],
"children": [...]
}
]
}
JavaScript Fallback (Token Mode):
{
"type": "token_summary",
"ast_mode": "summary",
"parser": "regex",
"import_count": 5,
"function_pattern_count": 3,
"class_pattern_count": 1
}
# List all AST outputs for a batch
codebatch query outputs --batch <id> --store ./store --kind ast
# Get Python files with full AST
codebatch query outputs --batch <id> --store ./store --kind ast --json | \
jq '.[] | select(.format == "json" and .ast_mode == "full")'
error output, not AST.Purpose: Produce file-level metrics for all files in the snapshot.
| Metric | Languages | Source | Description |
|---|---|---|---|
bytes |
All | Snapshot | File size in bytes |
loc |
Text | File content | Lines of code (non-empty) |
lang |
All | Snapshot hint | Language identifier |
complexity |
Python | AST | Total cyclomatic complexity |
max_complexity |
Python | AST | Highest function complexity |
function_count |
Python | AST | Number of functions |
class_count |
Python | AST | Number of classes |
import_count |
Python | AST | Number of import statements |
| Kind | Description | Fields |
|---|---|---|
metric |
Single metric value | path, metric, value |
Complexity starts at 1 for each function and increments for:
| Construct | Contribution |
|---|---|
if / elif |
+1 each |
for / while |
+1 each |
except |
+1 each |
and / or |
+1 per operator |
assert |
+1 |
| Comprehensions | +1 each |
Ternary (if expr) |
+1 |
Example:
def process(items): # base: 1
if not items: # +1
return []
result = []
for item in items: # +1
if item > 0: # +1
result.append(item)
return result
# Total complexity: 4
# Get all metrics for a file
codebatch query outputs --batch <id> --store ./store --kind metric --path src/main.py
# Find high-complexity files
codebatch query outputs --batch <id> --store ./store --kind metric --json | \
jq '.[] | select(.metric == "complexity" and .value > 10)'
# Get total lines of code
codebatch query outputs --batch <id> --store ./store --kind metric --json | \
jq '[.[] | select(.metric == "loc")] | map(.value) | add'
Purpose: Extract named symbols (functions, classes, variables) and import edges.
| Language | Functions | Classes | Variables | Imports | Exports |
|---|---|---|---|---|---|
| Python | Yes | Yes | Yes | Yes | N/A |
| JavaScript | Yes* | Yes* | Yes* | Yes* | Yes* |
| TypeScript | Yes* | Yes* | Yes* | Yes* | Yes* |
*Requires tree-sitter for full support. Fallback mode uses regex patterns.
| Kind | Description | Fields |
|---|---|---|
symbol |
Named symbol definition | path, name, symbol_type, scope, line, column |
edge |
Dependency relationship | path, edge_type, source, target |
| Type | Description | Example |
|---|---|---|
function |
Function or method definition | def calculate() |
class |
Class definition | class ShoppingCart |
variable |
Variable assignment in function/method | total = 0 |
parameter |
Function/method parameter | def foo(x, y) |
import |
Imported name | from os import path |
| Type | Description | Example |
|---|---|---|
imports |
Module import dependency | import os → target: os |
exports |
Exported symbol (JS/TS only) | export function foo → foo |
Symbols include their enclosing scope:
# Input
class Cart:
def add(self, item):
price = item.price
{"name": "Cart", "symbol_type": "class", "scope": "module"}
{"name": "add", "symbol_type": "function", "scope": "Cart"}
{"name": "item", "symbol_type": "parameter", "scope": "add"}
{"name": "price", "symbol_type": "variable", "scope": "add"}
# List all functions in a file
codebatch query outputs --batch <id> --store ./store --kind symbol --path src/main.py --json | \
jq '.[] | select(.symbol_type == "function")'
# Find all classes
codebatch query outputs --batch <id> --store ./store --kind symbol --json | \
jq '.[] | select(.symbol_type == "class") | .name'
# Get import graph edges
codebatch query outputs --batch <id> --store ./store --kind edge --json | \
jq '.[] | select(.edge_type == "imports")'
# Find a specific function by name
codebatch query outputs --batch <id> --store ./store --kind symbol --json | \
jq '.[] | select(.name == "calculate_total")'
import()) not tracked.Purpose: Detect code quality issues through text-based and AST-aware rules.
| Code | Rule | Description |
|---|---|---|
| L001 | Trailing whitespace | Lines ending with spaces/tabs |
| L002 | Mixed indentation | Tabs and spaces in same file |
| L003 | Line too long | Lines exceeding 120 characters |
| L004 | No newline at end | File doesn’t end with newline |
| L005 | Multiple blank lines | More than 2 consecutive blank lines |
| Code | Rule | Description |
|---|---|---|
| L101 | Unused import | Import statement not referenced in code |
| L102 | Unused variable | Local variable assigned but never used |
| L103 | Variable shadowing | Inner scope shadows outer scope variable |
| Kind | Description | Fields |
|---|---|---|
diagnostic |
Code quality issue | path, code, message, severity, line, column |
| Severity | Meaning |
|---|---|
error |
Must be fixed (syntax errors, etc.) |
warning |
Should be fixed (unused code, etc.) |
info |
Style suggestion |
Detects imports that are never referenced in the code.
import os # Used - os.path referenced below
import sys # UNUSED - never referenced
from typing import List # Used in type annotation
def example():
return os.path.exists("/tmp")
items: List[int] = []
Diagnostic: L101: Unused import 'sys' at line 2
Detects local variables that are assigned but never read.
def calculate(x):
temp = x * 2 # UNUSED - never read
result = x + 1 # Used - returned below
return result
Diagnostic: L102: Unused variable 'temp' at line 2
Note: Does not flag:
_ (intentionally unused)Detects inner scope variables that shadow outer scope.
x = 10 # Outer scope
def example():
x = 20 # Shadows outer 'x'
return x
Diagnostic: L103: Variable 'x' shadows outer scope at line 4
# Get all diagnostics for a batch
codebatch query diagnostics --batch <id> --store ./store
# Filter by severity
codebatch query outputs --batch <id> --store ./store --kind diagnostic --json | \
jq '.[] | select(.severity == "warning")'
# Find unused imports
codebatch query outputs --batch <id> --store ./store --kind diagnostic --json | \
jq '.[] | select(.code == "L101")'
# Count diagnostics by code
codebatch query outputs --batch <id> --store ./store --kind diagnostic --json | \
jq 'group_by(.code) | map({code: .[0].code, count: length})'
# Get errors only
codebatch errors --batch <id> --store ./store
All output records include:
| Field | Type | Description |
|---|---|---|
schema_version |
int | Record schema version (1) |
snapshot_id |
string | Source snapshot identifier |
batch_id |
string | Execution batch identifier |
task_id |
string | Task that produced this output |
shard_id |
string | Shard that produced this output |
path |
string | Source file path |
kind |
string | Output type (ast, symbol, etc.) |
ts |
string | ISO timestamp |
{
"kind": "ast",
"path": "src/main.py",
"object": "sha256:abc123...",
"format": "json",
"ast_mode": "full"
}
{
"kind": "symbol",
"path": "src/main.py",
"name": "calculate_total",
"symbol_type": "function",
"scope": "module",
"line": 10,
"column": 0
}
{
"kind": "edge",
"path": "src/main.py",
"edge_type": "imports",
"source": "src/main.py",
"target": "os"
}
{
"kind": "metric",
"path": "src/main.py",
"metric": "complexity",
"value": 15
}
{
"kind": "diagnostic",
"path": "src/main.py",
"code": "L101",
"message": "Unused import 'sys'",
"severity": "warning",
"line": 2,
"column": 0
}
Tasks are configured in the pipeline definition. The full pipeline uses
default settings:
PIPELINES = {
"full": [
{"task": "parse", "id": "01_parse"},
{"task": "analyze", "id": "02_analyze", "depends": ["01_parse"]},
{"task": "symbols", "id": "03_symbols", "depends": ["01_parse"]},
{"task": "lint", "id": "04_lint", "depends": ["01_parse"]},
],
}
┌─────────────┐
│ Snapshot │
└──────┬──────┘
│
┌──────▼──────┐
│ 01_parse │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌─────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
│02_analyze │ │03_symbols│ │ 04_lint │
└───────────┘ └──────────┘ └──────────┘
All downstream tasks can read AST outputs from 01_parse via iter_prior_outputs().
| Feature | Python | JavaScript | TypeScript |
|---|---|---|---|
| Full AST | Yes | tree-sitter | tree-sitter |
| Symbol extraction | Yes | tree-sitter | tree-sitter |
| Import tracking | Yes | tree-sitter | tree-sitter |
| Complexity metrics | Yes | No | No |
| AST-aware linting | Yes | No | No |
| Text-based linting | Yes | Yes | Yes |
tree-sitter: Optional dependency. Install with pip install codebatch[treesitter]
Without tree-sitter, JavaScript/TypeScript files: