Skip to content

Observability For Agents

Frisky comes with a robust CLI to inspect a running cluster. This CLI is designed for AI agents. It is self-describing with --help, and organized in an information funnel from broad summary overviews at the top to raw details at the bottom.

Agents start here:

frisky observe --help

make docs-build refreshes this help output before building the site.

                                                                                                    
 Usage: frisky observe [OPTIONS] COMMAND [ARGS]...                                                  
                                                                                                    
 Inspect a running Frisky cluster — live state and timing in one funnel.

 Start at `overview` and drill down; each view points to the next. The command
 groups below run broad to specific.

 State views are live-only via --url (default http://localhost:8787) and take
 --json (stable shape for parsing). Timing views (overview/timeline/stragglers/
 spans/export) take a SOURCE — a URL (live) or a captured JSON file — so you can
 `observe spans > f.json` once and analyse offline (`observe spans` emits JSON
 directly; the others render text). All views auto-switch to plain text on
 non-TTY stdout, so pipes and subprocess.run get parseable output.

 Common span --name prefixes: worker.exec.* (gil/deserialize/call), worker.spill.*,
 worker.unspill.*, worker.transfer.*, scheduler.* (handle_update_graph, metrics),
 client.dask_*. Logs are separate: `frisky logs --help`.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Start here ─────────────────────────────────────────────────────────────────────────────────────╮
│ overview    Live state + performance summary on one screen. START HERE.                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Breadth — orientation ──────────────────────────────────────────────────────────────────────────╮
│ cluster     Show a one-shot cluster summary: task counts, workers, queues, throughput.           │
│ versions    Show software versions on the cluster (python, frisky, imported packages).           │
│ workers     Show worker status and resource usage.                                               │
│ prefixes    Show task prefix state plus recent transfer/disk costs.                              │
│ progress    Show combined worker and prefix overview.                                            │
│ detail      Show detailed worker x prefix breakdown.                                             │
│ timeline    Render spans as a text-based intensity timeline (per-row sparklines).                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Ranked — find the problem ──────────────────────────────────────────────────────────────────────╮
│ erred       List Erred tasks, root failures first (sorted by blocking impact).                   │
│ blocked     List tasks with the most outstanding dependencies (most blocked first).              │
│ queued      List queued tasks (rootish tasks held back because workers are saturated).           │
│ transfers   Transfers: live in-flight view, or retrospective accounting breakdown.               │
│ stragglers  Rank workers by how much they differ from their peers (timing outliers).             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Inspect one entity ─────────────────────────────────────────────────────────────────────────────╮
│ task        One task: state, deps/dependents tree, event timeline.                               │
│ worker      Show detailed information for a single worker, including its tasks.                  │
│ deps        List immediate dependencies of a task (full list, paginated).                        │
│ dependents  List immediate dependents of a task (full list, paginated).                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Raw / last-mile ────────────────────────────────────────────────────────────────────────────────╮
│ events      Query the scheduler event log (transitions, placements, transfers, completions).     │
│ tasks       List tasks with optional filtering.                                                  │
│ spans       Raw spans as JSON (default) or a quick --table — the bottom of the funnel.           │
│ export      Export spans to Chrome trace format for chrome://tracing or Perfetto.                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Running vs Historical Clusters

The frisky observe CLI command works against a live running cluster. Give it your dashboard address:

frisky observe overview http://localhost:8787

Or we can save data from a cluster for offline review.

frisky observe spans http://localhost:8787 > cluster-data.json
frisky observe overview cluster-data.json

For Humans

For humans we recommend using the live dashboard.