Observability For Agents
Frisky comes with a robust CLI to inspect a running cluster. This CLI is designed for AI agents. It is self-describing with --help, and organized in an information funnel from broad summary overviews at the top to raw details at the bottom.
Agents start here:
frisky observe --help
make docs-build refreshes this help output before building the site.
[1m [0m
[1m [0m[1mUsage: [0m[1mfrisky observe [OPTIONS] COMMAND [ARGS]...[0m[1m [0m[1m [0m
[1m [0m
Inspect a running Frisky cluster — live state and timing in one funnel.
[2mStart at `overview` and drill down; each view points to the next. The command[0m
[2mgroups below run broad to specific.[0m
[2mState views are live-only via [0m[1;2m-[0m[1;2m-url[0m[2m (default http://localhost:8787) and take[0m
[1;2m-[0m[1;2m-json[0m[2m (stable shape for parsing). Timing views (overview/timeline/stragglers/[0m
[2mspans/export) take a SOURCE — a URL (live) or a captured JSON file — so you can[0m
[2m`observe spans > f.json` once and analyse offline (`observe spans` emits JSON[0m
[2mdirectly; the others render text). All views auto-switch to plain text on[0m
[2mnon-TTY stdout, so pipes and subprocess.run get parseable output.[0m
[2mCommon span [0m[1;2m-[0m[1;2m-name[0m[2m prefixes: worker.exec.* (gil/deserialize/call), worker.spill.*,[0m
[2mworker.unspill.*, worker.transfer.*, scheduler.* (handle_update_graph, metrics),[0m
[2mclient.dask_*. Logs are separate: `frisky logs [0m[1;2m-[0m[1;2m-help[0m[2m`.[0m
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1m-[0m[1m-help[0m Show this message and exit. [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Start here [0m[2m────────────────────────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1moverview [0m[1m [0m Live state + performance summary on one screen. START HERE. [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Breadth — orientation [0m[2m─────────────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1mcluster [0m[1m [0m Show a one-shot cluster summary: task counts, workers, queues, throughput. [2m│[0m
[2m│[0m [1mversions [0m[1m [0m Show software versions on the cluster (python, frisky, imported packages). [2m│[0m
[2m│[0m [1mworkers [0m[1m [0m Show worker status and resource usage. [2m│[0m
[2m│[0m [1mprefixes [0m[1m [0m Show task prefix state plus recent transfer/disk costs. [2m│[0m
[2m│[0m [1mprogress [0m[1m [0m Show combined worker and prefix overview. [2m│[0m
[2m│[0m [1mdetail [0m[1m [0m Show detailed worker x prefix breakdown. [2m│[0m
[2m│[0m [1mtimeline [0m[1m [0m Render spans as a text-based intensity timeline (per-row sparklines). [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Ranked — find the problem [0m[2m─────────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1merred [0m[1m [0m List Erred tasks, root failures first (sorted by blocking impact). [2m│[0m
[2m│[0m [1mblocked [0m[1m [0m List tasks with the most outstanding dependencies (most blocked first). [2m│[0m
[2m│[0m [1mqueued [0m[1m [0m List queued tasks (rootish tasks held back because workers are saturated). [2m│[0m
[2m│[0m [1mtransfers [0m[1m [0m Transfers: live in-flight view, or retrospective accounting breakdown. [2m│[0m
[2m│[0m [1mstragglers[0m[1m [0m Rank workers by how much they differ from their peers (timing outliers). [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Inspect one entity [0m[2m────────────────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1mtask [0m[1m [0m One task: state, deps/dependents tree, event timeline. [2m│[0m
[2m│[0m [1mworker [0m[1m [0m Show detailed information for a single worker, including its tasks. [2m│[0m
[2m│[0m [1mdeps [0m[1m [0m List immediate dependencies of a task (full list, paginated). [2m│[0m
[2m│[0m [1mdependents[0m[1m [0m List immediate dependents of a task (full list, paginated). [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m Raw / last-mile [0m[2m───────────────────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1mevents [0m[1m [0m Query the scheduler event log (transitions, placements, transfers, completions). [2m│[0m
[2m│[0m [1mtasks [0m[1m [0m List tasks with optional filtering. [2m│[0m
[2m│[0m [1mspans [0m[1m [0m Raw spans as JSON (default) or a quick [1m-[0m[1m-table[0m — the bottom of the funnel. [2m│[0m
[2m│[0m [1mexport [0m[1m [0m Export spans to Chrome trace format for chrome://tracing or Perfetto. [2m│[0m
[2m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
Running vs Historical Clusters
The frisky observe CLI command works against a live running cluster. Give it your dashboard address:
frisky observe overview http://localhost:8787
Or we can save data from a cluster for offline review.
frisky observe spans http://localhost:8787 > cluster-data.json
frisky observe overview cluster-data.json
For Humans
For humans we recommend using the live dashboard.