Integration

fsv-agent — the verification flywheel in 350 lines.

fsv-agent is a thin Python CLI that wraps any LLM with FullStackVibes Handshake retrieval and (optionally) records anonymous outcome data back to FSV. It is the smallest concrete demonstration of the architecture: a single command pulls a Precision Bundle, folds it into a system prompt, calls your model with your key, and closes the loop by tying retrieved windows to a per-session SUCCESS / FAILURE flag. Every recorded outcome is a citable proof of "this slice of the corpus was load-bearing for this query" — the empirical signal that nobody else's framework produces.

What this does that other agent runtimes don't

Every existing framework (langchain, llamaindex, Vercel AI SDK, Cursor's retrieval, Claude Code's MCP loop) treats retrieval as a one-way operation: pull context, use it, drop it. There is no path back from "the agent succeeded" to "which windows were load-bearing." Quality scores stay model-attested forever.

fsv-agent records the link. Each invocation, opted in, writes:

  • The window IDs retrieved (no content — those are public anyway)
  • The model + provider used
  • The latency
  • An eventual SUCCESS or FAILURE outcome (you mark it)

Aggregate across many sessions and the corpus develops an empirical quality_score per window: "windows pulled into N successful sessions across M models" beats "this window scored 0.84 on a single inference run" by a wide margin.

Privacy posture

  • Prompts and responses never touch FSV. Your LLM call goes directly from your machine to your provider with your key.
  • The anonymous actor id is generated locally on first run and stored at ~/.fsv-agent/actor.txt. It's a UUID, never joined to any FSV identity.
  • Recording is opt-in per call. Add --no-verify to skip session recording entirely — the CLI still pulls the bundle and runs the agent, just doesn't phone home.
  • Window IDs that get recorded are public corpus pointers, not user content.

Run it without installing

The script uses uv's PEP 723 inline-script-dependencies block. uv run handles the (single) dependency for you:

uv run https://fullstackvibes.com/integrations/agent/fsv_agent.py \
    --space fintech \
    --include-owner-kept \
    "How should I handle TradingView webhooks safely?"

That's the whole install. No pip install, no virtualenv, no SDK pulls.

Or save it locally

curl -sSL https://fullstackvibes.com/integrations/agent/fsv_agent.py \
    -o ~/.local/bin/fsv-agent
chmod +x ~/.local/bin/fsv-agent
fsv-agent --space fintech "your question"

Usage

Default subcommand is ask (you can omit it):

fsv-agent [--space SPACE] [--window-type TYPE] [--window-tag TAG]
          [--pattern-tag KIND:slug] [--quality-min FLOAT]
          [--max-chars N] [--max-windows N] [--include-owner-kept]
          [--provider {openrouter,anthropic,openai}] [--model NAME]
          [--system "extra system prompt"]
          [--no-handshake] [--no-verify] [--show-bundle]
          PROMPT...

All filter flags forward to the Handshake API — see the Handshake spec for semantics.

Provider keys

Set whichever you use:

  • OPENROUTER_API_KEY — default provider, accesses every model through one API
  • ANTHROPIC_API_KEY — for --provider anthropic
  • OPENAI_API_KEY — for --provider openai

Rate the outcome later

fsv-agent rate <session-id> --success
fsv-agent rate <session-id> --fail

The session id is printed at the end of every ask invocation that recorded a session. Rating is what makes the flywheel turn.

Worked example

$ export OPENROUTER_API_KEY=sk-or-v1-...
$ uv run https://fullstackvibes.com/integrations/agent/fsv_agent.py \
    --space fintech \
    --window-type SCHEMA --window-type ANTI_PATTERN \
    --include-owner-kept \
    "Give me a safe TradingView webhook payload schema"

» fetching Precision Bundle…
» bundle: 7 windows, 1284 chars
» calling openrouter/google/gemma-4-26b-a4b-it…
» 3201 ms

[the model's answer, citing the retrieved windows by parent artifact title]

[fsv-agent] session 0a8f3c7b-9d12-4e5f-9b21-77c4a6d2e0a1
[fsv-agent] rate it later: fsv_agent.py rate 0a8f3c7b-9d12-4e5f-9b21-77c4a6d2e0a1 --success

How it composes the system prompt

The bundle is folded into a system prompt with explicit framing: each window appears with its structural type, parent artifact title, and quality score, wrapped in a === BEGIN PRECISION BUNDLE === / === END PRECISION BUNDLE === envelope so the model can distinguish FSV-supplied context from later messages.

The preamble explicitly tells the model:

  • Treat the windows as authoritative reference material.
  • Cite parent artifact titles in parentheses when quoting.
  • The windows are typed — reason about which type addresses the user's need.

If you want to layer your own system prompt on top, pass --system "...". Your text appends after the bundle.

Comparing baseline (no FSV)

To compare bundle-augmented vs. baseline answers from the same model, run twice:

fsv-agent --no-handshake --provider openrouter --model google/gemma-4-26b-a4b-it \
    "your question"   # baseline

fsv-agent --space your-space --provider openrouter --model google/gemma-4-26b-a4b-it \
    "your question"   # FSV-augmented

This is also the simplest empirical evaluation harness for the corpus — if FSV-augmented answers don't measurably beat baseline on a domain question, the corpus needs sharper windows for that domain.

Source

The whole runtime is one file:

  • fsv_agent.py — ~350 lines, MIT, single dependency (httpx) declared inline via PEP 723.

Read it before you run it. It's short on purpose.

What's next

The runtime is intentionally minimal. The next pieces that would make it more useful:

  • Streaming responses (currently non-streaming; the call blocks until completion).
  • Multi-turn conversations (currently each invocation is a single user prompt).
  • Outcome auto-detection (e.g. for code-output prompts, run the code and use exit code as a signal).
  • Editor integration via VSCode extension that calls the same backend endpoints.

None of these are required for the flywheel to start producing data. One --success rating is more empirically useful than zero, and that's all this CLI needs to demonstrate.