MAF — Runbook
Operational tasks, ordered by frequency. Each command is paste-ready.
Table of contents
- Preflight: am I healthy?
- Service management — systemd
- Manual start / restart (development)
- Wiring up trtools2 (
TT2_API_KEY) - Start the Kronos sidecar
- Watch / unwatch a symbol
- Verify the Kronos loop end-to-end
- Trigger an arena from the shell
- Tail live events
- Ollama rate-limit recovery
- Production gotchas
Preflight: am I healthy?
python -m maf doctor
Walks Redis · Ollama Cloud · arenas-load · streams-write · trtools2 ·
fomo2 · mirofish. Returns exit-0 only when every required check
passes. Each failing line ships an actionable hint
(doctor.py).
The dashboard's status bar surfaces the same info as coloured pills — redis · ollama · trtools2 · fomo2 · mirofish · kronos_refresher · mirofish_refresher. The last two are red when only the dashboard is running (no worker); see service management below.
Service management — systemd
Production runs two long-lived processes — the dashboard and the service-mode worker. Both managed by systemd, both survive reboot, both log-rotated.
One-time install
sudo mkdir -p /var/log/maf
sudo chown trbck:trbck /var/log/maf
sudo cp deploy/systemd/maf-dashboard.service /etc/systemd/system/
sudo cp deploy/systemd/maf-worker.service /etc/systemd/system/
sudo cp deploy/logrotate.d/maf /etc/logrotate.d/maf
sudo systemctl daemon-reload
sudo systemctl enable --now maf-dashboard.service
sudo systemctl enable --now maf-worker.service
See deploy/README.md for the unit-file
breakdown.
Day-to-day ops
| Action | Command |
|---|---|
| Restart dashboard (after template / JS edits) | sudo systemctl restart maf-dashboard |
| Restart worker (after arena YAML edits) | sudo systemctl restart maf-worker |
| Restart both | sudo systemctl restart maf-dashboard maf-worker |
| Tail both | sudo journalctl -u maf-dashboard -u maf-worker -f |
| Status | systemctl status maf-dashboard maf-worker |
| Force log rotation (don't wait for cron) | sudo logrotate -f /etc/logrotate.d/maf |
Verifying the worker is up
Within ~60 s of maf-worker starting, the dashboard's
kronos_refresher + mirofish_refresher pills turn green —
that's the heartbeat
key landing in Redis.
If they stay red, check journalctl -u maf-worker -n 100.
Manual start / restart (development)
If you're not using systemd (dev box, CI, ad-hoc):
# stop the old process
pkill -9 -f 'python -m maf' ; sleep 3
# dashboard only (no refreshers — fine for UI work)
nohup ./.venv/bin/python -m maf --dashboard \
--host 127.0.0.1 --port 8420 --log-level WARNING \
>/tmp/maf-dashboard.log 2>&1 &
disown
# service mode (refreshers + trigger dispatcher + control inbox)
nohup ./.venv/bin/python -m maf --log-level INFO \
>/tmp/maf-worker.log 2>&1 &
disown
Verify:
sleep 5
curl -sI http://127.0.0.1:8420/api/system/status | head -1
curl -sI https://maf.techvizier.com/ | head -1
Wiring up trtools2 (TT2_API_KEY)
MAF talks to trtools2's dashboard API (default port 8888) via the
trtools2_api
adapter. trtools2 requires X-API-Key auth — copy the value from
trtools2's config into MAF's .env:
TT2_KEY=$(grep -E '^TT2_API_KEY=' /home/trbck/workspace/trtools2/config/dashboard.env | cut -d= -f2-)
if grep -q '^TT2_API_KEY=' /home/trbck/workspace/MAF/.env 2>/dev/null; then
sed -i "s|^TT2_API_KEY=.*|TT2_API_KEY=${TT2_KEY}|" /home/trbck/workspace/MAF/.env
else
echo "TT2_API_KEY=${TT2_KEY}" >> /home/trbck/workspace/MAF/.env
fi
MAF's python-dotenv auto-loader picks it up on next startup —
restart the dashboard and worker.
Verify:
curl -s 'http://127.0.0.1:8420/api/data/sources/alpaca_live/feed_health/sample' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('ok' if d['ok'] else 'fail')"
The dashboard's freshness badges on the alpaca_live arena's Setup
tab also flip to external · HTTP API once auth works.
What if trtools2 is on a different host?
Override the base URL:
echo 'TRTOOLS2_API_URL=https://trtools2.internal:8888' >> /home/trbck/workspace/MAF/.env
Or per-binding in YAML:
- name: feed_health
adapter: trtools2_api
config:
query_type: feed_stats
base_url: https://trtools2.internal:8888
Start the Kronos sidecar
cd /home/trbck/workspace/MAF/services/kronos-svc
# one-time setup (only on a fresh box; ~1.1 GB venv after install)
python3 -m venv .venv
./.venv/bin/pip install -r requirements.txt
# start
PORT=5102 LOG_LEVEL=INFO nohup ./.venv/bin/python server.py \
>/tmp/kronos-svc.log 2>&1 &
disown
sleep 4
# health
curl -s http://127.0.0.1:5102/health
# {"status":"ok","model":"NeoQuasar/Kronos-small","loaded":false}
The model loads lazily on the first /forecast call (~12 s).
Subsequent calls are warm (~5–10 s for Kronos-small on CPU).
Source: server.py +
PredictorPool.
Watch / unwatch a symbol
Adding a symbol kicks off the
KronosRefresher
for that symbol (provided the service-mode worker is running). Costs
zero until a fresh forecast is requested.
# watch
curl -X POST -H 'Content-Type: application/json' \
-d '{"target_id":"NVDA","kind":"symbol","ttl_seconds":21600}' \
http://localhost:8420/api/watch
# list
curl -s 'http://localhost:8420/api/watch?kind=symbol' | jq .
# unwatch
curl -X DELETE 'http://localhost:8420/api/watch/NVDA?kind=symbol'
Backing class: WatchList.
Verify the Kronos loop end-to-end
Hermetic 4-step test that proves real torch inference + cache + adapter + specialist contract, without depending on Ollama Cloud:
# make sure kronos-svc is running (see above)
cd /home/trbck/workspace/MAF
PYTHONPATH=src ./.venv/bin/python scripts/verify_kronos_loop.py
Expected output:
[1] sidecar health ✓ /health → model=NeoQuasar/Kronos-small loaded=True
[2] refresher tick ✓ cache populated: direction=NEUTRAL prob_up=0.0
[3] adapter read ✓ age=0.0s direction=NEUTRAL prob_up=0.0
[4] kronos_specialist ✓ AgentSignal: signal=NEUTRAL confidence=0.27 …
Source: verify_kronos_loop.py.
The script seeds 80 synthetic NVDA bars, runs one refresher tick,
then exercises the specialist with a stub LLM that follows the prompt's
deterministic mapping rule.
Trigger an arena from the shell
Two ways. The control plane is the production path — it's the same plumbing the dashboard uses.
A. Direct XADD (raw)
redis-cli XADD maf:control:in '*' data '{
"command": "run_arena",
"correlation_id": "demo-1",
"args": {
"arena": "market_pulse",
"target": {"ticker": "NVDA"},
"action_mode": "manual"
}
}'
# wait for ack
redis-cli XREAD COUNT 1 BLOCK 60000 STREAMS maf:control:out '$'
B. Via the Python client (prettier)
from maf.control.client import ControlClient
import asyncio
async def main():
client = ControlClient()
ack = await client.send("run_arena", {
"arena": "market_pulse",
"target": {"ticker": "NVDA"},
"action_mode": "manual",
})
print(ack["result"]["synthesis_verdict"],
ack["result"]["synthesis_confidence"])
asyncio.run(main())
Source:
ControlClient +
ControlInbox.
C. Via the dashboard Run button
Open the dashboard, click Run on any arena card. The adaptive
modal collects target + mode, posts to POST /api/arenas/{name}/run,
and renders the verdict + reasoning inline. This is the path most
operators use.
D. Auto-firing via smart triggers
Use the Setup tab on the arena page to add a trigger rule. The
TriggerDispatcher
auto-fires the arena when a matching event hits the configured stream.
See config/trigger_templates.yaml for prebuilt rules.
Tail live events
python -m maf events # all events
python -m maf events --filter arena.complete
python -m maf events --arena market_pulse
Or in the browser: https://maf.techvizier.com/live.
Backend: EventBusReader → WS pump
at /ws/events.
Ollama rate-limit recovery
Symptom: many HTTP 429 — Too Many Requests lines in the logs, arenas
falling back to verdict=HOLD confidence=0.0.
Steps:
-
Confirm it's a rate-limit (not a billing issue):
bash curl -s -o /dev/null -w '%{http_code}\n' \ -H "Authorization: Bearer $OLLAMA_API_KEY" \ https://ollama.com/v1/models200 means the account is live. 401 means the key is invalid (rotate). -
Wait 3–5 minutes. The per-account RPM resets quickly. Re-run the arena.
-
If the limit keeps biting, lower the burst by reducing
max_react_stepson the noisier specialists. Each ReAct iteration = one LLM call. -
Switch the quick-profile model to a smaller one in
PROFILE_MAP—glm-4.7is small but might still hit the limit; trygemma3:4bfor the quick profile. Heavy-reasoning profile can stay ongpt-oss:120b.
Production gotchas
Two clones, one inode
/home/trbck/wp/MAF and /home/trbck/workspace/MAF are the same
directory (bind-mounted). Editing one edits the other. No syncing
needed.
Refresher loops only run in service mode
The dashboard process (python -m maf --dashboard) does not start
the Kronos/MiroFish refreshers or the trigger dispatcher. Those live
in the full service mode (python -m maf with no flags). The
kronos_refresher / mirofish_refresher status pills in the
dashboard surface this — green means the heartbeat key is fresh, red
with "not running" means start the worker.
For one-off testing without a long-running worker:
python -c "
import asyncio
from maf.scheduler.kronos_refresher import KronosRefresher
import httpx
async def main():
r = KronosRefresher(sidecar_url='http://localhost:5102')
async with httpx.AsyncClient(timeout=120) as http:
await r._tick(http)
await r.aclose()
asyncio.run(main())
"
Config saved but didn't take effect
The Setup tab writes YAML atomically but in-memory configs are loaded once at process start. Either:
- Restart the worker (
sudo systemctl restart maf-worker) for arena - trigger changes to take effect.
- Send a
reload_configcontrol command (rebuilds in-memory arena graph without restart):bash redis-cli XADD maf:control:in '*' data '{"command":"reload_config","args":{}}'
Save returned 412 in the Setup tab
Another browser tab (or another operator) saved this arena's config while you were editing. Reload the Setup tab to pull the latest ETag, re-apply your edits, save again. The on-disk YAML has the other writer's version — your edits weren't lost in the UI, just not yet persisted.
Disk pressure
The kronos-svc venv is 1.1 GB; the Docker image (when built) is ~3 GB. With 13 GB free, prefer the native venv path. To free space, prune docker:
docker system prune -af --volumes
Browser cache
After any nav / CSS / JS change, hard-refresh in the browser (Ctrl-Shift-R on Linux, Cmd-Shift-R on macOS). Cloudflare in front of the public domain holds JS for a few minutes.