MAF concepts — what each arena does, and where the edge comes from
This page exists because the names "Mastermind", "Oracle", "MiroFish" don't self-explain. Below: what each thing IS, what it produces, and why it should move the prediction needle relative to plain market data + a single LLM call.
The single hypothesis MAF bets on
A single LLM, given even the best data, gets one shot to synthesize. That's brittle: it can latch on to a salient detail, miss a contradicting one, or hallucinate when the data is sparse.
MAF replaces one big call with a structured tournament:
- Specialists look at the same target through their own lens (price, risk, momentum, macro, on-chain, sentiment, fundamentals). Each commits to a structured AgentSignal — direction, confidence, key factors.
- A judge reads every signal plus the supporting evidence (citations from a knowledge graph, prior decisions, current strategy state) and writes one argument tree that gets reconciled into a verdict.
- A trail records every step so you can audit why the system said BUY on Tuesday — not just that it did.
This shape is the same across arenas. The arenas differ in what they ingest and what signal they specialise for.
Arena: trading_intelligence (a.k.a. MarketMind)
Goal: per-ticker BUY / HOLD / SELL with confidence + reasoning.
Loop:
- Five-to-six specialists fire in parallel, each with isolated state
(no telephone effect):
-
price_analyst— recent bars, technicals, quotes -sentiment_analyst— news, social, fomo2 digests -onchain_analyst— crypto microstructure -macro_analyst— FRED rates, DXY, fear/greed -risk_analyst— vol, drawdown, insider activity -committee_analyst— translates the upstream TradingAgents committee (langgraph subprocess) into one more vote synthesis_agentreads all signals + confidences, computes a weighted ensemble score, writes the final verdict + reasoning.- The post-run envelope (
maf:arena:trading_intelligence:output) is rich: target, synthesis, every agent signal, source-call metrics, phase timings. - A
TradingActionlands onmaf:actions:outwithmode=auto|semi|manual.
Where the edge comes from:
- Disagreement is information. Specialists with high confidence but opposing direction (e.g. risk says BEARISH, price says BULLISH) trigger a lower synthesis confidence — the system stops itself from overcommitting when the picture is mixed.
- Confidence-weighted ensemble outperforms majority-vote at the regime edges. A 0.9-confidence bull beats two 0.4-confidence bears, but a 0.9-confidence bear flips a 0.55-confidence bull majority.
- Source-metric instrumentation surfaces silent data failures (a feed went stale, an indicator returned empty). Most "the LLM was confused" reports are really "the LLM didn't have the data."
Arena: mastermind
Goal: answer any question — not just BUY/SELL — with a grounded, auditable decision.
Loop:
- Frame — sharpen the question.
- Gather — pull from the knowledge graph (entities & relations built up
by all prior arena runs), from
DecisionMemory(past decisions tagged by domain and horizon), and any directly-attached markdown documents. - Specialists — four runtime-checkable
Specialistprotocols:FundamentalAnalyst,RiskSkeptic,MomentumWatcher,CrowdProxy. Each returns oneArenaVote: weighted, with rationale ≤500 chars and a flags list. - Judge — reads all four votes + the graph citations + memory citations
and writes a typed
Decision: recommendation, confidence, argument tree, cited decision ids, flags. One LLM call, retry-once on JSON parse failure. - Reflect (out-of-band) — when the horizon elapses, an
outcome_harvesterlooks up what actually happened, and aReflectionAgentwrites the lesson "in hindsight" into the same Decision envelope. That lesson becomes searchable context for the next decision.
Where the edge comes from:
- Memory of past mistakes. The reflection loop creates a corpus of "I-thought-X-but-Y" entries that BM25/Chroma surfaces on the next similar question. This is the cheap version of fine-tuning — the LLM still reasons, but it reasons over its own track record.
- Argument trees are reviewable. You can scroll into a node and see which evidence it depended on. When the model is wrong, you can see where it was wrong rather than starting from "the output was wrong, why?".
- Domain-flagged specialists. Each
ArenaVotecarries a domain (price,sentiment,onchain,macro,risk). The judge weights by domain relevance to the question — a macro question doesn't get drowned by a high-confidence price specialist.
Arena: crowd_simulation (Oracle, MiroFish-driven)
Goal: answer "how will the market react to this news / report?" rather than "what should I do?"
Loop:
- Ingest a document (a fomo2 report, an SEC filing, a Polymarket bet resolution, a tweet thread).
- Hand it to MiroFish (vendored Flask backend, sitting on Neo4j + Ollama Cloud). MiroFish builds an ontology from the document, then spawns a synthetic crowd: dozens of personas (each with prior beliefs, holdings, risk appetite) that "react" round by round to the document and to each other's reactions in the OASIS simulation.
- After N rounds, MiroFish returns the converged crowd state — aggregate sentiment, position changes, top narratives, dissent percentage.
- The
synthesis_agentreads the crowd state and produces aCrowdPrediction: outcome label, probability, dissent %, top drivers, per-persona votes (ametablock enables drill-down). - Published to
maf:arena:crowd_simulation:outputas a versioned envelope.
Where the edge comes from:
- Behavioural priors beat fundamentals at short horizons. News doesn't move price; the crowd's belief about the news moves price. MiroFish simulates the belief-update step before it shows up in tape.
- Dissent % is the killer feature. High probability + high dissent = unstable consensus = good contrarian entry. Probability alone misses this.
- Persona priors are configurable. Run the same news through a "high short-interest retail crowd" and a "long-only institutional crowd" and compare the deltas — that's the asymmetry hedge funds pay for.
Arena: report_to_action (fast-path)
Goal: turn a fresh fomo2 report into an executable trading action in seconds, not minutes.
Why it exists: Mastermind is thorough but slow (graph queries, memory lookups, reflection). Trading intelligence is broad but expensive (5+ parallel specialists, large source fan-out). For "a new analyze_*.md just landed and I want a fast take", we want a leaner pipeline.
Loop:
- Parallel — three specialists:
signal_analyst,sentiment_analyst,risk_analyst. Each binds the new report + just enough live streams (trtools2 indicators, news, strategy events, fomo2 enriched items). - Synthesis — one synthesis pass, then publish a
TradingActionwith user-selected mode.
Edge: speed × decisiveness. A medium-confidence call published in 15s beats a high-confidence call published in 5 min when the move was in the first minute.
Component: MiroFish
One line: a multi-agent crowd-simulator with persistent knowledge graph, hosted as a separate Flask service we don't fork.
What it does:
- Builds an entity-relation graph from each ingested document (
/api/graph/*). - Runs the OASIS simulation: dozens of persona-agents react in rounds
(
/api/simulation/*). - Generates a structured post-simulation report (
/api/report/*). - Stores everything in Neo4j (knowledge graph) and uses Ollama Cloud for the LLM persona inference (no local GPU required).
Why MAF leans on it: the persona-agent loop is exactly the "what-will-the-market-believe" calculation we don't want to fake with a single LLM "imagine the crowd thinks…" prompt. Real multi-agent deliberation produces qualitatively different signals — especially the dissent metric.
Component: smart Ollama model picker
Why: Ollama Cloud hosts models ranging from 20B to 1T params with very different latency / reasoning / JSON-discipline tradeoffs.
How it works: every LLM call carries a task profile (or falls back to
quick/deep tier). The picker (src/maf/llm/model_picker.py) maps profiles
to models:
| Profile | Model |
|---|---|
quick / classification / signal |
gpt-oss:20b-cloud |
narrative / debate |
gpt-oss:20b-cloud |
synthesis / judge / json_strict |
gpt-oss:120b-cloud |
long_context / coding |
qwen3-coder:480b-cloud |
research |
deepseek-v3.1:671b-cloud |
trillion |
kimi-k2:1t-cloud |
Set a specific model in an arena config to bypass the picker (llm.providers.ollama.model: gpt-oss:120b-cloud). Override per-profile via
env var: MAF_OLLAMA_MODEL__SYNTHESIS=....
Component: realtime layer
Three Redis Streams form the operational nerve of MAF:
| Stream | Role |
|---|---|
maf:events |
Lifecycle events (arena.start, phase.complete, agent.signal, decision.emit, action.emit, source.error). Dashboard WebSocket pumps this to /ws/events. |
maf:control:in |
Inbound commands (run_arena, configure_arena, set_data_source, reload_config, health). MAF acks on maf:control:out. |
maf:actions:out |
Outbound TradingActions (verdict + mode). Downstream engines (trtools2) decide auto/semi/manual based on the mode field. |
Plus each arena owns its envelope stream:
maf:arena:<name>:output.
And MAF consumes trtools2 / fomo2 streams as data sources:
trtools2:bars:1m, trtools2:bars:1h, trtools2:news,
trtools2:indicators, trtools2:strategy:events, fomo2:enriched,
fomo2:reports, fomo2:requests:out.
Component: smart data plumbing (Phase 2)
Four-layer data architecture, designed so cold targets cost zero and hot ones are sub-minute reactive:
| Layer | What it does | Module |
|---|---|---|
| 1. Watch list | One Redis sorted-set keyed by opaque target id (symbol, question, document). TTL-decayed. The single source of truth for "what's interesting right now". |
maf.watch.list |
| 2. Refreshers (scheduled / event-driven) | Proactively fill expensive caches keyed off the watch list. Kronos: per-watched-symbol forecast every 60 s / 5 min. Mirofish: on a fresh high-impact fomo2:reports event for a watched symbol, runs a crowd-sim (10–30 min) once per report_id, budget-gated (10/day default). |
maf.scheduler.kronos_refresher, maf.scheduler.mirofish_refresher |
| 3. Trigger dispatcher | Declarative triggers: blocks per arena YAML. Tails the named streams, evaluates when: predicates via a tight safe-eval mini-language (payload.x, abs(), comparisons, and/or/not), applies per-(arena, target) cooldown + cost-gate demote, XADDs run_arena to maf:control:in. |
maf.triggers.dispatcher |
| 4. Arena consumption | Specialists read cached forecasts/sims via standard source adapters — no torch, no Neo4j, no HTTP to sidecars inside the MAF process. ReplanAgent detects stale_kronos_forecast / no_crowd_sim markers and forces re-runs. |
maf.sources.adapters.{kronos_forecast,mirofish_sim} |
Wire diagram (all streams are Redis Streams, all caches are Redis keys):
watch:zset ──────┐
│
┌────────────────┼────── kronos-refresher ──HTTP──► kronos-svc (sidecar)
│ │ │
│ │ ▼
│ │ kronos:forecast:{sym}:{tf} + kronos:forecasts:emitted
│ │ │
│ │ ├──► dispatcher (trigger rules)
│ │ │ │
│ │ │ ▼
│ fomo2:reports ─────► mirofish-refresher ──HTTP──► mirofish-svc maf:control:in
│ │ │
│ ▼ ▼
│ mirofish:sim:{report_id} + :sims:emitted ControlInbox → arena run
│ │ │
│ └───────────────► dispatcher ▼
│ arena ─► specialists read
│ kronos_forecast +
│ mirofish_sim cache
│ │
│ ▼
└─────────── action_outbox ◄────────────────────────────────────── TradingAction
│
▼
maf:actions:out (consumed by trtools2-side ActionConsumer)
Cost discipline:
- Cold ticker: zero ongoing cost. Refreshers do nothing.
- Watched ticker: ~60 Kronos calls/hour, free under most plans.
- Hot event (new report): one Mirofish sim per (report_id, symbol) per day.
- Cost-cap gate demotes auto → semi when the per-hour LLM cost exceeds max_cost_per_hour_eur.
Triggers in YAML look like:
triggers:
- name: "kronos prob_up shift"
on_stream: kronos:forecasts:emitted
when: "abs(payload.prob_up_delta) > 0.15 or payload.direction_flipped"
target: { ticker: "{payload.symbol}" }
cooldown_s: 60
action_mode: semi
Add a symbol to the watch list (POST /api/watch {target_id: "NVDA", kind: "symbol"}) and everything else is automatic: refreshers start producing, dispatcher reacts, arenas run.
MAF beyond trading
The platform is not trading-specific. The same EventBus, control plane, ReplanAgent, DecisionMemory, watch list, trigger dispatcher, and arena loader run any deliberation where you can:
- Frame the question (a target dict — opaque keys).
- Decompose it (parallel specialist personas).
- Reconcile (a synthesis pass).
Two configuration changes flip an arena from trading to discussion:
target_key on ArenaConfig
The single field on each arena's YAML that decides which outbox MAF publishes to:
target_key: "ticker"→maf:actions:out(TradingAction)target_key: "question_id"→maf:decisions:out(GenericDecision)target_key: "pr_id"→maf:decisions:out- … any string you want
Order routers (e.g. trtools2) keep listening to maf:actions:out and
never see research-debate verdicts; dashboards consuming
maf:decisions:out get only deliberation outcomes.
GenericDecision envelope
Where TradingAction has BUY/HOLD/SELL + mode + size, GenericDecision is intentionally free-form:
class GenericDecision(BaseModel):
arena: str # "research_debate"
target: dict # opaque — arena-specific
target_key: str # "question_id"
verdict: str # "approve" | "approve_with_conditions" | …
confidence: float
reasoning: str
contributors: list # per-specialist signals for audit
Arenas pick their own verdict vocabulary. The research_debate arena
uses approve / approve_with_conditions / needs_revision / reject /
needs_more_data; a code-review arena could use lgtm / nits / blocking.
The research_debate arena (concrete example)
name: research_debate
target_key: "question_id"
phases:
- name: analysis
pattern: parallel
agents:
- name: engineering # tech feasibility + maintenance burden
role: specialist
- name: legal # regulatory exposure, contractual surface
role: specialist
- name: business # customer value, strategic fit
role: specialist
- name: synthesis
pattern: sequential
agents:
- name: chair
role: synthesis
- name: replan_check # same ReplanAgent the trading arenas use
- name: emit
Sources are knowledge_base (fomo2 chromadb for prior decisions) and
crowd_sim (MiroFish synthetic-crowd reaction — same adapter the
trading arenas use, just pointed at a non-trading document).
Mirofish as a general discussion engine
Mirofish isn't "the trading-sentiment box". It's a multi-persona LLM
simulator over a Neo4j knowledge graph. The trading arenas use it to
estimate crowd reaction to news; the research_debate arena uses it to
estimate stakeholder reaction to a proposal. Same code path, different
prompt fed into MirofishCrowdSource.
How to dispatch a non-trading run
redis-cli XADD maf:control:in '*' data '{
"command": "run_arena",
"args": {
"arena": "research_debate",
"target": {
"question_id": "rfc-2026-042",
"title": "Migrate session storage to ScyllaDB?",
"text": "<the proposal markdown>"
},
"emit_action": true
}
}'
The ControlInbox routes it identically to a trading_intelligence run —
the only difference is which outbox stream receives the result.
Component: control plane (inbound)
Anything you can do from the dashboard (POST /api/arenas/.../run) you can
also do from a Redis-Streams client. That makes MAF programmable from
bash/python without reaching for HTTP, and lets multiple workers share the
load via a consumer group.
# trigger a run from the shell
redis-cli XADD maf:control:in '*' data '{
"command": "run_arena",
"correlation_id": "demo-1",
"args": {"arena": "report_to_action",
"target": {"ticker": "NVDA"},
"action_mode": "semi"}
}'
# read the ack
redis-cli XREAD COUNT 1 STREAMS maf:control:out 0
Or use the Python helper:
from maf.control.client import ControlClient
ack = await ControlClient().run_arena("report_to_action", target={"ticker": "NVDA"})
print(ack["result"]["synthesis_verdict"], ack["result"]["synthesis_confidence"])
How to get started
# 1. Check everything is wired
python -m maf doctor
# 2. Start the dashboard (recommended for first run)
python -m maf --dashboard --port 8420
# 3. Or run a single arena from the CLI
python -m maf --arena trading_intelligence --ticker NVDA
# 4. Or run the long-running service (control plane + scheduled timers)
python -m maf
The dashboard at http://localhost:8420/ now has:
- Live —
/live— real-time WebSocket feed of every event MAF emits. - Data —
/data— stream health + every source binding with sample. - Status bar — top of every page — Redis, Ollama, trtools2, fomo2, mirofish connectivity at a glance.