checking system…
Docs / Concepts (deep dive)
Underlying ideas: phases, ReAct loop, replan controller, decision memory.

MAF concepts — what each arena does, and where the edge comes from

This page exists because the names "Mastermind", "Oracle", "MiroFish" don't self-explain. Below: what each thing IS, what it produces, and why it should move the prediction needle relative to plain market data + a single LLM call.


The single hypothesis MAF bets on

A single LLM, given even the best data, gets one shot to synthesize. That's brittle: it can latch on to a salient detail, miss a contradicting one, or hallucinate when the data is sparse.

MAF replaces one big call with a structured tournament:

  1. Specialists look at the same target through their own lens (price, risk, momentum, macro, on-chain, sentiment, fundamentals). Each commits to a structured AgentSignal — direction, confidence, key factors.
  2. A judge reads every signal plus the supporting evidence (citations from a knowledge graph, prior decisions, current strategy state) and writes one argument tree that gets reconciled into a verdict.
  3. A trail records every step so you can audit why the system said BUY on Tuesday — not just that it did.

This shape is the same across arenas. The arenas differ in what they ingest and what signal they specialise for.


Arena: trading_intelligence (a.k.a. MarketMind)

Goal: per-ticker BUY / HOLD / SELL with confidence + reasoning.

Loop:

  1. Five-to-six specialists fire in parallel, each with isolated state (no telephone effect): - price_analyst — recent bars, technicals, quotes - sentiment_analyst — news, social, fomo2 digests - onchain_analyst — crypto microstructure - macro_analyst — FRED rates, DXY, fear/greed - risk_analyst — vol, drawdown, insider activity - committee_analyst — translates the upstream TradingAgents committee (langgraph subprocess) into one more vote
  2. synthesis_agent reads all signals + confidences, computes a weighted ensemble score, writes the final verdict + reasoning.
  3. The post-run envelope (maf:arena:trading_intelligence:output) is rich: target, synthesis, every agent signal, source-call metrics, phase timings.
  4. A TradingAction lands on maf:actions:out with mode=auto|semi|manual.

Where the edge comes from:

  • Disagreement is information. Specialists with high confidence but opposing direction (e.g. risk says BEARISH, price says BULLISH) trigger a lower synthesis confidence — the system stops itself from overcommitting when the picture is mixed.
  • Confidence-weighted ensemble outperforms majority-vote at the regime edges. A 0.9-confidence bull beats two 0.4-confidence bears, but a 0.9-confidence bear flips a 0.55-confidence bull majority.
  • Source-metric instrumentation surfaces silent data failures (a feed went stale, an indicator returned empty). Most "the LLM was confused" reports are really "the LLM didn't have the data."

Arena: mastermind

Goal: answer any question — not just BUY/SELL — with a grounded, auditable decision.

Loop:

  1. Frame — sharpen the question.
  2. Gather — pull from the knowledge graph (entities & relations built up by all prior arena runs), from DecisionMemory (past decisions tagged by domain and horizon), and any directly-attached markdown documents.
  3. Specialists — four runtime-checkable Specialist protocols: FundamentalAnalyst, RiskSkeptic, MomentumWatcher, CrowdProxy. Each returns one ArenaVote: weighted, with rationale ≤500 chars and a flags list.
  4. Judge — reads all four votes + the graph citations + memory citations and writes a typed Decision: recommendation, confidence, argument tree, cited decision ids, flags. One LLM call, retry-once on JSON parse failure.
  5. Reflect (out-of-band) — when the horizon elapses, an outcome_harvester looks up what actually happened, and a ReflectionAgent writes the lesson "in hindsight" into the same Decision envelope. That lesson becomes searchable context for the next decision.

Where the edge comes from:

  • Memory of past mistakes. The reflection loop creates a corpus of "I-thought-X-but-Y" entries that BM25/Chroma surfaces on the next similar question. This is the cheap version of fine-tuning — the LLM still reasons, but it reasons over its own track record.
  • Argument trees are reviewable. You can scroll into a node and see which evidence it depended on. When the model is wrong, you can see where it was wrong rather than starting from "the output was wrong, why?".
  • Domain-flagged specialists. Each ArenaVote carries a domain (price, sentiment, onchain, macro, risk). The judge weights by domain relevance to the question — a macro question doesn't get drowned by a high-confidence price specialist.

Arena: crowd_simulation (Oracle, MiroFish-driven)

Goal: answer "how will the market react to this news / report?" rather than "what should I do?"

Loop:

  1. Ingest a document (a fomo2 report, an SEC filing, a Polymarket bet resolution, a tweet thread).
  2. Hand it to MiroFish (vendored Flask backend, sitting on Neo4j + Ollama Cloud). MiroFish builds an ontology from the document, then spawns a synthetic crowd: dozens of personas (each with prior beliefs, holdings, risk appetite) that "react" round by round to the document and to each other's reactions in the OASIS simulation.
  3. After N rounds, MiroFish returns the converged crowd state — aggregate sentiment, position changes, top narratives, dissent percentage.
  4. The synthesis_agent reads the crowd state and produces a CrowdPrediction: outcome label, probability, dissent %, top drivers, per-persona votes (a meta block enables drill-down).
  5. Published to maf:arena:crowd_simulation:output as a versioned envelope.

Where the edge comes from:

  • Behavioural priors beat fundamentals at short horizons. News doesn't move price; the crowd's belief about the news moves price. MiroFish simulates the belief-update step before it shows up in tape.
  • Dissent % is the killer feature. High probability + high dissent = unstable consensus = good contrarian entry. Probability alone misses this.
  • Persona priors are configurable. Run the same news through a "high short-interest retail crowd" and a "long-only institutional crowd" and compare the deltas — that's the asymmetry hedge funds pay for.

Arena: report_to_action (fast-path)

Goal: turn a fresh fomo2 report into an executable trading action in seconds, not minutes.

Why it exists: Mastermind is thorough but slow (graph queries, memory lookups, reflection). Trading intelligence is broad but expensive (5+ parallel specialists, large source fan-out). For "a new analyze_*.md just landed and I want a fast take", we want a leaner pipeline.

Loop:

  1. Parallel — three specialists: signal_analyst, sentiment_analyst, risk_analyst. Each binds the new report + just enough live streams (trtools2 indicators, news, strategy events, fomo2 enriched items).
  2. Synthesis — one synthesis pass, then publish a TradingAction with user-selected mode.

Edge: speed × decisiveness. A medium-confidence call published in 15s beats a high-confidence call published in 5 min when the move was in the first minute.


Component: MiroFish

One line: a multi-agent crowd-simulator with persistent knowledge graph, hosted as a separate Flask service we don't fork.

What it does:

  • Builds an entity-relation graph from each ingested document (/api/graph/*).
  • Runs the OASIS simulation: dozens of persona-agents react in rounds (/api/simulation/*).
  • Generates a structured post-simulation report (/api/report/*).
  • Stores everything in Neo4j (knowledge graph) and uses Ollama Cloud for the LLM persona inference (no local GPU required).

Why MAF leans on it: the persona-agent loop is exactly the "what-will-the-market-believe" calculation we don't want to fake with a single LLM "imagine the crowd thinks…" prompt. Real multi-agent deliberation produces qualitatively different signals — especially the dissent metric.


Component: smart Ollama model picker

Why: Ollama Cloud hosts models ranging from 20B to 1T params with very different latency / reasoning / JSON-discipline tradeoffs.

How it works: every LLM call carries a task profile (or falls back to quick/deep tier). The picker (src/maf/llm/model_picker.py) maps profiles to models:

Profile Model
quick / classification / signal gpt-oss:20b-cloud
narrative / debate gpt-oss:20b-cloud
synthesis / judge / json_strict gpt-oss:120b-cloud
long_context / coding qwen3-coder:480b-cloud
research deepseek-v3.1:671b-cloud
trillion kimi-k2:1t-cloud

Set a specific model in an arena config to bypass the picker (llm.providers.ollama.model: gpt-oss:120b-cloud). Override per-profile via env var: MAF_OLLAMA_MODEL__SYNTHESIS=....


Component: realtime layer

Three Redis Streams form the operational nerve of MAF:

Stream Role
maf:events Lifecycle events (arena.start, phase.complete, agent.signal, decision.emit, action.emit, source.error). Dashboard WebSocket pumps this to /ws/events.
maf:control:in Inbound commands (run_arena, configure_arena, set_data_source, reload_config, health). MAF acks on maf:control:out.
maf:actions:out Outbound TradingActions (verdict + mode). Downstream engines (trtools2) decide auto/semi/manual based on the mode field.

Plus each arena owns its envelope stream: maf:arena:<name>:output.

And MAF consumes trtools2 / fomo2 streams as data sources: trtools2:bars:1m, trtools2:bars:1h, trtools2:news, trtools2:indicators, trtools2:strategy:events, fomo2:enriched, fomo2:reports, fomo2:requests:out.


Component: smart data plumbing (Phase 2)

Four-layer data architecture, designed so cold targets cost zero and hot ones are sub-minute reactive:

Layer What it does Module
1. Watch list One Redis sorted-set keyed by opaque target id (symbol, question, document). TTL-decayed. The single source of truth for "what's interesting right now". maf.watch.list
2. Refreshers (scheduled / event-driven) Proactively fill expensive caches keyed off the watch list. Kronos: per-watched-symbol forecast every 60 s / 5 min. Mirofish: on a fresh high-impact fomo2:reports event for a watched symbol, runs a crowd-sim (10–30 min) once per report_id, budget-gated (10/day default). maf.scheduler.kronos_refresher, maf.scheduler.mirofish_refresher
3. Trigger dispatcher Declarative triggers: blocks per arena YAML. Tails the named streams, evaluates when: predicates via a tight safe-eval mini-language (payload.x, abs(), comparisons, and/or/not), applies per-(arena, target) cooldown + cost-gate demote, XADDs run_arena to maf:control:in. maf.triggers.dispatcher
4. Arena consumption Specialists read cached forecasts/sims via standard source adapters — no torch, no Neo4j, no HTTP to sidecars inside the MAF process. ReplanAgent detects stale_kronos_forecast / no_crowd_sim markers and forces re-runs. maf.sources.adapters.{kronos_forecast,mirofish_sim}

Wire diagram (all streams are Redis Streams, all caches are Redis keys):

  watch:zset ──────┐
                   │
  ┌────────────────┼────── kronos-refresher ──HTTP──► kronos-svc (sidecar)
  │                │                │
  │                │                ▼
  │                │      kronos:forecast:{sym}:{tf}  + kronos:forecasts:emitted
  │                │                                          │
  │                │                                          ├──► dispatcher (trigger rules)
  │                │                                          │            │
  │                │                                          │            ▼
  │  fomo2:reports ─────► mirofish-refresher ──HTTP──► mirofish-svc        maf:control:in
  │                                  │                                     │
  │                                  ▼                                     ▼
  │                        mirofish:sim:{report_id} + :sims:emitted   ControlInbox → arena run
  │                                  │                                     │
  │                                  └───────────────► dispatcher          ▼
  │                                                                  arena ─► specialists read
  │                                                                          kronos_forecast +
  │                                                                          mirofish_sim cache
  │                                                                                │
  │                                                                                ▼
  └─────────── action_outbox ◄────────────────────────────────────── TradingAction
                    │
                    ▼
              maf:actions:out (consumed by trtools2-side ActionConsumer)

Cost discipline: - Cold ticker: zero ongoing cost. Refreshers do nothing. - Watched ticker: ~60 Kronos calls/hour, free under most plans. - Hot event (new report): one Mirofish sim per (report_id, symbol) per day. - Cost-cap gate demotes autosemi when the per-hour LLM cost exceeds max_cost_per_hour_eur.

Triggers in YAML look like:

triggers:
  - name: "kronos prob_up shift"
    on_stream: kronos:forecasts:emitted
    when: "abs(payload.prob_up_delta) > 0.15 or payload.direction_flipped"
    target: { ticker: "{payload.symbol}" }
    cooldown_s: 60
    action_mode: semi

Add a symbol to the watch list (POST /api/watch {target_id: "NVDA", kind: "symbol"}) and everything else is automatic: refreshers start producing, dispatcher reacts, arenas run.

MAF beyond trading

The platform is not trading-specific. The same EventBus, control plane, ReplanAgent, DecisionMemory, watch list, trigger dispatcher, and arena loader run any deliberation where you can:

  1. Frame the question (a target dict — opaque keys).
  2. Decompose it (parallel specialist personas).
  3. Reconcile (a synthesis pass).

Two configuration changes flip an arena from trading to discussion:

target_key on ArenaConfig

The single field on each arena's YAML that decides which outbox MAF publishes to:

  • target_key: "ticker"maf:actions:out (TradingAction)
  • target_key: "question_id"maf:decisions:out (GenericDecision)
  • target_key: "pr_id"maf:decisions:out
  • … any string you want

Order routers (e.g. trtools2) keep listening to maf:actions:out and never see research-debate verdicts; dashboards consuming maf:decisions:out get only deliberation outcomes.

GenericDecision envelope

Where TradingAction has BUY/HOLD/SELL + mode + size, GenericDecision is intentionally free-form:

class GenericDecision(BaseModel):
    arena:        str       # "research_debate"
    target:       dict      # opaque — arena-specific
    target_key:   str       # "question_id"
    verdict:      str       # "approve" | "approve_with_conditions" | …
    confidence:   float
    reasoning:    str
    contributors: list      # per-specialist signals for audit

Arenas pick their own verdict vocabulary. The research_debate arena uses approve / approve_with_conditions / needs_revision / reject / needs_more_data; a code-review arena could use lgtm / nits / blocking.

The research_debate arena (concrete example)

name: research_debate
target_key: "question_id"

phases:
  - name: analysis
    pattern: parallel
    agents:
      - name: engineering    # tech feasibility + maintenance burden
        role: specialist
      - name: legal          # regulatory exposure, contractual surface
        role: specialist
      - name: business       # customer value, strategic fit
        role: specialist
  - name: synthesis
    pattern: sequential
    agents:
      - name: chair
        role: synthesis
  - name: replan_check       # same ReplanAgent the trading arenas use
  - name: emit

Sources are knowledge_base (fomo2 chromadb for prior decisions) and crowd_sim (MiroFish synthetic-crowd reaction — same adapter the trading arenas use, just pointed at a non-trading document).

Mirofish as a general discussion engine

Mirofish isn't "the trading-sentiment box". It's a multi-persona LLM simulator over a Neo4j knowledge graph. The trading arenas use it to estimate crowd reaction to news; the research_debate arena uses it to estimate stakeholder reaction to a proposal. Same code path, different prompt fed into MirofishCrowdSource.

How to dispatch a non-trading run

redis-cli XADD maf:control:in '*' data '{
  "command": "run_arena",
  "args": {
    "arena": "research_debate",
    "target": {
      "question_id": "rfc-2026-042",
      "title": "Migrate session storage to ScyllaDB?",
      "text": "<the proposal markdown>"
    },
    "emit_action": true
  }
}'

The ControlInbox routes it identically to a trading_intelligence run — the only difference is which outbox stream receives the result.

Component: control plane (inbound)

Anything you can do from the dashboard (POST /api/arenas/.../run) you can also do from a Redis-Streams client. That makes MAF programmable from bash/python without reaching for HTTP, and lets multiple workers share the load via a consumer group.

# trigger a run from the shell
redis-cli XADD maf:control:in '*' data '{
  "command": "run_arena",
  "correlation_id": "demo-1",
  "args": {"arena": "report_to_action",
            "target": {"ticker": "NVDA"},
            "action_mode": "semi"}
}'

# read the ack
redis-cli XREAD COUNT 1 STREAMS maf:control:out 0

Or use the Python helper:

from maf.control.client import ControlClient
ack = await ControlClient().run_arena("report_to_action", target={"ticker": "NVDA"})
print(ack["result"]["synthesis_verdict"], ack["result"]["synthesis_confidence"])

How to get started

# 1. Check everything is wired
python -m maf doctor

# 2. Start the dashboard (recommended for first run)
python -m maf --dashboard --port 8420

# 3. Or run a single arena from the CLI
python -m maf --arena trading_intelligence --ticker NVDA

# 4. Or run the long-running service (control plane + scheduled timers)
python -m maf

The dashboard at http://localhost:8420/ now has:

  • Live/live — real-time WebSocket feed of every event MAF emits.
  • Data/data — stream health + every source binding with sample.
  • Status bar — top of every page — Redis, Ollama, trtools2, fomo2, mirofish connectivity at a glance.