MAF

MAF concepts — what each arena does, and where the edge comes from

This page exists because the names "Mastermind", "Oracle", "MiroFish" don't self-explain. Below: what each thing IS, what it produces, and why it should move the prediction needle relative to plain market data + a single LLM call.

The single hypothesis MAF bets on

A single LLM, given even the best data, gets one shot to synthesize. That's brittle: it can latch on to a salient detail, miss a contradicting one, or hallucinate when the data is sparse.

MAF replaces one big call with a structured tournament:

Specialists look at the same target through their own lens (price, risk, momentum, macro, on-chain, sentiment, fundamentals). Each commits to a structured AgentSignal — direction, confidence, key factors.
A judge reads every signal plus the supporting evidence (citations from a knowledge graph, prior decisions, current strategy state) and writes one argument tree that gets reconciled into a verdict.
A trail records every step so you can audit why the system said BUY on Tuesday — not just that it did.

This shape is the same across arenas. The arenas differ in what they ingest and what signal they specialise for.

Arena: `trading_intelligence` (a.k.a. MarketMind)

Goal: per-ticker BUY / HOLD / SELL with confidence + reasoning.

Loop:

Five-to-six specialists fire in parallel, each with isolated state (no telephone effect): - price_analyst — recent bars, technicals, quotes - sentiment_analyst — news, social, fomo2 digests - onchain_analyst — crypto microstructure - macro_analyst — FRED rates, DXY, fear/greed - risk_analyst — vol, drawdown, insider activity - committee_analyst — translates the upstream TradingAgents committee (langgraph subprocess) into one more vote
synthesis_agent reads all signals + confidences, computes a weighted ensemble score, writes the final verdict + reasoning.
The post-run envelope (maf:arena:trading_intelligence:output) is rich: target, synthesis, every agent signal, source-call metrics, phase timings.
A TradingAction lands on maf:actions:out with mode=auto|semi|manual.

Where the edge comes from:

Disagreement is information. Specialists with high confidence but opposing direction (e.g. risk says BEARISH, price says BULLISH) trigger a lower synthesis confidence — the system stops itself from overcommitting when the picture is mixed.
Confidence-weighted ensemble outperforms majority-vote at the regime edges. A 0.9-confidence bull beats two 0.4-confidence bears, but a 0.9-confidence bear flips a 0.55-confidence bull majority.
Source-metric instrumentation surfaces silent data failures (a feed went stale, an indicator returned empty). Most "the LLM was confused" reports are really "the LLM didn't have the data."

Arena: `mastermind`

Goal: answer any question — not just BUY/SELL — with a grounded, auditable decision.

Loop:

Frame — sharpen the question.
Gather — pull from the knowledge graph (entities & relations built up by all prior arena runs), from DecisionMemory (past decisions tagged by domain and horizon), and any directly-attached markdown documents.
Specialists — four runtime-checkable Specialist protocols: FundamentalAnalyst, RiskSkeptic, MomentumWatcher, CrowdProxy. Each returns one ArenaVote: weighted, with rationale ≤500 chars and a flags list.
Judge — reads all four votes + the graph citations + memory citations and writes a typed Decision: recommendation, confidence, argument tree, cited decision ids, flags. One LLM call, retry-once on JSON parse failure.
Reflect (out-of-band) — when the horizon elapses, an outcome_harvester looks up what actually happened, and a ReflectionAgent writes the lesson "in hindsight" into the same Decision envelope. That lesson becomes searchable context for the next decision.

Where the edge comes from:

Memory of past mistakes. The reflection loop creates a corpus of "I-thought-X-but-Y" entries that BM25/Chroma surfaces on the next similar question. This is the cheap version of fine-tuning — the LLM still reasons, but it reasons over its own track record.
Argument trees are reviewable. You can scroll into a node and see which evidence it depended on. When the model is wrong, you can see where it was wrong rather than starting from "the output was wrong, why?".
Domain-flagged specialists. Each ArenaVote carries a domain (price, sentiment, onchain, macro, risk). The judge weights by domain relevance to the question — a macro question doesn't get drowned by a high-confidence price specialist.

Arena: `crowd_simulation` (Oracle, MiroFish-driven)

Goal: answer "how will the market react to this news / report?" rather than "what should I do?"

Loop:

Ingest a document (a fomo2 report, an SEC filing, a Polymarket bet resolution, a tweet thread).
Hand it to MiroFish (vendored Flask backend, sitting on Neo4j + Ollama Cloud). MiroFish builds an ontology from the document, then spawns a synthetic crowd: dozens of personas (each with prior beliefs, holdings, risk appetite) that "react" round by round to the document and to each other's reactions in the OASIS simulation.
After N rounds, MiroFish returns the converged crowd state — aggregate sentiment, position changes, top narratives, dissent percentage.
The synthesis_agent reads the crowd state and produces a CrowdPrediction: outcome label, probability, dissent %, top drivers, per-persona votes (a meta block enables drill-down).
Published to maf:arena:crowd_simulation:output as a versioned envelope.

Where the edge comes from:

Behavioural priors beat fundamentals at short horizons. News doesn't move price; the crowd's belief about the news moves price. MiroFish simulates the belief-update step before it shows up in tape.
Dissent % is the killer feature. High probability + high dissent = unstable consensus = good contrarian entry. Probability alone misses this.
Persona priors are configurable. Run the same news through a "high short-interest retail crowd" and a "long-only institutional crowd" and compare the deltas — that's the asymmetry hedge funds pay for.

Arena: `report_to_action` (fast-path)

Goal: turn a fresh fomo2 report into an executable trading action in seconds, not minutes.

Why it exists: Mastermind is thorough but slow (graph queries, memory lookups, reflection). Trading intelligence is broad but expensive (5+ parallel specialists, large source fan-out). For "a new analyze_*.md just landed and I want a fast take", we want a leaner pipeline.

Loop:

Parallel — three specialists: signal_analyst, sentiment_analyst, risk_analyst. Each binds the new report + just enough live streams (trtools2 indicators, news, strategy events, fomo2 enriched items).
Synthesis — one synthesis pass, then publish a TradingAction with user-selected mode.

Edge: speed × decisiveness. A medium-confidence call published in 15s beats a high-confidence call published in 5 min when the move was in the first minute.

Component: MiroFish

One line: a multi-agent crowd-simulator with persistent knowledge graph, hosted as a separate Flask service we don't fork.

What it does:

Builds an entity-relation graph from each ingested document (/api/graph/*).
Runs the OASIS simulation: dozens of persona-agents react in rounds (/api/simulation/*).
Generates a structured post-simulation report (/api/report/*).
Stores everything in Neo4j (knowledge graph) and uses Ollama Cloud for the LLM persona inference (no local GPU required).

Why MAF leans on it: the persona-agent loop is exactly the "what-will-the-market-believe" calculation we don't want to fake with a single LLM "imagine the crowd thinks…" prompt. Real multi-agent deliberation produces qualitatively different signals — especially the dissent metric.

Component: smart Ollama model picker

Why: Ollama Cloud hosts models ranging from 20B to 1T params with very different latency / reasoning / JSON-discipline tradeoffs.

How it works: every LLM call carries a task profile (or falls back to quick/deep tier). The picker (src/maf/llm/model_picker.py) maps profiles to models:

Profile	Model
`quick` / `classification` / `signal`	gpt-oss:20b-cloud
`narrative` / `debate`	gpt-oss:20b-cloud
`synthesis` / `judge` / `json_strict`	gpt-oss:120b-cloud
`long_context` / `coding`	qwen3-coder:480b-cloud
`research`	deepseek-v3.1:671b-cloud
`trillion`	kimi-k2:1t-cloud

Set a specific model in an arena config to bypass the picker (llm.providers.ollama.model: gpt-oss:120b-cloud). Override per-profile via env var: MAF_OLLAMA_MODEL__SYNTHESIS=....

Component: realtime layer

Three Redis Streams form the operational nerve of MAF:

Stream	Role
`maf:events`	Lifecycle events (arena.start, phase.complete, agent.signal, decision.emit, action.emit, source.error). Dashboard WebSocket pumps this to `/ws/events`.
`maf:control:in`	Inbound commands (run_arena, configure_arena, set_data_source, reload_config, health). MAF acks on `maf:control:out`.
`maf:actions:out`	Outbound `TradingAction`s (verdict + mode). Downstream engines (trtools2) decide auto/semi/manual based on the mode field.

Plus each arena owns its envelope stream: maf:arena:<name>:output.

And MAF consumes trtools2 / fomo2 streams as data sources: trtools2:bars:1m, trtools2:bars:1h, trtools2:news, trtools2:indicators, trtools2:strategy:events, fomo2:enriched, fomo2:reports, fomo2:requests:out.

Component: smart data plumbing (Phase 2)

Four-layer data architecture, designed so cold targets cost zero and hot ones are sub-minute reactive:

Layer	What it does	Module
1. Watch list	One Redis sorted-set keyed by opaque target id (`symbol`, `question`, `document`). TTL-decayed. The single source of truth for "what's interesting right now".	`maf.watch.list`
2. Refreshers (scheduled / event-driven)	Proactively fill expensive caches keyed off the watch list. Kronos: per-watched-symbol forecast every 60 s / 5 min. Mirofish: on a fresh high-impact `fomo2:reports` event for a watched symbol, runs a crowd-sim (10–30 min) once per report_id, budget-gated (10/day default).	`maf.scheduler.kronos_refresher`, `maf.scheduler.mirofish_refresher`
3. Trigger dispatcher	Declarative `triggers:` blocks per arena YAML. Tails the named streams, evaluates `when:` predicates via a tight safe-eval mini-language (`payload.x`, `abs()`, comparisons, `and/or/not`), applies per-(arena, target) cooldown + cost-gate demote, XADDs `run_arena` to `maf:control:in`.	`maf.triggers.dispatcher`
4. Arena consumption	Specialists read cached forecasts/sims via standard source adapters — no torch, no Neo4j, no HTTP to sidecars inside the MAF process. `ReplanAgent` detects `stale_kronos_forecast` / `no_crowd_sim` markers and forces re-runs.	`maf.sources.adapters.{kronos_forecast,mirofish_sim}`

Wire diagram (all streams are Redis Streams, all caches are Redis keys):

  watch:zset ──────┐
                   │
  ┌────────────────┼────── kronos-refresher ──HTTP──► kronos-svc (sidecar)
  │                │                │
  │                │                ▼
  │                │      kronos:forecast:{sym}:{tf}  + kronos:forecasts:emitted
  │                │                                          │
  │                │                                          ├──► dispatcher (trigger rules)
  │                │                                          │            │
  │                │                                          │            ▼
  │  fomo2:reports ─────► mirofish-refresher ──HTTP──► mirofish-svc        maf:control:in
  │                                  │                                     │
  │                                  ▼                                     ▼
  │                        mirofish:sim:{report_id} + :sims:emitted   ControlInbox → arena run
  │                                  │                                     │
  │                                  └───────────────► dispatcher          ▼
  │                                                                  arena ─► specialists read
  │                                                                          kronos_forecast +
  │                                                                          mirofish_sim cache
  │                                                                                │
  │                                                                                ▼
  └─────────── action_outbox ◄────────────────────────────────────── TradingAction
                    │
                    ▼
              maf:actions:out (consumed by trtools2-side ActionConsumer)

Cost discipline: - Cold ticker: zero ongoing cost. Refreshers do nothing. - Watched ticker: ~60 Kronos calls/hour, free under most plans. - Hot event (new report): one Mirofish sim per (report_id, symbol) per day. - Cost-cap gate demotes auto → semi when the per-hour LLM cost exceeds max_cost_per_hour_eur.

Triggers in YAML look like:

triggers:
  - name: "kronos prob_up shift"
    on_stream: kronos:forecasts:emitted
    when: "abs(payload.prob_up_delta) > 0.15 or payload.direction_flipped"
    target: { ticker: "{payload.symbol}" }
    cooldown_s: 60
    action_mode: semi

Add a symbol to the watch list (POST /api/watch {target_id: "NVDA", kind: "symbol"}) and everything else is automatic: refreshers start producing, dispatcher reacts, arenas run.

MAF beyond trading

The platform is not trading-specific. The same EventBus, control plane, ReplanAgent, DecisionMemory, watch list, trigger dispatcher, and arena loader run any deliberation where you can:

Frame the question (a target dict — opaque keys).
Decompose it (parallel specialist personas).
Reconcile (a synthesis pass).

Two configuration changes flip an arena from trading to discussion:

`target_key` on `ArenaConfig`

The single field on each arena's YAML that decides which outbox MAF publishes to:

target_key: "ticker" → maf:actions:out (TradingAction)
target_key: "question_id" → maf:decisions:out (GenericDecision)
target_key: "pr_id" → maf:decisions:out
… any string you want

Order routers (e.g. trtools2) keep listening to maf:actions:out and never see research-debate verdicts; dashboards consuming maf:decisions:out get only deliberation outcomes.

`GenericDecision` envelope

Where TradingAction has BUY/HOLD/SELL + mode + size, GenericDecision is intentionally free-form:

class GenericDecision(BaseModel):
    arena:        str       # "research_debate"
    target:       dict      # opaque — arena-specific
    target_key:   str       # "question_id"
    verdict:      str       # "approve" | "approve_with_conditions" | …
    confidence:   float
    reasoning:    str
    contributors: list      # per-specialist signals for audit

Arenas pick their own verdict vocabulary. The research_debate arena uses approve / approve_with_conditions / needs_revision / reject / needs_more_data; a code-review arena could use lgtm / nits / blocking.

The `research_debate` arena (concrete example)

name: research_debate
target_key: "question_id"

phases:
  - name: analysis
    pattern: parallel
    agents:
      - name: engineering    # tech feasibility + maintenance burden
        role: specialist
      - name: legal          # regulatory exposure, contractual surface
        role: specialist
      - name: business       # customer value, strategic fit
        role: specialist
  - name: synthesis
    pattern: sequential
    agents:
      - name: chair
        role: synthesis
  - name: replan_check       # same ReplanAgent the trading arenas use
  - name: emit

Sources are knowledge_base (fomo2 chromadb for prior decisions) and crowd_sim (MiroFish synthetic-crowd reaction — same adapter the trading arenas use, just pointed at a non-trading document).

Mirofish as a general discussion engine

Mirofish isn't "the trading-sentiment box". It's a multi-persona LLM simulator over a Neo4j knowledge graph. The trading arenas use it to estimate crowd reaction to news; the research_debate arena uses it to estimate stakeholder reaction to a proposal. Same code path, different prompt fed into MirofishCrowdSource.

How to dispatch a non-trading run

redis-cli XADD maf:control:in '*' data '{
  "command": "run_arena",
  "args": {
    "arena": "research_debate",
    "target": {
      "question_id": "rfc-2026-042",
      "title": "Migrate session storage to ScyllaDB?",
      "text": "<the proposal markdown>"
    },
    "emit_action": true
  }
}'

The ControlInbox routes it identically to a trading_intelligence run — the only difference is which outbox stream receives the result.

Component: control plane (inbound)

Anything you can do from the dashboard (POST /api/arenas/.../run) you can also do from a Redis-Streams client. That makes MAF programmable from bash/python without reaching for HTTP, and lets multiple workers share the load via a consumer group.

# trigger a run from the shell
redis-cli XADD maf:control:in '*' data '{
  "command": "run_arena",
  "correlation_id": "demo-1",
  "args": {"arena": "report_to_action",
            "target": {"ticker": "NVDA"},
            "action_mode": "semi"}
}'

# read the ack
redis-cli XREAD COUNT 1 STREAMS maf:control:out 0

Or use the Python helper:

from maf.control.client import ControlClient
ack = await ControlClient().run_arena("report_to_action", target={"ticker": "NVDA"})
print(ack["result"]["synthesis_verdict"], ack["result"]["synthesis_confidence"])

How to get started

# 1. Check everything is wired
python -m maf doctor

# 2. Start the dashboard (recommended for first run)
python -m maf --dashboard --port 8420

# 3. Or run a single arena from the CLI
python -m maf --arena trading_intelligence --ticker NVDA

# 4. Or run the long-running service (control plane + scheduled timers)
python -m maf

The dashboard at http://localhost:8420/ now has:

Live — /live — real-time WebSocket feed of every event MAF emits.
Data — /data — stream health + every source binding with sample.
Status bar — top of every page — Redis, Ollama, trtools2, fomo2, mirofish connectivity at a glance.

MAF concepts — what each arena does, and where the edge comes from

The single hypothesis MAF bets on

Arena: trading_intelligence (a.k.a. MarketMind)

Arena: mastermind

Arena: crowd_simulation (Oracle, MiroFish-driven)

Arena: report_to_action (fast-path)

Component: MiroFish

Component: smart Ollama model picker

Component: realtime layer

Component: smart data plumbing (Phase 2)

MAF beyond trading

target_key on ArenaConfig

GenericDecision envelope

The research_debate arena (concrete example)

Mirofish as a general discussion engine

How to dispatch a non-trading run

Component: control plane (inbound)

How to get started

Arena: `trading_intelligence` (a.k.a. MarketMind)

Arena: `mastermind`

Arena: `crowd_simulation` (Oracle, MiroFish-driven)

Arena: `report_to_action` (fast-path)

`target_key` on `ArenaConfig`

`GenericDecision` envelope

The `research_debate` arena (concrete example)