Back to articles

Building a Quantitative Trading Bot: Architecture Deep Dive

15 March 2026 15 min
MLTradingSystem Design

Building a Quantitative Trading Bot: Architecture Deep Dive

This article walks through the architecture of ACMF (Automated Crypto Market Framework) — a production-grade trading bot for BTC/USDT perpetual futures on Binance. It uses a stacked ML ensemble, a 6-layer pipeline executing every 15 minutes, and a full Docker-based infrastructure with monitoring and safety systems.


Motivation

Most retail trading bots rely on simple technical indicators — moving average crossovers, RSI thresholds, or basic momentum signals. They work in some regimes and fail catastrophically in others. ACMF was built to address three specific problems:

  1. Regime blindness — A single strategy can't handle trending, ranging, and high-volatility markets equally. The system needs to detect the current regime and adapt its behavior.
  2. Feature limitations — Price action alone isn't sufficient. Incorporating order book depth, funding rates, macro indicators, and sentiment data provides a more complete market picture.
  3. Operational fragility — A trading system needs more than just a prediction model. It needs robust execution, crash recovery, circuit breakers, and monitoring to run reliably 24/7.

The 6-Layer Pipeline

The core architecture is a sequential pipeline that runs every 15 minutes at candle close:

Layer 1: Data Collection

DataSync fetches 8 data sources in parallel using asyncio.gather:

  • Binance OHLCV — 15-minute, 1-hour, 4-hour, and daily candles (mandatory)
  • Order book depth — Top 20 levels, used for bid-ask imbalance and spread calculations
  • Sentiment — Crypto Fear & Greed Index, perpetual funding rates, long/short ratios
  • Macro — Gold (XAU), VIX, S&P 500, Dollar Index (DXY) via yfinance with 1-hour Redis cache
  • On-chain — Whale transaction alerts, exchange inflow/outflow data
  • LLM sentiment — Claude Haiku analyzes crypto news headlines every 4 hours, cached in Redis

Each source has independent error handling. If a non-critical source (macro, sentiment, on-chain) fails, the pipeline continues with available data. Only the OHLCV data is mandatory.

Layer 2: Feature Engineering

This is where raw data becomes model-ready features. The pipeline produces two distinct output shapes:

Sequence features (Mamba input): A 192×22 matrix — the last 192 fifteen-minute bars across 22 channels:

rsi_7, rsi_14, ema_9, ema_21, atr_7, atr_14, bb_upper_10, bb_lower_10, bb_mid_10, macd_line, macd_hist, macd_signal, vwap, volume_sma_ratio, close_vs_vwap, adx_14, obv, obv_slope, realized_vol_15m, garman_klass_vol, atr_ratio, vol_regime

Tabular features (XGBoost input): ~66 features combining:

  • 15-minute technical indicators (16 features)
  • Volatility metrics (6 features)
  • Stacked features — log returns, lagged values, rolling statistics (20 features)
  • Multi-timeframe: 1H (5), 4H (6)
  • Order book metrics (3), sentiment (3), macro (3)

Critical normalization: Price-based indicators (EMA, Bollinger Bands, VWAP) are stored as price-relative percentages: (indicator - close) / close × 100. This prevents data leakage and ensures stationarity. OBV uses first-differencing; RSI, ADX, and MACD are fed as-is since they're already bounded.

Layer 3: Prediction Engine

The prediction layer uses a stacked ensemble — not a simple weighted average.

Step 1: Mamba SSM hidden state extraction. A custom, pure-PyTorch implementation of the Mamba State Space Model processes the 192×22 sequence and outputs a 128-dimensional hidden state vector. This captures temporal dependencies and sequence patterns that tabular features miss.

Step 2: Feature concatenation. The 128-dim Mamba hidden state is concatenated with 40 selected tabular features, producing a 168-dimensional input vector.

Step 3: Stacked XGBoost classification. The XGBoost classifier was trained on these concatenated features and outputs [P(LONG), P(SHORT), P(NEUTRAL)].

Step 4: Spread-based directional routing. The model has a ~55% NEUTRAL bias (which is correct — most 15-minute intervals are indeed neutral). Instead of using argmax (which would almost always predict NEUTRAL), the system computes:

  • ratio = P(dominant) / P(opposite) between LONG and SHORT probabilities
  • If ratio ≥ 1.5 and |P(LONG) - P(SHORT)| ≥ 0.05 → directional signal
  • Confidence is mapped as: 1 - 1/(ratio + 1), giving ratio 1.5→0.40, 2.0→0.50, 3.0→0.625
  • Otherwise → NEUTRAL (no trade)

This approach extracts directional signals even when NEUTRAL dominates, while still requiring meaningful separation between LONG and SHORT probabilities.

Regime classification runs in parallel — a rule-based scoring system using ADX, price vs EMA21, ATR ratio, volatility regime, funding rate, and order book imbalance. The detected regime influences entry thresholds and position sizing downstream.

Layer 4: Trade Logic

A 12-condition entry filter gates every trade:

  • Regime-specific confidence thresholds: trending ≥ 0.45, ranging ≥ 0.55, high-volatility ≥ 0.60
  • Cooldown period: 2–6 bars between trades (regime-dependent)
  • Maximum concurrent positions per regime (1–3)
  • Volume confirmation filter
  • Daily drawdown check (≤ -5% blocks new entries)
  • Consecutive loss check (< 3 losses in a row)
  • Funding rate sanity checks

Position sizing is ATR-based: risk = equity × 1.5%, size = risk / (ATR × stop_multiplier). The size is further adjusted by confidence (0.8×–1.2× scalar) and drawdown state (0.5× if current drawdown exceeds 2%).

Exit system follows a priority chain: stop-loss → take-profit (50% close + trailing remainder) → opposite signal → time-based exit (8–16 bars depending on regime).

Layer 5: Execution

In paper mode, the system fetches the real market price from Binance, applies 0.02% simulated slippage, and generates a mock order ID. No exchange orders are placed.

In live mode, market orders are sent via ccxt to Binance Futures API. Separate server-side stop-loss (stop_market) and take-profit (take_profit_market) orders are placed with reduceOnly=True.

A CrashRecovery module reconciles database state against actual exchange positions at startup, handling scenarios where the bot crashed between order placement and database update.

Layer 6: Monitoring

  • Telegram — Trade open/close notifications, daily 00:01 UTC summary, drawdown warnings, margin alerts
  • Prometheus — Metrics exposed on port 9100, scraped every 15 seconds
  • Model decay detection — Alerts if rolling 50-prediction accuracy drops below 45% or ECE exceeds 0.15
  • Feedback loop — Accumulates closed trade outcomes; triggers a retrain signal when 200+ new samples are collected

Why a Custom Mamba Implementation?

The official mamba-ssm library requires specific CUDA kernels and has strict hardware dependencies. Since the bot needs to run on various environments (development laptop, VPS without GPU for inference, GPU server for training), a pure PyTorch implementation was chosen.

The custom Mamba processes sequences through selective state space dynamics: at each timestep, the model decides what information to remember and what to forget from its hidden state, making it naturally suited for financial time series where the relevance of past data changes with market conditions.


Safety Mechanisms

Running a trading system — even in paper mode — requires multiple safety layers:

  1. Live Mode Gate — Requires a confirmation file with a timestamp less than 24 hours old. Without it, TRADING_MODE=live silently falls back to paper mode.
  2. Kill Switch — Dual mechanism (file + Redis key). When activated: blocks all new entries, emergency-closes all positions, sends Telegram alert.
  3. Circuit Breaker — Auto-triggers the kill switch on: daily drawdown > 5%, consecutive losses ≥ 3, or model confidence below 0.35 for 5 consecutive cycles.
  4. Pre-flight Checks — Validates exchange connectivity, balance, model files, database access before each live trading session.
  5. Observe-Only Mode — First 2 cycles after live start suppress new entries while allowing exits.

Infrastructure

The entire system runs as 7 Docker Compose services:

| Service | Role | |---------|------| | PostgreSQL | Primary database — trades, predictions, system events, model registry, OHLCV | | Redis | Cache layer for macro data (1H TTL) and LLM sentiment (4H TTL) | | Bot | Trading pipeline — the core 6-layer loop (2GB RAM, 2 CPU limit) | | Dashboard API | FastAPI read-only REST API with 9 endpoints | | Nginx | Reverse proxy, rate limiting, static dashboard serving | | Prometheus | Metrics collection and alerting | | DB Backup | Automated daily pg_dump with 7-day retention |

The scheduler uses APScheduler 3.x with an independent asyncio watchdog. APScheduler can silently stop firing under certain conditions — the watchdog checks every 60 seconds and runs the cycle directly if more than 18 minutes have passed since the last execution.


Lessons Learned

  • Neutral bias is correct. Most 15-minute intervals genuinely have no clear directional signal. Fighting this with argmax leads to overtrading. The spread-based routing approach was a key breakthrough.
  • Feature order is load-bearing. Both the 22 sequence columns and the 168 stacked features must match training order exactly. A reorder silently breaks inference without any errors.
  • Safety layers pay for themselves. The circuit breaker has prevented several situations where a model producing consistently poor predictions would have accumulated significant losses.
  • Graceful degradation beats hard failures. The system losing access to macro data or sentiment is far better than the system stopping entirely. Each data source is independently failable.

This system is currently running in paper trading mode on a VPS, processing real market data every 15 minutes. You can see the live performance data on the ACMF project page.