NautilusTrader — Architecture Insights
The “what I wish someone had told me before I touched this code” document. Distilled from chapters 1–15. Opinionated. Useful for daily work.
Read Project History first if you want the narrative. This file is the lessons.
The single most important fact about this codebase
Section titled “The single most important fact about this codebase”It is migrating. The Cython that runs your strategies today is
scheduled for replacement by Rust. Most subsystems already are Rust
under the hood (OrderBook, network, MessageBus, DataEngine, RiskEngine,
ExecutionEngine, Portfolio, BacktestEngine, Catalog). The Python you
import is an increasingly thin shim. The maintainer’s stated direction
(see ROADMAP.md) is “stable API at v2.0, after the
Rust port.” Nothing in the codebase is “the final form” until v2.0
ships. Plan accordingly when you write against the API.
Practical implication: when you have a choice between a Cython type and
a _pyo3 variant of the same thing, prefer the PyO3 one. It is the
future. (Example: interactive_brokers_pyo3/ vs
interactive_brokers/.)
Reading the codebase
Section titled “Reading the codebase”Versions are build counters, not stability signals
Section titled “Versions are build counters, not stability signals”v1.225 in 2026 is the 225th release after the public launch in 2020,
not “almost v2.0.” The Beta label in the README is sincere. Breaking
changes ship in normal releases. Pin to a version. Read the
release notes when you upgrade.
The release notes (RELEASES.md) are the spine
Section titled “The release notes (RELEASES.md) are the spine”6,000+ lines. Read the section for the version you’re on. Each release has Enhancements / Breaking Changes / Internal Improvements / Fixes / Documentation / Deprecations sections. The Breaking Changes section is the one that bites.
The architecture concepts doc is the steady-state, not the history
Section titled “The architecture concepts doc is the steady-state, not the history”docs/concepts/architecture.md
explains the system as it is. It does not explain why it is that way.
For the why, read the chapters in chapters/.
Core abstractions and how they work
Section titled “Core abstractions and how they work”MessageBus is the spine
Section titled “MessageBus is the spine”Every event, command, and (post-v1.197) data point flows through the
MessageBus. Components publish to topics and subscribe to topics.
Topic strings follow conventions like data.quotes.{venue}.{symbol}
but the conventions are enforced by code review, not the type
system.
When debugging “why didn’t my strategy receive X?”, start by checking the topic shape. After v1.218 (chapter 13) topic-matching is 100× faster — wildcard subscribes are fine.
Cache is the source of truth
Section titled “Cache is the source of truth”After v1.122 (chapter 5) there is one Cache. Components
read from it; they don’t pass state to each other. Position, order,
instrument, account, quotes, trades, bars, mark prices, exchange
rates — all live in the cache.
If two components hold the same fact in two places, you have a bug. Always go through the cache.
Component FSM is real
Section titled “Component FSM is real”PRE_INITIALIZED → READY → RUNNING → STOPPING → STOPPED → DISPOSED
with DEGRADED and FAULTED side-states. State transitions are
checked in Rust now (chapter 10). Don’t construct components and
manually flip them — call register(), start(), etc.
Actor vs Component traits are different
Section titled “Actor vs Component traits are different”(Introduced explicitly in v1.212, chapter 12.)
Actortrait → registry-based message dispatch by ID.Componenttrait → lifecycle (start/stop/reset/dispose).- A
ThrottlerisActor-only. - A
DataEngineisComponent-only. - A
Strategyis both.
If you’re tempted to give your new type both, ask whether it really
needs targeted message dispatch. If not, it’s just a Component.
ts_event vs ts_init
Section titled “ts_event vs ts_init”Every Data subclass carries both:
ts_event— when the venue says it happened.ts_init— when Nautilus constructed the object.
Reconciliation, replay, and order routing all care about the distinction. They are not interchangeable.
Atomic global clock
Section titled “Atomic global clock”Since v1.183 (chapter 8), the engine uses one global atomic clock for
all ts_init timestamps. Monotonicity is guaranteed by construction.
If you create a Clock instance directly in user code, you bypass
this — don’t.
Nanoseconds, not milliseconds
Section titled “Nanoseconds, not milliseconds”Every internal timestamp is i64 nanoseconds (or u64 UnixNanos in
Rust). Conversion to pd.Timestamp happens only at user-facing
boundaries.
Numerical precision
Section titled “Numerical precision”Two precision modes
Section titled “Two precision modes”- High-precision (i128) — default on Linux / macOS Python wheels. 16 decimals. Larger value range.
- Standard-precision (i64) — default on Windows wheels (MSVC has
no
__int128) and Rust crates. 9 decimals.
If you write a numerical-correctness test, parametrise it on both.
Use Price, Quantity, Money types — never raw floats — for any
value that affects PnL. (See chapter 11.)
Decimal, not float, on internals
Section titled “Decimal, not float, on internals”Many engine paths (account / cost / exchange-rate calculations) use
decimal.Decimal internally even though the values are stored as
fixed-point integers. The reason is rounding control. Don’t push
floats through these paths.
Adapters
Section titled “Adapters”Adapter shape (post-chapter 8)
Section titled “Adapter shape (post-chapter 8)”Every adapter has at minimum:
*HttpClient(Rust,hyperbased).*WebSocketClient(Rust,tokio-tungstenite).*DataClientand*ExecutionClient(PyO3-bound).*InstrumentProvider.*Configclasses (bon::Buildersince v1.225).
If you write a new adapter, closely follow an existing Rust adapter
(e.g. bybit/ or okx/). The “split-client” pattern, “AuthTracker”,
and “data event emission” guidelines are documented in
docs/developer_guide/.
Reconnection is infinite with exponential backoff
Section titled “Reconnection is infinite with exponential backoff”Don’t try to limit retries. v1.211–v1.218 removed every
max_ws_reconnection_tries config. To stop a client, call
disconnect(). (Chapter 13.)
RetryManagerPool for HTTP retries
Section titled “RetryManagerPool for HTTP retries”Don’t hand-roll exponential backoff in your adapter’s HTTP path. Use
the shared RetryManagerPool. Adds jitter, error-categorisation,
and consistent logging across adapters. (Chapter 9.)
F_LAST and F_SNAPSHOT flags on order-book deltas
Section titled “F_LAST and F_SNAPSHOT flags on order-book deltas”If your adapter sends batched deltas, set F_LAST only on the last
delta in the batch; on the rest, leave it clear. The data engine
buffers until F_LAST. (Chapter 9.) Polymarket / Bybit / dYdX bugs in
old releases were almost all flag-related.
Adapters die
Section titled “Adapters die”Don’t read inclusion as a permanent commitment. FTX, Coinbase Intl, dYdX v3, and CCXT have all been removed. Build your own internal forks if you need something the project drops.
Reconciliation (live trading)
Section titled “Reconciliation (live trading)”Inferred orders / fills
Section titled “Inferred orders / fills”When the venue reports state the engine doesn’t have (because of a
reconnection gap), the engine creates synthetic orders / fills
with venue IDs and Inferred=True to align state. This was
hardened from v1.197 through v1.220 (chapters 9–14). Don’t filter
inferred events out of your audit log — they explain why your
state matches the venue.
OrderStatusReport is what the venue thinks; Order is what
Section titled “OrderStatusReport is what the venue thinks; Order is what”you think
Reconciliation reconciles the two. Both can be wrong (your view is stale, the venue’s report can be inconsistent across endpoints). The engine’s job is to converge them.
Hard-crash on unexpected exception, not graceful
Section titled “Hard-crash on unexpected exception, not graceful”The default graceful_shutdown_on_exception is False. Live engines
crash on unexpected errors so the supervisor can restart them. Don’t
flip this on without thinking — graceful handling of an unhandled
exception is its own bug class. (Chapter 13.)
Persistence
Section titled “Persistence”Catalog (Parquet) and Cache (Redis)
Section titled “Catalog (Parquet) and Cache (Redis)”- The catalog is on-disk: backtest data, reproducible streams,
archives.
ParquetDataCatalogis the type to use (v1.225 dropped the V2 suffix). - The cache is in-memory + optional Redis backing. Live state. After v1.182 (chapter 8), Redis access goes through Rust on a dedicated tokio task — calls return before the Redis write succeeds. Don’t assume durability without flush.
Catalog v2 schemas (chapter 7) and high-precision schema break (chapter 11)
Section titled “Catalog v2 schemas (chapter 7) and high-precision schema break (chapter 11)”If you have old catalog data, you may need to migrate. The migration guides are in the docs. Don’t try to read v1 / pre-high-precision data with current readers — you’ll see corruption rather than a clean error.
Streaming Feather Writer is the live → catalog path
Section titled “Streaming Feather Writer is the live → catalog path”Strategies / actors that want to record live data should use
StreamingFeatherWriter (or its v2-renamed-to-V0 form, chapter 15).
Don’t roll your own.
Configuration
Section titled “Configuration”bon::Builder is the canonical pattern (chapter 15)
Section titled “bon::Builder is the canonical pattern (chapter 15)”let config = MyConfig::builder() .important_field(value) .build();Hand-rolled ::new constructors are removed. Default::default()
delegates to builder().build(). Fields with sensible defaults are
plain T with #[builder(default)], not Option<T>.
msgspec for Python configs
Section titled “msgspec for Python configs”Python config classes are msgspec.Struct. Don’t use dataclass or
pydantic (the latter was removed in v1.161, chapter 6).
Config classes serialize over the wire
Section titled “Config classes serialize over the wire”Configs are sent over the message bus (e.g. for live engine distribution). They must be msgpack/msgspec-serializable. If you add a field that isn’t, encoding fails at runtime.
Testing
Section titled “Testing”Property-based tests are expected (chapter 13)
Section titled “Property-based tests are expected (chapter 13)”Value types, OrderBook, Throttler, UnixNanos, network primitives
all have property tests. If you add a new invariant, add a proptest!
block.
Chaos / DST testing exists
Section titled “Chaos / DST testing exists”Chapter 13 added turmoil for socket-client chaos tests. Chapter 15
adds deterministic simulation testing behind a simulation feature.
For high-stakes Rust changes (engine-level logic), expect to run DST
or write a chaos-test scenario.
Testing is split
Section titled “Testing is split”- Rust unit tests:
cargo nextest runfrom the workspace root. - Python tests:
uv run pytest(orpytest -n autofor parallel). - Integration tests under
tests/use real (sandboxed) venue clients for some adapters. They require credentials in env vars.
Logging
Section titled “Logging”One log sink, configured from LoggingConfig
Section titled “One log sink, configured from LoggingConfig”After v1.183 (chapter 8) the Rust log crate is the core logger.
Python’s LoggerAdapter writes through it. Don’t use Python stdlib
logging for hot-path events.
Log file rotation (chapter 13)
Section titled “Log file rotation (chapter 13)”max_file_size + max_backup_count config options on the logger.
tracing-subscriber integration (chapter 14)
Section titled “tracing-subscriber integration (chapter 14)”use_tracing=True in LoggingConfig enables tracing for external
Rust libraries. Filter via RUST_LOG env var — not via in-config
log-level options.
Cryptography & secrets
Section titled “Cryptography & secrets”aws-lc-rs, no native OpenSSL
Section titled “aws-lc-rs, no native OpenSSL”Pure-Rust crypto + aws-lc-rs (FIPS-validated). The build does not
need system OpenSSL. (Chapter 13.)
Credentials are zeroized on drop
Section titled “Credentials are zeroized on drop”Every adapter’s Credential is Box<str> (heap-allocated) with
zeroize-on-drop. Debug impls redact secrets. (Chapter 14.) Don’t
store credentials in Ustr (interned, lives forever) or
ordinary String (no automatic zeroization).
Performance gotchas
Section titled “Performance gotchas”MessageBus topic interning
Section titled “MessageBus topic interning”Topics are Ustr-interned. Don’t construct topics in tight loops with
string formatting — pre-build them or rely on the engine’s caching.
MPSC backpressure
Section titled “MPSC backpressure”The Rust runtime uses MPSC channels between Python adapters and
engine internals. Slow consumers see backpressure as try_send
failures. They surface in logs but not as Python exceptions —
monitor your logs.
Broadcast channels lose data on slow subscribers
Section titled “Broadcast channels lose data on slow subscribers”Tick-rate paths use tokio::sync::broadcast. Slow subscribers miss
messages by design. The engine tells you — don’t ignore the
“falling behind” warnings.
Common bugs in adapter PRs
Section titled “Common bugs in adapter PRs”These show up across release notes’ Fixes sections:
- Missing
F_LASTflag on terminal delta of a batch — book batching breaks; subscribers see partial book. - WebSocket subscribe ACKs that confirm all pending topics instead of the acknowledged one.
- Empty-string deserializer panics (Bybit position side, dYdX
account state, Binance account state) — the venue sends
"", the parser expects an enum. - Random UUIDs as
TradeIdfallbacks — must be deterministic hashes of the trade fields, otherwise reconciliation diverges across restarts. block_oninside an async context — adapterquery_accountpanics; usespawn_task.- Refcount leaks on subscription failure — when subscribe fails, the subscription state must roll back.
- OrderBook L1 stale event mutation — corrupts bid/ask; property tests catch this.
due_post_onlynot set on post-only rejection — strategies can’t distinguish post-only from real rejection.
When you’re stuck
Section titled “When you’re stuck”- Read the relevant chapter in
chapters/. - Check
RELEASES.mdfor the version that introduced the API you’re using. - Search the
docs/concepts/directory — there’s usually a guide. - Search GitHub issues / PRs for the symptom — there’s usually a prior fix.
- Property tests and chaos tests exist; consider whether you can reproduce in one of those harnesses.
- The maintainer culture is “fail fast, post a stack trace, don’t
guess.” If your PR adds a
unwrap()or silently swallows an error, expect review pushback.