Client failover concepts
Client failover is most useful with QuestDB Enterprise primary-replica replication. OSS users with a single instance gain limited benefit from multi-host configuration.
Client-side failover support is currently available in the Java client. Additional language clients are on the roadmap.
When a QuestDB cluster fails over from one primary to another — whether through a planned promotion, a rolling upgrade, or an unplanned outage — clients with a single hard-coded address must be reconfigured and restarted. A failover-aware client instead carries the full list of peers and walks that list automatically when the current connection breaks.
This page explains the model. The user-facing knobs and worked examples live in the Configuration page.
What failover does
You give the client a comma-separated list of endpoints:
addr=node-a:9000,node-b:9000,node-c:9000
The client picks one, connects, and uses it until that connection breaks. When it breaks, the client walks the rest of the list, classifies what it found at each host, and either reconnects or surfaces a failure to your code. The exact loop that drives this depends on whether you are ingesting (long-lived background reconnect) or querying (per-request retry budget). Both loops share the same primitives described here.
Host health model
For every entry in addr, the client tracks two attributes: a state and a
zone tier.
State
The state records what the client most recently observed when it tried that host.
| State | When the client moves a host here |
|---|---|
Healthy | The last connect attempt succeeded. |
Unknown | The host has not been tried in this round, or its classification was reset. |
TransientReject | The server returned 421 with X-QuestDB-Role: PRIMARY_CATCHUP — it is a primary that is still catching up after promotion. Expected to recover. |
TransportError | TCP/TLS handshake failed, an HTTP upgrade returned a transient error code, or an established connection broke mid-stream. |
TopologyReject | The server returned 421 with a role that cannot satisfy the requested target= filter — for example, a REPLICA when you asked for target=primary. The host will not become writable without a topology change. |
A lower state in the table above is preferred when the client picks the next host to try.
Zone tier
Each host is also classified relative to the client's configured zone=:
| Zone tier | Meaning |
|---|---|
Same | Server's advertised zone matches the client's zone= (case-insensitive), or zone= is unset, or target=primary. |
Unknown | Server has not advertised a zone yet. |
Other | Server advertised a different zone. |
Zone information is advertised by the server on a successful upgrade and
(starting in QWP v2) on 421 rejects. The client remembers it for the lifetime
of the connection.
target=primary collapses every host's zone tier to Same — writers must
follow the primary regardless of geography. Ingress is currently zone-blind in
both storage modes, so the zone= key is silently accepted on ingress
connections and only takes effect on egress.
Selection priority
When the client needs to pick the next host, it sorts by the tuple (state, zone_tier) lexicographically — state first, zone second. So a known-good host
in another zone wins against an untried local host. Within a tied bucket, the
order in your addr= list is preserved verbatim.
The client does not shuffle, randomise, or load-balance across peers. Cluster-level load balancing is the responsibility of QuestDB's server-side coordinators. If you need a different first-pick distribution across many simultaneously-starting clients, rotate the connect string at deployment time.
Sticky-Healthy across rounds
Once the client lands on a Healthy host, that host stays the priority pick on
the next round of failover — provided its zone tier is still Same. This
avoids unnecessary churn after a short blip: a momentary network glitch
doesn't promote a different node into the active slot just because it
happened to be probed first.
Healthy hosts in another zone are reset to Unknown between outages rather
than kept sticky. Otherwise a once-healthy cross-zone host would lock the
client out of probing local hosts after they recover.
Role filter (target=)
The target= key controls which server role the client is willing to bind to:
target= | STANDALONE | PRIMARY | REPLICA | PRIMARY_CATCHUP |
|---|---|---|---|---|
any (default) | accept | accept | accept | accept (transient) |
primary | accept | accept | reject (topology) | accept (transient) |
replica | reject (topology) | reject (topology) | accept | reject (topology) |
PRIMARY_CATCHUP is a primary that has been promoted but has not yet caught
up to its predecessor's WAL — the client treats it as transient and retries
the same host (with a fresh round, no exponential backoff) until it either
becomes a full PRIMARY or the outage budget expires.
A 421 Misdirected Request response without an X-QuestDB-Role header
is treated as a generic transport error, not a role reject — the client walks
to the next host but does not pin the rejecting host as topology-unreachable.
target=replica is intended for read-side workloads that explicitly want to
spread query load across read-only peers (see the egress flow below).
Two failover contexts
Failover applies to both directions of QWP traffic, but the two contexts have very different goals.
Ingress (writes)
The ingress reconnect loop sits inside the store-and-forward I/O thread. It runs continuously in the background, retrying through outages while the producer keeps appending to the local buffer. The defaults are tuned for throughput-oriented workloads that can tolerate minutes of server unavailability:
- Initial backoff:
100 ms - Maximum backoff:
5 s - Per-outage budget:
5 minutes(reconnect_max_duration_millis) - Jitter: equal-jitter
[base, 2·base)— non-zero lower bound damps reconnect storms when many producers share a cluster - Inter-host pause within a round: none — the client walks the full
address list as fast as
auth_timeout_msallows, paying one backoff sleep at round exhaustion
See the store-and-forward concepts page for how the reconnect loop interacts with the disk-backed segment ring.
Egress (queries)
The egress failover loop wraps each Execute() call on the read-side query
client. It is interactive: a slow failover is worse than a clear error, so
the budget is short:
- Initial backoff:
50 ms - Maximum backoff:
1 s - Total wall-clock budget:
30 s(failover_max_duration_ms) - Attempt cap:
8(failover_max_attempts) - Jitter: full-jitter
[0, base)— a single-user query benefits from the lowest expected recovery time, and one client per workload removes the thundering-herd concern
The egress loop also respects the target= role filter and prefers same-zone
hosts when zone= is set.
Error classification
Every error the client encounters falls into one of three buckets, which drives the loop's response:
Terminal — bypass failover
The client surfaces the error to your code immediately. Retrying every host will not help.
| Condition | Why terminal |
|---|---|
HTTP 401 / 403 on upgrade | Credentials are cluster-wide; retrying floods server logs without recovery. |
| Server-status reject (SF) | Application-layer reject; replay reproduces the same response. |
Topology — handled inside the round
The host is demoted in the priority lattice; the client walks to the next host within the same round. No exponential backoff is consumed.
421+X-QuestDB-Role: PRIMARY_CATCHUP→TransientReject421+ any other recognised role →TopologyRejectSERVER_INFO.Roledoes not match the requestedtarget=
If every host in a round role-rejects, ingress pays one fixed backoff sleep
(reset to InitialBackoff, no doubling) and starts a fresh round; egress
fails the current Execute() call.
Transient — enter backoff
Everything else: TCP/TLS errors, auth_timeout_ms expiry, mid-stream send or
receive failures, 404 / 426 / 503 on upgrade, version mismatches
(per-endpoint — a rolling upgrade in flight does not lock out compatible
peers), and generic frame-decode errors. The client records TransportError
and walks to the next host.
When a round exhausts with transient errors, the client sleeps for the backoff interval (clamped to the remaining outage budget) and starts the next round.
Mid-stream demotion
If a connection breaks mid-stream — for example, the receive pump throws after
a successful upgrade — the client marks the failed host as TransportError
before picking the next host. Without this ordering, the sticky-Healthy
rule would re-pick the same just-failed host as the priority candidate, and
the next attempt would target the broken node again.
This invariant only matters when you are reading client source code or debugging a custom implementation. As a user, you observe it as "failover moves off a broken node on the very next attempt, with no exponential delay when at least one peer is healthy."
Authentication is cluster-wide
A 401 or 403 on the HTTP upgrade is terminal — the client does not retry
other hosts. The assumption is that auth credentials are configured
identically across the cluster, so a credential failure against one node is
a credential failure against all of them. Retrying would spam every peer's
audit log without recovering.
If your deployment has per-host credentials, that is unsupported and outside the failover model — split the workload into one connect string per credential.
Next steps
- Configuration — the connect-string keys and worked examples for each context.
- Store-and-forward concepts — how the ingress failover loop interacts with the disk-backed substrate.