Cluster Operations Guide
This guide covers the cluster lifecycle: bootstrapping a new cluster, inspecting its state, watching member health, decommissioning members, and dropping a cluster. It is intended for operators.
All KR.ADMIN commands run on the management interface (default port 3320). The examples below use
RESP3 output.
Concepts
Section titled “Concepts”Shard status controls whether a shard accepts traffic:
READWRITE: accepts reads and writes.READONLY: accepts reads, rejects writes.INOPERABLE: rejects all access.
Member status reflects a node’s lifecycle state:
RUNNING: the member is active.UNAVAILABLE: the member is registered but not reachable.STOPPED: the member has been shut down or marked for removal.UNKNOWN: the member’s state has not been determined.
Roles describe a member’s relationship to a shard. The primary owns the shard and serves writes.
A standby replicates from the primary and can be promoted to primary.
Bringing Up a New Cluster
Section titled “Bringing Up a New Cluster”A freshly initialized cluster cannot serve traffic. Every shard starts as INOPERABLE with no routes
assigned. The steps below must run in order: a shard must have a primary before it can be opened for
writes.
-
Initialize the cluster. Run this exactly once. It is the only
KR.ADMINsubcommand that does not require a pre-initialized cluster.127.0.0.1:3320> KR.ADMIN INITIALIZE-CLUSTEROK -
List the members to get their IDs.
127.0.0.1:3320> KR.ADMIN LIST-MEMBERS1# "006cdc459c59e600c76494e8388857fc3cba2fa8" =>1# "status" => "RUNNING"2# "process_id" => "A1B2C3D4E5F6G7H8I9J0"3# "external_host" => "10.0.0.1"4# "external_port" => (integer) 54845# "internal_host" => "10.0.0.1"6# "internal_port" => (integer) 33207# "latest_heartbeat" => (integer) 31404 -
Assign a primary to each shard. A four-character member ID prefix is accepted in place of the full forty-character ID.
127.0.0.1:3320> KR.ADMIN ROUTE SET PRIMARY BUCKET 0 006cdc459c59e600c76494e8388857fc3cba2fa8OK -
Add standbys (optional). A primary must already be assigned to the shard.
127.0.0.1:3320> KR.ADMIN ROUTE SET STANDBY BUCKET 0 a3f18b2e74d9c5601f82e4a7b390d612c8f7e149OK -
Open the shards for traffic. Passing
*as the shard ID applies the status to every bucket shard in a single transaction.127.0.0.1:3320> KR.ADMIN SET-SHARD-STATUS BUCKET * READWRITEOK -
Verify the topology.
127.0.0.1:3320> KR.ADMIN DESCRIBE-CLUSTER1# "metadata_version" => "1.0.0"2# "cluster_name" => "my-cluster"3# "bucket" =>1# (integer) 0 =>1# "primary" => "006cdc459c59e600c76494e8388857fc3cba2fa8"2# "standbys" =>1) "a3f18b2e74d9c5601f82e4a7b390d612c8f7e149"3# "status" => "READWRITE"4# "linked_volumes" =>1) "bucket-shard-0"
ROUTE SET PRIMARY also accepts * for the shard ID to assign the same member as primary across all
bucket shards atomically. If any shard fails a pre-condition, the whole operation rolls back.
For full parameter and error details, see KR.ADMIN INITIALIZE-CLUSTER, KR.ADMIN ROUTE, and KR.ADMIN SET-SHARD-STATUS.
Inspecting Cluster State
Section titled “Inspecting Cluster State”Five read-only commands answer different questions:
DESCRIBE-CLUSTER: full topology, every shard’s primary, standbys, status, and linked volumes.DESCRIBE-SHARD: the same per-shard detail for a single shard.LIST-MEMBERS: every registered member with its host, ports, status, and last heartbeat.FIND-MEMBER: metadata for one member, looked up by full ID or four-character prefix.DESCRIBE-MEMBER: metadata for the local member, including its own ID.
127.0.0.1:3320> KR.ADMIN DESCRIBE-SHARD BUCKET 01# "primary" => "006cdc459c59e600c76494e8388857fc3cba2fa8"2# "standbys" => 1) "a3f18b2e74d9c5601f82e4a7b390d612c8f7e149"3# "status" => "READWRITE"4# "linked_volumes" => 1) "bucket-shard-0"Health Monitoring
Section titled “Health Monitoring”Each member increments its own heartbeat counter every cluster.heartbeat.interval seconds. Other
members read that counter and expect it to keep advancing. When a member’s counter stops moving for
longer than cluster.heartbeat.maximum_silent_period intervals, it is suspected dead.
LIST-SILENT-MEMBERS returns the IDs of the members currently
suspected:
127.0.0.1:3320> KR.ADMIN LIST-SILENT-MEMBERS1) "a3f18b2e74d9c5601f82e4a7b390d612c8f7e149"This reflects the local failure-detection state of the responding node, not a cluster-wide consensus. A different node may report a different set. If a silent member resumes sending heartbeats, it drops off the list automatically.
LIST-MEMBERS and FIND-MEMBER expose
the raw latest_heartbeat counter, useful for a manual check or a monitoring script. The value is not
a timestamp: it is meaningful only against an earlier reading of the same member. Read it twice, a few
intervals apart. If a member’s counter has not advanced, that member has gone silent.
127.0.0.1:3320> KR.ADMIN FIND-MEMBER ad14d838a2fa6bf2b87bf7872dbeb63bec03898b1# "status" => "RUNNING"2# "process_id" => "0000085BAE3Q20000000xxxx"3# "external_host" => "172.20.0.4"4# "external_port" => (integer) 54845# "internal_host" => "172.20.0.4"6# "internal_port" => (integer) 33207# "latest_heartbeat" => (integer) 31396Decommissioning a Member
Section titled “Decommissioning a Member”A member in RUNNING status cannot be removed. This guard prevents removing an active node by
accident. Stop it first, then remove it:
-
Mark the member as stopped.
127.0.0.1:3320> KR.ADMIN SET-MEMBER-STATUS 006cdc459c59e600c76494e8388857fc3cba2fa8 STOPPEDOK -
Remove it.
127.0.0.1:3320> KR.ADMIN REMOVE-MEMBER 006cdc459c59e600c76494e8388857fc3cba2fa8OK
Both commands accept a four-character ID prefix instead of the full ID. The change propagates to the other members promptly.
Before removing a member, reassign any shards it owns as primary. See KR.ADMIN SET-MEMBER-STATUS and KR.ADMIN REMOVE-MEMBER.
Dropping a Cluster
Section titled “Dropping a Cluster”DROP-CLUSTER removes a cluster’s entire metadata tree from
FoundationDB. It is a two-phase command to prevent accidental deletion.
-
Request a token. Pass only the cluster name. The name must match the running node’s configured cluster name. The returned token is valid for 60 seconds.
127.0.0.1:3320> KR.ADMIN DROP-CLUSTER my-cluster"a1b2c3d4-e5f6-7890-abcd-ef1234567890" -
Confirm the deletion. Pass the cluster name and the token.
127.0.0.1:3320> KR.ADMIN DROP-CLUSTER my-cluster a1b2c3d4-e5f6-7890-abcd-ef1234567890OK
Notes:
- This removes metadata from FoundationDB only. Volume segment files on local disk are left in place and must be cleaned up by hand.
- Multiple Kronotop clusters can share one FoundationDB instance. The command drops only the named cluster; the others are untouched.
- After a drop, nodes that belonged to the cluster are in an inconsistent state. Stop them.