This is the full developer documentation for Kronotop
# Introduction
> Kronotop is a distributed multi-model database built on FoundationDB.
Kronotop is a distributed multi-model database built on [FoundationDB](https://www.foundationdb.org/).
It has two data models behind one RESP interface: **Bucket**, a document model with secondary indexes and vector search, and **ZMap**, an ordered key-value model. Both live inside namespaces and share the same transaction model. Document bodies are stored on the local disk by **Volume**, a segment-based storage engine with primary-standby replication.
## Data Models
[Section titled “Data Models”](#data-models)
**Bucket** stores BSON documents and provides a query language (BQL) with comparison, logical, and array operators. Buckets support single-field, compound, and vector indexes. Vector indexes are powered by [JVector](https://github.com/datastax/jvector), use HNSW with automatic Product Quantization, and support `cosine`, `euclidean`, and `dot_product` distance functions. Results are ranked by similarity first, then filtered with BQL predicates (post-filtering).
**ZMap** is a RESP-compatible proxy over FoundationDB’s ordered key-value API. Keys and values are opaque byte sequences. Keys are stored in lexicographic order. ZMap provides typed numeric operations (int64, float64, decimal128), conflict-free atomic mutations through FoundationDB’s atomic primitives, and range operations over the ordered key space.
## Namespaces
[Section titled “Namespaces”](#namespaces)
[Namespaces](/docs/namespaces/) are lightweight logical databases built on FoundationDB’s directory layer with hierarchical, dot-separated paths. Each namespace has its own keyspace; buckets, indexes, and ZMap keys in one namespace are invisible to another.
## Transactions
[Section titled “Transactions”](#transactions)
[Transactions](/docs/transactions/) are strictly serializable, inherited from FoundationDB. Each command runs in auto-commit mode by default. `BEGIN` and `COMMIT` group commands into a single atomic unit. A single transaction can atomically span multiple namespaces. Snapshot reads are available for read-heavy workloads where strict serializability is not required.
## Wire Protocol
[Section titled “Wire Protocol”](#wire-protocol)
Kronotop speaks RESP2 and RESP3 and works with existing RESP-compatible clients. `kronotop-cli` or `valkey-cli` can connect directly.
## Getting Started
[Section titled “Getting Started”](#getting-started)
The [Quickstart](/docs/quickstart/) starts a minimal cluster with Docker Compose and walks through inserting and querying a document.
# Quickstart
> Start a minimal Kronotop cluster with Docker Compose, insert a document, and query it back.
The fastest way to try Kronotop is with Docker Compose. Prebuilt jars are also published on the [Releases page](https://github.com/kronotop/kronotop/releases); download them and run with `java -jar`, which requires Java 25 or newer. If you want to build Kronotop from source and run a node locally, see the [Building and Running guide](https://github.com/kronotop/kronotop/blob/main/BUILDING.md).
## Start the Cluster
[Section titled “Start the Cluster”](#start-the-cluster)
Download the Docker Compose file:
```bash
curl -O https://kronotop.com/kronotop-quickstart.yaml
```
Start the cluster:
```bash
docker compose -f kronotop-quickstart.yaml up
```
This starts a minimal cluster: one FoundationDB instance, one Kronotop primary node, and one standby node. The primary owns bucket shard 0; the standby replicates it for failover.
## Connect
[Section titled “Connect”](#connect)
Once all containers are running, connect with `kronotop-cli` on port 3320:
```bash
docker run --rm -it --platform linux/amd64 --network kronotop \
ghcr.io/kronotop/kronotop:latest kronotop-cli -h kronotop-primary -p 3320
```
Alternatively, you can use `kronotop-cli` or `valkey-cli` if you have them installed locally:
```bash
valkey-cli -p 3320
```
Verify the cluster is up:
```kronotop
kronotop-primary:3320> PING
PONG
kronotop-primary:3320> KR.ADMIN DESCRIBE-CLUSTER
1# "metadata_version" => "1.0.0"
2# "cluster_name" => "development"
3# "bucket" =>
1# (integer) 0 =>
1# "primary" => "f627bf46f8627333a064de5c388d0316cc223a54"
2# "standbys" => 1) "e159f73fb21cd6ac80639b2cc1e087e8330cd947"
3# "status" => "READWRITE"
4# "linked_volumes" => 1) "bucket-shard-0"
```
## Session Setup
[Section titled “Session Setup”](#session-setup)
Port 3320 is the internal/admin port. Client traffic goes through port 5484. Connect to the client port:
```bash
docker run --rm -it --platform linux/amd64 --network kronotop \
ghcr.io/kronotop/kronotop:latest kronotop-cli -h kronotop-primary -p 5484
```
`kronotop-cli` sets these automatically. If you are using `valkey-cli`, set the session attributes manually for human-readable output:
```kronotop
SESSION.ATTRIBUTE SET input_type json
SESSION.ATTRIBUTE SET reply_type json
SESSION.ATTRIBUTE SET object_id_format hex
```
## Insert and Query a Document
[Section titled “Insert and Query a Document”](#insert-and-query-a-document)
Create a bucket:
```kronotop
kronotop-primary:5484> BUCKET.CREATE orders
OK
```
Insert a document:
```kronotop
kronotop-primary:5484> BUCKET.INSERT orders DOCS '{
"item": "keyboard",
"qty": 2,
"price": 49.99
}'
1) "6a133c8806bf494c9e7e00cb"
```
Query it back with the Bucket Query Language (BQL):
```kronotop
kronotop-primary:5484> BUCKET.QUERY orders '{"qty": {"$gte": 1}}'
1# "cursor_id" => (integer) 1
2# "entries" => 1) {"_id": "6a133c8806bf494c9e7e00cb", "item": "keyboard", "qty": 2, "price": 49.99}
```
## One Transaction, Multiple Models
[Section titled “One Transaction, Multiple Models”](#one-transaction-multiple-models)
Kronotop allows different data models and namespaces to participate in the same strictly serializable transaction boundary. A single transaction can atomically write a document and update a counter across isolated namespaces:
```kronotop
BEGIN
# In the sales namespace: record a new order.
NAMESPACE USE production.sales
BUCKET.INSERT orders DOCS '{"item": "keyboard", "qty": 2, "price": 49.99}'
# In the inventory namespace: decrement stock, conflict-free.
NAMESPACE USE production.inventory
ZINC.I64 keyboard -2
COMMIT
```
Namespaces must exist before use; create them with `NAMESPACE CREATE production.sales` and `NAMESPACE CREATE production.inventory`.
## Next Steps
[Section titled “Next Steps”](#next-steps)
* [Bucket Tutorial](/docs/bucket/tutorial/): a walkthrough of the Bucket API, from creating a bucket to removing it
* [BQL Reference](/docs/bucket/bql-reference/): the query language in full
* [ZMap](/docs/zmap/): the ordered key-value model
* [Transactions](/docs/transactions/): explicit transactions and snapshot reads
* [Namespaces](/docs/namespaces/): logical isolation and cross-namespace transactions
* [Configuration Reference](/docs/config/): overriding the built-in defaults
# Bucket Tutorial
> A hands-on walkthrough of the Bucket API, from creating your first bucket to removing it.
A hands-on walkthrough of the Bucket API, from creating your first bucket to removing it.
## Before You Start
[Section titled “Before You Start”](#before-you-start)
You need a running Kronotop cluster with at least one initialized shard. Connect with any RESP-compatible client on port **5484** (the default client port).
All examples in this tutorial use JSON input and output for readability. In production, BSON is recommended because it avoids conversion overhead and supports richer data types.
## Session Setup
[Section titled “Session Setup”](#session-setup)
Before working with documents, configure the session to use JSON:
```kronotop
SESSION.ATTRIBUTE SET input_type json
SESSION.ATTRIBUTE SET reply_type json
SESSION.ATTRIBUTE SET object_id_format hex
```
| Attribute | Effect |
| ------------------ | ------------------------------------------------------------------------------- |
| `input_type` | Controls whether documents you send are parsed as `json` or `bson`. |
| `reply_type` | Controls whether documents returned by queries are encoded as `json` or `bson`. |
| `object_id_format` | Controls whether ObjectIds are returned as `hex` strings or raw `bytes`. |
See [Session Attributes](/docs/sessions/) for the full list of session settings.
## Creating a Bucket
[Section titled “Creating a Bucket”](#creating-a-bucket)
Create a bucket named `users`:
```kronotop
> BUCKET.CREATE users
OK
```
Verify it exists:
```kronotop
> BUCKET.LIST
1) "users"
```
For idempotent scripts, use `IF-NOT-EXISTS` to avoid errors when the bucket already exists:
```kronotop
> BUCKET.CREATE users IF-NOT-EXISTS
OK
```
Every bucket is created with a **primary index** on the `_id` field. This index is always in `READY` status and cannot be dropped.
See [BUCKET.CREATE](/docs/bucket/commands/bucket-create/) for shard assignment and index creation at bucket creation time.
## Data Modelling and Inserting Documents
[Section titled “Data Modelling and Inserting Documents”](#data-modelling-and-inserting-documents)
### Single Insert
[Section titled “Single Insert”](#single-insert)
```kronotop
> BUCKET.INSERT users DOCS '{"name": "Alice", "age": 30, "status": "active"}'
1) "6a23da8a87f4f93001bd8df6"
```
Kronotop auto-generates an `_id` (ObjectId) for each document and returns it.
### Batch Insert
[Section titled “Batch Insert”](#batch-insert)
Pass multiple documents after the `DOCS` keyword:
```kronotop
> BUCKET.INSERT users DOCS '{"name": "Bob", "age": 25, "status": "active"}' '{"name": "Carol", "age": 35, "status": "inactive"}'
1) "6a23da9887f4f93001bd8df7"
2) "6a23da9887f4f93001bd8df8"
```
### User-Provided `_id`
[Section titled “User-Provided \_id”](#user-provided-_id)
Supply an ObjectId using extended JSON notation:
```kronotop
> BUCKET.INSERT users DOCS '{"_id": {"$oid": "507f1f77bcf86cd799439011"}, "name": "Dave", "age": 28}'
1) "507f1f77bcf86cd799439011"
```
If the `_id` already exists, the command returns a `DUPLICATEKEY` error. The `_id` field is immutable.
### Schemaless Documents
[Section titled “Schemaless Documents”](#schemaless-documents)
Documents in the same bucket can have different fields:
```kronotop
> BUCKET.INSERT users DOCS '{"name": "Eve", "email": "eve@example.com"}'
1) "6a23dab187f4f93001bd8df9"
```
### Nested Documents
[Section titled “Nested Documents”](#nested-documents)
```kronotop
> BUCKET.INSERT users DOCS '{
"name": "Frank",
"age": 40,
"address": {"city": "Istanbul", "zip": "34000"}
}'
1) "6a23dabd87f4f93001bd8dfa"
```
### Arrays
[Section titled “Arrays”](#arrays)
```kronotop
> BUCKET.INSERT users DOCS '{"name": "Grace", "age": 29, "tags": ["admin", "verified"]}'
1) "6a23dac687f4f93001bd8dfb"
```
### Supported Data Types
[Section titled “Supported Data Types”](#supported-data-types)
| Type | JSON Example | Notes |
| --------- | ------------------------------------------------------ | --------------------- |
| String | `"Alice"` | UTF-8 |
| Int32 | `25` | 32-bit integer |
| Int64 | `9223372036854775807` | 64-bit integer |
| Double | `19.99` | 64-bit floating point |
| Boolean | `true`, `false` | |
| Null | `null` | |
| DateTime | `{"$date": "2026-06-01T10:00:00Z"}` | Extended JSON |
| Timestamp | `{"$timestamp": {"t": 1749031200, "i": 1}}` | Extended JSON |
| Binary | `{"$binary": {"base64": "SGVsbG8=", "subType": "00"}}` | Extended JSON |
| ObjectId | `{"$oid": "..."}` | Extended JSON |
| Array | `[1, 2, 3]` | |
| Document | `{"key": "value"}` | Nested |
See [BUCKET.INSERT](/docs/bucket/commands/bucket-insert/) for the full reference.
## Querying Documents
[Section titled “Querying Documents”](#querying-documents)
### Match All
[Section titled “Match All”](#match-all)
An empty filter returns every document in the bucket:
```kronotop
> BUCKET.QUERY users '{}'
1# "cursor_id" => (integer) 1
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
5) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
6) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
7) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
Every response contains a `cursor_id` and an `entries` array. Each returned document includes the server-injected `_id` field.
### Exact Match
[Section titled “Exact Match”](#exact-match)
A plain field value is an implicit `$eq`:
```kronotop
> BUCKET.QUERY users '{"name": "Alice"}'
1# "cursor_id" => (integer) 2
2# "entries" => 1) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
```
### Query by `_id`
[Section titled “Query by \_id”](#query-by-_id)
```kronotop
> BUCKET.QUERY users '{"_id": {"$oid": "6a23da8a87f4f93001bd8df6"}}'
1# "cursor_id" => (integer) 4
2# "entries" => 1) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
```
See [BUCKET.QUERY](/docs/bucket/commands/bucket-query/) for the full reference.
## Filtering with BQL
[Section titled “Filtering with BQL”](#filtering-with-bql)
BQL (Bucket Query Language) is used by `BUCKET.QUERY`, `BUCKET.DELETE`, and `BUCKET.UPDATE`. All examples below query the `users` bucket.
### Comparison Operators
[Section titled “Comparison Operators”](#comparison-operators)
| Operator | Description |
| -------- | ------------------------ |
| `$eq` | Equal to |
| `$ne` | Not equal to |
| `$gt` | Greater than |
| `$gte` | Greater than or equal to |
| `$lt` | Less than |
| `$lte` | Less than or equal to |
Explicit comparison:
```kronotop
> BUCKET.QUERY users '{"age": {"$gt": 25}}'
1# "cursor_id" => (integer) 5
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
4) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
5) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
Range shorthand: multiple operators on the same field are implicitly ANDed.
```kronotop
> BUCKET.QUERY users '{"age": {"$gte": 18, "$lt": 65}}'
1# "cursor_id" => (integer) 6
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
5) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
6) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
### Logical Operators
[Section titled “Logical Operators”](#logical-operators)
**Implicit AND**, multiple fields in the same object:
```kronotop
> BUCKET.QUERY users '{"status": "active", "age": {"$gte": 18}}'
1# "cursor_id" => (integer) 7
2# "entries" =>
1) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
2) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
```
**Explicit `$and`:**
```kronotop
> BUCKET.QUERY users '{"$and": [{"status": "active"}, {"age": {"$gte": 18}}]}'
1# "cursor_id" => (integer) 8
2# "entries" =>
1) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
2) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
```
**`$or`:**
```kronotop
> BUCKET.QUERY users '{"$or": [{"status": "active"}, {"status": "pending"}]}'
1# "cursor_id" => (integer) 9
2# "entries" =>
1) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
2) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
```
**`$nor`:**
```kronotop
> BUCKET.QUERY users '{"$nor": [{"status": "inactive"}, {"status": "deleted"}]}'
1# "cursor_id" => (integer) 10
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
5) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
6) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
**`$not`:**
```kronotop
> BUCKET.QUERY users '{"age": {"$not": {"$gt": 100}}}'
1# "cursor_id" => (integer) 11
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
5) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
6) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
7) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
### Array Operators
[Section titled “Array Operators”](#array-operators)
**`$in`**, match any value in a set:
```kronotop
> BUCKET.QUERY users '{"status": {"$in": ["active", "pending"]}}'
1# "cursor_id" => (integer) 12
2# "entries" =>
1) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
2) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
```
**`$nin`**, match none of the values:
```kronotop
> BUCKET.QUERY users '{"status": {"$nin": ["deleted", "archived"]}}'
1# "cursor_id" => (integer) 13
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
5) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
6) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
7) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
**`$all`**, array field contains all specified values:
```kronotop
> BUCKET.QUERY users '{"tags": {"$all": ["admin", "verified"]}}'
1# "cursor_id" => (integer) 14
2# "entries" => 1) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
**`$size`**, array has the specified length:
```kronotop
> BUCKET.QUERY users '{"tags": {"$size": 2}}'
1# "cursor_id" => (integer) 15
2# "entries" => 1) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
**`$elemMatch`**, an array element matches a compound condition:
```kronotop
> BUCKET.INSERT users DOCS '{"name": "Henry", "age": 31, "scores": [75, 85, 95]}'
1) "6a23e0ac87f4f93001bd8dfd"
```
Query it back with `$elemMatch`:
```kronotop
> BUCKET.QUERY users '{"scores": {"$elemMatch": {"$gte": 80, "$lt": 90}}}'
1# "cursor_id" => (integer) 32
2# "entries" => 1) {"_id": "6a23e0ac87f4f93001bd8dfd", "name": "Henry", "age": 31, "scores": [75, 85, 95]}
```
### Field Operators
[Section titled “Field Operators”](#field-operators)
**`$exists`**, field is present or absent:
```kronotop
> BUCKET.QUERY users '{"email": {"$exists": true}}'
1# "cursor_id" => (integer) 17
2# "entries" => 1) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
```
```kronotop
> BUCKET.QUERY users '{"deletedAt": {"$exists": false}}'
1# "cursor_id" => (integer) 18
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
5) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
6) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
7) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
### Nested Field Access
[Section titled “Nested Field Access”](#nested-field-access)
Use dot notation:
```kronotop
> BUCKET.QUERY users '{"address.city": "Istanbul"}'
1# "cursor_id" => (integer) 19
2# "entries" => 1) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
```
### Null Values
[Section titled “Null Values”](#null-values)
```kronotop
> BUCKET.QUERY users '{"middleName": null}'
1# "cursor_id" => (integer) 20
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
3) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
4) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
5) {"_id": "6a23dab187f4f93001bd8df9", "name": "Eve", "email": "eve@example.com"}
6) {"_id": "6a23dabd87f4f93001bd8dfa", "name": "Frank", "age": 40, "address": {"city": "Istanbul", "zip": "34000"}}
7) {"_id": "6a23dac687f4f93001bd8dfb", "name": "Grace", "age": 29, "tags": ["admin", "verified"]}
```
See [BQL Language Reference](/docs/bucket/bql-reference/) for the complete operator reference.
## Sorting and Pagination
[Section titled “Sorting and Pagination”](#sorting-and-pagination)
### Cursor Model
[Section titled “Cursor Model”](#cursor-model)
Every `BUCKET.QUERY`, `BUCKET.DELETE`, and `BUCKET.UPDATE` response includes a `cursor_id`. The cursor tracks position in the result set so you can page through results with `BUCKET.ADVANCE`.
### Basic Pagination
[Section titled “Basic Pagination”](#basic-pagination)
```kronotop
> BUCKET.QUERY users '{}' LIMIT 2
1# "cursor_id" => (integer) 21
2# "entries" =>
1) {"_id": "507f1f77bcf86cd799439011", "name": "Dave", "age": 28}
2) {"_id": "6a23da8a87f4f93001bd8df6", "name": "Alice", "age": 30, "status": "active"}
```
```kronotop
> BUCKET.ADVANCE QUERY 21
1# "cursor_id" => (integer) 21
2# "entries" =>
1) {"_id": "6a23da9887f4f93001bd8df7", "name": "Bob", "age": 25, "status": "active"}
2) {"_id": "6a23da9887f4f93001bd8df8", "name": "Carol", "age": 35, "status": "inactive"}
```
An empty `entries` array signals that all matching documents have been returned.
### Default Batch Size
[Section titled “Default Batch Size”](#default-batch-size)
When `LIMIT` is omitted, the session’s `limit` attribute controls the batch size (default: 100). You can change it with:
```kronotop
> SESSION.ATTRIBUTE SET limit 50
OK
```
### Closing Cursors
[Section titled “Closing Cursors”](#closing-cursors)
Cursors are stored in the session and consume resources while open. Close a cursor with `BUCKET.CLOSE` when you are done:
```kronotop
> BUCKET.CLOSE QUERY 0
OK
```
### Listing Active Cursors
[Section titled “Listing Active Cursors”](#listing-active-cursors)
```kronotop
> BUCKET.CURSORS
1) QUERY -> 0 -> "{}"
2) UPDATE -> (empty map)
3) DELETE -> (empty map)
```
Filter by operation type:
```kronotop
> BUCKET.CURSORS QUERY
1) QUERY -> 0 -> "{}"
```
### Sorting with SORTBY
[Section titled “Sorting with SORTBY”](#sorting-with-sortby)
```kronotop
BUCKET.QUERY users '{"age": {"$gt": 25}}' SORTBY age ASC LIMIT 10
```
`SORTBY` requires an index on the sort field, and the query plan must produce results already ordered by that field. Here the query filters and sorts on the same indexed field, so the index scan provides the ordering. Results are globally sorted across all `BUCKET.ADVANCE` calls. If the plan cannot provide the ordering, the query is rejected at planning time. To sort by a different field than the filter, create a compound index covering both fields, or use `RESULTSORT` for in-memory per-batch sorting.
This example needs the `age` index created in the [Indexes](#indexes) section below. Indexes are built asynchronously; `SORTBY` is rejected until the index reaches `READY` status. Check the status with `BUCKET.INDEX DESCRIBE`.
See [SORTBY](/docs/bucket/sortby/) for sorting details, [BUCKET.ADVANCE](/docs/bucket/commands/bucket-advance/), [BUCKET.CLOSE](/docs/bucket/commands/bucket-close/), and [BUCKET.CURSORS](/docs/bucket/commands/bucket-cursors/) for the full reference.
## Updating Documents
[Section titled “Updating Documents”](#updating-documents)
### `$set`: Set Field Values
[Section titled “$set: Set Field Values”](#set-set-field-values)
```kronotop
> BUCKET.UPDATE users '{"name": "Alice"}' '{"$set": {"status": "inactive"}}'
1# "cursor_id" => (integer) 22
2# "object_ids" => 1) "6a23da8a87f4f93001bd8df6"
```
The response contains the ObjectIds of all updated documents.
### `$unset`: Remove Fields
[Section titled “$unset: Remove Fields”](#unset-remove-fields)
```kronotop
> BUCKET.UPDATE users '{"age": {"$gt": 30}}' '{"$unset": ["temporary_field", "deprecated_field"]}'
1# "cursor_id" => (integer) 23
2# "object_ids" =>
1) "6a23da9887f4f93001bd8df8"
2) "6a23dabd87f4f93001bd8dfa"
```
### Combining `$set` and `$unset`
[Section titled “Combining $set and $unset”](#combining-set-and-unset)
```kronotop
> BUCKET.UPDATE users '{}' '{"$set": {"version": 2}, "$unset": ["old_field"]}'
1# "cursor_id" => (integer) 24
2# "object_ids" =>
1) "507f1f77bcf86cd799439011"
2) "6a23da8a87f4f93001bd8df6"
3) "6a23da9887f4f93001bd8df7"
4) "6a23da9887f4f93001bd8df8"
5) "6a23dab187f4f93001bd8df9"
6) "6a23dabd87f4f93001bd8dfa"
7) "6a23dac687f4f93001bd8dfb"
```
### Upsert
[Section titled “Upsert”](#upsert)
When `upsert` is `true`, a new document is inserted if no documents match the filter:
```kronotop
> BUCKET.UPDATE users '{"name": "Helen"}' '{"$set": {"name": "Helen", "status": "active"}, "upsert": true}'
1# "cursor_id" => (integer) 25
2# "object_ids" => 1) "6a23dd0287f4f93001bd8dfc"
```
### Array Filters
[Section titled “Array Filters”](#array-filters)
Use the positional `$[identifier]` syntax with `array_filters` to update only the array elements that match a condition. The filter is evaluated against each element. Henry’s `scores` array is `[75, 85, 95]`; set the elements greater than or equal to 80 to 100:
```kronotop
> BUCKET.UPDATE users '{"name": "Henry"}' '{"$set": {"scores.$[elem]": 100}, "array_filters": [{"elem": {"$gte": 80}}]}'
1# "cursor_id" => (integer) 26
2# "object_ids" => 1) "6a23e0ac87f4f93001bd8dfd"
```
Henry’s `scores` array is now `[75, 100, 100]`.
### Ordered Batch Updates
[Section titled “Ordered Batch Updates”](#ordered-batch-updates)
Combine `SORTBY` and `LIMIT` to update documents in a specific order. `SORTBY` requires the query to produce results already ordered by the sort field. When the filter field and the sort field are different, create a compound index that covers both, with the filter field first:
```kronotop
> BUCKET.INDEX CREATE users '{
"$compound": [{
"name": "idx_status_created_at",
"fields": [
{"selector": "status", "bson_type": "string"},
{"selector": "created_at", "bson_type": "datetime"}
]
}]
}'
OK
```
Indexes are built asynchronously. Wait until the index reaches `READY` status before running queries that sort with it; check with `BUCKET.INDEX DESCRIBE`.
Insert a few pending documents with `created_at` timestamps:
```kronotop
> BUCKET.INSERT users DOCS '{
"name": "Ivy",
"status": "pending",
"created_at": {"$date": "2026-06-01T10:00:00Z"}
}' '{
"name": "Jack",
"status": "pending",
"created_at": {"$date": "2026-06-02T10:00:00Z"}
}'
1) "6a23e10187f4f93001bd8dfd"
2) "6a23e10187f4f93001bd8dfe"
```
```kronotop
> BUCKET.INSERT users DOCS '{
"name": "Kate",
"status": "pending",
"created_at": {"$date": "2026-06-03T10:00:00Z"}
}' '{
"name": "Liam",
"status": "pending",
"created_at": {"$date": "2026-06-04T10:00:00Z"}
}'
1) "6a23e10987f4f93001bd8dff"
2) "6a23e10987f4f93001bd8e00"
```
With an equality filter on `status`, the compound index provides natural ordering on `created_at`:
```kronotop
> BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}' SORTBY created_at ASC LIMIT 2
1# "cursor_id" => (integer) 29
2# "object_ids" =>
1) "6a23e10187f4f93001bd8dfd"
2) "6a23e10187f4f93001bd8dfe"
```
The first batch updates the two oldest pending documents. Advance the cursor to update the next batch:
```kronotop
> BUCKET.ADVANCE UPDATE 29
1# "cursor_id" => (integer) 29
2# "object_ids" =>
1) "6a23e10987f4f93001bd8dff"
2) "6a23e10987f4f93001bd8e00"
```
See [BUCKET.UPDATE](/docs/bucket/commands/bucket-update/) for the full reference.
## Deleting Documents
[Section titled “Deleting Documents”](#deleting-documents)
### Delete by Filter
[Section titled “Delete by Filter”](#delete-by-filter)
```kronotop
> BUCKET.DELETE users '{"status": "inactive"}'
1# "cursor_id" => (integer) 27
2# "object_ids" =>
1) "6a23da8a87f4f93001bd8df6"
2) "6a23da9887f4f93001bd8df8"
```
### Batched Deletes
[Section titled “Batched Deletes”](#batched-deletes)
Use `LIMIT` and `BUCKET.ADVANCE` to delete in batches:
```kronotop
> BUCKET.DELETE users '{"status": "inactive"}' LIMIT 100
1# "cursor_id" => (integer) 28
2# "object_ids" =>
1) "6a23da8a87f4f93001bd8df6"
2) "6a23da9887f4f93001bd8df8"
```
Advance the cursor:
```kronotop
> BUCKET.ADVANCE DELETE 0
cursor_id -> (integer) 0
object_ids -> [...] (next 100 deleted)
```
### Delete All Documents
[Section titled “Delete All Documents”](#delete-all-documents)
```kronotop
> BUCKET.DELETE users '{}'
1# "cursor_id" => (integer) 28
2# "object_ids" =>
1) "6a23da8a87f4f93001bd8df6"
2) "6a23da9887f4f93001bd8df8"
```
Note: `SORTBY` is not supported on `BUCKET.DELETE`.
See [BUCKET.DELETE](/docs/bucket/commands/bucket-delete/) for the full reference.
## Indexes
[Section titled “Indexes”](#indexes)
An index is an accelerator. There is no semantic difference between indexed and non-indexed fields. A query returns the same results regardless of which indexes exist; indexes only affect performance.
### Primary Index
[Section titled “Primary Index”](#primary-index)
Every bucket has a primary index on `_id`. It is created automatically and cannot be dropped.
### Creating Secondary Indexes
[Section titled “Creating Secondary Indexes”](#creating-secondary-indexes)
```kronotop
> BUCKET.INDEX CREATE users '{"age": {"bson_type": "int32"}}'
OK
```
Create multiple indexes at once:
```kronotop
> BUCKET.INDEX CREATE users '{"age": {"bson_type": "int32"}, "email": {"bson_type": "string"}}'
OK
```
### Indexes at Bucket Creation Time
[Section titled “Indexes at Bucket Creation Time”](#indexes-at-bucket-creation-time)
You can define indexes when creating a bucket:
```kronotop
> BUCKET.CREATE events INDEXES '{"timestamp": {"bson_type": "datetime"}}'
OK
```
### Multi-Key Indexes
[Section titled “Multi-Key Indexes”](#multi-key-indexes)
For array fields, set `multi_key` to `true` so each array element gets its own index entry:
```kronotop
> BUCKET.INDEX CREATE users '{"tags": {"bson_type": "string", "multi_key": true}}'
OK
```
### Compound Indexes
[Section titled “Compound Indexes”](#compound-indexes)
A compound index covers multiple fields in a defined order. Use it when queries consistently filter on the same combination of fields:
```kronotop
> BUCKET.INDEX CREATE products '{
"$compound": [{
"name": "idx_cat_price",
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
]
}]
}'
OK
```
Query using both fields. The compound index handles the entire filter in a single scan:
```kronotop
BUCKET.QUERY products '{"category": {"$eq": "electronics"}, "price": {"$gt": 100.0}}'
```
The query engine matches filters left to right against the compound index fields. All fields before the last must use equality; only the last matched field may use a range operator. At least two fields must match for the compound index to activate. The exception is `SORTBY`: a single equality match is enough when the sort field is the next field in the index, as in the ordered batch updates example above.
See [Compound Indexes](/docs/bucket/compound-index/) for the prefix rule, trade-offs, and constraints.
### Index Lifecycle
[Section titled “Index Lifecycle”](#index-lifecycle)
Indexes are built asynchronously in the background. After creation, an index progresses through these states:
| Status | Description |
| ---------- | -------------------------------------- |
| `WAITING` | Queued for building. |
| `BUILDING` | Background task is populating entries. |
| `READY` | Available for query planning. |
| `DROPPED` | Marked for deletion and cleanup. |
| `FAILED` | Build failed. |
Monitor progress with `BUCKET.INDEX TASKS`:
```kronotop
BUCKET.INDEX TASKS users "selector:age.bsonType:INT32"
```
Check index details with `BUCKET.INDEX DESCRIBE`:
```kronotop
> BUCKET.INDEX DESCRIBE users "selector:age.bsonType:INT32"
1# "index_type" => "single_field"
2# "id" => (integer) 3048755950204840837
3# "selector" => "age"
4# "bson_type" => "INT32"
5# "status" => "READY"
6# "collation" =>
1# "locale" => (nil)
2# "strength" => (nil)
3# "case_level" => (nil)
4# "case_first" => (nil)
5# "numeric_ordering" => (nil)
6# "alternate" => (nil)
7# "backwards" => (nil)
8# "normalization" => (nil)
9# "max_variable" => (nil)
7# "statistics" =>
1# "cardinality" => (integer) 6
```
### Listing and Dropping Indexes
[Section titled “Listing and Dropping Indexes”](#listing-and-dropping-indexes)
```kronotop
> BUCKET.INDEX LIST users
1) "primary-index"
2) "selector:age.bsonType:INT32"
3) "selector:email.bsonType:STRING"
4) "selector:tags.bsonType:STRING"
```
```kronotop
> BUCKET.INDEX DROP users "selector:email.bsonType:STRING"
OK
```
The primary index (`primary-index`) cannot be dropped.
### Analyzing Statistics
[Section titled “Analyzing Statistics”](#analyzing-statistics)
Trigger statistics analysis to help the query optimizer choose better plans:
```kronotop
> BUCKET.INDEX ANALYZE users "selector:age.bsonType:INT32"
OK
```
### Strict Typing
[Section titled “Strict Typing”](#strict-typing)
Index types must be compatible with query predicate types. Given an index on `age` with type `int32`:
```plaintext
{"age": {"$eq": 25}} -- uses the index (int32 matches int32)
{"age": {"$eq": "25"}} -- does NOT use the index (string does not match int32)
```
No implicit type coercion is performed between unrelated types. Numeric types (`INT32`, `INT64`, `DOUBLE`, `DECIMAL128`) support lossless widening. See [strict-types.md](/docs/bucket/strict-types/#numeric-widening) for details.
See [BUCKET.INDEX](/docs/bucket/commands/bucket-index/) for the full reference.
## Inspecting Query Plans
[Section titled “Inspecting Query Plans”](#inspecting-query-plans)
Use `BUCKET.EXPLAIN` to see how a query will be executed without running it:
```kronotop
> BUCKET.EXPLAIN users '{"age": {"$gt": 25}}'
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "IndexScan"
3# "id" => (integer) 2
4# "scanType" => "INDEX_SCAN"
5# "index" => "selector:age.bsonType:INT32"
6# "selector" => "age"
7# "operator" => "GT"
8# "operand" => "Param[ref=ParamRef[index=0]]"
```
### Key Node Types
[Section titled “Key Node Types”](#key-node-types)
| Node Type | Meaning |
| -------------------------------- | --------------------------------------------------------------------------- |
| `FullScan` | Scans all documents (no usable index). |
| `IndexScan` | Point lookup on an index. |
| `RangeScan` | Bounded range scan on an index. |
| `TransformWithResidualPredicate` | Post-scan filter for conditions that could not be pushed into an index. |
| `Union` | Combines results from multiple scans (logical OR). |
| `CompoundIndexScan` | Single scan on a compound index using equality prefixes and optional range. |
| `Intersection` | Combines results from multiple scans (logical AND). |
### Mixed Indexed and Non-Indexed Filters
[Section titled “Mixed Indexed and Non-Indexed Filters”](#mixed-indexed-and-non-indexed-filters)
When `age` is indexed but `name` is not:
```kronotop
> BUCKET.EXPLAIN users '{"$and": [{"age": {"$eq": 25}}, {"name": {"$eq": "Alice"}}]}'
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "IndexScan"
3# "id" => (integer) 23
4# "scanType" => "INDEX_SCAN"
5# "index" => "selector:age.bsonType:INT32"
6# "selector" => "age"
7# "operator" => "EQ"
8# "operand" => "Param[ref=ParamRef[index=1]]"
9# "next" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "TransformWithResidualPredicate"
3# "id" => (integer) 26
4# "operation" => "FILTER"
5# "predicate" =>
1# "type" => "AND"
2# "children" =>
1) 1# "type" => "PREDICATE"
2# "selector" => "name"
3# "operator" => "EQ"
4# "operand" => "Param[ref=ParamRef[index=0]]"
```
The plan uses an `IndexScan` on `age` followed by a `TransformWithResidualPredicate` that filters on `name`.
### Plan Caching
[Section titled “Plan Caching”](#plan-caching)
Plans are cached by query shape, the structural fingerprint without literal values. Two queries with the same operators and field paths but different values share the same cached plan. The cache is invalidated when indexes are created or dropped.
```kronotop
> BUCKET.QUERY users '{"status": "active"}'
...
> BUCKET.EXPLAIN users '{"status": "active"}'
is_cached -> (boolean) true
plan -> ...
```
### Detecting Full Scans
[Section titled “Detecting Full Scans”](#detecting-full-scans)
If `BUCKET.EXPLAIN` shows a `FullScan` node for a query you run frequently, create an index on the filtered field.
See [BUCKET.EXPLAIN](/docs/bucket/commands/bucket-explain/) for the full reference.
## Transactions
[Section titled “Transactions”](#transactions)
By default, every command runs in **auto-commit mode**. Each operation is its own transaction.
For multi-step atomic operations, use explicit transactions:
```kronotop
> BEGIN
OK
> BUCKET.INSERT users DOCS '{"name": "Ivy", "age": 22}'
1) "6835a1c0e4b0f72a3c000007"
> BUCKET.UPDATE users '{"name": "Alice"}' '{"$set": {"status": "inactive"}}'
1# "cursor_id" => (integer) 30
2# "object_ids" => 1) "6835a1c0e4b0f72a3c000001"
> COMMIT
OK
```
To cancel, use `ROLLBACK` instead of `COMMIT`.
Within an explicit transaction, reads reflect prior writes from the same transaction (read-your-writes).
FoundationDB imposes two constraints on transactions:
| Constraint | Limit |
| -------------------- | --------- |
| Transaction size | 10 MB |
| Transaction duration | 5 seconds |
See [Transactions](/docs/transactions/) for snapshot reads, size inspection, and advanced usage.
## Namespaces
[Section titled “Namespaces”](#namespaces)
Buckets live inside namespaces. The default namespace is `global`. All bucket commands operate within the current session namespace.
Switch to a different namespace:
```kronotop
> NAMESPACE USE production
OK
```
Check the current namespace:
```kronotop
> NAMESPACE CURRENT
"production"
```
Any bucket created or queried after `NAMESPACE USE` belongs to that namespace:
```kronotop
> BUCKET.CREATE orders
OK
> BUCKET.LIST
1) "orders"
```
See [Namespaces](/docs/namespaces/) for hierarchical namespaces, creation, and removal.
## Bucket Lifecycle
[Section titled “Bucket Lifecycle”](#bucket-lifecycle)
Removing a bucket is a two-phase process.
### Phase 1: Soft Delete
[Section titled “Phase 1: Soft Delete”](#phase-1-soft-delete)
`BUCKET.REMOVE` marks the bucket as logically removed. The bucket becomes inaccessible, and background workers (index maintenance, replication) are signaled to stop.
```kronotop
> BUCKET.REMOVE users
OK
```
### Phase 2: Hard Delete
[Section titled “Phase 2: Hard Delete”](#phase-2-hard-delete)
`BUCKET.PURGE` permanently deletes all bucket data, indexes, and metadata. Before proceeding, it enforces a distributed sync barrier that verifies every alive cluster member has observed the removal event.
```kronotop
> BUCKET.PURGE users
OK
```
If the barrier is not yet satisfied, the command returns `BARRIERNOTSATISFIED`. Retry the command; one retry is usually enough.
```kronotop
> BUCKET.PURGE users
(error) BARRIERNOTSATISFIED Barrier not satisfied: not all shards observed version ...
> BUCKET.PURGE users
OK
```
After purging, the bucket name can be reused.
See [BUCKET.REMOVE](/docs/bucket/commands/bucket-remove/) and [BUCKET.PURGE](/docs/bucket/commands/bucket-purge/) for the full reference.
# BQL Language Reference
> BQL (Bucket Query Language) is the query language of the Bucket data structure, used for document operations.
BQL (Bucket Query Language) is the query language of the Bucket data structure, used for document operations. Queries are written in JSON or BSON format.
## Query Syntax
[Section titled “Query Syntax”](#query-syntax)
Queries are JSON objects that specify conditions for matching documents.
### Basic Equality
[Section titled “Basic Equality”](#basic-equality)
```json
{ "status": "active" }
{ "age": 25 }
{ "verified": true }
```
### Empty Query (Match All)
[Section titled “Empty Query (Match All)”](#empty-query-match-all)
```json
{}
```
## Comparison Operators
[Section titled “Comparison Operators”](#comparison-operators)
| Operator | Description | Example |
| -------- | --------------------- | ------------------------------------ |
| `$eq` | Equal to | `{ "age": { "$eq": 25 } }` |
| `$ne` | Not equal to | `{ "status": { "$ne": "deleted" } }` |
| `$gt` | Greater than | `{ "age": { "$gt": 18 } }` |
| `$gte` | Greater than or equal | `{ "price": { "$gte": 10 } }` |
| `$lt` | Less than | `{ "age": { "$lt": 65 } }` |
| `$lte` | Less than or equal | `{ "price": { "$lte": 100 } }` |
### Range Queries
[Section titled “Range Queries”](#range-queries)
Multiple operators on the same field are implicitly combined with AND:
```json
{ "age": { "$gte": 18, "$lt": 65 } }
{ "price": { "$gt": 10, "$lte": 50, "$ne": 25 } }
```
## Array Operators
[Section titled “Array Operators”](#array-operators)
| Operator | Description | Example |
| ------------ | ------------------------------- | --------------------------------------------------- |
| `$in` | Value in array | `{ "status": { "$in": ["active", "pending"] } }` |
| `$nin` | Value not in array | `{ "status": { "$nin": ["deleted", "archived"] } }` |
| `$all` | Array contains all values | `{ "tags": { "$all": ["urgent", "bug"] } }` |
| `$size` | Array has specific length | `{ "items": { "$size": 3 } }` |
| `$elemMatch` | Array element matches condition | `{ "scores": { "$elemMatch": { "$gte": 80 } } }` |
### $elemMatch Examples
[Section titled “$elemMatch Examples”](#elemmatch-examples)
Scalar array: match elements where scores >= 80 AND < 90:
```json
{ "scores": { "$elemMatch": { "$gte": 80, "$lt": 90 } } }
```
Object array: match an item with product=“xyz” AND score >= 8:
```json
{ "items": { "$elemMatch": { "product": "xyz", "score": { "$gte": 8 } } } }
```
## Field Operators
[Section titled “Field Operators”](#field-operators)
| Operator | Description | Example |
| --------- | ------------------- | ---------------------------------- |
| `$exists` | Field exists or not | `{ "email": { "$exists": true } }` |
Field must exist:
```json
{ "phone": { "$exists": true } }
```
Field must not exist:
```json
{ "deletedAt": { "$exists": false } }
```
## Logical Operators
[Section titled “Logical Operators”](#logical-operators)
### $and
[Section titled “$and”](#and)
Explicit AND of conditions:
```json
{ "$and": [
{ "status": "active" },
{ "age": { "$gte": 18 } }
] }
```
Multiple fields in the same object are implicitly AND:
```json
{ "status": "active", "age": { "$gte": 18 } }
```
### $or
[Section titled “$or”](#or)
Match any condition:
```json
{ "$or": [
{ "status": "active" },
{ "status": "pending" }
] }
```
### $not
[Section titled “$not”](#not)
Negate a condition:
```json
{ "price": { "$not": { "$gt": 100 } } }
```
### $nor
[Section titled “$nor”](#nor)
Match none of the conditions (equivalent to `$not` + `$or`):
```json
{ "$nor": [
{ "status": "deleted" },
{ "status": "archived" }
] }
```
### Combined Logical Operators
[Section titled “Combined Logical Operators”](#combined-logical-operators)
```json
{
"$and": [
{ "$or": [
{ "status": "active" },
{ "priority": "high" }
] },
{ "age": { "$gte": 18 } }
]
}
```
## Null Values
[Section titled “Null Values”](#null-values)
Querying for null values:
```json
{ "middleName": { "$eq": null } }
{ "deletedAt": null }
```
## Supported Data Types
[Section titled “Supported Data Types”](#supported-data-types)
| Type | Example | Description |
| ------------ | ---------------------------- | ----------------------------------------------------------------- |
| String | `"Alice"` | UTF-8 string |
| Int32 | `25` | 32-bit integer |
| Int64 | `9223372036854775807` | 64-bit integer |
| Double | `19.99` | 64-bit floating point |
| Decimal128 | `"123.456"` | 128-bit decimal |
| Boolean | `true`, `false` | Boolean value |
| Null | `null` | Null value |
| DateTime | BSON DateTime | Date and time |
| Timestamp | BSON Timestamp | Timestamp |
| Binary | BSON Binary | Binary data |
| ObjectId | `"6835a1c0e4b0f72a3c000001"` | 12-byte unique identifier (auto-detected from 24-char hex string) |
| Versionstamp | BSON Binary (12 bytes) | FoundationDB Versionstamp |
| Array | `[1, 2, 3]` | Array of values |
| Document | `{ "nested": "value" }` | Nested document |
## Nested Field Access
[Section titled “Nested Field Access”](#nested-field-access)
Use dot notation for nested fields:
```json
{ "user.address.city": "Istanbul" }
{ "metadata.version": { "$gte": 2 } }
```
## Examples
[Section titled “Examples”](#examples)
### Find active users over 18
[Section titled “Find active users over 18”](#find-active-users-over-18)
```json
{ "status": "active", "age": { "$gt": 18 } }
```
### Find orders with specific statuses
[Section titled “Find orders with specific statuses”](#find-orders-with-specific-statuses)
```json
{ "orderStatus": { "$in": ["shipped", "delivered"] } }
```
### Find products in the price range
[Section titled “Find products in the price range”](#find-products-in-the-price-range)
```json
{ "price": { "$gte": 10, "$lte": 100 } }
```
### Find users with verified email
[Section titled “Find users with verified email”](#find-users-with-verified-email)
```json
{ "email": { "$exists": true }, "emailVerified": true }
```
### Find documents with tags containing both “urgent” and “bug”
[Section titled “Find documents with tags containing both “urgent” and “bug””](#find-documents-with-tags-containing-both-urgent-and-bug)
```json
{ "tags": { "$all": ["urgent", "bug"] } }
```
### Complex query with OR and AND
[Section titled “Complex query with OR and AND”](#complex-query-with-or-and-and)
```json
{
"$or": [
{ "priority": "high", "status": "open" },
{ "dueDate": { "$lt": "2025-01-01" } }
]
}
```
## Input/Output Formats
[Section titled “Input/Output Formats”](#inputoutput-formats)
BQL accepts queries in both JSON and BSON formats. Use the `SESSION.ATTRIBUTE` command to configure:
* `input_type`: `json` or `bson`
* `reply_type`: `json` or `bson`
## Type Matching
[Section titled “Type Matching”](#type-matching)
The query engine enforces strict type matching during predicate evaluation. This behavior is always active and not configurable.
Non-numeric types (`STRING`, `BOOLEAN`, `DATETIME`, etc.) are strictly separated: a type mismatch always evaluates to `false`. For numeric types (`INT32`, `INT64`, `DOUBLE`, `DECIMAL128`), lossless numeric widening is supported. An `INT32` predicate can match an `INT64` value, but a `STRING` predicate never matches an `INT32` value.
### Example
[Section titled “Example”](#example)
INT32 predicate matches INT32 and INT64 values:
```json
{ "age": { "$eq": 25 } }
```
STRING predicate never matches INT32 values:
```json
{ "age": { "$eq": "25" } }
```
For the full widening rules, common type resolution, and index-side type enforcement (`bucket.index.strict_types`), see [Strict Types](/docs/bucket/strict-types/).
## Limitations
[Section titled “Limitations”](#limitations)
* Decimal128 indexing is not yet supported
* Full-text search is not supported
* Geospatial queries are not supported
* Aggregation pipeline is planned for future releases
# Kronotop developer preview is out!
Hello,
After almost 3 years of development, the first developer preview of Kronotop is out!
Kronotop is a distributed multi-model database built on FoundationDB. It provides a document model (Bucket) and an ordered key-value model (ZMap). Both live inside namespaces and share one transaction boundary.
Kronotop uses RESP2 and RESP3 as the wire protocol, so it’s compatible with the existing RESP ecosystem.
Please note that this is a developer preview: the architecture and the transaction model are stable, but APIs and internal formats can still change before the first stable release.
## Try it
[Section titled “Try it”](#try-it)
Run a demo cluster with Docker Compose, then insert a document and read it back:
```bash
curl -O https://kronotop.com/kronotop-quickstart.yaml
docker compose -f kronotop-quickstart.yaml up
```
The [Quickstart](/docs/quickstart/) covers the rest: how to connect and run your first query.
Questions and feedback are welcome on [Discord](https://discord.gg/VPRNvdh2C).
# Architecture
> How Kronotop is structured, from the RESP front end down to the FoundationDB and Volume storage layers.
RESP-compatible clients talk to a Kronotop member over TCP. The member hosts the data models: [Bucket](/docs/bucket/) for documents, [ZMap](/docs/zmap/) for ordered key-value data. Both share the same sessions, [namespaces](/docs/namespaces/), and [transactions](/docs/transactions/). Underneath, FoundationDB stores metadata, indexes, ZMap data, and cluster state, while [Volume](/docs/volume/), Kronotop’s storage engine, keeps document bodies on the member’s local disk.
The rest of this page explains what Kronotop delegates to FoundationDB, how data is split into shards, and how shards are replicated.
## Wire Protocol and Sessions
[Section titled “Wire Protocol and Sessions”](#wire-protocol-and-sessions)
Kronotop speaks RESP2 and RESP3 and works with existing RESP-compatible clients. There is no separate query endpoint or admin protocol; everything, including cluster administration, is a RESP command. Each member listens on two ports: one for clients, one for cluster administration.
Every client connection is bound to a [session](/docs/sessions/). The session holds its attributes, its current namespace, the active transaction if one is open, and its cursors.
## Kronotop and FoundationDB
[Section titled “Kronotop and FoundationDB”](#kronotop-and-foundationdb)
Every Kronotop [transaction](/docs/transactions/) is a FoundationDB transaction; strict serializability, conflict detection, and the ordered keyspace come from FoundationDB. Kronotop adds the RESP front end, the document layer with its query engine and indexes, and the Volume storage engine on top.
FoundationDB holds everything that must be transactional and small: bucket metadata and index entries, ZMap data, namespace directories, volume metadata, and cluster state. Document bodies are the exception. FoundationDB is optimized for small key-value pairs and enforces a 100 KB value-size limit, which document bodies routinely exceed. [Volume](/docs/volume/) offloads them to append-only segment files on the member’s local disk and keeps only pointers in FoundationDB.
The write path preserves transactional guarantees across the split: a document body is appended to a segment file and flushed to disk first; only after the flush succeeds is the metadata committed to FoundationDB. Metadata never references content that has not been persisted.
## Sharding
[Section titled “Sharding”](#sharding)
Bucket data is partitioned into shards. Each shard owns exactly one volume, named after the shard (`bucket-shard-0`, `bucket-shard-1`, and so on). A bucket spans one or more shards, so its documents are distributed across one or more volumes. Within a volume, each bucket’s data is isolated by a prefix.
Shard ownership is assigned through cluster routing: one member is the primary and serves writes, standby members replicate from it and can be promoted. Each shard also carries a status that controls traffic: `READWRITE`, `READONLY`, or `INOPERABLE`. Routing and status are managed with the [admin command interface](/docs/admin/cluster/operations-guide/).
ZMap data is not sharded by Kronotop. It lives directly in FoundationDB, which partitions its own keyspace automatically.
## Replication
[Section titled “Replication”](#replication)
Volume replication is asynchronous and primary-to-standby. Each shard’s volume is replicated independently: standbys pull from the primary, first copying existing segment data in chunks until they reach the primary’s current write position, then streaming incremental changes from a changelog maintained in FoundationDB. Replication progress is also persisted in FoundationDB, so a standby can restart and resume exactly where it left off.
Replication starts automatically when a standby is assigned through cluster routing. Promoting a standby and reassigning shards are explicit operator actions. See [Volume replication](/docs/volume/#replication) for the mechanics and the [operations guide](/docs/admin/cluster/operations-guide/) for the commands.
## Cluster Coordination
[Section titled “Cluster Coordination”](#cluster-coordination)
All coordination state goes through FoundationDB; there is no separate consensus layer or gossip protocol. Members do not talk to each other to agree on cluster state, they read and write it in FoundationDB.
Failure detection works the same way. Each member periodically increments a heartbeat counter in FoundationDB, and other members expect that counter to keep advancing. A member whose counter stalls beyond a configured silent period is suspected dead. This is a local judgement made by each observer, not a cluster-wide consensus, and a suspected member drops off the list as soon as its heartbeats resume. See [health monitoring](/docs/admin/cluster/operations-guide/#health-monitoring) for the details.
# Bucket
> A bucket is a named collection of BSON documents that lives inside a namespace.
## Overview
[Section titled “Overview”](#overview)
A bucket is a named collection of BSON documents that lives inside a [namespace](/docs/namespaces/). Buckets are the primary unit of the document model in Kronotop. You insert documents into a bucket, query them with BQL (Bucket Query Language), and manage indexes on their fields.
Buckets are distributed across one or more shards. Each shard is owned by a primary member and optionally replicated to standby members. The number of shards is configured at the cluster level and assigned at bucket creation time.
## Documents
[Section titled “Documents”](#documents)
Documents are stored internally as BSON. Clients can send documents as JSON. Kronotop converts them to BSON on ingestion. The session’s document format setting controls whether query results are returned as JSON or BSON.
Every document has an `_id` field of type ObjectId. If the client omits `_id` on insert, Kronotop generates one automatically. If the client provides an `_id`, it must be a valid ObjectId. The `_id` field is immutable. It cannot be changed after insertion.
```kronotop
> BUCKET.INSERT users DOCS '{"name": "Alice", "age": 30}'
1) "6835a1c0e4b0f72a3c000001"
> BUCKET.INSERT users DOCS '{"name": "Bob", "age": 25}' '{"name": "Carol", "age": 42}'
1) "6835a1c0e4b0f72a3c000002"
2) "6835a1c0e4b0f72a3c000003"
```
The returned value is a list of ObjectIds assigned to the inserted documents.
## Query Language (BQL)
[Section titled “Query Language (BQL)”](#query-language-bql)
BQL is the filter language used by `BUCKET.QUERY`, `BUCKET.DELETE`, and `BUCKET.UPDATE`. Filters are JSON objects containing field names and operator expressions.
### Comparison Operators
[Section titled “Comparison Operators”](#comparison-operators)
| Operator | Description |
| -------- | ------------------------ |
| `$eq` | Equal to |
| `$ne` | Not equal to |
| `$gt` | Greater than |
| `$gte` | Greater than or equal to |
| `$lt` | Less than |
| `$lte` | Less than or equal to |
### Logical Operators
[Section titled “Logical Operators”](#logical-operators)
| Operator | Description |
| -------- | --------------------------------- |
| `$and` | All conditions must match |
| `$or` | At least one condition must match |
| `$nor` | None of the conditions must match |
| `$not` | Negates a condition |
### Array Operators
[Section titled “Array Operators”](#array-operators)
| Operator | Description |
| ------------ | ----------------------------------------- |
| `$in` | Field matches any value in the array |
| `$nin` | Field matches none of the values |
| `$all` | Array field contains all specified values |
| `$elemMatch` | Array element matches a sub-query |
| `$size` | Array has the specified length |
### Other Operators
[Section titled “Other Operators”](#other-operators)
| Operator | Description |
| --------- | ------------------------- |
| `$exists` | Field exists or is absent |
### Sorting and Pagination
[Section titled “Sorting and Pagination”](#sorting-and-pagination)
`BUCKET.QUERY` and `BUCKET.UPDATE` accept `SORTBY ` to control result ordering and `LIMIT ` to cap the number of documents processed per batch. Results are paginated through cursors. Call `BUCKET.ADVANCE` to fetch the next batch, and `BUCKET.CLOSE` to release the cursor when done.
```kronotop
> BUCKET.QUERY users '{"age": {"$gte": 18}}' SORTBY name ASC LIMIT 10
1# "cursor_id" => (integer) 1
2# "entries" =>
1) {"_id": "6835a1c0e4b0f72a3c000001", "name": "Alice", "age": 30}
2) {"_id": "6835a1c0e4b0f72a3c000002", "name": "Bob", "age": 25}
> BUCKET.ADVANCE QUERY 1
1# "cursor_id" => (integer) 1
2# "entries" => 1) {"_id": "6835a1c0e4b0f72a3c000003", "name": "Carol", "age": 42}
> BUCKET.CLOSE QUERY 1
OK
```
## Query Processing
[Section titled “Query Processing”](#query-processing)
Queries pass through a multi-stage pipeline before execution:
1. **Parser** - Parses the BQL filter string into an abstract syntax tree (`BqlParser`).
2. **Logical Planner** - Converts the AST into a logical plan and applies optimization transforms: flattening nested AND/OR nodes, eliminating double negation, detecting contradictions, removing tautologies, folding constants, and pruning redundant conditions.
3. **Physical Planner** - Converts the logical plan into a physical plan by selecting scan strategies (index scan, range scan, or full scan) based on available indexes.
4. **Optimizer** - Applies rule-based optimization passes: eliminating redundant scans, consolidating adjacent range scans, introducing index intersections, and ordering scans by selectivity.
5. **Executor** - Executes the physical plan against indexes and the volume layer, applying any residual predicates that could not be pushed into index scans.
### Plan Caching
[Section titled “Plan Caching”](#plan-caching)
Query plans are cached by query shape, the structural fingerprint of a query without its parameter values. Two queries with the same operators, field paths, and value types but different literal values share the same cached plan. The cache holds up to 200 plans per bucket and is invalidated when indexes are created or dropped.
Use `BUCKET.EXPLAIN` to inspect the plan for a query and see whether it was served from the cache.
## Indexes
[Section titled “Indexes”](#indexes)
### Primary Index
[Section titled “Primary Index”](#primary-index)
Every bucket has a primary index on the `_id` field, created automatically at bucket creation time. The primary index maps ObjectId values to document locations. It is always in `READY` status and cannot be dropped.
### Secondary Indexes
[Section titled “Secondary Indexes”](#secondary-indexes)
Secondary indexes are user-defined indexes on document fields. Create them with `BUCKET.INDEX CREATE`, providing a JSON schema that maps field selectors to their BSON type and optional flags.
```kronotop
> BUCKET.INDEX CREATE users '{"age": {"bson_type": "int32"}}'
OK
```
A query returns the same results regardless of which indexes exist. Indexes only affect performance.
### Multi-Key Indexes
[Section titled “Multi-Key Indexes”](#multi-key-indexes)
When a field contains an array, a multi-key index creates a separate entry for each element of the array, making queries on array contents efficient. Create a multi-key index by setting `multi_key` to `true` in the field schema:
```kronotop
> BUCKET.INDEX CREATE users '{"tags": {"bson_type": "string", "multi_key": true}}'
OK
```
### Compound Indexes
[Section titled “Compound Indexes”](#compound-indexes)
A compound index covers multiple fields in a defined order. Instead of maintaining separate single-field indexes, a compound index lets the query engine satisfy multi-field predicates with a single index scan. Fields are declared inside the `$compound` key of the index schema:
```kronotop
> BUCKET.INDEX CREATE '{
"$compound": [{
"name": "",
"fields": [
{"selector": "", "bson_type": ""},
{"selector": "", "bson_type": ""}
]
}]
}'
OK
```
See [Compound Indexes](/docs/bucket/compound-index/) for the prefix rule, range scan behavior, and full constraints.
### Vector Indexes
[Section titled “Vector Indexes”](#vector-indexes)
A vector index supports similarity search on fields that contain fixed-dimension numeric vectors. It uses [JVector](https://github.com/datastax/jvector) and supports three distance functions: `cosine`, `euclidean`, and `dot_product`. Create a vector index with the `$vector` schema:
```kronotop
> BUCKET.INDEX CREATE products '{
"$vector": {"field": "embedding", "dimensions": 3, "distance": "cosine"}
}'
OK
```
| Parameter | Required | Description |
| ------------ | -------- | ------------------------------------------------------------- |
| `field` | Yes | Document field containing the vector. Supports dot notation. |
| `dimensions` | Yes | Number of dimensions (must be >= 1). |
| `distance` | Yes | Similarity function: `cosine`, `euclidean`, or `dot_product`. |
| `name` | No | Index name. Auto-generated if omitted. |
Vector indexes require single-shard buckets. Use `BUCKET.VECTOR` to search. See [BUCKET.VECTOR](/docs/bucket/commands/bucket-vector/) for query syntax and filtering options.
### Index Lifecycle
[Section titled “Index Lifecycle”](#index-lifecycle)
Indexes are built asynchronously. After creation, an index progresses through these states:
| Status | Description |
| ---------- | ------------------------------------- |
| `WAITING` | Queued for building |
| `BUILDING` | Background task is populating entries |
| `READY` | Available for query planning |
| `DROPPED` | Marked for deletion and cleanup |
| `FAILED` | Build failed |
Use `BUCKET.INDEX TASKS` to monitor the progress of index builds.
## Sharding
[Section titled “Sharding”](#sharding)
Buckets span one or more shards. At creation time, shards can be assigned explicitly or selected automatically via round-robin:
```kronotop
> BUCKET.CREATE users
OK
> BUCKET.CREATE orders SHARDS 0 1
OK
```
When no `SHARDS` clause is given, Kronotop picks a shard using round-robin selection. Use `BUCKET.LOCATE` to see which shards a bucket spans and the addresses of their primary and standby members:
```kronotop
> BUCKET.LOCATE users
1) (integer) 0 # shard id
2) "10.0.0.1:5484" # primary address
3) 1) "10.0.0.2:5484" # standby addresses
```
## Two-Phase Removal
[Section titled “Two-Phase Removal”](#two-phase-removal)
Deleting a bucket follows the same two-phase pattern as [namespace removal](/docs/namespaces/#two-phase-removal):
1. **`BUCKET.REMOVE`** : Marks the bucket as logically removed. The bucket becomes inaccessible and background workers (index maintenance, replication) are signaled to stop.
2. **`BUCKET.PURGE`** : Permanently deletes the bucket data. Before proceeding, the command enforces a distributed sync barrier that verifies every alive cluster member has observed the removal event. If the barrier is not satisfied, the command returns `BARRIERNOTSATISFIED` and should be retried.
## Commands
[Section titled “Commands”](#commands)
| Command | Description |
| ------------------------------------------------------- | ------------------------------------------------ |
| [BUCKET.CREATE](/docs/bucket/commands/bucket-create/) | Create a new bucket |
| [BUCKET.INSERT](/docs/bucket/commands/bucket-insert/) | Insert documents into a bucket |
| [BUCKET.QUERY](/docs/bucket/commands/bucket-query/) | Query documents using a BQL filter |
| [BUCKET.DELETE](/docs/bucket/commands/bucket-delete/) | Delete documents matching a filter |
| [BUCKET.UPDATE](/docs/bucket/commands/bucket-update/) | Update documents matching a filter |
| [BUCKET.EXPLAIN](/docs/bucket/commands/bucket-explain/) | Show the execution plan for a query |
| [BUCKET.INDEX](/docs/bucket/commands/bucket-index/) | Create, list, describe, drop, or analyze indexes |
| [BUCKET.ADVANCE](/docs/bucket/commands/bucket-advance/) | Fetch the next batch from a cursor |
| [BUCKET.CLOSE](/docs/bucket/commands/bucket-close/) | Close a cursor and release its resources |
| [BUCKET.CURSORS](/docs/bucket/commands/bucket-cursors/) | List active cursors for the session |
| [BUCKET.VECTOR](/docs/bucket/commands/bucket-vector/) | Perform vector similarity search on a bucket |
| [BUCKET.LOCATE](/docs/bucket/commands/bucket-locate/) | Show shard routing information for a bucket |
| [BUCKET.LIST](/docs/bucket/commands/bucket-list/) | List all buckets in the current namespace |
| [BUCKET.REMOVE](/docs/bucket/commands/bucket-remove/) | Mark a bucket for removal (phase 1) |
| [BUCKET.PURGE](/docs/bucket/commands/bucket-purge/) | Permanently delete a removed bucket (phase 2) |
# Collation
> Collation controls how strings are compared and ordered.
Collation controls how strings are compared and ordered. By default, Kronotop compares strings using binary (byte-order) comparison, which works well for ASCII text but produces unexpected results for accented characters, mixed-case text, and non-Latin scripts. Collation uses [ICU4J](https://unicode-org.github.io/icu/userguide/icu4j/) (International Components for Unicode for Java) to provide locale-aware string comparison, enabling case-insensitive searches, accent-insensitive matching, and natural language ordering. ICU4J handles all collation behavior: sort key generation, strength levels, numeric ordering, and locale-specific rules.
Kronotop supports collation at three levels: bucket, index, and query.
## Collation Specification
[Section titled “Collation Specification”](#collation-specification)
A collation is specified as a JSON object. Only `locale` is required; all other fields have sensible defaults:
```json
{"locale": "tr"}
```
The full set of fields:
| Field | Type | Required | Default | Description |
| ------------------ | ------- | -------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `locale` | string | Yes | — | ICU locale identifier (e.g., `"en"`, `"tr"`, `"fr"`, `"de"`, `"es"`). |
| `strength` | integer | No | `3` | Comparison strength level (1—5). See [Strength Levels](#strength-levels). |
| `case_level` | boolean | No | `false` | When `true`, adds a case distinction level between primary and secondary comparisons. |
| `case_first` | string | No | `"off"` | Controls whether uppercase or lowercase letters sort first: `"upper"`, `"lower"`, or `"off"`. |
| `numeric_ordering` | boolean | No | `false` | When `true`, digit substrings are compared as numbers. See [Numeric Ordering](#numeric-ordering). |
| `alternate` | string | No | `"non-ignorable"` | Controls handling of spaces and punctuation: `"non-ignorable"` or `"shifted"`. See [Alternate and MaxVariable](#alternate-and-maxvariable). |
| `backwards` | boolean | No | `false` | When `true`, reverses the secondary (accent) comparison pass. Useful for some French sorting rules. |
| `normalization` | boolean | No | `false` | When `true`, performs Unicode normalization before comparison. |
| `max_variable` | string | No | `"punct"` | When `alternate` is `"shifted"`, controls which characters become ignorable: `"punct"` or `"space"`. |
A full specification with all fields:
```json
{
"locale": "en",
"strength": 2,
"case_level": false,
"case_first": "off",
"numeric_ordering": false,
"alternate": "non-ignorable",
"backwards": false,
"normalization": false,
"max_variable": "punct"
}
```
## Strength Levels
[Section titled “Strength Levels”](#strength-levels)
The `strength` field controls how fine-grained the comparison is. Lower strengths ignore more differences.
| Strength | Name | Behavior | Example |
| -------- | ---------- | -------------------------------------------------------------- | ------------------------------------------------------- |
| 1 | PRIMARY | Base letter only. Case and accents are ignored. | `"cafe"` = `"CAFE"` = `"café"` |
| 2 | SECONDARY | Base letter + accents. Case is ignored. | `"cafe"` = `"Cafe"`, but `"cafe"` ≠ `"café"` |
| 3 | TERTIARY | Base letter + accents + case. Default. | `"cafe"` ≠ `"Cafe"` ≠ `"café"` |
| 4 | QUATERNARY | Adds punctuation distinctions when `alternate` is `"shifted"`. | `"black-bird"` ≠ `"blackbird"` (with shifted alternate) |
| 5 | IDENTICAL | All differences are significant, including code point. | Distinguishes canonically equivalent sequences. |
Strengths 1 and 2 are the most commonly used. Strength 1 provides the broadest matching (case-insensitive and accent-insensitive), while strength 2 provides case-insensitive but accent-sensitive matching.
## Numeric Ordering
[Section titled “Numeric Ordering”](#numeric-ordering)
When `numeric_ordering` is `true`, contiguous digit substrings within strings are compared as numbers instead of character by character. This produces natural sorting for strings with embedded numbers.
Without numeric ordering (default), strings with embedded numbers are compared character by character, the same ordering as binary comparison without any collation:
```plaintext
"item1" < "item10" < "item2" < "item20"
```
With `numeric_ordering: true`:
```plaintext
"item1" < "item2" < "item10" < "item20"
```
### Limitations
[Section titled “Limitations”](#limitations)
* Works for positive integers embedded in strings.
* Negative numbers are not supported. The minus sign is treated as a separator, so `"-2"` and `"-10"` do not sort numerically.
* Decimal numbers are not supported. The decimal point is treated as a separator, so `"2.1"` and `"2.10"` do not sort as expected.
* The `+` prefix is not supported as a positive sign. It is treated as a separator, so `"+2"` and `"+10"` do not sort as signed numbers.
* Exponent notation is not supported. In strings like `"1e5"` or `"2E3"`, the letter acts as a separator and digit groups on each side are compared independently as integers.
## Alternate and MaxVariable
[Section titled “Alternate and MaxVariable”](#alternate-and-maxvariable)
The `alternate` and `max_variable` fields work together to control whether spaces and punctuation are significant during comparison.
| `alternate` | `max_variable` | Effect |
| ----------------- | -------------- | ---------------------------------------------------------------------------------------------- |
| `"non-ignorable"` | (ignored) | Spaces and punctuation are significant at all strength levels. This is the default. |
| `"shifted"` | `"punct"` | Spaces and punctuation are ignorable. `"black-bird"` = `"blackbird"` = `"black bird"`. |
| `"shifted"` | `"space"` | Only spaces are ignorable. `"black bird"` = `"blackbird"`, but `"black-bird"` ≠ `"blackbird"`. |
`max_variable` has no effect when `alternate` is `"non-ignorable"`.
## Collation Levels
[Section titled “Collation Levels”](#collation-levels)
### Bucket-level collation
[Section titled “Bucket-level collation”](#bucket-level-collation)
Set a default collation for the entire bucket at creation time. All string indexes in the bucket inherit this collation unless overridden at the index level.
```kronotop
> BUCKET.CREATE users COLLATION '{"locale": "en", "strength": 2}'
OK
```
Every string index created in this bucket uses case-insensitive English collation by default.
### Index-level collation
[Section titled “Index-level collation”](#index-level-collation)
Set collation on individual indexes in the index schema. This overrides the bucket-level collation for that specific index. Collation is only valid for `string` type fields.
Single-field index:
```kronotop
> BUCKET.INDEX CREATE users '{
"username": {
"bson_type": "string",
"collation": {"locale": "tr", "strength": 2}
}
}'
OK
```
Compound index:
```kronotop
> BUCKET.INDEX CREATE products '{
"$compound": [{
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
],
"collation": {"locale": "en"}
}]
}'
OK
```
Compound indexes require at least one `string` field for collation to apply.
### Query-level collation
[Section titled “Query-level collation”](#query-level-collation)
Override collation for a single operation using the `COLLATION` parameter. This takes the highest precedence and overrides both index-level and bucket-level collation.
Query:
```kronotop
> BUCKET.QUERY users '{"name": {"$eq": "alice"}}' COLLATION '{
"locale": "en",
"strength": 1
}'
```
Update:
```kronotop
> BUCKET.UPDATE users '{"name": {"$eq": "alice"}}' '{
"$set": {"verified": true}
}' COLLATION '{"locale": "en", "strength": 1}'
```
Delete:
```kronotop
> BUCKET.DELETE users '{"name": {"$eq": "alice"}}' COLLATION '{
"locale": "en",
"strength": 1
}'
```
## Resolution Precedence
[Section titled “Resolution Precedence”](#resolution-precedence)
Collation resolution follows different rules depending on whether the operation is reading (evaluating predicates) or writing (building index keys).
### Read path (predicate evaluation)
[Section titled “Read path (predicate evaluation)”](#read-path-predicate-evaluation)
When evaluating filter conditions in `BUCKET.QUERY`, `BUCKET.UPDATE`, and `BUCKET.DELETE`, the most specific collation wins:
1. **Query-level** — collation specified in the command (`COLLATION` parameter)
2. **Index-level (single-field)** — collation defined on a single-field index for this selector
3. **Index-level (compound)** — if all READY compound indexes containing the selector as a `string` field agree on the same collation, that collation is used; if any two indexes disagree, the entire compound step is skipped and resolution falls through to bucket-level. Compound indexes with no collation defined are excluded from the agreement check. They neither contribute a collation nor trigger a conflict. For example, if one compound index has collation `"de"` and another has no collation, the result is `"de"`.
4. **Bucket-level** — default collation set at bucket creation
5. **Binary comparison** — no collation; strings are compared byte by byte
When no collation is specified at any level, strings are compared using binary comparison.
### Write path (index key generation)
[Section titled “Write path (index key generation)”](#write-path-index-key-generation)
When writing index entries during `BUCKET.INSERT` and the index-update phase of `BUCKET.UPDATE`, the query-level `COLLATION` parameter has no effect. Index keys are always built with the collation that was fixed at index creation time:
1. **Index-level** — collation defined on the index
2. **Bucket-level** — default collation set at bucket creation
3. **Binary comparison** — no collation; strings are stored as raw bytes
This is intentional: an index is built with a single, stable collation so that its sort keys remain consistent across all writes. Accepting a different collation at write time would corrupt the sort order of existing entries.
**Practical consequence for `BUCKET.UPDATE`**: the `COLLATION` parameter controls which documents the filter matches, but the index keys written for the updated document always use the index’s own collation, regardless of what `COLLATION` was specified in the command.
## Index Compatibility
[Section titled “Index Compatibility”](#index-compatibility)
Collation affects which indexes the query engine can use. When a query’s effective collation does not match an index’s collation, the index is skipped and the query falls back to a full scan.
The rules:
* A `string` index built with a specific collation is only used when the query’s effective collation matches.
* If a query specifies a different collation than the index, the index is skipped.
* A `string` index built without collation (binary) is skipped when the query specifies a collation.
* Non-string indexes (`int32`, `double`, `boolean`, etc.) are unaffected by collation. They are used regardless of the query’s collation setting.
* Compound indexes with at least one `string` field are skipped entirely when the collation does not match.
Use `BUCKET.EXPLAIN` to verify that the expected index is being used when collation is involved:
```kronotop
> BUCKET.EXPLAIN users '{"name": {"$eq": "alice"}}' COLLATION '{
"locale": "en",
"strength": 1
}'
```
If the `BUCKET.EXPLAIN` output shows a `FullScan` instead of an `IndexScan`, the collation may not match the index.
## Practical Example
[Section titled “Practical Example”](#practical-example)
The examples below use RESP3 protocol output. Switch to RESP3 with `HELLO 3` before running the commands.
Create a bucket with Turkish collation at strength 1 (case-insensitive and accent-insensitive), and a string index on `city`:
```kronotop
127.0.0.1:5484> BUCKET.CREATE cities COLLATION '{
"locale": "tr",
"strength": 1
}' INDEXES '{"city": {"bson_type": "string"}}'
OK
```
The `city` index inherits the bucket’s Turkish collation.
Insert documents with different case variations:
```kronotop
BUCKET.INSERT cities DOCS '{"city": "istanbul", "population": 16000000}'
BUCKET.INSERT cities DOCS '{"city": "Istanbul", "population": 16000000}'
BUCKET.INSERT cities DOCS '{"city": "ISTANBUL", "population": 16000000}'
BUCKET.INSERT cities DOCS '{"city": "ankara", "population": 5700000}'
```
Query for `"istanbul"` — with strength 1, all case variations match:
```kronotop
127.0.0.1:5484> BUCKET.QUERY cities '{"city": {"$eq": "istanbul"}}'
1# "cursor_id" => (integer) 1
2# "entries" =>
1) {"_id": "682c5a006597b10d87d13500", "city": "istanbul", "population": 16000000}
2) {"_id": "682c5a006597b10d87d13501", "city": "Istanbul", "population": 16000000}
3) {"_id": "682c5a006597b10d87d13502", "city": "ISTANBUL", "population": 16000000}
```
All three documents match because at primary strength, `"istanbul"`, `"Istanbul"`, and `"ISTANBUL"` are considered equal under Turkish locale rules.
Override with query-level collation at strength 3 (case-sensitive) — only exact matches are returned:
```kronotop
127.0.0.1:5484> BUCKET.QUERY cities '{"city": {"$eq": "istanbul"}}' COLLATION '{
"locale": "tr",
"strength": 3
}'
1# "cursor_id" => (integer) 2
2# "entries" =>
1) {"_id": "682c5a006597b10d87d13500", "city": "istanbul", "population": 16000000}
```
Only the exact match is returned. The query-level collation overrides the bucket’s strength 1 setting for this single query.
Delete with case-insensitive collation — all case variations are deleted:
```kronotop
127.0.0.1:5484> BUCKET.DELETE cities '{"city": {"$eq": "istanbul"}}' COLLATION '{
"locale": "tr",
"strength": 1
}'
1# "cursor_id" => (integer) 3
2# "object_ids" =>
1) "682c5a006597b10d87d13500"
2) "682c5a006597b10d87d13501"
3) "682c5a006597b10d87d13502"
```
# BUCKET.ADVANCE
> Advances a cursor to fetch or process the next batch of documents.
Advances a cursor to fetch or process the next batch of documents.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.ADVANCE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------- | -------- | ---------------------------------------------------------------------------------------------------- |
| `operation` | string | Yes | The operation type. Must be `QUERY`, `DELETE`, or `UPDATE`. |
| `cursor-id` | integer | Yes | The cursor ID returned by the initial command (`BUCKET.QUERY`, `BUCKET.DELETE`, or `BUCKET.UPDATE`). |
## Return Value
[Section titled “Return Value”](#return-value)
The return value depends on the operation type.
**QUERY operation:**
Returns the next batch of matching documents. The format is the same as `BUCKET.QUERY`.
The encoding format of returned documents depends on the session’s `reply_type` setting:
| Format | Response Type | Description |
| ------ | ------------- | ------------------------------- |
| `bson` | Binary | BSON-encoded document (default) |
| `json` | String | JSON-encoded document |
RESP3 (map format):
The response is a map with two keys: `cursor_id` (integer) and `entries` (array of documents).
```kronotop
1# "cursor_id" => (integer)
2# "entries" => [, , ...]
```
RESP2 (array format):
The response is an array with two elements: the cursor ID and a nested array of documents.
```kronotop
1) (integer)
2) 1)
2)
...
```
An empty `entries` array means no documents were available at that moment. It does not mean the cursor is exhausted. A later call may return new documents.
**DELETE operation:**
Deletes the next batch of matching documents and returns their ObjectIds.
The encoding format of returned ObjectIds depends on the session’s `object_id_format` setting (configurable via `SESSION.ATTRIBUTE SET object_id_format `).
RESP3 (map format):
The response is a map with two keys: `cursor_id` (integer) and `object_ids` (array of ObjectIds).
```kronotop
1# "cursor_id" => (integer)
2# "object_ids" => [, , ...]
```
RESP2 (array format):
The response is an array with two elements: the cursor ID and a nested array of ObjectIds.
```kronotop
1) (integer)
2) 1)
2)
...
```
An empty `object_ids` array means no documents were available at that moment. It does not mean the cursor is exhausted. A later call may return new documents.
**UPDATE operation:**
Updates the next batch of matching documents and returns their ObjectIds.
The format is the same as the DELETE operation.
## Cursor Lifecycle
[Section titled “Cursor Lifecycle”](#cursor-lifecycle)
Cursors are created by `BUCKET.QUERY`, `BUCKET.DELETE`, or `BUCKET.UPDATE` commands. Each cursor:
* Is bound to the session that created it
* Stores the query context (filter, sort, limit)
* Tracks the current position in the result set
* Respects the original batch size (LIMIT) from the initial command
The cursor ID must match the operation type. For example, a cursor created by `BUCKET.QUERY` can only be used with `BUCKET.ADVANCE QUERY`.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| -------------------- | ------------------------------------------------------------------------------------ |
| `ERR` | No previous query context found for `` operation with the given cursor id |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
## Examples
[Section titled “Examples”](#examples)
**Paginate through query results:**
```kronotop
> BUCKET.QUERY users '{}' LIMIT 100
1# "cursor_id" => (integer) 34
2# "entries" => [...] (first 100 documents)
> BUCKET.ADVANCE QUERY 1
1# "cursor_id" => (integer) 34
2# "entries" => [...] (next 100 documents)
> BUCKET.ADVANCE QUERY 1
1# "cursor_id" => (integer) 34
2# "entries" => [] (empty)
```
**Batch delete with pagination:**
```kronotop
> BUCKET.DELETE users '{"status": "inactive"}' LIMIT 50
1# "cursor_id" => (integer) 1
2# "object_ids" => [...] (first 50 deleted ObjectIds)
> BUCKET.ADVANCE DELETE 1
1# "cursor_id" => (integer) 1
2# "object_ids" => [...] (next 50 deleted ObjectIds)
```
**Batch update with pagination:**
```kronotop
> BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}' LIMIT 50
1# "cursor_id" => (integer) 1
2# "object_ids" => [...] (first 50 updated ObjectIds)
> BUCKET.ADVANCE UPDATE 1
1# "cursor_id" => (integer) 1
2# "object_ids" => [...] (next 50 updated ObjectIds)
```
# BUCKET.CLOSE
> Closes a cursor and releases its associated query context from the session.
Closes a cursor and releases its associated query context from the session.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.CLOSE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------- | -------- | ---------------------------------------------------------------------------------------------------- |
| `operation` | string | Yes | The operation type. Must be `QUERY`, `DELETE`, or `UPDATE`. |
| `cursor-id` | integer | Yes | The cursor ID returned by the initial command (`BUCKET.QUERY`, `BUCKET.DELETE`, or `BUCKET.UPDATE`). |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ------------------------------------------------------------------------------------------- |
| `ERR` | `no cursor found` if the cursor does not exist or was already closed. |
| `ERR` | `Unknown '' action` if the operation type is not `QUERY`, `DELETE`, or `UPDATE`. |
## Examples
[Section titled “Examples”](#examples)
**Close a query cursor:**
```kronotop
> BUCKET.QUERY users '{}' LIMIT 100
cursor_id -> (integer) 1
entries -> [...] (first 100 documents)
> BUCKET.CLOSE QUERY 1
OK
```
**Double-close returns an error:**
```kronotop
> BUCKET.QUERY users '{}' LIMIT 100
cursor_id -> (integer) 1
entries -> [...]
> BUCKET.CLOSE QUERY 1
OK
> BUCKET.CLOSE QUERY 1
(error) ERR no cursor found
```
**Close after pagination:**
```kronotop
> BUCKET.DELETE users '{"status": "inactive"}' LIMIT 50
cursor_id -> (integer) 1
object_ids -> [...] (first 50 deleted)
> BUCKET.ADVANCE DELETE 1
cursor_id -> (integer) 1
object_ids -> [...] (next 50 deleted)
> BUCKET.CLOSE DELETE 1
OK
```
**Closing one cursor does not affect others:**
```kronotop
> BUCKET.QUERY users '{}' LIMIT 10
cursor_id -> (integer) 1
entries -> [...]
> BUCKET.DELETE users '{"status": "inactive"}' LIMIT 10
cursor_id -> (integer) 2
object_ids -> [...]
> BUCKET.CLOSE QUERY 1
OK
> BUCKET.ADVANCE DELETE 2
cursor_id -> (integer) 2
object_ids -> [...] (still works)
```
# BUCKET.CREATE
> Creates a new bucket with optional shard assignment and index definitions.
Creates a new bucket with optional shard assignment and index definitions.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.CREATE [SHARDS [shard-id ...]] [INDEXES ] [COLLATION ] [IF-NOT-EXISTS]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------------- | ---------- | -------- | ------------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to create. |
| `SHARDS` | integer(s) | No | One or more shard IDs to assign the bucket to. If omitted, a shard is selected automatically. |
| `INDEXES` | JSON | No | Index schema defining secondary indexes to create alongside the bucket. |
| `COLLATION` | JSON | No | Bucket-level collation spec for locale-aware string ordering. See [Collation](/docs/bucket/collation/). |
| `IF-NOT-EXISTS` | flag | No | When specified, the command returns `OK` instead of an error if the bucket already exists. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Shard Assignment
[Section titled “Shard Assignment”](#shard-assignment)
When the `SHARDS` parameter is omitted, Kronotop automatically assigns the bucket to a shard using round-robin selection across available shards. This ensures even distribution of buckets across the cluster. When multiple shard IDs are provided, the bucket is assigned to all specified shards.
Regardless of how shards are selected, every shard ID is validated before the bucket is created: the shard must have a known route. If a shard has no route, the command returns an error and the bucket is not created. A bucket can span shards across multiple nodes.
## Index Schema Format
[Section titled “Index Schema Format”](#index-schema-format)
The `INDEXES` parameter accepts a JSON object that can contain single-field indexes, compound indexes, or both:
```plaintext
{
"": { "bson_type": "" [, "multi_key": true] [, "name": ""] },
"$compound": [ { "fields": [ { "selector": "", "bson_type": "" }, ... ] [, "name": ""] } ]
}
```
### Field selectors
[Section titled “Field selectors”](#field-selectors)
Field selectors use dot notation to address nested fields inside documents. Arrays are traversed automatically. When a selector crosses an array, each element is evaluated independently.
| Selector | Targets |
| -------------- | ---------------------------------------------------------- |
| `name` | Top-level field `name`. |
| `address.city` | `city` inside the nested object `address`. |
| `tags` | The `tags` field itself (use with `multi_key` for arrays). |
| `orders.total` | `total` inside each element of the `orders` array. |
See [BUCKET.INDEX CREATE](/docs/bucket/commands/bucket-index/#bucketindex-create) for detailed examples of dot notation and array traversal.
### Single-field indexes
[Section titled “Single-field indexes”](#single-field-indexes)
Each top-level key (other than `$compound`) is a field selector (dot-notation path). The value defines the index properties:
```json
{
"field_name": {
"bson_type": "type",
"multi_key": true|false,
"name": "optional_custom_name"
}
}
```
| Property | Type | Required | Description |
| ----------- | ------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `bson_type` | string | Yes | The BSON type of the field values. See [Supported BSON Types](/docs/bucket/commands/bucket-index/#supported-bson-types). |
| `multi_key` | boolean | No | When `true`, creates a multi-key index for array fields. Each array element generates a separate index entry. Default: `false`. |
| `name` | string | No | Custom name for the index. If omitted, a name is auto-generated from the selector and type. |
| `collation` | object | No | Collation spec for locale-aware string ordering. Only valid for `string` type. See [Collation](/docs/bucket/collation/). |
### Compound indexes
[Section titled “Compound indexes”](#compound-indexes)
The `$compound` key holds an array of compound index definitions. Each definition specifies an ordered list of fields:
```json
{
"$compound": [
{
"name": "optional_custom_name",
"fields": [
{ "selector": "field_a", "bson_type": "string" },
{ "selector": "field_b", "bson_type": "int32" }
]
}
]
}
```
| Property | Type | Required | Description |
| ----------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------------- |
| `name` | string | No | Custom name for the compound index. If omitted, a name is auto-generated. |
| `collation` | object | No | Collation spec for locale-aware string ordering. Requires at least one `string` field. See [Collation](/docs/bucket/collation/). |
Each field in the `fields` array supports `selector` (required), `bson_type` (required), and `multi_key` (optional, default `false`).
**Constraints:** A compound index must have at least two fields, at most one field can have `multi_key` enabled, and each selector must appear exactly once. See [Compound Indexes](/docs/bucket/compound-index/) for detailed rules.
Every bucket automatically includes a primary index. The indexes defined here are secondary indexes created in the same transaction as the bucket itself.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| --------------------- | ------------------------------------------------------------------------------------------ |
| `BUCKETALREADYEXISTS` | A bucket with the same name already exists (suppressed when `IF-NOT-EXISTS` is specified). |
| `ERR` | Invalid index schema (e.g., missing or unknown `bson_type`). |
| `BUCKETBEINGREMOVED` | A bucket with the same name was previously removed and has not yet been purged. |
## Examples
[Section titled “Examples”](#examples)
**Create a bucket with the default shard assignment:**
```kronotop
> BUCKET.CREATE users
OK
```
**Create a bucket on specific shards:**
```kronotop
> BUCKET.CREATE users SHARDS 0 1
OK
```
**Create a bucket with secondary indexes:**
```kronotop
> BUCKET.CREATE users INDEXES '{"username": {"bson_type": "string"}, "age": {"bson_type": "int32"}}'
OK
```
**Create a bucket on specific shards with indexes:**
```kronotop
> BUCKET.CREATE users SHARDS 0 1 INDEXES '{
"username": {"bson_type": "string"},
"age": {"bson_type": "int32"}
}'
OK
```
**Create a bucket with a named multi-key index:**
```kronotop
> BUCKET.CREATE products INDEXES '{
"tags": {"bson_type": "string", "multi_key": true, "name": "idx_tags"}
}'
OK
```
**Create a bucket with a collated string index:**
```kronotop
> BUCKET.CREATE users INDEXES '{
"username": {"bson_type": "string", "collation": {"locale": "tr", "strength": 2}}
}'
OK
```
**Create a bucket with bucket-level collation:**
```kronotop
> BUCKET.CREATE users COLLATION '{"locale": "en", "strength": 2}'
OK
```
All string indexes in this bucket will use case-insensitive English collation by default, unless overridden at the index level.
**Create a bucket with bucket-level collation and indexes:**
```kronotop
> BUCKET.CREATE products SHARDS 0 1 COLLATION '{"locale": "tr"}' INDEXES '{"name": {"bson_type": "string"}}'
OK
```
The `name` index inherits the bucket’s Turkish collation.
**Create a bucket with a compound index:**
```kronotop
> BUCKET.CREATE products INDEXES '{
"$compound": [{
"name": "idx_cat_price",
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
]
}]
}'
OK
```
**Create a bucket with single-field and compound indexes:**
```kronotop
> BUCKET.CREATE products INDEXES '{
"email": {"bson_type": "string"},
"$compound": [{
"fields": [
{"selector": "status", "bson_type": "string"},
{"selector": "region", "bson_type": "string"}
]
}]
}'
OK
```
**Idempotent creation:**
```kronotop
> BUCKET.CREATE users IF-NOT-EXISTS
OK
> BUCKET.CREATE users IF-NOT-EXISTS
OK
```
**Attempting to create a bucket that already exists:**
```kronotop
> BUCKET.CREATE users
OK
> BUCKET.CREATE users
(error) BUCKETALREADYEXISTS Bucket already exists: users
```
# BUCKET.CURSORS
> Lists all active cursors for the current session.
Lists all active cursors for the current session.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.CURSORS [operation]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `operation` | string | No | Filter cursors by operation type. Must be `QUERY`, `DELETE`, or `UPDATE` (case-insensitive). If omitted, returns cursors for all operation types. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns a mapping of cursor IDs to their query filters for each operation type. Query filters are serialized as JSON regardless of the original format used when creating the cursor.
**RESP3 (map format):**
Without operation filter:
```kronotop
> BUCKET.CURSORS
1# "QUERY" =>
1# (integer) 2 => {}
2# "UPDATE" =>
1# (integer) 1 => {"name": "Henry"}
3# "DELETE" => (empty map)
```
With operation filter (e.g., `BUCKET.CURSORS UPDATE`):
```kronotop
> BUCKET.CURSORS UPDATE
1# "UPDATE" =>
1# (integer) 1 => {"name": "Henry"}
```
**RESP2 (array format):**
Without operation filter:
```kronotop
> BUCKET.CURSORS
1) "QUERY"
2) 1) (integer) 2
2) {}
3) "UPDATE"
4) 1) (integer) 1
2) {"name": "Henry"}
5) "DELETE"
6) (empty array)
```
With operation filter:
```kronotop
> BUCKET.CURSORS UPDATE
1) "UPDATE"
2) 1) (integer) 1
2) {"name": "Henry"}
```
When no cursors exist for an operation type, the corresponding map or array is empty.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ----------------------------------------------------------------------------------- |
| `ERR` | Unknown operation type. The error message format is `Unknown '' action`. |
## Examples
[Section titled “Examples”](#examples)
**List all cursors:**
```kronotop
> BUCKET.QUERY users '{"age": {"$gt": 20}}' LIMIT 10
1# "cursor_id" => (integer) 1
2# "entries" => ... (first 10 documents)
> BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}' LIMIT 5
1# "cursor_id" => (integer) 2
2# "entries" => ... (first 5 object_ids)
> BUCKET.CURSORS
1# "QUERY" =>
1# (integer) 1 => "{"age": {"$gt": 20}}"
2# "UPDATE" =>
1# (integer) 2 => "{"status": "pending"}"
3# "DELETE" => (empty map)
```
**List only query cursors:**
```kronotop
> BUCKET.CURSORS QUERY
1# "QUERY" =>
1# (integer) 2 "{"age": {"$gt": 20}}"
```
**List cursors when none exist:**
```kronotop
> BUCKET.CURSORS
1) QUERY -> (empty map)
2) UPDATE -> (empty map)
3) DELETE -> (empty map)
```
**Verify cursor removal after close:**
```kronotop
> BUCKET.QUERY users '{}' LIMIT 10
1# "cursor_id" => (integer) 1
2# "entries" => ... (documents)
> BUCKET.CURSORS QUERY
1# "QUERY" =>
1# (integer) 2 => {}
> BUCKET.CLOSE QUERY 1
OK
> BUCKET.CURSORS QUERY
1# "QUERY" => (empty map)
```
# BUCKET.DELETE
> Deletes documents from a bucket that match a filter expression.
Deletes documents from a bucket that match a filter expression.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.DELETE [LIMIT ] [COLLATION ]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------------ | -------- | -------------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to delete from. |
| `query` | JSON or BSON | Yes | Filter expression to match documents. Use `{}` to match all documents. |
| `LIMIT` | integer | No | Maximum number of documents to delete per batch. Must be non-negative. |
| `COLLATION` | JSON | No | Query-level collation spec for locale-aware string comparison. Overrides index collation for this query. |
Note: `SORTBY` is not supported for delete operations.
## Return Value
[Section titled “Return Value”](#return-value)
The command returns a cursor ID and an array of ObjectIds for deleted documents. The format depends on the protocol version.
An ObjectId is a 12-byte unique identifier. The encoding format of returned ObjectIds depends on the session’s `object_id_format` setting:
| Format | Response Type | Description |
| ------- | ------------- | ----------------------------------------- |
| `hex` | String | 24-character hex-encoded string (default) |
| `bytes` | Binary | Raw 12-byte array |
To change the format:
```kronotop
SESSION.ATTRIBUTE SET object_id_format hex
SESSION.ATTRIBUTE SET object_id_format bytes
```
**RESP3 (map format):**
```kronotop
1# "cursor_id" => (integer)
2# "object_ids" => 1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
...
```
**RESP2 (array format):**
```kronotop
1) (integer)
2) 1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
...
```
When no documents match the filter, the `object_ids` array is empty.
**Auto-commit mode (default):**
The delete operation is committed immediately. Deleted documents cannot be recovered.
**Transaction mode (within BEGIN/COMMIT):**
The delete operation is not committed until `COMMIT` is called. Use `ROLLBACK` to cancel the delete operation.
## Pagination
[Section titled “Pagination”](#pagination)
When using `LIMIT`, use the cursor ID with `BUCKET.ADVANCE` to delete more matching documents:
```kronotop
BUCKET.ADVANCE DELETE
```
Each call deletes the next batch of documents up to the limit.
## Routing
[Section titled “Routing”](#routing)
`BUCKET.DELETE` is a metadata operation and can be executed from any node. The exception is a bucket with a vector index. A vector-indexed bucket is pinned to a single shard, and deleting a document must also remove its vector from the local graph, so the command must be sent to the node that owns that shard. When the shard is hosted on another node, the server rejects the request with a redirect to that node.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `REJECT` | Only when the bucket has a vector index: the bucket’s shard is hosted on another node. The error includes the target address: `REJECT :`. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
| `NOSUCHBUCKET` | The bucket does not exist. |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
| `VECTORINDEXNOTREADY` | A vector index on the bucket is still bootstrapping. Retry after a short delay. |
| `ERR` | `SORTBY` is an unsupported argument. |
## Examples
[Section titled “Examples”](#examples)
**Delete all documents:**
```kronotop
BUCKET.DELETE users '{}'
```
**Delete with filter:**
```kronotop
BUCKET.DELETE users '{"status": "inactive"}'
```
**Delete with limit:**
```kronotop
BUCKET.DELETE users '{"age": {"$gt": 30}}' LIMIT 50
```
**Delete with collation:**
```kronotop
BUCKET.DELETE users '{"name": "alice"}' COLLATION '{"locale": "en", "strength": 2}'
```
Deletes documents where `name` matches `"alice"` using case-insensitive English collation.
**Batch delete with pagination:**
```kronotop
> BUCKET.DELETE users '{"status": "inactive"}' LIMIT 100
1# "cursor_id" => (integer) 1
2# "object_ids" =>... (first 100 deleted)
> BUCKET.ADVANCE DELETE 1
1# "cursor_id" => (integer) 1
2# "object_ids" => ... (next 100 deleted)
```
**Delete within a transaction:**
```kronotop
BEGIN
BUCKET.DELETE users '{"status": "inactive"}'
COMMIT
```
# BUCKET.EXPLAIN
> Returns the query execution plan for a given query without executing it.
Returns the query execution plan for a given query without executing it.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.EXPLAIN [SORTBY ] [LIMIT ] [COLLATION ]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------------------ | -------- | --------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to explain the query against. |
| `query` | JSON or BSON | Yes | Filter expression to analyze. Use `{}` to match all documents. |
| `SORTBY` | string + direction | No | Sort specification. Requires field name followed by `ASC` or `DESC`. |
| `LIMIT` | integer | No | Maximum number of documents per batch. |
| `COLLATION` | JSON | No | Query-level collation spec. When provided, the plan reflects how collation affects index selection. |
The parameters are identical to `BUCKET.QUERY`. The query is parsed and planned but never executed.
## Return Value
[Section titled “Return Value”](#return-value)
The command returns a map containing the plan cache status and the execution plan.
**RESP3 (map format):**
```kronotop
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "CompoundIndexScan"
3# "id" => (integer) 4
4# "scanType" => "COMPOUND_INDEX_SCAN"
5# "index" => "idx_cat_price"
6# "filters" =>
1) 1# "selector" => "category"
2# "operator" => "EQ"
3# "operand" => "Param[ref=ParamRef[index=0]]"
2) 1# "selector" => "price"
2# "operator" => "GT"
3# "operand" => "Param[ref=ParamRef[index=1]]"
```
**RESP2 (array format):**
```kronotop
1) "is_cached"
2) (false)
3) "plan"
4) 1) "planner_version"
2) (integer) 1
3) "nodeType"
4) "CompoundIndexScan"
5) "id"
6) (integer) 4
7) "scanType"
8) "COMPOUND_INDEX_SCAN"
9) "index"
10) "idx_cat_price"
11) "filters"
12) 1) 1# "selector" => "category"
2# "operator" => "EQ"
3# "operand" => "Param[ref=ParamRef[index=0]]"
2) 1# "selector" => "price"
2# "operator" => "GT"
3# "operand" => "Param[ref=ParamRef[index=1]]"
```
The `query_collation` field reflects the effective collation used during planning. It is omitted when no `COLLATION` parameter is provided.
## Plan Cache
[Section titled “Plan Cache”](#plan-cache)
The `is_cached` field indicates whether the returned plan was retrieved from the plan cache.
* `true`: The plan was previously built and cached by a command that runs the same query shape. Any query-executing command populates the cache: `BUCKET.QUERY`, `BUCKET.DELETE`, `BUCKET.UPDATE`, and filtered `BUCKET.VECTOR`. `BUCKET.EXPLAIN` itself never writes to the cache.
* `false`: The plan was freshly generated for this `BUCKET.EXPLAIN` call.
A query shape is determined by the structure and operators in the query, not the literal operand values. Two queries with the same structure but different values share the same shape and therefore the same cached plan.
## Plan Node Types
[Section titled “Plan Node Types”](#plan-node-types)
Every plan node contains the following common fields:
| Field | Type | Description |
| ----------------- | ------- | ------------------------------------------------ |
| `planner_version` | integer | Plan format version (currently `1`). |
| `nodeType` | string | The type of this plan node. |
| `id` | integer | Unique identifier for this node within the plan. |
When a node has a downstream processing step, it includes a `next` field containing the next node in the pipeline chain.
### FullScan
[Section titled “FullScan”](#fullscan)
Scans all entries in an index. Used when no selective index is available for the query.
| Field | Type | Description |
| ----------- | ------ | -------------------------------------------------------------------------------------------- |
| `scanType` | string | `FULL_SCAN` |
| `index` | string | Name of the index being scanned (typically `primary-index`). |
| `predicate` | map | Residual predicate applied during the scan. See [Residual Predicates](#residual-predicates). |
### IndexScan
[Section titled “IndexScan”](#indexscan)
Scans an index using a single comparison predicate. Used for equality lookups and single-bound inequalities. The operator determines the matched bound and the scan direction.
| Field | Type | Description |
| ---------- | ------ | ----------------------------------------------------------- |
| `scanType` | string | `INDEX_SCAN` |
| `index` | string | Name of the index being scanned. |
| `selector` | string | Field path used for the scan. |
| `operator` | string | Comparison operator (e.g., `EQ`, `LT`, `GT`, `LTE`, `GTE`). |
| `operand` | varies | The value being compared against. |
### RangeScan
[Section titled “RangeScan”](#rangescan)
Scans an index over a bounded range.
| Field | Type | Description |
| -------------- | ------- | ---------------------------------------- |
| `scanType` | string | `RANGE_SCAN` |
| `index` | string | Name of the index being scanned. |
| `selector` | string | Field path used for the range. |
| `lowerBound` | varies | Lower bound value, or null if unbounded. |
| `upperBound` | varies | Upper bound value, or null if unbounded. |
| `includeLower` | boolean | Whether the lower bound is inclusive. |
| `includeUpper` | boolean | Whether the upper bound is inclusive. |
### Union
[Section titled “Union”](#union)
Combines results from multiple child scan nodes using set union (logical OR).
| Field | Type | Description |
| ----------- | ------ | ------------------------- |
| `operation` | string | `UNION` |
| `children` | array | List of child plan nodes. |
### OrderedConcat
[Section titled “OrderedConcat”](#orderedconcat)
Runs child scan nodes one after another in a fixed order, fully consuming one child before moving to the next. Used when an `$in` condition is combined with `SORTBY` on the same indexed field: each child is an equality scan ordered by value, so concatenating them yields globally sorted results without scanning the whole index.
| Field | Type | Description |
| ----------- | ------ | ------------------------- |
| `operation` | string | `ORDERED_CONCAT` |
| `children` | array | List of child plan nodes. |
### CompoundIndexScan
[Section titled “CompoundIndexScan”](#compoundindexscan)
Scans a compound index using a combination of equality prefixes and an optional range on the last matched field.
| Field | Type | Description |
| ---------- | ------ | -------------------------------------------------------------------------------------------------------------------------- |
| `scanType` | string | `COMPOUND_INDEX_SCAN` |
| `index` | string | Name of the compound index being scanned. |
| `filters` | array | Ordered list of per-field filters applied during the scan. Each entry is a map with `selector`, `operator`, and `operand`. |
Each entry in `filters` contains:
| Field | Type | Description |
| ---------- | ------ | ----------------------------------------------------- |
| `selector` | string | Field path for this filter. |
| `operator` | string | Comparison operator (`EQ`, `GT`, `GTE`, `LT`, `LTE`). |
| `operand` | varies | The value being compared against. |
### TransformWithResidualPredicate
[Section titled “TransformWithResidualPredicate”](#transformwithresidualpredicate)
Applies a post-scan filter to results that could not be fully resolved by index scans. This is the `nodeType` value; the `operation` field carries `FILTER`.
| Field | Type | Description |
| ----------- | ------ | ------------------------------------------------------------------------------------ |
| `operation` | string | `FILTER` |
| `predicate` | map | The residual predicate to evaluate. See [Residual Predicates](#residual-predicates). |
## Residual Predicates
[Section titled “Residual Predicates”](#residual-predicates)
Residual predicates represent filter conditions that are evaluated after index scanning. They appear in `FullScan` and `TransformWithResidualPredicate` nodes.
| Type | Fields | Description |
| ------------- | --------------------------------- | ------------------------------------------------------- |
| `PREDICATE` | `selector`, `operator`, `operand` | A single field comparison. |
| `AND` | `children` | Logical AND of multiple predicates. |
| `OR` | `children` | Logical OR of multiple predicates. |
| `ALWAYS_TRUE` | (none) | Matches all documents (used for unfiltered full scans). |
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------- |
| `NOSUCHBUCKET` | The bucket does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
## Examples
[Section titled “Examples”](#examples)
**Explain a full scan (no filter):**
```kronotop
BUCKET.EXPLAIN users '{}'
```
Response (RESP3):
```kronotop
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "FullScan"
3# "id" => (integer) 1
4# "scanType" => "FULL_SCAN"
5# "index" => "primary-index"
6# "predicate" =>
1# "type" => "ALWAYS_TRUE"
```
**Explain an index scan on the primary key:**
```kronotop
BUCKET.EXPLAIN users '{"_id": {"$eq": {"$oid": "6835a1c0e4b0f72a3c000001"}}}'
```
Response (RESP3):
```kronotop
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "IndexScan"
3# "id" => (integer) 2
4# "scanType" => "INDEX_SCAN"
5# "index" => "primary-index"
6# "selector" => "_id"
7# "operator" => "EQ"
8# "operand" => "Param[ref=ParamRef[index=0]]"
```
**Explain a range scan:**
```kronotop
BUCKET.EXPLAIN users '{_id: {$gte: "aaa", $lte: "zzz"}}'
```
Response (RESP3):
```kronotop
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "RangeScan"
3# "id" => (integer) 7
4# "scanType" => "RANGE_SCAN"
5# "index" => "primary-index"
6# "selector" => "_id"
7# "lowerBound" => "Param[ref=ParamRef[index=0]]"
8# "upperBound" => "Param[ref=ParamRef[index=1]]"
9# "includeLower" => (true)
10# "includeUpper" => (true)
```
**Explain a query with an indexed field and a non-indexed field:**
```kronotop
BUCKET.EXPLAIN users '{$and: [{age: {$eq: 25}}, {name: {$eq: "Alice"}}]}'
```
When `age` is indexed but `name` is not, the plan uses an index scan on `age` with a residual predicate filter for `name`:
Response (RESP3):
```kronotop
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "IndexScan"
3# "id" => (integer) 23
4# "scanType" => "INDEX_SCAN"
5# "index" => "selector:age.bsonType:INT32"
6# "selector" => "age"
7# "operator" => "EQ"
8# "operand" => "Param[ref=ParamRef[index=1]]"
9# "next" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "TransformWithResidualPredicate"
3# "id" => (integer) 26
4# "operation" => "FILTER"
5# "predicate" =>
1# "type" => "AND"
2# "children" =>
1) 1# "type" => "PREDICATE"
2# "selector" => "name"
3# "operator" => "EQ"
4# "operand" => "Param[ref=ParamRef[index=0]]"
```
**Explain a compound index scan:**
Given a compound index `idx_cat_price` on `(category: STRING, price: DOUBLE)`:
```kronotop
BUCKET.INDEX CREATE products '{"$compound": [{"name": "idx_cat_price", "fields": [{"selector": "category", "bson_type": "string"}, {"selector": "price", "bson_type": "double"}]}]}'
```
Get the execution plan:
```kronotop
BUCKET.EXPLAIN products '{"category": {"$eq": "electronics"}, "price": {"$gt": 100.0}}'
```
Response (RESP3):
```kronotop
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "CompoundIndexScan"
3# "id" => (integer) 4
4# "scanType" => "COMPOUND_INDEX_SCAN"
5# "index" => "idx_cat_price"
6# "filters" =>
1) 1# "selector" => "category"
2# "operator" => "EQ"
3# "operand" => "Param[ref=ParamRef[index=0]]"
2) 1# "selector" => "price"
2# "operator" => "GT"
3# "operand" => "Param[ref=ParamRef[index=1]]"
```
**Explain a cached plan:**
After running a query, explaining the same query shape returns the cached plan:
```kronotop
> BUCKET.QUERY users '{status: {$eq: "active"}}'
1# "cursor_id" => (integer) 1
2# "entries" => 1) {"_id": "6a240c7b5da17d872dc0e102", "name": "Bob", "age": 25, "status": "active"}
> BUCKET.EXPLAIN users '{status: {$eq: "active"}}'
1# "is_cached" => (true)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "FullScan"
3# "id" => (integer) 1
4# "scanType" => "FULL_SCAN"
5# "index" => "primary-index"
6# "predicate" =>
1# "type" => "PREDICATE"
2# "selector" => "status"
3# "operator" => "EQ"
4# "operand" => "Param[ref=ParamRef[index=0]]"
```
# BUCKET.INDEX
> Manages indexes on bucket fields.
Manages indexes on bucket fields. Indexes accelerate queries by allowing efficient lookups on specific fields.
## Subcommands
[Section titled “Subcommands”](#subcommands)
| Subcommand | Description |
| ---------- | ----------------------------------------------- |
| `CREATE` | Create a new index on one or more fields. |
| `LIST` | List all indexes on a bucket. |
| `DESCRIBE` | Get detailed information about an index. |
| `DROP` | Drop an existing index. |
| `TASKS` | List background maintenance tasks for an index. |
| `ANALYZE` | Trigger index statistics analysis. |
***
## BUCKET.INDEX CREATE
[Section titled “BUCKET.INDEX CREATE”](#bucketindex-create)
Creates one or more indexes on bucket fields.
### Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.INDEX CREATE
```
### Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | --------------------------------------------------------- |
| `bucket` | string | Yes | Name of the target bucket. The bucket must already exist. |
| `schema` | JSON | Yes | Index schema defining the fields to index. |
### Index Schema Format
[Section titled “Index Schema Format”](#index-schema-format)
The schema is a JSON object that can contain single-field indexes, compound indexes, or both:
```plaintext
{
"": { "bson_type": "" [, "multi_key": true] [, "name": ""] },
"$compound": [ { "fields": [ { "selector": "", "bson_type": "" }, ... ] [, "name": ""] } ]
}
```
#### Field selectors
[Section titled “Field selectors”](#field-selectors)
Field selectors use dot notation to address nested fields inside documents. Arrays are traversed automatically. When a selector crosses an array, each element is evaluated independently.
| Selector | Targets |
| -------------- | ---------------------------------------------------------- |
| `name` | Top-level field `name`. |
| `address.city` | `city` inside the nested object `address`. |
| `tags` | The `tags` field itself (use with `multi_key` for arrays). |
| `orders.total` | `total` inside each element of the `orders` array. |
**Example document:**
```json
{
"username": "alice",
"address": { "city": "Istanbul", "zip": "34000" },
"tags": ["admin", "editor"],
"orders": [
{ "total": 120, "status": "shipped" },
{ "total": 45, "status": "pending" }
]
}
```
* Selector `address.city` reaches `"Istanbul"`.
* Selector `tags` with `multi_key: true` indexes each element (`"admin"`, `"editor"`) separately.
* Selector `orders.total` with `multi_key: true` indexes each order’s total (`120`, `45`) separately.
#### Single-field indexes
[Section titled “Single-field indexes”](#single-field-indexes)
Each top-level key (other than `$compound`) is a field selector. The value defines the index properties:
```json
{
"field_name": {
"bson_type": "type",
"multi_key": true|false,
"name": "optional_custom_name"
}
}
```
| Property | Type | Required | Description |
| ----------- | ------- | -------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `bson_type` | string | Yes | The BSON type of the field values. See [Supported BSON Types](#supported-bson-types). |
| `multi_key` | boolean | No | When `true`, creates a multi-key index for array fields. Each array element generates a separate index entry. Default: `false`. |
| `name` | string | No | Custom name for the index. If omitted, a name is auto-generated from the selector and type. |
| `collation` | object | No | Collation spec for locale-aware string ordering. Only valid for `string` type. See [Collation](/docs/bucket/collation/). |
#### Compound indexes
[Section titled “Compound indexes”](#compound-indexes)
The `$compound` key holds an array of compound index definitions. Each definition specifies an ordered list of fields:
```json
{
"$compound": [
{
"name": "optional_custom_name",
"fields": [
{ "selector": "field_a", "bson_type": "string" },
{ "selector": "field_b", "bson_type": "int32", "multi_key": false }
]
}
]
}
```
Each field in the `fields` array supports:
| Property | Type | Required | Description |
| ----------- | ------- | -------- | ------------------------------------------------------------------------------------- |
| `selector` | string | Yes | Dot-notation path to the field. |
| `bson_type` | string | Yes | The BSON type of the field values. See [Supported BSON Types](#supported-bson-types). |
| `multi_key` | boolean | No | When `true`, creates a multi-key index entry per array element. Default: `false`. |
The compound index definition also supports:
| Property | Type | Required | Description |
| ----------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------------- |
| `name` | string | No | Custom name for the compound index. If omitted, a name is auto-generated. |
| `collation` | object | No | Collation spec for locale-aware string ordering. Requires at least one `string` field. See [Collation](/docs/bucket/collation/). |
**Constraints:**
* A compound index must have at least two fields.
* A compound index supports at most 32 fields.
* At most one field can have `multi_key` enabled.
* Each field selector must appear exactly once within a compound index.
See [Compound Indexes](/docs/bucket/compound-index/) for detailed rules including the prefix rule and supported operators.
### Multi-key Index Behavior
[Section titled “Multi-key Index Behavior”](#multi-key-index-behavior)
When `multi_key` is set to `true`, the index is designed for array fields. Each element in the array creates a separate index entry, allowing queries to match documents where any array element satisfies the condition.
**Limitations:**
* **Undefined ordering**: Result ordering is undefined with multi-key indexes. Since each document can have multiple index entries (one per array element), the order in which documents are returned cannot be guaranteed.
* **Index size**: Multi-key indexes can be significantly larger than regular indexes because each array element creates a separate index entry.
* **Type matching**: Only array elements matching the specified `bson_type` are indexed.
### Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
### Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| -------------------- | ------------------------------------ |
| `ERR` | The index already exists. |
| `ERR` | The schema is invalid. |
| `ERR` | Unknown BSON type. |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `BUCKETBEINGREMOVED` | The target bucket is being removed. |
### Examples
[Section titled “Examples”](#examples)
**Create a single-field index:**
```kronotop
> BUCKET.INDEX CREATE users '{"username": {"bson_type": "string"}}'
OK
```
**Create multiple indexes at once:**
```kronotop
> BUCKET.INDEX CREATE users '{"age": {"bson_type": "int32"}, "email": {"bson_type": "string"}}'
OK
```
**Create an index with a custom name:**
```kronotop
> BUCKET.INDEX CREATE users '{"username": {"bson_type": "string", "name": "idx_username"}}'
OK
```
**Create a multi-key index for array fields:**
```kronotop
> BUCKET.INDEX CREATE products '{"tags": {"bson_type": "string", "multi_key": true}}'
OK
```
**Index a nested field using dot notation:**
```kronotop
> BUCKET.INDEX CREATE users '{"address.city": {"bson_type": "string"}}'
OK
```
**Index elements inside an array of objects:**
```kronotop
> BUCKET.INDEX CREATE users '{"orders.total": {"bson_type": "int32", "multi_key": true}}'
OK
```
**Create a single-field index with collation:**
```kronotop
> BUCKET.INDEX CREATE users '{
"username": {
"bson_type": "string",
"collation": {"locale": "tr", "strength": 2}
}
}'
OK
```
**Create a compound index:**
```kronotop
> BUCKET.INDEX CREATE products '{
"$compound": [{
"name": "idx_cat_price",
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
]
}]
}'
OK
```
**Create a compound index with collation:**
```kronotop
> BUCKET.INDEX CREATE products '{
"$compound": [{
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
],
"collation": {"locale": "en"}
}]
}'
OK
```
**Create single-field and compound indexes together:**
```kronotop
> BUCKET.INDEX CREATE products '{
"email": {"bson_type": "string"},
"$compound": [{
"fields": [
{"selector": "status", "bson_type": "string"},
{"selector": "region", "bson_type": "string"}
]
}]
}'
OK
```
***
## BUCKET.INDEX LIST
[Section titled “BUCKET.INDEX LIST”](#bucketindex-list)
Lists all indexes defined on a bucket.
### Syntax
[Section titled “Syntax”](#syntax-1)
```kronotop
BUCKET.INDEX LIST
```
### Parameters
[Section titled “Parameters”](#parameters-1)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ------------------- |
| `bucket` | string | Yes | Name of the bucket. |
### Return Value
[Section titled “Return Value”](#return-value-1)
Returns an array of index names.
### Errors
[Section titled “Errors”](#errors-1)
| Error Code | Description |
| -------------------- | ------------------------------------ |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
### Examples
[Section titled “Examples”](#examples-1)
```kronotop
> BUCKET.INDEX LIST users
1) "primary-index"
2) "selector:username.bsonType:STRING"
3) "selector:age.bsonType:INT32"
```
***
## BUCKET.INDEX DESCRIBE
[Section titled “BUCKET.INDEX DESCRIBE”](#bucketindex-describe)
Gets detailed information about a specific index.
### Syntax
[Section titled “Syntax”](#syntax-2)
```kronotop
BUCKET.INDEX DESCRIBE
```
### Parameters
[Section titled “Parameters”](#parameters-2)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ------------------------------ |
| `bucket` | string | Yes | Name of the bucket. |
| `index` | string | Yes | Name of the index to describe. |
### Return Value
[Section titled “Return Value”](#return-value-2)
Returns a map with the following fields:
| Field | Type | Description |
| ------------ | ------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `index_type` | string | Kind of the index: `single_field`, `compound`, or `vector`. |
| `id` | integer | Index identifier. |
| `selector` | string | The field selector the index is built on. |
| `bson_type` | string | The BSON type of indexed values. |
| `status` | string | Current index status. See [Index Lifecycle](#index-lifecycle). |
| `collation` | map | Collation configuration (see below). All values are null when no collation is set. Only present for single-field and compound indexes. |
| `statistics` | map | Index statistics including `cardinality`. |
#### Collation sub-fields
[Section titled “Collation sub-fields”](#collation-sub-fields)
| Field | Type | Description |
| ------------------ | ------- | --------------------------------------------------------------------------------- |
| `locale` | string | ICU locale identifier (e.g., `"en"`, `"tr"`). |
| `strength` | integer | Comparison strength level (1-5). |
| `case_level` | boolean | Whether to include case-level comparisons. |
| `case_first` | string | Sort order of case differences (`"upper"`, `"lower"`, `"off"`). |
| `numeric_ordering` | boolean | Whether to compare numeric strings as numbers. |
| `alternate` | string | Handling of variable-weight characters (`"non-ignorable"`, `"shifted"`). |
| `backwards` | boolean | Whether to reverse secondary-level comparisons (for French). |
| `normalization` | boolean | Whether to perform Unicode normalization. |
| `max_variable` | string | Which characters are ignorable when `alternate="shifted"` (`"punct"`, `"space"`). |
### Errors
[Section titled “Errors”](#errors-2)
| Error Code | Description |
| -------------------- | ------------------------------------ |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `NOSUCHINDEX` | The specified index does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
### Examples
[Section titled “Examples”](#examples-2)
```kronotop
> BUCKET.INDEX DESCRIBE users "selector:username.bsonType:STRING"
index_type -> "single_field"
id -> 2
selector -> "username"
bson_type -> "STRING"
status -> "WAITING"
collation -> {locale -> (nil), strength -> (nil), case_level -> (nil), case_first -> (nil), numeric_ordering -> (nil), alternate -> (nil), backwards -> (nil), normalization -> (nil), max_variable -> (nil)}
statistics -> {cardinality -> 0}
```
***
## BUCKET.INDEX DROP
[Section titled “BUCKET.INDEX DROP”](#bucketindex-drop)
Drops an existing index from a bucket.
### Syntax
[Section titled “Syntax”](#syntax-3)
```kronotop
BUCKET.INDEX DROP
```
### Parameters
[Section titled “Parameters”](#parameters-3)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | -------------------------- |
| `bucket` | string | Yes | Name of the bucket. |
| `index` | string | Yes | Name of the index to drop. |
### Return Value
[Section titled “Return Value”](#return-value-3)
Returns `OK` on success. The index is marked as `DROPPED` and a background task is created to clean up the index data.
### Errors
[Section titled “Errors”](#errors-3)
| Error Code | Description |
| -------------------- | ------------------------------------------------ |
| `ERR` | Cannot drop the primary index (`primary-index`). |
| `ERR` | The index is already in the `DROPPED` status. |
| `ERR` | The index has active tasks. |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `NOSUCHINDEX` | The specified index does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
### Examples
[Section titled “Examples”](#examples-3)
```kronotop
> BUCKET.INDEX DROP users "selector:username.bsonType:STRING"
OK
```
**Attempting to drop the primary index:**
```kronotop
> BUCKET.INDEX DROP users "primary-index"
(error) ERR Cannot drop the primary index
```
***
## BUCKET.INDEX TASKS
[Section titled “BUCKET.INDEX TASKS”](#bucketindex-tasks)
Lists background maintenance tasks associated with an index.
### Syntax
[Section titled “Syntax”](#syntax-4)
```kronotop
BUCKET.INDEX TASKS
```
### Parameters
[Section titled “Parameters”](#parameters-4)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ------------------- |
| `bucket` | string | Yes | Name of the bucket. |
| `index` | string | Yes | Name of the index. |
### Return Value
[Section titled “Return Value”](#return-value-4)
Returns a map where each key is a task ID and the value contains task details:
**For BUILD tasks:**
| Field | Type | Description |
| -------- | ------ | -------------------------------------- |
| `kind` | string | Task type (`BUILD`). |
| `cursor` | string | Current position in the build process. |
| `lower` | string | Lower bound position. |
| `upper` | string | Upper bound position. |
| `status` | string | Task status. |
| `error` | string | Error message if failed. |
**For DROP tasks:**
| Field | Type | Description |
| -------- | ------ | ------------------------ |
| `kind` | string | Task type (`DROP`). |
| `status` | string | Task status. |
| `error` | string | Error message if failed. |
**For BOUNDARY tasks:**
| Field | Type | Description |
| -------- | ------ | ------------------------ |
| `kind` | string | Task type (`BOUNDARY`). |
| `status` | string | Task status. |
| `error` | string | Error message if failed. |
**For ANALYZE tasks:**
| Field | Type | Description |
| -------- | ------ | ------------------------ |
| `kind` | string | Task type (`ANALYZE`). |
| `status` | string | Task status. |
| `error` | string | Error message if failed. |
### Errors
[Section titled “Errors”](#errors-4)
| Error Code | Description |
| -------------------- | ------------------------------------ |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
### Examples
[Section titled “Examples”](#examples-4)
```kronotop
> BUCKET.INDEX TASKS users "selector:username.bsonType:STRING"
"5K8G4R000000000000000000" -> {
kind -> "BUILD"
cursor -> "5K8G4R000000000000000500"
lower -> "0000000000000000000000000"
upper -> "5K8G4R000000000000001000"
status -> "RUNNING"
error -> ""
}
```
***
## BUCKET.INDEX ANALYZE
[Section titled “BUCKET.INDEX ANALYZE”](#bucketindex-analyze)
Trigger index statistics analysis. Statistics help the query optimizer make better decisions.
### Syntax
[Section titled “Syntax”](#syntax-5)
```kronotop
BUCKET.INDEX ANALYZE
```
### Parameters
[Section titled “Parameters”](#parameters-5)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ----------------------------- |
| `bucket` | string | Yes | Name of the bucket. |
| `index` | string | Yes | Name of the index to analyze. |
### Return Value
[Section titled “Return Value”](#return-value-5)
Returns `OK` on success. A background task is created to compute index statistics.
### Errors
[Section titled “Errors”](#errors-5)
| Error Code | Description |
| -------------------- | ------------------------------------------------------------------------------------------------- |
| `ERR` | An analysis task already exists for this index. |
| `ERR` | The index is not in the `READY` state. Only indexes that have completed building can be analyzed. |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `NOSUCHINDEX` | The specified index does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
### Examples
[Section titled “Examples”](#examples-5)
```kronotop
> BUCKET.INDEX ANALYZE users "selector:username.bsonType:STRING"
OK
```
***
## Index Lifecycle
[Section titled “Index Lifecycle”](#index-lifecycle)
Indexes go through the following states:
| Status | Description |
| ---------- | ---------------------------------------------------------------- |
| `WAITING` | Index is created but background building has not started yet. |
| `BUILDING` | Index is being built by a background task. |
| `READY` | Index is fully built and available for queries. |
| `DROPPED` | Index is marked for deletion; background cleanup is in progress. |
| `FAILED` | Index building failed due to an error. |
***
## Supported BSON Types
[Section titled “Supported BSON Types”](#supported-bson-types)
The following BSON types can be indexed:
| Type | Description |
| ------------ | --------------------------------------------------------------------- |
| `string` | UTF-8 string values. |
| `int32` | 32-bit signed integers. |
| `int64` | 64-bit signed integers. |
| `double` | 64-bit IEEE 754 floating point. |
| `boolean` | Boolean values (`true` / `false`). |
| `datetime` | UTC datetime (milliseconds since Unix epoch). |
| `timestamp` | Internal timestamp type. |
| `binary` | Binary data. |
| `objectid` | ObjectId values. |
| `decimal128` | 128-bit decimal floating point. Not yet fully supported for indexing. |
***
# BUCKET.INSERT
> Inserts one or more documents into a bucket.
Inserts one or more documents into a bucket.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.INSERT DOCS [document ...]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------------ | -------- | ----------------------------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the target bucket. The bucket must already exist (see `BUCKET.CREATE`). |
| `DOCS` | JSON or BSON | Yes | One or more documents to insert. Documents can be in JSON or BSON format depending on the session’s input type setting. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns an array of ObjectIds, one for each inserted document. ObjectIds are returned in both auto-commit mode and within explicit transactions.
An ObjectId is a 12-byte unique identifier. The encoding format of returned ObjectIds depends on the session’s `object_id_format` setting:
| Format | Response Type | Description |
| ------- | ------------- | ----------------------------------------- |
| `hex` | String | 24-character hex-encoded string (default) |
| `bytes` | Binary | Raw 12-byte array |
To change the format:
```kronotop
SESSION.ATTRIBUTE SET object_id_format hex
SESSION.ATTRIBUTE SET object_id_format bytes
```
Example response with `hex` format:
```kronotop
1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
```
Example response with `bytes` format:
```kronotop
1) "\x68\x35\xa1\xc0\xe4\xb0\xf7\x2a\x3c\x00\x00\x01"
2) "\x68\x35\xa1\xc0\xe4\xb0\xf7\x2a\x3c\x00\x00\x02"
```
## The `_id` Field
[Section titled “The \_id Field”](#the-_id-field)
Each document stored in a bucket has an `_id` field that serves as its primary key.
* **Auto-generated**: If a document does not contain an `_id` field, Kronotop automatically generates an ObjectId and injects it into the document before storage.
* **User-provided**: If a document already contains an `_id` field, it must be of type ObjectId. Any other type causes an error.
* **Duplicate detection**: When user-provided `_id` values are used, Kronotop checks the primary index for duplicates. If an `_id` already exists in the bucket, a `DUPLICATEKEY` error is returned.
## Document Format
[Section titled “Document Format”](#document-format)
Documents must be valid JSON or BSON objects. The input format is determined by the session’s `input_type` setting, which can be configured via `SESSION.ATTRIBUTE SET input_type `.
| Format | Description |
| -------- | -------------------------------------------------------------------- |
| **JSON** | Standard JSON format. Useful for debugging and human-readable input. |
| **BSON** | Binary JSON format. Recommended for production use. |
**Internal storage:** Kronotop always stores documents in BSON format regardless of the input format. JSON documents are automatically converted to BSON before storage.
**Production recommendation:** Use BSON as the input format in production environments. BSON avoids the conversion overhead on every insert, reducing CPU usage and latency. BSON also supports richer data types (such as `DateTime`, `Int64`, `Decimal128`) that cannot be natively represented in JSON.
## Routing
[Section titled “Routing”](#routing)
The command must be sent to a node that owns at least one shard assigned to the bucket. If the bucket’s shards are all hosted on other nodes, the server rejects the request with a redirect to the appropriate node.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `REJECT` | The bucket’s shards are hosted on another node. The error includes the target address: `REJECT :`. |
| `DUPLICATEKEY` | A document with the same `_id` already exists in the bucket. |
| `ERR` | `_id field must be of type ObjectId`: the `_id` field in a document is not an ObjectId. |
| `INDEXTYPE_MISMATCH` | The value type does not match the expected index type (when `strict_types` is enabled). |
| `BUCKETBEINGREMOVED` | The target bucket is being removed. |
| `NAMESPACEBEINGREMOVED` | The target namespace is being removed. |
| `NOSUCHNAMESPACE` | The specified namespace does not exist. |
| `VECTORINDEXNOTREADY` | A vector index on the bucket is still bootstrapping. Retry after a short delay. |
## Examples
[Section titled “Examples”](#examples)
The following examples assume `input_type` is set to `json` and `object_id_format` is set to `hex`.
**Insert a single document:**
```kronotop
BUCKET.INSERT users DOCS '{"name": "Alice", "age": 30}'
```
Response:
```kronotop
1) "6835a1c0e4b0f72a3c000001"
```
**Insert multiple documents:**
```kronotop
BUCKET.INSERT users DOCS '{"name": "Alice", "age": 30}' '{"name": "Bob", "age": 25}'
```
Response:
```kronotop
1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
```
**Insert within a transaction:**
```kronotop
BEGIN
BUCKET.INSERT users DOCS '{"name": "Alice"}' '{"name": "Bob"}'
COMMIT
```
Response from `BUCKET.INSERT`:
```kronotop
1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
```
**Insert with a user-provided `_id`:**
```kronotop
BUCKET.INSERT users DOCS '{"_id": {"$oid": "507f1f77bcf86cd799439011"}, "name": "Alice"}'
```
Response:
```kronotop
1) "507f1f77bcf86cd799439011"
```
# BUCKET.LIST
> Returns the names of all buckets in the current namespace.
Returns the names of all buckets in the current namespace.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.LIST
```
This command takes no parameters.
## Return Value
[Section titled “Return Value”](#return-value)
Returns an array of bulk strings, one per bucket. Each element is the bucket name. If the namespace contains no buckets, an empty array is returned.
Buckets that have been marked for removal with `BUCKET.REMOVE` but not yet purged with `BUCKET.PURGE` are still included in the result. Only after a successful `BUCKET.PURGE` does the bucket disappear from the list.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------- |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
## Examples
[Section titled “Examples”](#examples)
**List buckets on a fresh namespace (no buckets exist):**
```kronotop
> BUCKET.LIST
(empty array)
```
**List a single bucket:**
```kronotop
> BUCKET.CREATE users
OK
> BUCKET.LIST
1) "users"
```
**List multiple buckets:**
```kronotop
> BUCKET.CREATE alpha-bucket
OK
> BUCKET.CREATE beta-bucket
OK
> BUCKET.CREATE gamma-bucket
OK
> BUCKET.LIST
1) "alpha-bucket"
2) "beta-bucket"
3) "gamma-bucket"
```
**Bucket disappears only after purge, not after remove:**
```kronotop
> BUCKET.CREATE users
OK
> BUCKET.REMOVE users
OK
> BUCKET.PURGE users
OK
> BUCKET.LIST
(empty array)
```
# BUCKET.LOCATE
> Returns the routing information for a bucket, showing which shards hold its data and the addresses of primary and standby replicas.
Returns the routing information for a bucket, showing which shards hold its data and the addresses of primary and standby replicas.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.LOCATE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ----------------------------- |
| `bucket` | string | Yes | Name of the bucket to locate. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns a flat array with 3 elements per shard:
| Position | Type | Description |
| -------- | ------- | ---------------------------------------------------------------------------------------- |
| 0 | integer | Shard ID. |
| 1 | string | Primary owner address in `host:port` format. |
| 2 | array | Standby replica addresses, each in `host:port` format. Empty array if no standbys exist. |
This pattern repeats for each shard the bucket spans. For a bucket on 2 shards, the array contains 6 elements.
Shards without a known route are silently omitted from the result.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| -------------- | ------------------------------------ |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
## Examples
[Section titled “Examples”](#examples)
**Locate a single-shard bucket:**
```kronotop
> BUCKET.LOCATE users
1) (integer) 0
2) "127.0.0.1:5484"
3) (empty array)
```
**Locate a multi-shard bucket:**
```kronotop
> BUCKET.LOCATE events
1) (integer) 0
2) "10.0.0.1:5484"
3) (empty array)
4) (integer) 1
5) "10.0.0.2:5484"
6) (empty array)
```
**Non-existent bucket:**
```kronotop
> BUCKET.LOCATE nonexistent
(error) NOSUCHBUCKET No such bucket: 'nonexistent'
```
# BUCKET.PURGE
> Permanently deletes a bucket and all its data (hard delete).
Permanently deletes a bucket and all its data (hard delete). This is the second phase of bucket deletion.
## Overview
[Section titled “Overview”](#overview)
Before permanently deleting a bucket, `BUCKET.PURGE` enforces a **distributed sync barrier** to ensure cluster-wide consistency. This barrier waits for all shards to confirm they have observed the bucket’s “removed” status (set by `BUCKET.REMOVE`).
The barrier mechanism prevents data races in a distributed environment:
* Background workers (index maintenance, replication) may still be processing the bucket
* Other cluster nodes may have pending operations or cached references
* Without coordination, purging could cause errors or inconsistent state
If any shard has not yet observed the removal, the barrier fails with `BARRIERNOTSATISFIED`. When this happens, the command automatically notifies all cluster members of the removal to accelerate propagation, and you should retry the purge. In most cases, a single retry is sufficient.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.PURGE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ------------------------------------------------------------------------------------------------------------ |
| `bucket` | string | Yes | Name of the bucket to permanently delete. The bucket must be marked for removal first using `BUCKET.REMOVE`. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| --------------------- | -------------------------------------------------------------------------- |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `ERR` | The bucket is not marked for removal. You must call `BUCKET.REMOVE` first. |
| `BARRIERNOTSATISFIED` | Not all shards have observed the removal. Retry the command. |
## Examples
[Section titled “Examples”](#examples)
**Permanently delete a removed bucket:**
```kronotop
> BUCKET.REMOVE users
OK
> BUCKET.PURGE users
OK
```
**Attempting to purge without removing first:**
```kronotop
> BUCKET.PURGE users
(error) ERR Bucket 'users' is not removed
```
**Handling barrier not satisfied:**
```kronotop
> BUCKET.REMOVE users
OK
> BUCKET.PURGE users
(error) BARRIERNOTSATISFIED Barrier not satisfied: not all shards observed version ...
> BUCKET.PURGE users
OK
```
# BUCKET.QUERY
> Queries documents from a bucket using a filter expression.
Queries documents from a bucket using a filter expression.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.QUERY [SORTBY ] [RESULTSORT ] [PROJECTION ] [LIMIT ] [COLLATION ]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ------------ | ------------------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to query. |
| `query` | JSON or BSON | Yes | Filter expression to match documents. Use `{}` to match all documents. |
| `SORTBY` | string + direction | No | Sort results by a field. Requires field name followed by `ASC` or `DESC`. |
| `RESULTSORT` | string + direction | No | Sort each result batch in memory by any field (indexed or not). Requires field name followed by `ASC` or `DESC`. Does not guarantee global ordering across `BUCKET.ADVANCE` calls. See [RESULTSORT](/docs/bucket/sortby/#resultsort). |
| `PROJECTION` | JSON or BSON | No | Projection specification that controls which fields appear in returned documents. Use `{"field": 1}` for inclusion or `{"field": 0}` for exclusion. See [Projection](/docs/bucket/projection/). |
| `LIMIT` | integer | No | Maximum number of documents to return per batch. Must be non-negative. When not specified, the session’s default limit is used (default: 100, configurable via `SESSION.ATTRIBUTE SET limit `). |
| `COLLATION` | JSON | No | Query-level collation spec for locale-aware string comparison. Overrides index collation for this query. |
## Return Value
[Section titled “Return Value”](#return-value)
The command returns a cursor ID and matching documents. The format depends on the protocol version.
Each returned document includes an `_id` field (ObjectId) that serves as the document’s primary key.
The encoding format of returned documents depends on the session’s `reply_type` setting:
| Format | Response Type | Description |
| ------ | ------------- | ------------------------------- |
| `bson` | Binary | BSON-encoded document (default) |
| `json` | String | JSON-encoded document |
To change the format:
```kronotop
SESSION.ATTRIBUTE SET reply_type bson
SESSION.ATTRIBUTE SET reply_type json
```
**RESP3 (map format):**
The response is a map with two keys: `cursor_id` (integer) and `entries` (array of documents).
```kronotop
1# "cursor_id" => (integer)
2# "entries" => [, , ...]
```
**RESP2 (array format):**
The response is an array with two elements: the cursor ID and a nested array of documents.
```kronotop
1) (integer)
2) 1)
2)
...
```
**Cursor ID:**
The cursor ID is used to fetch more results with `BUCKET.ADVANCE`. Each query creates a new cursor that stores the query context in the session. The cursor tracks the position in the result set for pagination.
## Pagination
[Section titled “Pagination”](#pagination)
Results are returned in batches. Use the cursor ID with `BUCKET.ADVANCE` to get more results:
```kronotop
BUCKET.ADVANCE QUERY
```
When there are no more results, the command returns an empty result set.
The cursor maintains its state across calls:
* Query context (filter, sort, limit)
* Current position in the result set
* Transaction context (if within an explicit transaction)
## Snapshot Reads
[Section titled “Snapshot Reads”](#snapshot-reads)
`BUCKET.QUERY` honors the session’s `SNAPSHOTREAD` setting. When `SNAPSHOTREAD ON` is active, index scans use snapshot isolation, so they will not cause transactions to conflict with concurrent writes. See [SNAPSHOTREAD](/docs/transactions/commands/snapshotread/) for details.
## Routing
[Section titled “Routing”](#routing)
`BUCKET.QUERY` can be executed from any node. When the query is sent to a node that does not own the bucket’s shards, it still returns correct results, but with higher latency because the data is read from the owning nodes. For best performance, use `BUCKET.LOCATE` to find the node that owns the bucket’s shards and send the query there.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------- |
| `NOSUCHBUCKET` | The bucket does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
## Examples
[Section titled “Examples”](#examples)
The following examples assume `reply_type` is set to `json`.
**Query all documents:**
```kronotop
BUCKET.QUERY users '{}'
```
Response (RESP3):
```kronotop
1# "cursor_id" => (integer) 1
2# "entries" =>
1) {"_id": "6a240c7b5da17d872dc0e102", "name": "Bob", "age": 25, "status": "active"}
2) {"_id": "6a240c7b5da17d872dc0e103", "name": "Carol", "age": 35, "status": "inactive"}
3) {"_id": "6a240c875da17d872dc0e104", "name": "Henry", "age": 31, "scores": [75, 100, 100]}
```
**Query with filter:**
```kronotop
BUCKET.QUERY users '{"name": "Alice"}'
```
**Query with sorting:**
```kronotop
BUCKET.QUERY users '{}' SORTBY age DESC
```
**Query with limit:**
```kronotop
BUCKET.QUERY users '{"status": "active"}' LIMIT 10
```
**Query with sorting and limit:**
```kronotop
BUCKET.QUERY users '{"status": "active"}' SORTBY age ASC LIMIT 5
```
**Query with projection:**
```kronotop
BUCKET.QUERY users '{"status": "active"}' PROJECTION '{"name": 1, "email": 1}'
```
**Query with in-memory result sort (no index required):**
```kronotop
BUCKET.QUERY users '{"status": "active"}' RESULTSORT score ASC LIMIT 10
```
**Query with collation override:**
```kronotop
BUCKET.QUERY users '{"name": "alice"}' COLLATION '{"locale": "en", "strength": 2}'
```
This performs a case-insensitive match using English locale rules, regardless of the index’s collation setting.
**Pagination:**
```kronotop
> BUCKET.QUERY users '{}' LIMIT 100
1# "cursor_id" => (integer) 1
2# "entries" => [...] (first 100 documents)
> BUCKET.ADVANCE QUERY 1
1# "cursor_id" => (integer) 1
2# "entries" => [...] (next batch of documents)
```
# BUCKET.REMOVE
> Marks a bucket for removal (soft delete).
Marks a bucket for removal (soft delete). This is the first phase of bucket deletion.
## Overview
[Section titled “Overview”](#overview)
Bucket deletion uses a two-phase approach to ensure safe removal in a distributed cluster:
1. **Phase 1 - Soft Delete (`BUCKET.REMOVE`)**: Marks the bucket as “removed” in metadata. The bucket immediately becomes inaccessible for normal operations (queries, inserts, updates). This phase signals background tasks (such as index maintenance) to stop processing the bucket.
2. **Phase 2 - Hard Delete (`BUCKET.PURGE`)**: Permanently deletes all bucket data, indexes, and metadata.
This two-phase approach is necessary because in a Kronotop cluster, multiple nodes may have active operations or background tasks referencing the bucket. The soft delete phase gives these operations time to gracefully stop before the data is permanently removed, preventing errors and ensuring consistency across the cluster.
Between the two phases, Kronotop uses a **distributed sync barrier** to coordinate cluster-wide acknowledgment. When `BUCKET.PURGE` is called, it waits for all shards across the cluster to confirm they have observed the bucket’s “removed” status. This barrier mechanism prevents data races where a node might still be writing to or reading from a bucket that is being deleted on another node. If the barrier is not satisfied within the timeout, the purge fails with `BARRIERNOTSATISFIED`, and you should retry the command.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.REMOVE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | --------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to mark for removal. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ----------------------------------------- |
| `NOSUCHBUCKET` | The specified bucket does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is already marked for removal. |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
## Examples
[Section titled “Examples”](#examples)
**Mark a bucket for removal:**
```kronotop
> BUCKET.REMOVE users
OK
```
**Attempting to remove a non-existent bucket:**
```kronotop
> BUCKET.REMOVE nonexistent
(error) NOSUCHBUCKET No such bucket: 'nonexistent'
```
**Attempting to remove an already-removed bucket:**
```kronotop
> BUCKET.REMOVE users
(error) BUCKETBEINGREMOVED Bucket 'users' is being removed
```
**Two-phase deletion workflow:**
```kronotop
> BUCKET.REMOVE users
OK
> BUCKET.PURGE users
OK
```
# BUCKET.UPDATE
> Updates documents in a bucket that match a filter expression.
Updates documents in a bucket that match a filter expression.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.UPDATE [SORTBY ] [LIMIT ] [COLLATION ]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------------------ | -------- | -------------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to update. |
| `query` | JSON or BSON | Yes | Filter expression to match documents. Use `{}` to match all documents. |
| `update` | JSON or BSON | Yes | Update document with update operators. Cannot be empty. |
| `SORTBY` | string + direction | No | Process documents in sorted order. Requires field name followed by `ASC` or `DESC`. |
| `LIMIT` | integer | No | Maximum number of documents to update per batch. Must be non-negative. |
| `COLLATION` | JSON | No | Query-level collation spec for locale-aware string comparison. Overrides index collation for this query. |
## Return Value
[Section titled “Return Value”](#return-value)
The command returns a cursor ID and an array of ObjectIds for updated documents. The format depends on the protocol version.
An ObjectId is a 12-byte unique identifier. The encoding format of returned ObjectIds depends on the session’s `object_id_format` setting:
| Format | Response Type | Description |
| ------- | ------------- | ----------------------------------------- |
| `hex` | String | 24-character hex-encoded string (default) |
| `bytes` | Binary | Raw 12-byte array |
To change the format:
```kronotop
SESSION.ATTRIBUTE SET object_id_format hex
SESSION.ATTRIBUTE SET object_id_format bytes
```
**RESP3 (map format):**
```kronotop
1# "cursor_id" => (integer)
2# "object_ids" => 1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
...
```
**RESP2 (array format):**
```kronotop
1) (integer)
2) 1) "6835a1c0e4b0f72a3c000001"
2) "6835a1c0e4b0f72a3c000002"
...
```
When no documents match the filter, the `object_ids` array is empty.
The `object_ids` array contains only the documents that were actually modified. A matched document whose update would leave its content unchanged is not modified and does not appear in `object_ids`.
**Auto-commit mode (default):**
The update is committed immediately.
**Transaction mode (within BEGIN/COMMIT):**
The update is not committed until `COMMIT` is called. Use `ROLLBACK` to cancel the update operation.
## Update Operators
[Section titled “Update Operators”](#update-operators)
The `update` parameter accepts a JSON or BSON document containing one or more of the following operators:
**`$set`**: Sets field values on matched documents.
```kronotop
'{"$set": {"status": "active", "version": 2}}'
```
The `_id` field is immutable and cannot be modified with `$set`.
**`$unset`**: Removes fields from matched documents. Accepts either an array of field names or a document with field names as keys:
```kronotop
'{"$unset": ["temporary_field", "deprecated_field"]}'
'{"$unset": {"temporary_field": 1, "deprecated_field": 1}}'
```
**`array_filters`**: Applies conditional updates to array elements using positional operators. Each filter is a document with an identifier and a condition.
Supported filter operators: `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, `$nin`, `$all`, `$size`, `$exists`.
```kronotop
'{"$set": {"grades.$[elem].score": 100}, "array_filters": [{"elem": {"$gte": 8}}]}'
```
Positional operators target array elements. If the target field is missing or is not an array, the document is left unchanged.
**`upsert`**: Boolean. When `true`, inserts a new document if no documents match the filter. Positional operators cannot be used with upsert.
```kronotop
'{"$set": {"status": "active"}, "upsert": true}'
```
After an update, all indexes whose selectors overlap the modified field paths are synchronized with the new document content.
## Pagination
[Section titled “Pagination”](#pagination)
When using `LIMIT`, use the cursor ID with `BUCKET.ADVANCE` to update more matching documents:
```kronotop
BUCKET.ADVANCE UPDATE
```
Each call updates the next batch of documents up to the limit.
## Routing
[Section titled “Routing”](#routing)
The command must be sent to a node that owns at least one shard assigned to the bucket. If the bucket’s shards are all hosted on other nodes, the server rejects the request with a redirect to the appropriate node.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `REJECT` | The bucket’s shards are hosted on another node. The error includes the target address: `REJECT :`. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
| `NOSUCHBUCKET` | The bucket does not exist. |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
| `INDEXTYPE_MISMATCH` | The updated value type does not match the expected index type. |
| `DUPLICATEKEY` | Duplicate `_id` encountered during upsert. |
| `VECTORINDEXNOTREADY` | A vector index on the bucket is still bootstrapping. Retry after a short delay. |
| `ERR` | Update parameter cannot be empty. |
## Examples
[Section titled “Examples”](#examples)
**Update with $set:**
```kronotop
BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}'
```
**Update with $unset:**
```kronotop
BUCKET.UPDATE users '{"age": {"$gt": 30}}' '{"$unset": ["temporary_field", "deprecated_field"]}'
```
**Update with $set and $unset:**
```kronotop
BUCKET.UPDATE users '{}' '{"$set": {"version": 2}, "$unset": ["old_field"]}'
```
**Update with array\_filters:**
```kronotop
BUCKET.UPDATE students '{}' '{"$set": {"grades.$[elem].passed": true}, "array_filters": [{"elem": {"$gte": 60}}]}'
```
**Update with upsert:**
```kronotop
BUCKET.UPDATE users '{"username": "alice"}' '{"$set": {"username": "alice", "status": "active"}, "upsert": true}'
```
**Update with sorting:**
```kronotop
BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}' SORTBY created_at ASC
```
**Update with limit:**
```kronotop
BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}' LIMIT 50
```
**Update with collation:**
```kronotop
BUCKET.UPDATE users '{"name": "alice"}' '{"$set": {"verified": true}}' COLLATION '{"locale": "en", "strength": 2}'
```
Updates documents matching `"alice"` using case-insensitive English collation.
**Batch update with pagination:**
```kronotop
> BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}' LIMIT 100
1# "cursor_id" => (integer) 1
2# "object_ids" =>... (first 100 updated)
> BUCKET.ADVANCE UPDATE 1
1# "cursor_id" => (integer) 1
2# "object_ids" => ... (next 100 updated)
```
**Update within a transaction:**
```kronotop
BEGIN
BUCKET.UPDATE users '{"status": "pending"}' '{"$set": {"status": "active"}}'
COMMIT
```
# BUCKET.VECTOR
> Performs vector similarity search on a bucket using a vector index backed by JVector, with optional post-filtering to combine similarity ranking with structured query predicates.
Performs vector similarity search on a bucket using a vector index backed by [JVector](https://github.com/datastax/jvector), with optional post-filtering to combine similarity ranking with structured query predicates.
> **Note:** `BUCKET.VECTOR` does not provide ACID transaction guarantees. The search operates on a graph index that is updated asynchronously after a transaction commit, and matching documents are read directly from the storage engine outside of a transaction. Newly inserted or deleted vectors may not immediately appear in search results.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BUCKET.VECTOR [FILTER ] [PROJECTION ] [TOP ] [THRESHOLD ] [MAX-SCAN-CANDIDATES ] [OVERQUERY ]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------------------- | -------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket` | string | Yes | Name of the bucket to search. |
| `selector` | string | Yes | Field selector identifying the vector index to use. Supports dot notation for nested fields (e.g., `data.embedding`). |
| `vector` | JSON or BINARY | Yes | Query vector provided as either a JSON array of numbers (e.g., `[0.1, 0.2, 0.3]`) or a raw binary blob of packed floats. Must have the same number of dimensions as the vector index. |
| `FILTER` | JSON or BSON | No | A BQL filter expression to post-filter vector search results. The format is auto-detected: if the input starts with `{` it is parsed as JSON, otherwise as BSON. Only documents matching both the vector similarity and the filter condition are returned. |
| `PROJECTION` | JSON or BSON | No | Projection specification that controls which fields appear in returned documents. Use `{"field": 1}` for inclusion or `{"field": 0}` for exclusion. See [Projection](/docs/bucket/projection/). |
| `TOP` | integer | No | Maximum number of results to return. Must be non-negative. Default: `10`. |
| `THRESHOLD` | number | No | Minimum similarity score. Results with a score below this value are excluded. Default: `0.0`. |
| `MAX-SCAN-CANDIDATES` | integer | No | Maximum number of vector candidates to examine during filtered search. Must be a positive integer. Limits how far the search explores the vector graph. |
| `OVERQUERY` | number | No | Multiplier that controls how many extra candidates the graph traversal examines beyond the requested TOP. Must be `>= 1.0`. Higher values may improve recall at the cost of latency. |
## Binary Vector Format
[Section titled “Binary Vector Format”](#binary-vector-format)
For SDK developers, the binary format is a contiguous array of IEEE 754 single-precision (32-bit) floats in \* *little-endian*\* byte order. Each float occupies 4 bytes, so a vector with *N* dimensions is exactly *N × 4* bytes.
The format is auto-detected: if the first byte is `[` (0x5B), the payload is parsed as a JSON array; otherwise it is treated as a binary blob.
**Example:** encoding a 3-dimensional vector `[0.1, 0.2, 0.3]` in Python:
```python
import struct
vector = [0.1, 0.2, 0.3]
binary = struct.pack(f"<{len(vector)}f", *vector) # 12 bytes, little-endian
```
## Return Value
[Section titled “Return Value”](#return-value)
The command returns an array of results ordered by similarity score (highest first). Each result contains a similarity score and the matching document.
The encoding format of returned documents depends on the session’s `reply_type` setting:
| Format | Response Type | Description |
| ------ | ------------- | ------------------------------- |
| `bson` | Binary | BSON-encoded document (default) |
| `json` | String | JSON-encoded document |
To change the format:
```kronotop
SESSION.ATTRIBUTE SET reply_type bson
SESSION.ATTRIBUTE SET reply_type json
```
**RESP3 (array of maps):**
Each result is a map with `score` (float) and `entry` (document bytes):
```kronotop
1) score -> (double) 0.99
entry ->
2) score -> (double) 0.87
entry ->
...
```
**RESP2 (array of pairs):**
Each result is a two-element array containing the score as a string and the document bytes:
```kronotop
1) 1) "0.99"
2)
2) 1) "0.87"
2)
...
```
When no results match, an empty array is returned.
## Filtered Search
[Section titled “Filtered Search”](#filtered-search)
When `FILTER` is provided, the command applies post-filtering to vector search results. The search first retrieves vector candidates ordered by similarity, then evaluates the filter expression against each candidate’s document.
If the initial batch of candidates does not yield enough results that pass the filter, the search automatically fetches additional candidates from the vector graph in progressively larger batches until:
* Enough results are found to satisfy `TOP`, or
* The `MAX-SCAN-CANDIDATES` limit is reached, or
* The vector graph is exhausted.
The `MAX-SCAN-CANDIDATES` parameter provides an upper bound on how many candidates are examined. This is useful for controlling latency when the filter is highly selective and most candidates do not match.
## Routing
[Section titled “Routing”](#routing)
The command must be sent to a node that owns at least one shard assigned to the bucket. If the bucket’s shards are all hosted on other nodes, the server rejects the request with a redirect to the appropriate node.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `ERR` | No vector index exists for the given selector. |
| `ERR` | The query vector dimensions do not match the index dimensions. |
| `ERR` | An argument is invalid (negative TOP, OVERQUERY below 1.0, or non-positive MAX-SCAN-CANDIDATES). |
| `REJECT` | The bucket’s shards are hosted on another node. The error includes the target address: `REJECT :`. |
| `VECTORINDEXNOTREADY` | The vector index is still being built or recovered. Retry after the background build completes. |
| `NOSUCHBUCKET` | The bucket does not exist. |
| `BUCKETBEINGREMOVED` | The bucket is being removed. |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is being removed. |
## Examples
[Section titled “Examples”](#examples)
The examples below use human-readable input and output. Configure the session first:
```kronotop
SESSION.ATTRIBUTE SET input_type json
SESSION.ATTRIBUTE SET reply_type json
SESSION.ATTRIBUTE SET object_id_format hex
```
**Create the vector index:**
```kronotop
BUCKET.INDEX CREATE products '{
"$vector": {"field": "embedding", "dimensions": 3, "distance": "cosine"}
}'
```
Creates a 3-dimensional cosine index on the `embedding` field. The index is built asynchronously; searches return `VECTORINDEXNOTREADY` until the build completes.
**Insert documents with vectors:**
```kronotop
BUCKET.INSERT products DOCS '{"label": "alpha", "embedding": [0.1, 0.2, 0.3]}' '{"label": "beta", "embedding": [0.4, 0.5, 0.6]}' '{"label": "gamma", "embedding": [0.7, 0.8, 0.9]}'
```
Each `embedding` value must be an array with the same number of dimensions as the index. Vectors are added to the index asynchronously after the insert commits.
**Basic vector search:**
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]'
```
Response (RESP3):
```kronotop
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta", "embedding": [0.4, 0.5, 0.6]}
2) 1# "score" => (double) 0.9990954399108887
2# "entry" => {"_id": "6a252f086c85cddddbd8918f", "label": "gamma", "embedding": [0.7, 0.8, 0.9]}
3) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha", "embedding": [0.1, 0.2, 0.3]}
```
**Search with a filter:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' FILTER '{"label": "beta"}'
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta", "embedding": [0.4, 0.5, 0.6]}
```
Only documents where `label` equals `"beta"` are returned.
**Limit results with TOP:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' TOP 5
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta", "embedding": [0.4, 0.5, 0.6]}
2) 1# "score" => (double) 0.9990954399108887
2# "entry" => {"_id": "6a252f086c85cddddbd8918f", "label": "gamma", "embedding": [0.7, 0.8, 0.9]}
3) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha", "embedding": [0.1, 0.2, 0.3]}
```
**Set a minimum similarity threshold:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' THRESHOLD 0.95
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta", "embedding": [0.4, 0.5, 0.6]}
2) 1# "score" => (double) 0.9990954399108887
2# "entry" => {"_id": "6a252f086c85cddddbd8918f", "label": "gamma", "embedding": [0.7, 0.8, 0.9]}
3) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha", "embedding": [0.1, 0.2, 0.3]}
```
Only results with a similarity score of 0.95 or higher are returned.
**Combine filter and threshold:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' FILTER '{"label": "alpha"}' THRESHOLD 0.97
1) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha", "embedding": [0.1, 0.2, 0.3]}
```
Returns only documents matching the filter that also meet the minimum similarity score.
**Control filtered search depth:**
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' FILTER '{"label": "target"}' TOP 3 MAX-SCAN-CANDIDATES 100
```
Examines at most 100 vector candidates while looking for 3 results that match the filter.
**Increase recall with OVERQUERY:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' OVERQUERY 2.0
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta", "embedding": [0.4, 0.5, 0.6]}
2) 1# "score" => (double) 0.9990954399108887
2# "entry" => {"_id": "6a252f086c85cddddbd8918f", "label": "gamma", "embedding": [0.7, 0.8, 0.9]}
3) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha", "embedding": [0.1, 0.2, 0.3]}
```
**Search with projection:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' PROJECTION '{"label": 1}'
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta"}
2) 1# "score" => (double) 0.9990954399108887
2# "entry" => {"_id": "6a252f086c85cddddbd8918f", "label": "gamma"}
3) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha"}
```
Returns only `_id` and `label` for each result.
**Exclude embedding from results:**
```kronotop
> BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' PROJECTION '{"embedding": 0}'
1) 1# "score" => (double) 1.0
2# "entry" => {"_id": "6a252f086c85cddddbd8918e", "label": "beta"}
2) 1# "score" => (double) 0.9990954399108887
2# "entry" => {"_id": "6a252f086c85cddddbd8918f", "label": "gamma"}
3) 1# "score" => (double) 0.9873158931732178
2# "entry" => {"_id": "6a252f086c85cddddbd8918d", "label": "alpha"}
```
Returns all fields except the embedding array, reducing response size.
# Compound Indexes
> A compound index covers multiple fields in a defined order.
## Introduction
[Section titled “Introduction”](#introduction)
A compound index covers multiple fields in a defined order. Instead of creating separate indexes on `category` and `price`, a compound index on `(category, price)` lets the query engine satisfy multi-field predicates with a single index scan. Use compound indexes when your queries consistently filter on the same combination of fields.
## Creation
[Section titled “Creation”](#creation)
Compound indexes are defined with the `$compound` key in the index schema. Each entry specifies an ordered list of fields:
```json
{
"$compound": [
{
"name": "idx_cat_price",
"fields": [
{
"selector": "category",
"bson_type": "string"
},
{
"selector": "price",
"bson_type": "double"
}
]
}
]
}
```
With the command:
```kronotop
BUCKET.INDEX CREATE products '{
"$compound": [{
"name": "idx_cat_price",
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
]
}]
}'
```
A three-field compound index:
```kronotop
BUCKET.INDEX CREATE orders '{
"$compound": [{
"fields": [
{"selector": "status", "bson_type": "string"},
{"selector": "region", "bson_type": "string"},
{"selector": "created_at", "bson_type": "datetime"}
]
}]
}'
```
If `name` is omitted, a name is auto-generated from the selectors and types.
A compound index with a `multi_key` field for array element indexing:
```kronotop
BUCKET.INDEX CREATE products '{
"$compound": [{
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "tags", "bson_type": "string", "multi_key": true}
]
}]
}'
```
Each element in the `tags` array is indexed as a separate entry. At most one field in a compound index may have `multi_key` enabled.
Compound indexes support an optional `collation` at the index level for locale-aware string ordering:
```kronotop
BUCKET.INDEX CREATE products '{
"$compound": [{
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
],
"collation": {"locale": "en"}
}]
}'
```
Collation requires at least one `string` field in the compound index. If omitted, the bucket-level collation is inherited for string fields. See [Collation](/docs/bucket/collation/) for full details.
Single-field and compound indexes can be defined in the same schema:
```kronotop
BUCKET.INDEX CREATE products '{
"email": {"bson_type": "string"},
"$compound": [{
"name": "idx_cat_price",
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
]
}]
}'
```
See [BUCKET.INDEX CREATE](/docs/bucket/commands/bucket-index/#bucketindex-create) for the full command reference.
## The prefix rule
[Section titled “The prefix rule”](#the-prefix-rule)
Field order in a compound index matters. The query engine matches filters against index fields strictly left to right. This is the prefix rule. It is the most important concept for compound indexes.
The rules are:
1. **Left-to-right matching.** The engine walks the index fields in order. If a field has no matching filter, the walk stops. Later fields are not considered even if they have filters.
2. **Equality before range.** All fields before the last matched field must use equality (`$eq`). Only the last matched field may use a range operator (`$gt`, `$gte`, `$lt`, `$lte`).
3. **Range stops the walk.** Once a range operator is encountered on a field, no further fields are matched. The range marks the end of the prefix.
4. **Leading field is sufficient.** A single equality or range filter on the leading (first) field is enough to activate the compound index. When a single-field index exists for that field, the query engine generally prefers it, unless SORTBY makes the compound index a better choice (see [Natural sort order](#natural-sort-order)).
### Example: compound index on `(a, b, c)`
[Section titled “Example: compound index on (a, b, c)”](#example-compound-index-on-a-b-c)
| Query filters | Fields matched | Compound index used? | Why |
| ----------------------- | -------------- | -------------------- | --------------------------------------------------------------- |
| `a = 1, b = 2, c = 3` | `a, b, c` | Yes | Full prefix, all equality |
| `a = 1, b = 2` | `a, b` | Yes | Two-field equality prefix |
| `a = 1, b > 5` | `a, b` | Yes | Equality on `a`, range on `b` (last matched) |
| `a = 1, b > 5, b < 10` | `a, b` | Yes | Equality on `a`, range bounds on `b` |
| `a = 1, b = 2, c > 100` | `a, b, c` | Yes | Equality prefix on `a, b`, range on `c` (last matched) |
| `a = 1, b > 5, c = 3` | `a, b` | Yes | Range on `b` stops the walk; `c` becomes a residual filter |
| `a = 1` | `a` only | Yes | Single equality on leading field; compound index as fallback |
| `a = 1, c = 3` | `a` only | Yes | Compound scan on `a`; `c` becomes a residual filter |
| `a > 5` | `a` only | Yes | Leading prefix range scan on `a` |
| `a > 5, b = 2` | `a` only | Yes | Leading prefix range scan on `a`; `b` becomes a residual filter |
| `b = 2, c = 3` | none | No | `a` has no filter; walk stops immediately |
When the compound index is not used, the query engine falls back to single-field indexes (if available) or a full scan.
## Supported operators
[Section titled “Supported operators”](#supported-operators)
The following operators participate in compound index matching:
| Operator | Role in compound index |
| -------- | ----------------------------------------- |
| `$eq` | Equality, can appear on any matched field |
| `$gt` | Range, only on the last matched field |
| `$gte` | Range, only on the last matched field |
| `$lt` | Range, only on the last matched field |
| `$lte` | Range, only on the last matched field |
Multiple range operators can apply to the same last field. For example, `a = 1, b > 5, b < 10` uses the compound index with both range bounds on `b`.
Operators like `$ne`, `$in`, and `$nin` do not participate in compound index matching. If a filter uses one of these operators on a compound index field, that field is not matched and the prefix walk stops.
## Trade-offs vs. single-field indexes
[Section titled “Trade-offs vs. single-field indexes”](#trade-offs-vs-single-field-indexes)
**When compound indexes win:**
* Multi-field equality lookups. A query on `category = "electronics", price = 29.99` is a single scan on a `(category, price)` compound index, instead of two separate index scans followed by an intersection.
* Equality + range patterns. A query on `status = "active", created_at > "2025-01-01"` is a single range scan within the `status = "active"` partition.
**When single-field indexes are better:**
* Queries that filter on a single field where a single-field index exists. A single-field index has smaller keys and is more efficient than scanning a compound index for the same field. When both exist, the query engine generally prefers the single-field index, unless a compound index can also satisfy SORTBY ( see [Natural sort order](#natural-sort-order)).
* Queries where the fields don’t match the prefix order. If your queries sometimes filter on `a` alone and sometimes on `b` alone, two single-field indexes serve both patterns. A compound index on `(a, b)` only helps queries that start with `a`.
**Costs:**
* Larger index keys. Each entry stores values for all fields in the compound index.
* More storage. The combined key size grows with the number of fields.
* Longer build times. The background index build task processes all fields per document.
**Guidance:** Create compound indexes for query patterns you actually have. If you always query `status` and `region` together, a compound index on `(status, region)` makes sense. Don’t create compound indexes speculatively.
## Residual predicates
[Section titled “Residual predicates”](#residual-predicates)
When a compound index matches only a prefix of the query’s filters, the remaining filters become residual predicates. These are evaluated as post-filters after the index scan produces candidate documents.
Example: with a compound index on `(a, b)` and a query `a = 1, b = 2, c = 3`:
* `a = 1, b = 2` is handled by the compound index scan.
* `c = 3` is a residual predicate evaluated against each candidate document.
Results are always correct. Residual predicates don’t change what the query returns. They just aren’t index-accelerated for those fields. If `c` has its own single-field index, the query engine may use it separately.
## Leading prefix scans
[Section titled “Leading prefix scans”](#leading-prefix-scans)
A single equality or range filter on the leading (first) field of a compound index is sufficient to activate a compound index scan, even when no other fields are matched. When a single-field index exists for that field, the query engine generally prefers it, unless a compound index can also satisfy SORTBY.
### Example: compound index on `(a, b)`
[Section titled “Example: compound index on (a, b)”](#example-compound-index-on-a-b)
| Query filters | Compound index used? | Why |
| --------------- | -------------------- | ------------------------------------------------------ |
| `a = 1` | Yes | Equality on leading field scans the prefix |
| `a = 1, c = 3` | Yes | Equality scan on `a`; `c = 3` applied as residual |
| `a >= 20` | Yes | Range on leading field scans the index directly |
| `a > 5, a < 30` | Yes | Bounded range on leading field |
| `a > 5, b = 2` | Yes | Range scan on `a`; `b = 2` applied as residual filter |
| `a > 3, b > 10` | Yes | Range scan on `a`; `b > 10` applied as residual filter |
When the leading field has a filter and subsequent fields also have filters, those subsequent filters become residual predicates, evaluated against each candidate document after the index scan.
## Natural sort order
[Section titled “Natural sort order”](#natural-sort-order)
A compound index can provide natural sort order for SORTBY, eliminating in-memory sorting. The query engine determines this as follows:
1. Fields with equality (`$eq`) filters form a prefix of constant values. These are trivially sorted.
2. The first field after the equality prefix is naturally sorted by the underlying tuple ordering.
3. SORTBY is satisfied if the sort field is either in the equality prefix or is the first field after it.
### Example: compound index on `(a, b, c)`
[Section titled “Example: compound index on (a, b, c)”](#example-compound-index-on-a-b-c-1)
| Query filters | SORTBY field | Index provides sort? | Why |
| --------------- | ------------ | -------------------- | ------------------------------------------ |
| `a = 1, b = 2` | `c` | Yes | EQ prefix on a, b. First non-EQ field is c |
| `a = 1, b >= 5` | `b` | Yes | EQ prefix on a. First non-EQ field is b |
| `a = 1, b >= 5` | `c` | No | First non-EQ field is b, not c |
| `a = 1` | `b` | Yes | EQ prefix on a. First non-EQ field is b |
| `a = 1` | `c` | No | First non-EQ field is b, not c |
When a compound index can satisfy SORTBY, the query engine may prefer it over a single-field index to avoid in-memory sorting. Both ASC and DESC sort directions are supported.
## Constraints
[Section titled “Constraints”](#constraints)
* **Minimum two fields.** A compound index must have at least two fields. A single-field compound index is just a single-field index. Use the regular index syntax instead.
* **Maximum 32 fields.** A compound index supports at most 32 fields.
* **At most one multi-key field.** A compound index allows at most one field with `multi_key` enabled. Multiple multi-key fields in the same compound index are rejected at creation time.
* **No duplicate selectors.** Each field selector must appear exactly once within a compound index. Duplicate selectors are rejected.
* **Strict type matching.** Each field in a compound index has a declared BSON type. The same strict type matching rules apply as with single-field indexes. See [Strict Types](/docs/bucket/strict-types/).
* **Unique names.** Index names must be unique across all indexes (single-field and compound) in the bucket.
When `BUCKET.UPDATE` modifies a document, a compound index is refreshed as a whole if any of its fields overlaps a modified field path. The path overlap rules are the same as for single-field indexes. See [Index maintenance on updates](/docs/bucket/single-field-index/#index-maintenance-on-updates).
## Practical example
[Section titled “Practical example”](#practical-example)
Create a bucket and a compound index on `(category, price)`:
```kronotop
BUCKET.CREATE products
BUCKET.INDEX CREATE products '{
"$compound": [{
"name": "idx_cat_price",
"fields": [
{"selector": "category", "bson_type": "string"},
{"selector": "price", "bson_type": "double"}
]
}]
}'
```
Insert some documents:
```kronotop
BUCKET.INSERT products DOCS '{"category": "electronics", "price": 299.99, "name": "Headphones"}'
BUCKET.INSERT products DOCS '{"category": "electronics", "price": 49.99, "name": "USB Cable"}'
BUCKET.INSERT products DOCS '{"category": "books", "price": 19.99, "name": "Design Patterns"}'
BUCKET.INSERT products DOCS '{"category": "electronics", "price": 999.99, "name": "Laptop"}'
```
**Query: equality on both fields**, uses the full compound index:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{
"category": {"$eq": "electronics"},
"price": {"$eq": 299.99}
}'
1# "cursor_id" => (integer) 12
2# "entries" => 1) {"_id": "69ce887e6597b10d87d13511", "category": "electronics", "price": 299.99, "name": "Headphones"}
```
**Query: equality + range**, uses the compound index with equality on `category` and range on `price`:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{
"category": {"$eq": "electronics"},
"price": {"$gt": 100.0}
}'
1# "cursor_id" => (integer) 13
2# "entries" =>
1) {"_id": "69ce887e6597b10d87d13511", "category": "electronics", "price": 299.99, "name": "Headphones"}
2) {"_id": "69ce88886597b10d87d13514", "category": "electronics", "price": 999.99, "name": "Laptop"}
```
Returns the Headphones and Laptop documents.
**Query: filter on non-leading field only**, compound index not used:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{"price": {"$gt": 20.0}}'
1# "cursor_id" => (integer) 14
2# "entries" =>
1) {"_id": "69ce887e6597b10d87d13511", "category": "electronics", "price": 299.99, "name": "Headphones"}
2) {"_id": "69ce88816597b10d87d13512", "category": "electronics", "price": 49.99, "name": "USB Cable"}
3) {"_id": "69ce88886597b10d87d13514", "category": "electronics", "price": 999.99, "name": "Laptop"}
```
`price` is the second field in the compound index `(category, price)`. The prefix rule requires a predicate on `category` first. Without it, the prefix walk stops immediately and the compound index cannot be used. This falls back to a single-field index on `price` (if one exists) or a full scan.
```kronotop
127.0.0.1:5484> BUCKET.EXPLAIN products '{"price": {"$gt": 20.0}}'
1# "is_cached" => (true)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "FullScan"
3# "id" => (integer) 1
4# "scanType" => "FULL_SCAN"
5# "index" => "primary-index"
6# "predicate" =>
1# "type" => "PREDICATE"
2# "selector" => "price"
3# "operator" => "GT"
4# "operand" => "Param[ref=ParamRef[index=0]]"
```
# CRUD Operations
> A quick-reference guide to basic document operations (insert, query, update, delete) with a simple product catalog.
A quick-reference guide to basic document operations (insert, query, update, delete) with a simple product catalog. This guide assumes you have a RESP-compatible CLI client (`kronotop-cli`, `valkey-cli` or similar) installed and connected to a running Kronotop instance.
## Session Setup
[Section titled “Session Setup”](#session-setup)
Kronotop supports both RESP2 and RESP3 wire protocols. Switch to RESP3. Its map-based responses are more readable for this tutorial:
```kronotop
127.0.0.1:5484> HELLO 3
1# "server" => "Kronotop"
2# "version" => "2026.06-2"
3# "proto" => (integer) 3
4# "id" => (integer) 0
5# "mode" => "cluster"
6# "role" => "master"
7# "modules" => (empty array)
```
Configure the session to use JSON for readability:
```kronotop
127.0.0.1:5484> SESSION.ATTRIBUTE SET input_type json
OK
127.0.0.1:5484> SESSION.ATTRIBUTE SET reply_type json
OK
127.0.0.1:5484> SESSION.ATTRIBUTE SET object_id_format hex
OK
```
All examples in this guide run in **auto-commit mode**. Each command is executed as a one-off transaction that commits immediately. See [Transactions](/docs/transactions/) for explicit transaction control.
## Create a Bucket
[Section titled “Create a Bucket”](#create-a-bucket)
Create a bucket named `products`:
```kronotop
127.0.0.1:5484> BUCKET.CREATE products
OK
```
## Insert Documents
[Section titled “Insert Documents”](#insert-documents)
### Single insert
[Section titled “Single insert”](#single-insert)
```kronotop
127.0.0.1:5484> BUCKET.INSERT products DOCS '{
"category": "books",
"price": 19.99,
"name": "The Disconnected"
}'
1) "69dbdc95690a394e625a82c0"
```
Kronotop generates an `_id` (ObjectId) for each document and returns it.
### Batch insert
[Section titled “Batch insert”](#batch-insert)
Pass multiple documents after the `DOCS` keyword. All documents are inserted atomically in a single transaction, either all succeed or none do.
```kronotop
127.0.0.1:5484> BUCKET.INSERT products DOCS '{
"category": "books",
"price": 24.99,
"name": "The Black Book"
}' '{
"category": "electronics",
"price": 499.99,
"name": "Wireless Headphones"
}'
1) "69dbdccc690a394e625a82c1"
2) "69dbdccc690a394e625a82c2"
```
See [BUCKET.INSERT](/docs/bucket/commands/bucket-insert/) for user-provided `_id` values and document format details.
## Query Documents
[Section titled “Query Documents”](#query-documents)
### All documents
[Section titled “All documents”](#all-documents)
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{}'
1# "cursor_id" => (integer) 2
2# "entries" =>
1) {"_id": "69dbdc95690a394e625a82c0", "category": "books", "price": 19.99, "name": "The Disconnected"}
2) {"_id": "69dbdccc690a394e625a82c1", "category": "books", "price": 24.99, "name": "The Black Book"}
3) {"_id": "69dbdccc690a394e625a82c2", "category": "electronics", "price": 499.99, "name": "Wireless Headphones"}
```
### Filter by field
[Section titled “Filter by field”](#filter-by-field)
Find all books:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{"category": "books"}'
1# "cursor_id" => (integer) 3
2# "entries" =>
1) {"_id": "69dbdc95690a394e625a82c0", "category": "books", "price": 19.99, "name": "The Disconnected"}
2) {"_id": "69dbdccc690a394e625a82c1", "category": "books", "price": 24.99, "name": "The Black Book"}
```
### Filter with comparison operators
[Section titled “Filter with comparison operators”](#filter-with-comparison-operators)
Find products cheaper than 25:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{"price": {"$lt": 25.0}}'
1# "cursor_id" => (integer) 5
2# "entries" =>
1) {"_id": "69dbdc95690a394e625a82c0", "category": "books", "price": 19.99, "name": "The Disconnected"}
2) {"_id": "69dbdccc690a394e625a82c1", "category": "books", "price": 24.99, "name": "The Black Book"}
```
### Combine multiple filters
[Section titled “Combine multiple filters”](#combine-multiple-filters)
Find electronics cheaper than 100, no match in our data:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{
"$and": [
{"category": "electronics"},
{"price": {"$lt": 100.0}}
]
}'
1# "cursor_id" => (integer) 6
2# "entries" => (empty array)
```
### Sort and limit
[Section titled “Sort and limit”](#sort-and-limit)
`SORTBY` requires an index on the sort field. Create one on `price` first:
```kronotop
127.0.0.1:5484> BUCKET.INDEX CREATE products '{
"price": {"bson_type": "double"}
}'
OK
```
Now find the cheapest product:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{}' SORTBY price ASC LIMIT 1
1# "cursor_id" => (integer) 7
2# "entries" => 1) {"_id": "69dbdc95690a394e625a82c0", "category": "books", "price": 19.99, "name": "The Disconnected"}
```
See [BUCKET.QUERY](/docs/bucket/commands/bucket-query/) for the full query syntax and filter operators. See [SORTBY](/docs/bucket/sortby/) for sorting details and compound index support.
## Update Documents
[Section titled “Update Documents”](#update-documents)
### Set a field
[Section titled “Set a field”](#set-a-field)
Increase the price of “The Disconnected”:
```kronotop
127.0.0.1:5484> BUCKET.UPDATE products '{"name": "The Disconnected"}' '{
"$set": {"price": 22.99}
}'
1# "cursor_id" => (integer) 8
2# "object_ids" => 1) "69dbdc95690a394e625a82c0"
```
### Unset a field
[Section titled “Unset a field”](#unset-a-field)
Remove the `category` field from all electronics:
```kronotop
127.0.0.1:5484> BUCKET.UPDATE products '{"category": "electronics"}' '{
"$unset": ["category"]
}'
1# "cursor_id" => (integer) 9
2# "object_ids" => 1) "69dbdccc690a394e625a82c2"
```
See [BUCKET.UPDATE](/docs/bucket/commands/bucket-update/) for `array_filters`, `upsert`, and other update operators.
## Delete Documents
[Section titled “Delete Documents”](#delete-documents)
Delete all books:
```kronotop
127.0.0.1:5484> BUCKET.DELETE products '{"category": "books"}'
1# "cursor_id" => (integer) 10
2# "object_ids" =>
1) "69dbdc95690a394e625a82c0"
2) "69dbdccc690a394e625a82c1"
```
See [BUCKET.DELETE](/docs/bucket/commands/bucket-delete/) for batch deletion and filter options.
## Pagination with BUCKET.ADVANCE
[Section titled “Pagination with BUCKET.ADVANCE”](#pagination-with-bucketadvance)
When a query matches more documents than the batch size, use `BUCKET.ADVANCE` to fetch subsequent pages.
First, insert a few more products:
```kronotop
127.0.0.1:5484> BUCKET.INSERT products DOCS '{
"category": "books",
"price": 12.99,
"name": "The Disconnected"
}' '{
"category": "electronics",
"price": 79.99,
"name": "USB-C Hub"
}' '{
"category": "electronics",
"price": 149.99,
"name": "Mechanical Keyboard"
}'
1) "69dbddc3690a394e625a82c3"
2) "69dbddc3690a394e625a82c4"
3) "69dbddc3690a394e625a82c5"
```
Query with a limit of 2:
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{}' LIMIT 2
1# "cursor_id" => (integer) 12
2# "entries" =>
1) {"_id": "69dbdccc690a394e625a82c2", "price": 499.99, "name": "Wireless Headphones"}
2) {"_id": "69dbddc3690a394e625a82c3", "category": "books", "price": 12.99, "name": "The Disconnected"}
```
Fetch the next page using the cursor ID:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 12
1# "cursor_id" => (integer) 12
2# "entries" =>
1) {"_id": "69dbddc3690a394e625a82c4", "category": "electronics", "price": 79.99, "name": "USB-C Hub"}
2) {"_id": "69dbddc3690a394e625a82c5", "category": "electronics", "price": 149.99, "name": "Mechanical Keyboard"}
```
Continue until the entries array is empty:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 12
1# "cursor_id" => (integer) 12
2# "entries" => (empty array)
```
`BUCKET.ADVANCE` also works with `DELETE` and `UPDATE` operations. See [BUCKET.ADVANCE](/docs/bucket/commands/bucket-advance/) for details.
## Closing Cursors
[Section titled “Closing Cursors”](#closing-cursors)
Every `BUCKET.QUERY`, `BUCKET.DELETE`, and `BUCKET.UPDATE` command creates a cursor that holds state in the session. Each cursor holds the filter, sort configuration, and current position. These cursors stay open until explicitly closed. When you are done paginating, close the cursor to release its resources:
```kronotop
127.0.0.1:5484> BUCKET.CLOSE QUERY 12
OK
```
A closed cursor cannot be advanced:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 12
(error) ERR No previous query context found for 'query' operation with the given cursor id
```
See [BUCKET.CLOSE](/docs/bucket/commands/bucket-close/) and [BUCKET.CURSORS](/docs/bucket/commands/bucket-cursors/) for details.
# Cursor-Based Streaming
> Every BUCKET.QUERY, BUCKET.DELETE, and BUCKET.UPDATE command returns results in batches through a cursor.
Every `BUCKET.QUERY`, `BUCKET.DELETE`, and `BUCKET.UPDATE` command returns results in batches through a cursor. Rather than computing the entire result set up front, each call produces the next batch and advances the cursor’s position. The total number of matching documents is not known in advance. You consume results batch by batch, calling `BUCKET.ADVANCE` until the batch comes back empty.
Outside an explicit `BEGIN`/`COMMIT` block, each `BUCKET.ADVANCE` call runs in its own transaction, so the work is spread across independent, short-lived transactions. When you are done, `BUCKET.CLOSE` releases the cursor.
## Why Batching
[Section titled “Why Batching”](#why-batching)
Kronotop delivers results in batches rather than all at once for two reasons:
* **Transaction time budget.** FoundationDB limits each transaction to approximately 5 seconds. A query matching thousands of documents cannot fetch, decode, and return them all within a single transaction. Batching splits the work across multiple short-lived transactions.
* **Memory efficiency.** Returning the entire result set at once would require buffering all matching documents in memory. Batching caps memory usage at the batch size.
## Cursor Lifecycle
[Section titled “Cursor Lifecycle”](#cursor-lifecycle)
A cursor goes through three phases: creation, advancing, and closing.
**Creation** happens automatically when you run `BUCKET.QUERY`, `BUCKET.DELETE`, or `BUCKET.UPDATE`. The response includes a `cursor_id` and the first batch of results. The cursor stores the query filter, sort configuration, and current position.
**Advancing** fetches subsequent batches. Pass the operation type and cursor ID to `BUCKET.ADVANCE`. When the `entries` (or `object_ids`) array comes back empty consistently, the result set is exhausted ( see [Partial and Empty Batches](#partial-and-empty-batches)). There is no time limit between `BUCKET.ADVANCE` calls — a cursor remains valid indefinitely as long as the session is open and the cursor has not been closed. You can fetch the first batch now, wait an hour, and call `BUCKET.ADVANCE` to pick up where you left off.
**Closing** releases the cursor. Always call `BUCKET.CLOSE` when you are done paginating.
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{}' LIMIT 2
1# "cursor_id" => (integer) 0
2# "entries" =>
1) {"_id": "69ce80c76597b10d87d134ff", "category": "books", "price": 19.99, "name": "The Disconnected"}
2) {"_id": "69ce80c76597b10d87d13500", "category": "electronics", "price": 499.99, "name": "Wireless Headphones"}
```
Fetch the next batch:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 0
1# "cursor_id" => (integer) 0
2# "entries" =>
1) {"_id": "69ce80c76597b10d87d13501", "category": "electronics", "price": 79.99, "name": "USB-C Hub"}
```
No more results:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 0
1# "cursor_id" => (integer) 0
2# "entries" => (empty array)
```
Close the cursor:
```kronotop
127.0.0.1:5484> BUCKET.CLOSE QUERY 0
OK
```
A closed cursor cannot be advanced. See [BUCKET.ADVANCE](/docs/bucket/commands/bucket-advance/) and [BUCKET.CLOSE](/docs/bucket/commands/bucket-close/) for command details.
## Batch Size
[Section titled “Batch Size”](#batch-size)
`LIMIT` controls how many documents (or object IDs) are returned per batch. When omitted, the session’s default limit applies.
The default limit is 100. You can change it per session:
```kronotop
127.0.0.1:5484> SESSION.ATTRIBUTE SET limit 50
OK
```
All subsequent queries in this session use 50 as the default batch size unless overridden by an explicit `LIMIT` parameter.
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{}' LIMIT 10
```
This query returns at most 10 documents per batch, regardless of the session default.
See [BUCKET.QUERY](/docs/bucket/commands/bucket-query/) for the full parameter reference.
## Checkpointing
[Section titled “Checkpointing”](#checkpointing)
After each batch, the cursor records the exact position where it stopped. Each `BUCKET.ADVANCE` call resumes from that position. No documents are skipped or duplicated between batches.
## Partial and Empty Batches
[Section titled “Partial and Empty Batches”](#partial-and-empty-batches)
A batch may contain fewer results than `LIMIT` requested, or even zero results. This does **not** mean the result set is exhausted. The cursor’s position is still valid, and the next `BUCKET.ADVANCE` call resumes from where the previous batch stopped.
A highly selective filter against a large dataset is the most common cause: many documents are examined but few match, so the engine caps the work it performs per call and returns what it has found so far.
**When is the result set truly exhausted?** Keep calling `BUCKET.ADVANCE` until you receive empty batches consistently. A selective filter may produce several empty batches before finding the next group of matches. The result set is exhausted when there are no more documents left to scan, not after a single empty batch.
## Cursors with UPDATE and DELETE
[Section titled “Cursors with UPDATE and DELETE”](#cursors-with-update-and-delete)
Cursors are not limited to reads. `BUCKET.UPDATE` and `BUCKET.DELETE` use the same streaming model. The only difference is in the response shape: they return `object_ids` (the IDs of affected documents) instead of `entries` (full documents).
```kronotop
127.0.0.1:5484> BUCKET.DELETE users '{"status": "inactive"}' LIMIT 2
1# "cursor_id" => (integer) 3
2# "object_ids" =>
1) "69ce80c76597b10d87d13510"
2) "69ce80c76597b10d87d13511"
```
Advance to delete the next batch:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE DELETE 3
1# "cursor_id" => (integer) 3
2# "object_ids" => (empty array)
```
Each `BUCKET.ADVANCE DELETE` call deletes the next batch of matching documents and returns their IDs. The same applies to `BUCKET.ADVANCE UPDATE`.
See [BUCKET.DELETE](/docs/bucket/commands/bucket-delete/) and [BUCKET.UPDATE](/docs/bucket/commands/bucket-update/) for command details.
## Sorted Cursors
[Section titled “Sorted Cursors”](#sorted-cursors)
When `SORTBY` is used, the cursor iterates through a sort-field index in the requested direction. The checkpoint tracks position in that index, so **global ordering is guaranteed across all batches**. The combined result of all `BUCKET.ADVANCE` calls forms a single, consistently sorted sequence.
```kronotop
BUCKET.QUERY events '{}' SORTBY created_at DESC LIMIT 5
```
Each batch returns the next 5 events in descending `created_at` order. No event appears out of order, even across batch boundaries.
`SORTBY` is supported on `BUCKET.QUERY` and `BUCKET.UPDATE`, but not on `BUCKET.DELETE`.
See [SORTBY](/docs/bucket/sortby/) for sorting details, compound index support, and pagination examples.
## Multiple Cursors
[Section titled “Multiple Cursors”](#multiple-cursors)
A session can have multiple cursors active at the same time. Each cursor has its own ID, operation type, filter, and position. Cursors are independent – advancing or closing one does not affect others.
```kronotop
127.0.0.1:5484> BUCKET.QUERY products '{"category": "books"}' LIMIT 5
1# "cursor_id" => (integer) 0
2# "entries" => ...
127.0.0.1:5484> BUCKET.DELETE products '{"category": "discontinued"}' LIMIT 10
1# "cursor_id" => (integer) 1
2# "object_ids" => ...
```
Use `BUCKET.CURSORS` to list all active cursors in the session:
```kronotop
127.0.0.1:5484> BUCKET.CURSORS
1# "QUERY" =>
1# 0 => "{"category": "books"}"
2# "UPDATE" => (empty map)
3# "DELETE" =>
1# 1 => "{"category": "discontinued"}"
```
You can also filter by operation type:
```kronotop
127.0.0.1:5484> BUCKET.CURSORS QUERY
1# "QUERY" =>
1# 0 => "{"category": "books"}"
```
See [BUCKET.CURSORS](/docs/bucket/commands/bucket-cursors/) for the full response format.
## Session Binding
[Section titled “Session Binding”](#session-binding)
Cursors are bound to the session that created them. A cursor cannot be accessed from a different session. When a session disconnects, all its cursors are released automatically.
## Best Practices
[Section titled “Best Practices”](#best-practices)
* **Always close cursors.** Open cursors hold state in the session. Close them with `BUCKET.CLOSE` as soon as you are done paginating.
* **Handle empty batches.** An empty batch does not always mean the result set is exhausted. Keep calling `BUCKET.ADVANCE` until empty batches come back consistently. See [Partial and Empty Batches](#partial-and-empty-batches).
* **Choose an appropriate LIMIT.** Smaller batches use less memory per transaction. Larger batches reduce round trips. The default (100) is a reasonable starting point for most workloads.
* **Use `BUCKET.CURSORS` for debugging.** List active cursors to verify none are leaked.
* **Use `SORTBY` when ordering matters.** Without `SORTBY`, document order across batches depends on the index the engine selected and is not guaranteed to be meaningful.
# Dot Notation
> Dot notation is the path syntax used to address fields inside documents.
## Introduction
[Section titled “Introduction”](#introduction)
Dot notation is the path syntax used to address fields inside documents. Queries, indexes, and filters all use dot notation to identify which field a predicate or index applies to. A dot-separated string like `"address.city"` tells the system to navigate into the `address` object and select the `city` field.
## Syntax
[Section titled “Syntax”](#syntax)
A selector is one or more field names separated by dots:
```plaintext
"field"
"parent.child"
"parent.child.grandchild"
```
Each segment between dots is resolved left to right against the current value. The starting point is always the root of the document.
## Traversal Rules
[Section titled “Traversal Rules”](#traversal-rules)
### Top-level fields
[Section titled “Top-level fields”](#top-level-fields)
A single segment selects a root-level field.
```json
{"name": "Alice", "age": 30}
```
| Selector | Result |
| -------- | --------- |
| `name` | `"Alice"` |
| `age` | `30` |
### Nested documents
[Section titled “Nested documents”](#nested-documents)
When a segment resolves to a document, the next segment selects a field inside that document.
```json
{
"address": {
"city": "Istanbul",
"zip": "34000"
}
}
```
| Selector | Result |
| -------------- | -------------------------------------- |
| `address` | `{"city": "Istanbul", "zip": "34000"}` |
| `address.city` | `"Istanbul"` |
| `address.zip` | `"34000"` |
This works at any depth:
```json
{
"config": {
"database": {
"host": "localhost"
}
}
}
```
| Selector | Result |
| ---------------------- | ------------- |
| `config.database.host` | `"localhost"` |
### Array numeric indexing
[Section titled “Array numeric indexing”](#array-numeric-indexing)
When a segment resolves to an array and the next segment is a number, it selects the element at that zero-based index.
```json
{
"scores": [95, 87, 92]
}
```
| Selector | Result |
| ---------- | ------ |
| `scores.0` | `95` |
| `scores.1` | `87` |
| `scores.2` | `92` |
| `scores.5` | *null* |
### Array field collection
[Section titled “Array field collection”](#array-field-collection)
When a segment resolves to an array and the next segment is a non-numeric field name, the system iterates through every element of the array. For each element that is a document, it looks up the field inside that document. The collected values are returned as an array.
```json
{
"orders": [
{"total": 120, "status": "shipped"},
{"total": 45, "status": "pending"}
]
}
```
| Selector | Result |
| --------------- | ------------------------ |
| `orders.total` | `[120, 45]` |
| `orders.status` | `["shipped", "pending"]` |
Non-document elements in the array are skipped. If no elements match, the result is *null*.
### Multi-level array flattening
[Section titled “Multi-level array flattening”](#multi-level-array-flattening)
When field collection passes through multiple levels of arrays, collected values are flattened into a single array.
```json
{
"departments": [
{"teams": [{"name": "Alpha"}, {"name": "Beta"}]},
{"teams": [{"name": "Gamma"}]}
]
}
```
| Selector | Result |
| ------------------------ | ---------------------------- |
| `departments.teams.name` | `["Alpha", "Beta", "Gamma"]` |
The result is a flat list, not nested arrays.
### Mixed paths
[Section titled “Mixed paths”](#mixed-paths)
These rules compose. A single selector can mix document traversal, numeric array indexing, and array field collection.
```json
{
"users": [
{
"name": "Alice",
"address": {"city": "Istanbul"}
},
{
"name": "Bob",
"address": {"city": "Ankara"}
}
]
}
```
| Selector | Result |
| ---------------------- | ---------------------------------------------------- |
| `users.0` | `{"name": "Alice", "address": {"city": "Istanbul"}}` |
| `users.0.name` | `"Alice"` |
| `users.0.address.city` | `"Istanbul"` |
| `users.name` | `["Alice", "Bob"]` |
| `users.address.city` | `["Istanbul", "Ankara"]` |
## Missing Paths
[Section titled “Missing Paths”](#missing-paths)
When a selector does not match any value, the result is *null*. This covers missing fields, out-of-bounds array indexes, and segments that try to traverse into a primitive. No error is raised.
| Scenario | Example selector | Result |
| --------------------------- | ----------------- | ------ |
| Field does not exist | `email` | *null* |
| Nested field does not exist | `address.country` | *null* |
| Array index out of bounds | `scores.99` | *null* |
| Traversal into a primitive | `name.first` | *null* |
# $elemMatch
> $elemMatch tests whether at least one element in an array field satisfies all specified conditions.
## Introduction
[Section titled “Introduction”](#introduction)
`$elemMatch` tests whether at least one element in an array field satisfies all specified conditions. It works with two kinds of arrays: scalar arrays that hold primitive values like numbers or strings, and document arrays that hold objects with named fields.
Without `$elemMatch`, separate conditions on an array field can match different elements. `$elemMatch` guarantees that a single element satisfies every condition.
## Scalar arrays
[Section titled “Scalar arrays”](#scalar-arrays)
This section uses a `sensors` bucket. Each document has a `readings` field containing an array of integers.
```kronotop
BUCKET.CREATE sensors INDEXES '{"readings": {"bson_type": "int32", "multi_key": true}}'
```
```kronotop
BUCKET.INSERT sensors DOCS '{"sensor_id": "temp-01", "readings": [22, 45, 52]}'
BUCKET.INSERT sensors DOCS '{"sensor_id": "temp-02", "readings": [78, 82, 95]}'
BUCKET.INSERT sensors DOCS '{"sensor_id": "temp-03", "readings": [60, 65, 70]}'
```
**Single condition.** Find sensors with at least one reading above 80:
```kronotop
BUCKET.QUERY sensors '{"readings": {"$elemMatch": {"$gt": 80}}}'
```
Matches temp-02 (82, 95 are above 80).
**Range.** Find sensors with at least one reading between 50 and 70 (inclusive):
```kronotop
BUCKET.QUERY sensors '{"readings": {"$elemMatch": {"$gte": 50, "$lte": 70}}}'
```
Matches temp-01 (52) and temp-03 (60, 65, 70).
**Exact match.** Find sensors with a reading equal to 45:
```kronotop
BUCKET.QUERY sensors '{"readings": {"$elemMatch": {"$eq": 45}}}'
```
Matches temp-01.
**Exclusion.** Find sensors with at least one reading that is not 78:
```kronotop
BUCKET.QUERY sensors '{"readings": {"$elemMatch": {"$ne": 78}}}'
```
Matches all three. Each sensor has at least one reading different from 78.
**Set membership.** Find sensors with a reading of 22, 60, or 95:
```kronotop
BUCKET.QUERY sensors '{"readings": {"$elemMatch": {"$in": [22, 60, 95]}}}'
```
Matches temp-01 (22) and temp-02 (95) and temp-03 (60).
**Alternatives.** Find sensors with a reading above 90 or below 25:
```kronotop
BUCKET.QUERY sensors '{"readings": {"$elemMatch": {"$or": [{"$gt": 90}, {"$lt": 25}]}}}'
```
Matches temp-01 (22 is below 25) and temp-02 (95 is above 90).
## Document arrays
[Section titled “Document arrays”](#document-arrays)
This section uses an `orders` bucket. Each document has an `items` array where each element is an object with product details.
```kronotop
BUCKET.CREATE orders INDEXES '{"items.category": {"bson_type": "string", "multi_key": true}}'
```
```kronotop
BUCKET.INSERT orders DOCS '{
"customer": "Alice",
"items": [
{"product": "Headphones", "category": "electronics", "price": 79.99, "status": "shipped", "tags": ["sale", "new"]},
{"product": "Novel", "category": "books", "price": 14.99, "status": "delivered", "tags": ["bestseller"]}
]
}'
BUCKET.INSERT orders DOCS '{
"customer": "Bob",
"items": [
{"product": "Laptop", "category": "electronics", "price": 999.99, "status": "processing", "tags": ["sale", "new", "featured"]},
{"product": "Mouse", "category": "electronics", "price": 29.99, "status": "cancelled", "tags": ["sale"]}
]
}'
BUCKET.INSERT orders DOCS '{
"customer": "Carol",
"items": [
{"product": "Desk", "category": "furniture", "price": 249.99, "status": "shipped", "tags": ["new"], "discount": 15}
]
}'
```
**Single field condition.** Find orders with at least one item priced above 100:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"price": {"$gt": 100}}}}'
```
Matches Bob (Laptop, 999.99) and Carol (Desk, 249.99).
**Multiple conditions.** Find orders with at least one electronics item above 100. Both conditions must be satisfied by the same element:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"category": "electronics", "price": {"$gt": 100}}}}'
```
Matches only Bob. Alice has an electronics item (Headphones) but it costs 79.99. Carol’s expensive item is furniture, not electronics.
**Exclusion.** Find orders with at least one item whose status is not “cancelled”:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"status": {"$ne": "cancelled"}}}}'
```
Matches all three. Each order has at least one non-cancelled item.
**Set membership.** Find orders with at least one item that is “shipped” or “delivered”:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"status": {"$in": ["shipped", "delivered"]}}}}'
```
Matches Alice (shipped Headphones, delivered Novel) and Carol (shipped Desk).
**Negative set.** Find orders with at least one item not in \[“cancelled”, “processing”]:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"status": {"$nin": ["cancelled", "processing"]}}}}'
```
Matches Alice (shipped, delivered) and Carol (shipped).
**Field existence.** Find orders with at least one item that has a “discount” field:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"discount": {"$exists": true}}}}'
```
Matches only Carol (Desk has discount: 15).
**Alternatives.** Find orders with at least one item that is either electronics above 200 or furniture in stock:
```kronotop
BUCKET.QUERY orders '{
"items": {
"$elemMatch": {
"$or": [
{"$and": [{"category": "electronics"}, {"price": {"$gt": 200}}]},
{"$and": [{"category": "furniture"}, {"status": {"$ne": "cancelled"}}]}
]
}
}
}'
```
Matches Bob (Laptop is electronics above 200) and Carol (Desk is furniture, not cancelled).
**Inner array checks with $all.** Find orders with at least one item tagged both “sale” and “new”:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"tags": {"$all": ["sale", "new"]}}}}'
```
Matches Alice (Headphones has both) and Bob (Laptop has both).
**Inner array checks with $size.** Find orders with at least one item that has exactly 3 tags:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"tags": {"$size": 3}}}}'
```
Matches only Bob (Laptop has \[“sale”, “new”, “featured”]).
**Combining $all and $size.** Find orders with at least one item tagged both “sale” and “new” with exactly 2 tags:
```kronotop
BUCKET.QUERY orders '{"items": {"$elemMatch": {"tags": {"$all": ["sale", "new"], "$size": 2}}}}'
```
Matches only Alice (Headphones has \[“sale”, “new”], exactly 2 tags). Bob’s Laptop has those tags but 3 tags total.
## Nested fields
[Section titled “Nested fields”](#nested-fields)
Use dot notation inside `$elemMatch` to reach fields in embedded documents within each array element.
Match orders where at least one item has `details.amount` above 150:
```json
{ "orders": { "$elemMatch": { "details.amount": { "$gte": 150 } } } }
```
Multiple nested paths can appear in the same `$elemMatch`. The same element must satisfy all of them:
```json
{
"entries": {
"$elemMatch": {
"meta.count": { "$gt": 70 },
"meta.total": { "$gt": 1000 }
}
}
}
```
## Nested $elemMatch
[Section titled “Nested $elemMatch”](#nested-elemmatch)
`$elemMatch` can be nested inside another `$elemMatch` to query arrays within arrays.
**Two levels.** Find stores with at least one department that has at least one product priced above 400:
```json
{
"departments": {
"$elemMatch": {
"products": {
"$elemMatch": {
"price": { "$gt": 400 }
}
}
}
}
}
```
**Three levels.** Find organizations with a division that has a unit containing a group with a score above 90:
```json
{
"divisions": {
"$elemMatch": {
"units": {
"$elemMatch": {
"groups": {
"$elemMatch": {
"score": { "$gt": 90 }
}
}
}
}
}
}
}
```
**Nested with multiple conditions.** Find companies with a team that has at least one member who is an engineer at level 4 or above:
```json
{
"teams": {
"$elemMatch": {
"members": {
"$elemMatch": {
"role": "engineer",
"level": { "$gte": 4 }
}
}
}
}
}
```
## Updating matched array elements
[Section titled “Updating matched array elements”](#updating-matched-array-elements)
`BUCKET.UPDATE` uses `$elemMatch` in the query filter to locate array elements. Three pieces work together:
* **`$elemMatch`** in the query selects which documents match and identifies the first array element that satisfies all conditions.
* **`$`** (the positional operator) in the update path stands in for the index of that matched element.
* **`$set`** or **`$unset`** in the update expression describes what to change.
### The positional `$` operator
[Section titled “The positional $ operator”](#the-positional--operator)
The `$` placeholder appears in the update path where you would normally write a numeric index. It resolves to the position of the first element that satisfied the `$elemMatch` filter. If multiple elements match, only the first one is affected.
The general form:
```kronotop
BUCKET.UPDATE '' '{"$set": {".$.": }}'
```
For scalar arrays (no nested field), the path is simply `.$`:
```kronotop
BUCKET.UPDATE '' '{"$set": {".$": }}'
```
### $set
[Section titled “$set”](#set)
`$set` changes the value of a field on the matched element, or replaces the element itself in scalar arrays.
**Document array.** Using the `orders` bucket from above, change the status of the first item that is priced at 100 or above and is currently “processing”:
```kronotop
BUCKET.UPDATE orders '{"items": {"$elemMatch": {"price": {"$gte": 100}, "status": "processing"}}}' '{"$set": {"items.$.status": "shipped"}}'
```
Bob’s Laptop (price 999.99, status “processing”) satisfies both conditions. Its status changes to “shipped”. The Mouse does not match, so it stays unchanged.
**Scalar number array.** Using the `sensors` bucket, replace the first reading that is 85 or above with 100:
```kronotop
BUCKET.UPDATE sensors '{"readings": {"$elemMatch": {"$gte": 85}}}' '{"$set": {"readings.$": 100}}'
```
For temp-02 (readings: \[78, 82, 95]), 95 is the first element satisfying the condition. The result is \[78, 82, 100].
**Scalar string array.** Replace the first tag equal to “featured” with “promoted”:
```kronotop
BUCKET.UPDATE products '{"tags": {"$elemMatch": {"$eq": "featured"}}}' '{"$set": {"tags.$": "promoted"}}'
```
### $unset
[Section titled “$unset”](#unset)
`$unset` removes a field from the matched element. The `$` in the path identifies which element to modify, and the field name after it specifies what to remove.
Remove the `discount` field from the first cancelled item:
```kronotop
BUCKET.UPDATE orders '{"items": {"$elemMatch": {"status": "cancelled"}}}' '{"$unset": ["items.$.discount"]}'
```
Bob’s Mouse (status “cancelled”) loses its `discount` field. Other items in the same document are not affected.
### Nested $elemMatch with `$`
[Section titled “Nested $elemMatch with $”](#nested-elemmatch-with-)
When `$elemMatch` is nested, the `$` operator refers to the position in the outermost array.
Given a document with an `orders` array where each order has an `items` array:
```kronotop
BUCKET.UPDATE customers '{"orders": {"$elemMatch": {"items": {"$elemMatch": {"name": "Widget", "qty": {"$gte": 5}}}}}}' '{"$set": {"orders.$.shipped": true}}'
```
The outer `$elemMatch` finds the first order that contains a Widget with qty >= 5. The `$` targets that order and sets `shipped: true` on it. Other orders are not touched.
## Indexes
[Section titled “Indexes”](#indexes)
### Multi-key indexes
[Section titled “Multi-key indexes”](#multi-key-indexes)
A multi-key index on the array field allows the query engine to use the index for candidate retrieval. The `$elemMatch` condition is then applied as a filter to verify that a single element satisfies all conditions.
Scalar array index:
```kronotop
BUCKET.INDEX CREATE sensors '{"readings": {"bson_type": "int32", "multi_key": true}}'
```
Document array index on a field inside each element:
```kronotop
BUCKET.INDEX CREATE teams '{"members.role": {"bson_type": "string", "multi_key": true}}'
```
### Combining with other indexed fields
[Section titled “Combining with other indexed fields”](#combining-with-other-indexed-fields)
When `$elemMatch` appears alongside conditions on other indexed fields, the query engine uses the most selective index for the initial scan and applies `$elemMatch` as a filter.
For example, if `category` has a single-field index:
```kronotop
BUCKET.QUERY orders '{"category": "electronics", "items": {"$elemMatch": {"price": {"$gt": 100}}}}'
```
The engine scans the `category` index first, then filters results with `$elemMatch`. Use `BUCKET.EXPLAIN` to verify.
### Index-accelerated operators
[Section titled “Index-accelerated operators”](#index-accelerated-operators)
When a multi-key index exists on the array field, the following operators inside `$elemMatch` can use the index:
| Operator | Index behavior |
| ---------------------------- | ---------------------- |
| `$eq` | Point lookup |
| `$gt`, `$gte`, `$lt`, `$lte` | Range scan |
| `$in` | Multiple point lookups |
The remaining operators (`$all`, `$size`, `$exists`, `$not`, `$ne`, `$nin`) are not index-accelerated. When these are the only conditions, the query falls back to a full scan with the `$elemMatch` filter applied.
## Edge cases
[Section titled “Edge cases”](#edge-cases)
| Condition | Result |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| Field is missing from the document | `$elemMatch` evaluates to false |
| Field is `null` | `$elemMatch` evaluates to false |
| Field is not an array (string, number, etc.) | `$elemMatch` evaluates to false |
| Field is an empty array `[]` | `$elemMatch` evaluates to false |
| Array contains `null` elements | `null` elements are evaluated normally. `$eq: null` matches them. Comparison operators like `$gt` skip them. |
| Single-element array | Normal evaluation. The single element must satisfy all conditions. |
## Supported operators
[Section titled “Supported operators”](#supported-operators)
Quick reference for all operators supported inside `$elemMatch`:
| Operator | Applies to | Example inside `$elemMatch` |
| ------------ | ---------------------- | -------------------------------------------------- |
| `$eq` | scalar, document field | `{ "$eq": 80 }` |
| `$ne` | scalar, document field | `{ "$ne": "cancelled" }` |
| `$gt` | scalar, document field | `{ "$gt": 100 }` |
| `$gte` | scalar, document field | `{ "$gte": 50 }` |
| `$lt` | scalar, document field | `{ "$lt": 90 }` |
| `$lte` | scalar, document field | `{ "$lte": 70 }` |
| `$in` | scalar, document field | `{ "$in": [22, 60, 95] }` |
| `$nin` | scalar, document field | `{ "$nin": ["cancelled", "refunded"] }` |
| `$all` | document field (array) | `{ "tags": { "$all": ["sale", "new"] } }` |
| `$size` | document field (array) | `{ "tags": { "$size": 3 } }` |
| `$exists` | document field | `{ "discount": { "$exists": true } }` |
| `$and` | scalar, document | `{ "$and": [{ "price": { "$gt": 10 } }, ...] }` |
| `$or` | scalar, document | `{ "$or": [{ "$gt": 90 }, { "$lt": 25 }] }` |
| `$not` | scalar, document field | `{ "status": { "$not": { "$eq": "cancelled" } } }` |
| `$elemMatch` | document field (array) | Nested `$elemMatch` for arrays within arrays |
# Plan Cache
> The plan cache stores compiled execution plans so that structurally identical queries skip the planning pipeline (parse → logical plan → physical plan → optimize) and reuse a previously compiled plan.
The plan cache stores compiled execution plans so that structurally identical queries skip the planning pipeline (parse → logical plan → physical plan → optimize) and reuse a previously compiled plan. Queries that differ only in literal values skip planning entirely.
## How It Works
[Section titled “How It Works”](#how-it-works)
### Query Shapes
[Section titled “Query Shapes”](#query-shapes)
A **query shape** is the structural fingerprint of a query: the operators, field paths, value types, and their nesting, but **not** the literal values. Two queries have the same shape when they use the same operators on the same fields with the same value types in the same structure.
**Same shape.** These two queries share one cached plan:
```plaintext
{"age": {"$gt": 25}}
{"age": {"$gt": 40}}
```
Both use `$gt` on the field `age` with an integer value.
**Different shapes.** These produce different cache entries:
```plaintext
{"age": {"$gt": 25}}
{"age": {"$eq": 25}}
```
The operator changed from `$gt` to `$eq`, so the shape is different.
**Different shape.** Value type matters:
```plaintext
{"age": {"$gt": 25}}
{"age": {"$gt": 25.0}}
```
The first uses an integer, the second uses a double. Different value types produce different shapes.
**Same shape with compound filters:**
```plaintext
{"$and": [{"age": {"$gt": 25}}, {"status": {"$eq": "active"}}]}
{"$and": [{"age": {"$gt": 40}}, {"status": {"$eq": "inactive"}}]}
```
Same operators, same fields, same value types, same shape.
**Different shape.** Array operator element count and element types matter:
```plaintext
{"status": {"$in": ["a", "b"]}}
{"status": {"$in": ["a", "b", "c"]}}
```
Both use `$in` on the field `status`, but the first has two elements and the second has three. The element count is part of the shape, so these produce different cache entries. The same rule applies to `$nin` and `$all`.
**Order independence.** Field ordering within `$and`/`$or` does not affect the shape:
```plaintext
{"$and": [{"age": {"$gt": 25}}, {"status": {"$eq": "active"}}]}
{"$and": [{"status": {"$eq": "active"}}, {"age": {"$gt": 25}}]}
```
These two queries produce the same shape hash because children are sorted before hashing.
### Parameterized Execution
[Section titled “Parameterized Execution”](#parameterized-execution)
Cached plans are templates with parameter slots, conceptually similar to SQL prepared statements. When a query is planned for the first time, its literal values are extracted into a parameter list and the compiled plan is stored in the cache. When a subsequent query with the same shape arrives:
1. The parameter values are extracted from the new query in the same deterministic order.
2. The cached plan is retrieved.
3. Each parameter slot in the plan is bound to the corresponding value from the new query.
This means the full planning pipeline runs only once per shape. Subsequent executions skip straight to parameter binding and plan execution.
### Parameter Ordering
[Section titled “Parameter Ordering”](#parameter-ordering)
Parameters are extracted in **canonical order**. AND/OR children are sorted by their shape hash, with insertion order preserved for siblings that have identical shapes. Physical plan nodes are walked in the same canonical order. Each node’s operand is mapped to a parameter index so that the binding is deterministic regardless of how the optimizer rearranges the plan internally.
For range scans (e.g., `$gt` + `$lt` on the same field), the lower and upper bounds are tracked as separate occurrences within the same node binding.
## Cache Key
[Section titled “Cache Key”](#cache-key)
Each cached plan is keyed by:
| Component | Description |
| ---------- | ------------------------------------- |
| Namespace | The active namespace |
| Bucket ID | UUID of the bucket |
| Shape hash | FNV-1a 64-bit hash of the query shape |
The shape hash incorporates the `SORTBY` field and the collation setting when present. Two queries that differ only in their `SORTBY` field or collation produce different cache entries because these affect index selection and comparison behavior.
## Eviction and TTL
[Section titled “Eviction and TTL”](#eviction-and-ttl)
| Setting | Default | Description |
| --------------------------- | ------- | ----------------------------------------- |
| Max entries per bucket | 200 | FIFO eviction, oldest entry removed first |
| `bucket.plan_cache.max_ttl` | 300000 | TTL in milliseconds (5 minutes) |
* Each bucket independently holds up to 200 cached plans. When the limit is exceeded, the oldest entry is evicted.
* TTL is checked lazily on each cache lookup. Expired plans are not returned and will be replaced on the next cache write for that shape.
## Configuration
[Section titled “Configuration”](#configuration)
The plan cache is controlled by two settings in `reference.conf`:
```hocon
bucket {
plan_cache {
enabled: true
max_ttl: 300000 // milliseconds
}
}
```
| Key | Type | Default | Description |
| --------------------------- | ------- | -------- | ---------------------------------- |
| `bucket.plan_cache.enabled` | boolean | `true` | Enable or disable the plan cache |
| `bucket.plan_cache.max_ttl` | int | `300000` | Time-to-live for cached plans (ms) |
## Automatic Invalidation
[Section titled “Automatic Invalidation”](#automatic-invalidation)
The cache is automatically invalidated in response to metadata changes:
| Event | Scope |
| ------------------------ | -------------------------------------------------- |
| Index created or dropped | All plans for the affected bucket |
| Index statistics updated | All plans for the affected bucket |
| Bucket removed | All plans for the removed bucket |
| Namespace removed | All plans under the removed namespace |
| Namespace moved | All plans under the old namespace path (by prefix) |
## Observability
[Section titled “Observability”](#observability)
Use `BUCKET.EXPLAIN` to inspect the execution plan for a query. The response includes an `is_cached` boolean that indicates whether the plan was served from the cache.
First execution. Plan is compiled and cached:
```kronotop
> BUCKET.EXPLAIN users '{"status": "active"}'
is_cached -> (boolean) false
plan -> planner_version -> (integer) 1
nodeType -> "IndexScan"
...
```
Subsequent execution with the same shape. Plan is served from cache:
```kronotop
> BUCKET.EXPLAIN users '{"status": "inactive"}'
is_cached -> (boolean) true
plan -> planner_version -> (integer) 1
nodeType -> "IndexScan"
...
```
The plan structure is identical in both cases; only the bound parameter values differ.
# Projection
> Projection controls which fields appear in documents returned by BUCKET.QUERY and BUCKET.VECTOR.
Projection controls which fields appear in documents returned by `BUCKET.QUERY` and `BUCKET.VECTOR`. The `PROJECTION` parameter accepts a JSON specification that selects fields to include or exclude.
Projection is applied after query execution. It does not affect which documents match a filter or how vector similarity is ranked. It only shapes what each returned document contains.
```kronotop
BUCKET.QUERY users '{"status": "active"}' PROJECTION '{"name": 1, "email": 1}'
```
## Projection Spec
[Section titled “Projection Spec”](#projection-spec)
A projection spec is a JSON object where keys are field names and values are `1` (include) or `0` (exclude).
### Inclusion Mode
[Section titled “Inclusion Mode”](#inclusion-mode)
When the spec contains fields set to `1`, only those fields are returned. The `_id` field is included by default.
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"name": 1, "age": 1}'
```
Input document:
```json
{"_id": {"$oid": "6835a1c0e4b0f72a3c000001"}, "name": "Alice", "age": 30, "email": "alice@example.com"}
```
Returned:
```json
{"_id": {"$oid": "6835a1c0e4b0f72a3c000001"}, "name": "Alice", "age": 30}
```
### Exclusion Mode
[Section titled “Exclusion Mode”](#exclusion-mode)
When the spec contains fields set to `0`, all fields are returned except those specified.
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"email": 0}'
```
Input document:
```json
{"_id": {"$oid": "6835a1c0e4b0f72a3c000001"}, "name": "Alice", "age": 30, "email": "alice@example.com"}
```
Returned:
```json
{"_id": {"$oid": "6835a1c0e4b0f72a3c000001"}, "name": "Alice", "age": 30}
```
### Mixing Rules
[Section titled “Mixing Rules”](#mixing-rules)
Inclusion and exclusion cannot be mixed in the same spec. The only exception is `_id: 0`, which can be combined with inclusion fields.
| Spec | Valid | Mode |
| ------------------------- | ----- | --------- |
| `{"name": 1, "age": 1}` | Yes | Inclusion |
| `{"email": 0}` | Yes | Exclusion |
| `{"_id": 0, "name": 1}` | Yes | Inclusion |
| `{"_id": 0}` | Yes | Exclusion |
| `{"name": 1, "email": 0}` | No | Error |
| `{}` | Yes | No-op |
### Empty Spec
[Section titled “Empty Spec”](#empty-spec)
An empty spec `{}` returns all fields unchanged, equivalent to no projection.
## The `_id` Field
[Section titled “The \_id Field”](#the-_id-field)
The `_id` field is included by default in inclusion mode. To exclude it, set `"_id": 0` explicitly.
**Inclusion without `_id` exclusion**, `_id` is included:
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"name": 1}'
```
```json
{"_id": {"$oid": "6835a1c0e4b0f72a3c000001"}, "name": "Alice"}
```
**Inclusion with `_id: 0`**, `_id` is excluded:
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"_id": 0, "name": 1}'
```
```json
{"name": "Alice"}
```
**`{"_id": 0}` alone**, exclusion mode, returns all fields except `_id`:
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"_id": 0}'
```
```json
{"name": "Alice", "age": 30, "email": "alice@example.com"}
```
## Nested Fields
[Section titled “Nested Fields”](#nested-fields)
Projection specs support [dot notation](/docs/bucket/dot-notation/) for nested fields. The parent document structure is preserved.
### Inclusion
[Section titled “Inclusion”](#inclusion)
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"address.city": 1}'
```
Input document:
```json
{"_id": {"$oid": "..."}, "name": "Alice", "address": {"city": "Istanbul", "zip": "34000"}}
```
Returned:
```json
{"_id": {"$oid": "..."}, "address": {"city": "Istanbul"}}
```
The `address` object is preserved but only contains the `city` field. Sibling fields (`name`, `address.zip`) are omitted.
### Exclusion
[Section titled “Exclusion”](#exclusion)
```kronotop
BUCKET.QUERY users '{}' PROJECTION '{"address.zip": 0}'
```
Returned:
```json
{"_id": {"$oid": "..."}, "name": "Alice", "address": {"city": "Istanbul"}}
```
Only the `zip` field is removed. All other fields are preserved.
### Arrays
[Section titled “Arrays”](#arrays)
When a dot-notation path crosses an array of documents, the projection applies to each element in the array.
```kronotop
BUCKET.QUERY shop '{}' PROJECTION '{"orders.total": 1}'
```
Input document:
```json
{"_id": {"$oid": "..."}, "orders": [{"total": 120, "status": "shipped"}, {"total": 45, "status": "pending"}]}
```
Returned:
```json
{"_id": {"$oid": "..."}, "orders": [{"total": 120}, {"total": 45}]}
```
This works through multiple levels of nesting: `"orders.items.name": 1` extracts `name` from each item in each order.
## Positional Operator (`$`)
[Section titled “Positional Operator ($)”](#positional-operator-)
The `$` operator returns the first array element that matched the query condition. It is used with inclusion mode.
```kronotop
BUCKET.QUERY students '{"grades": {"$gte": 85}}' PROJECTION '{"grades.$": 1}'
```
Input document:
```json
{"_id": {"$oid": "..."}, "name": "Alice", "grades": [70, 87, 90]}
```
Returned:
```json
{"_id": {"$oid": "..."}, "grades": [87]}
```
The query matched elements `87` and `90` (both `>= 85`), but `$` returns only the first match.
### Nested Paths
[Section titled “Nested Paths”](#nested-paths)
The `$` operator works with nested paths:
```kronotop
BUCKET.QUERY students '{"user.grades": {"$gte": 85}}' PROJECTION '{"user.grades.$": 1}'
```
### Rules
[Section titled “Rules”](#rules)
* `$` must appear at the **end** of the field path (e.g., `"grades.$": 1`, not `"grades.$.value": 1`).
* Only **one** `$` operator is allowed per projection spec.
* The query must reference the array field for `$` to identify the matched element.
* `$` can be combined with other inclusion fields in the same spec.
### Fallback Behavior
[Section titled “Fallback Behavior”](#fallback-behavior)
When the query does not reference the array field used with `$`, the operator defaults to the first element (index 0).
```kronotop
BUCKET.QUERY students '{"name": "Alice"}' PROJECTION '{"grades.$": 1}'
```
The query filters on `name`, not on `grades`. The `$` operator returns `grades[0]`.
## Usage
[Section titled “Usage”](#usage)
| Command | Syntax |
| --------------- | ----------------------------------------------------------------- |
| `BUCKET.QUERY` | `BUCKET.QUERY PROJECTION ` |
| `BUCKET.VECTOR` | `BUCKET.VECTOR ...PROJECTION ` |
### BUCKET.QUERY
[Section titled “BUCKET.QUERY”](#bucketquery)
```kronotop
BUCKET.QUERY users '{"status": "active"}' PROJECTION '{"name": 1, "email": 1}' LIMIT 10
```
### BUCKET.VECTOR
[Section titled “BUCKET.VECTOR”](#bucketvector)
Projection is useful for excluding large embedding arrays from vector search results:
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' PROJECTION '{"embedding": 0}'
```
Or returning only specific fields:
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' PROJECTION '{"label": 1}'
```
When a `FILTER` is provided, the positional `$` operator uses the filter expression to identify matched elements.
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' FILTER '{"tags": "ml"}' PROJECTION '{"tags.$": 1}'
```
# Selectivity Estimation
> When a query has multiple indexes available, the query engine must decide which one to use.
When a query has multiple indexes available, the query engine must decide which one to use. The engine predicts how many documents each candidate index would return and picks the one with the smallest estimate, the most selective index.
Without statistics, all candidate indexes look equally good. The engine picks one, but it may not be the best. After you run `BUCKET.INDEX ANALYZE`, the engine uses collected statistics to make an informed choice.
## How It Works
[Section titled “How It Works”](#how-it-works)
Selectivity estimation is a three-stage process: sampling, analysis, and estimation.
### Sampling
[Section titled “Sampling”](#sampling)
During normal index operations (inserts, updates, deletes), the engine automatically samples a small fraction of indexed values. Roughly 1 in every 16,000 values is selected and recorded as a sampling hint. No user action is required.
The sampled hints serve as representative pivot points across the index’s value space and guide the analysis stage.
### Analysis
[Section titled “Analysis”](#analysis)
When you run `BUCKET.INDEX ANALYZE`, a background task reads the collected sampling hints and uses them as pivot points for distributed sampling across the index. Around each pivot, the engine reads a small neighborhood of index entries. It also samples from the edges (the smallest and largest values in the index).
The collected samples are then partitioned into a histogram — an approximate summary of the value distribution. The histogram divides the value space into at most 10 ranges (buckets) of approximately equal size. Each range tracks the minimum value, maximum value, and approximate entry count within that range.
The analysis task also records the index’s cardinality: the total number of indexed entries.
### Estimation
[Section titled “Estimation”](#estimation)
At query time, when multiple indexes can satisfy a filter, the engine uses the histogram to estimate how many documents each index would return:
* **Equality filters** (`$eq`): The estimate is based on uniform distribution within the matching histogram range.
* **One-sided range filters** (`$gt`, `$gte`, `$lt`, `$lte`): The engine locates the filter value in the histogram and estimates the fraction of entries that fall within the range.
* **Bounded range filters** (`$gt` + `$lt` on the same field): The engine estimates the fraction of entries between the two bounds.
The index with the lowest estimated result count is selected as the primary scan. Remaining filters become residual predicates applied after retrieval.
When no statistics are available for an index, the engine assigns it the worst possible estimate, effectively deprioritizing it in favor of indexes that have been analyzed.
## Running Analysis
[Section titled “Running Analysis”](#running-analysis)
Trigger analysis with `BUCKET.INDEX ANALYZE`:
```kronotop
127.0.0.1:5484> BUCKET.INDEX ANALYZE users "selector:age.bsonType:INT32"
OK
```
The index must be in `READY` status. Analysis runs as a background task. Check its progress with `BUCKET.INDEX TASKS`:
```kronotop
127.0.0.1:5484> BUCKET.INDEX TASKS users "selector:age.bsonType:INT32"
1) 1# "kind" => "ANALYZE"
2# "status" => "COMPLETED"
```
After analysis completes, `BUCKET.INDEX DESCRIBE` shows the collected cardinality:
```kronotop
127.0.0.1:5484> BUCKET.INDEX DESCRIBE users "selector:age.bsonType:INT32"
1# "index_type" => "single_field"
2# "id" => (integer) 2
3# "selector" => "age"
4# "bson_type" => "INT32"
5# "status" => "READY"
6# "collation" =>
1# "locale" => (nil)
2# "strength" => (nil)
3# "case_level" => (nil)
4# "case_first" => (nil)
5# "numeric_ordering" => (nil)
6# "alternate" => (nil)
7# "backwards" => (nil)
8# "normalization" => (nil)
9# "max_variable" => (nil)
7# "statistics" =>
1# "cardinality" => (integer) 50000
```
Only one analysis task can run per index at a time. If an analysis task already exists, the command returns an error.
## What the Optimizer Does with Statistics
[Section titled “What the Optimizer Does with Statistics”](#what-the-optimizer-does-with-statistics)
The optimizer uses selectivity information in two ways: choosing which index to scan and deciding the order in which conditions are evaluated.
### Index Selection
[Section titled “Index Selection”](#index-selection)
When a query’s filter matches multiple indexes, the engine estimates the result count for each candidate using the histogram and picks the most selective one. For example, given indexes on both `status` and `region`:
```kronotop
BUCKET.QUERY orders '{"status": {"$eq": "shipped"}, "region": {"$eq": "eu-west"}}'
```
If the `status` index estimates 5,000 matches and the `region` index estimates 800 matches, the engine uses the `region` index as the primary scan and applies `status = "shipped"` as a residual filter.
For compound indexes, the engine packs the equality prefix and any trailing range bound into a composite key and looks it up in the compound index’s histogram. This allows compound indexes to compete fairly against single-field indexes during selection.
### Condition Ordering
[Section titled “Condition Ordering”](#condition-ordering)
Independent of histogram statistics, the optimizer reorders conditions within `$and` and `$or` expressions to improve short-circuit evaluation:
* **`$and`**: Conditions are ordered from most selective to least selective. If the first condition eliminates most candidates, later conditions run against a smaller set.
* **`$or`**: Conditions are ordered from least selective to most selective. A broad condition that matches early avoids evaluating narrower conditions unnecessarily.
This ordering uses lightweight heuristics based on operator type and index availability:
| Factor | More selective | Less selective |
| ------------------ | -------------------------------------- | ------------------------------ |
| Operator | `$eq` (point lookup) | `$ne`, `$exists` (broad match) |
| Index availability | Indexed field (scan is bounded) | Unindexed field (full scan) |
| Scan type | Compound index scan (narrow key space) | Full scan (entire dataset) |
## When to Analyze
[Section titled “When to Analyze”](#when-to-analyze)
Analysis is most valuable when:
* **Multiple indexes cover the same query.** Without statistics, the engine cannot distinguish between them. Analysis lets it pick the most selective one.
* **Data distribution is skewed.** If 90% of documents have `status = "active"` and 10% have `status = "archived"`, the histogram captures this skew and avoids choosing `status` as the primary scan for `$eq: "active"` queries.
* **After significant data changes.** A large batch insert or delete can shift the value distribution. Re-running analysis updates the histogram to reflect the current state.
Analysis is unnecessary when:
* **Only one index matches the query.** The engine uses it regardless of statistics.
* **The dataset is small.** With few documents, the cost difference between indexes is negligible.
## Observability
[Section titled “Observability”](#observability)
Use `BUCKET.EXPLAIN` to see which index the engine selected and whether the plan was influenced by statistics:
```kronotop
127.0.0.1:5484> BUCKET.EXPLAIN orders '{
"status": {"$eq": "shipped"},
"region": {"$eq": "eu-west"}
}'
1# "is_cached" => (false)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "TransformWithResidualPredicate"
3# "id" => (integer) 5
4# "child" =>
1# "nodeType" => "IndexScan"
2# "index" => "selector:region.bsonType:STRING"
3# "selector" => "region"
...
5# "predicate" =>
1# "selector" => "status"
2# "operator" => "EQ"
...
```
The plan shows `region` was chosen as the primary index scan and `status` was pushed to a residual predicate. This indicates the optimizer estimated `region` to be more selective than `status` for this query.
Compare with `BUCKET.INDEX DESCRIBE` to verify that both indexes have been analyzed and have cardinality data.
## Plan Cache Interaction
[Section titled “Plan Cache Interaction”](#plan-cache-interaction)
When analysis completes and statistics are updated, the plan cache for the affected bucket is automatically invalidated. This ensures that subsequent queries are re-planned using the new statistics rather than reusing a cached plan that was compiled without them.
See [Plan Cache](/docs/bucket/plan-cache/) for details on cache invalidation triggers.
## Limitations
[Section titled “Limitations”](#limitations)
* **Approximate, not exact.** The histogram summarizes the value distribution with at most 10 ranges. Estimates are good enough for index selection but are not precise row counts.
* **No automatic refresh.** Analysis does not run automatically. After significant data changes, you should re-run `BUCKET.INDEX ANALYZE` to update the statistics.
* **Compound indexes supported.** Compound indexes are analyzed using composite keys that preserve field ordering. The histogram reflects the combined key distribution, not individual fields.
# Single-Field Indexes
> A single-field index covers one field in a document.
## Introduction
[Section titled “Introduction”](#introduction)
A single-field index covers one field in a document. It accelerates queries that filter on that field by allowing the query engine to scan the index instead of reading every document in the bucket. Single-field indexes are the most common index type and the right default choice when your queries filter on individual fields.
## Creation
[Section titled “Creation”](#creation)
Single-field indexes are defined as top-level keys in the index schema. Each key is a field selector, and the value specifies the index properties:
```json
{
"username": {
"bson_type": "string"
}
}
```
With the command:
```kronotop
BUCKET.INDEX CREATE users '{"username": {"bson_type": "string"}}'
```
With a custom name:
```kronotop
BUCKET.INDEX CREATE users '{"username": {"bson_type": "string", "name": "idx_username"}}'
```
If `name` is omitted, a name is auto-generated from the selector and type. For example, `username` with type `string` produces `selector:username.bsonType:STRING`.
Multiple single-field indexes can be created in one command:
```kronotop
BUCKET.INDEX CREATE users '{"age": {"bson_type": "int32"}, "email": {"bson_type": "string"}}'
```
### Nested fields
[Section titled “Nested fields”](#nested-fields)
Use dot notation to index fields inside nested objects:
```kronotop
BUCKET.INDEX CREATE users '{"address.city": {"bson_type": "string"}}'
```
Given a document `{"address": {"city": "Istanbul", "zip": "34000"}}`, the selector `address.city` reaches `"Istanbul"`. See [Dot Notation](/docs/bucket/dot-notation/) for the full path syntax.
### Collation
[Section titled “Collation”](#collation)
String-typed indexes can specify a collation to control comparison rules:
```kronotop
BUCKET.INDEX CREATE users '{"username": {"bson_type": "string", "collation": {"locale": "tr"}}}'
```
When a query uses a collation, the query engine only selects an index if its collation is compatible. See [Collation](/docs/bucket/collation/) for details.
See [BUCKET.INDEX CREATE](/docs/bucket/commands/bucket-index/#bucketindex-create) for the full command reference.
## Supported operators
[Section titled “Supported operators”](#supported-operators)
The following operators can use a single-field index:
| Operator | Behavior |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `$eq` | Equality lookup. Point scan on the index. |
| `$ne` | Not-equal scan. Not supported on multi-key indexes (falls back to full scan). |
| `$gt` | Range scan for values strictly greater than the operand. |
| `$gte` | Range scan for values greater than or equal to the operand. |
| `$lt` | Range scan for values strictly less than the operand. |
| `$lte` | Range scan for values less than or equal to the operand. |
| `$in` | Transformed into multiple equality scans combined with OR. |
| `$nin` | Transformed into multiple not-equal scans combined with AND. Not supported on multi-key. |
| `$elemMatch` | Matches documents where at least one array element satisfies the sub-filter. Uses the index when combined with a multi-key index on the array field. |
Range operators can be combined. For example, `{age: {$gte: 18, $lt: 65}}` produces a bounded range scan on a single-field index.
The following operators are **not** index-accelerated and always trigger a full scan for that predicate:
| Operator | Why not indexed |
| --------- | ------------------------------------------------------------ |
| `$all` | Requires checking that all elements are present in an array. |
| `$size` | Requires counting array elements. |
| `$exists` | Requires checking field presence, not field value. |
## Multi-key indexes
[Section titled “Multi-key indexes”](#multi-key-indexes)
When a field contains an array, set `multi_key: true` to index each array element separately:
```kronotop
BUCKET.INDEX CREATE products '{"tags": {"bson_type": "string", "multi_key": true}}'
```
For an array of objects, use dot notation with `multi_key`:
```kronotop
BUCKET.INDEX CREATE users '{"orders.total": {"bson_type": "int32", "multi_key": true}}'
```
Given a document `{"orders": [{"total": 120}, {"total": 45}]}`, the selector `orders.total` with `multi_key: true` indexes both `120` and `45` as separate entries. A query `{orders.total: {$gt: 100}}` matches this document because at least one element satisfies the condition.
### Limitations
[Section titled “Limitations”](#limitations)
* **Undefined ordering.** Documents with multi-key indexes can have multiple index entries. The order in which documents are returned cannot be guaranteed.
* **Larger index size.** Each array element creates a separate index entry, so multi-key indexes grow proportionally to array sizes.
* **Strict type matching.** Only array elements matching the declared BSON type are indexed. Elements of other types are skipped.
* **`$ne` and `$nin` not supported.** On multi-key indexes, these operators fall back to a full scan. The index finds ” any element not equal to the value,” but the correct semantics requires “no element equal to the value.” To avoid incorrect results, the query engine skips the index entirely.
## Index maintenance on updates
[Section titled “Index maintenance on updates”](#index-maintenance-on-updates)
When `BUCKET.UPDATE` modifies a document, every index whose selector overlaps a modified field path is brought in sync with the new document content. Overlap covers three cases:
* **The field itself.** Setting or unsetting `age` refreshes an index on `age`.
* **A parent path.** Replacing `tags` as a whole refreshes an index on `tags.name`, because the update rewrites everything under `tags`.
* **A nested path.** Setting `tags.0` refreshes an index on `tags`, because an element of the indexed array changed.
For example, with an index on `tags.name` and the document `{"tags": [{"name": "java"}, {"name": "kotlin"}]}`, the update `{"$set": {"tags": [{"name": "go"}]}}` removes the `java` and `kotlin` entries and indexes `go`.
Indexes on unrelated or sibling fields are left untouched. Setting `meta.name` does not modify an index on `meta.color`.
After an update, an index always reflects the current document content. The result is the same as if the updated document had been freshly inserted.
## The primary index
[Section titled “The primary index”](#the-primary-index)
Every bucket has a built-in primary index on the `_id` field:
* **Name:** `primary-index`
* **Selector:** `_id`
* **Type:** `objectid`
* **Status:** Always `READY`
The primary index is created automatically when the bucket is created. It cannot be dropped.
The primary index is a regular single-field index and supports the same operators as any other single-field index: `$eq` for point lookups, `$gt`, `$gte`, `$lt`, `$lte` for range scans, and `$in` for multi-value lookups. It can also be used with `SORTBY _id ASC|DESC` to iterate documents in insertion order.
Like other single-field indexes, the primary index can be analyzed with `BUCKET.INDEX ANALYZE` to collect histogram statistics for selectivity estimation.
## Query engine behavior
[Section titled “Query engine behavior”](#query-engine-behavior)
When a query has a filter on a single field, the query engine checks for a matching single-field index. If one exists and is in `READY` status, the engine uses it.
When both a single-field index and a compound index cover the same field, the engine generally prefers the single-field index. The exception is when the query uses `SORTBY` and the filter operator is `$eq`. In that case, a compound index on the filter field and the sort field provides both filtering and sorted output, so the engine prefers it.
For queries with multiple filters on different fields, the query engine can use multiple single-field indexes independently. Each index produces a set of candidate documents, and the engine intersects the results.
## Constraints
[Section titled “Constraints”](#constraints)
* **Supported BSON types.** `string`, `int32`, `int64`, `double`, `boolean`, `datetime`, `timestamp`, `binary`, `objectid`. The `decimal128` type is not yet fully supported for indexing.
* **Unique names.** Index names must be unique across all indexes (single-field, compound, and vector) in the bucket.
* **Strict type matching.** Each index has a declared BSON type. Values that don’t match the type are rejected (with `strict_types = true`) or silently skipped (with `strict_types = false`). See [Strict Types](/docs/bucket/strict-types/).
* **One type per index.** A single-field index indexes values of exactly one BSON type. There is no mixed-type index.
## Practical example
[Section titled “Practical example”](#practical-example)
The examples below use RESP3 protocol output. Switch to RESP3 with `HELLO 3` before running the commands.
Create a bucket with a single-field index on `age`:
```kronotop
127.0.0.1:5484> BUCKET.CREATE users INDEXES '{
"age": {"bson_type": "int32", "name": "idx_age"}
}'
OK
```
Insert some documents:
```kronotop
BUCKET.INSERT users DOCS '{"name": "Alice", "age": 30}'
BUCKET.INSERT users DOCS '{"name": "Bob", "age": 25}'
BUCKET.INSERT users DOCS '{"name": "Charlie", "age": 40}'
BUCKET.INSERT users DOCS '{"name": "Diana", "age": 22}'
```
**Query: equality**, uses the index for a point lookup:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{"age": {"$eq": 30}}'
1# "cursor_id" => (integer) 16
2# "entries" => 1) {"_id": "69ce9b496597b10d87d13515", "name": "Alice", "age": 30}
```
**Query: range**, uses the index for a range scan:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{"age": {"$gte": 25, "$lt": 40}}'
1# "cursor_id" => (integer) 17
2# "entries" =>
1) {"_id": "69ce9b4a6597b10d87d13516", "name": "Bob", "age": 25}
2) {"_id": "69ce9b496597b10d87d13515", "name": "Alice", "age": 30}
```
Returns Bob (25) and Alice (30). Charlie (40) is excluded by `$lt: 40`, Diana (22) by `$gte: 25`.
**Explain output**, confirms the index scan:
```kronotop
127.0.0.1:5484> BUCKET.EXPLAIN users '{"age": {"$gte": 25, "$lt": 40}}'
1# "is_cached" => (true)
2# "plan" =>
1# "planner_version" => (integer) 1
2# "nodeType" => "RangeScan"
3# "id" => (integer) 7
4# "scanType" => "RANGE_SCAN"
5# "index" => "selector:age.bsonType:INT32"
6# "selector" => "age"
7# "lowerBound" => "Param[ref=ParamRef[index=0]]"
8# "upperBound" => "Param[ref=ParamRef[index=1]]"
9# "includeLower" => (true)
10# "includeUpper" => (false)
```
# Sorting
> SORTBY controls the ordering of documents returned by BUCKET.QUERY and BUCKET.UPDATE.
`SORTBY` controls the ordering of documents returned by `BUCKET.QUERY` and `BUCKET.UPDATE`. It accepts a field name and a direction (`ASC` or `DESC`). `BUCKET.DELETE` does not support `SORTBY`.
```kronotop
BUCKET.QUERY users '{}' SORTBY age ASC LIMIT 10
```
`SORTBY` requires an index on the sort field. The index provides natural ordering. FoundationDB stores index entries in sorted key order, so iterating through the index returns documents in the requested order without any in-memory sorting.
If no suitable index exists, the query is rejected at planning time with an actionable error message.
## How SORTBY Works
[Section titled “How SORTBY Works”](#how-sortby-works)
When the SORTBY field has a matching index and the optimizer selects it, the index itself provides the sort order.
* `ASC` reads the index forward.
* `DESC` reverses the FoundationDB scan direction.
* The cursor checkpoint tracks the position in the index, so each `BUCKET.ADVANCE` call picks up where the previous batch ended.
* **Global ordering is guaranteed.** Documents across all batches form a single, consistently sorted sequence.
Example: `age` is indexed.
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{}' SORTBY age ASC LIMIT 3
1# "cursor_id" => (integer) 2
2# "entries" =>
1) {"_id": "69ce80c76597b10d87d134ff", "age": 20}
2) {"_id": "69ce80c76597b10d87d13500", "age": 25}
3) {"_id": "69ce80c76597b10d87d13501", "age": 30}
```
Pagination via `BUCKET.ADVANCE`:
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 2
1# "cursor_id" => (integer) 2
2# "entries" =>
1) {"_id": "69ce80c76597b10d87d13502", "age": 35}
2) {"_id": "69ce80c76597b10d87d13503", "age": 40}
3) {"_id": "69ce80c76597b10d87d13504", "age": 45}
```
Every batch continues in strict ascending order.
## Collation
[Section titled “Collation”](#collation)
`SORTBY` ordering is determined by the collation used when the index was built. The index stores string values as ICU4J collation keys in FoundationDB, so the physical key order already encodes the locale-aware sort order.
The query-level `COLLATION` parameter does **not** affect `SORTBY` ordering. It applies to filter predicate evaluation, but it cannot change the order in which the index returns documents. That order is fixed at index creation time.
If the query specifies a collation that differs from the index’s collation, the query is rejected at planning time:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{}' SORTBY name ASC COLLATION '{
"locale": "fr"
}'
(error) ERR query cannot be executed: SORTBY 'name' cannot use the existing index because its collation does not match the query collation. Hint: use a collation that matches the index on 'name'
```
To sort strings under a specific locale, create the index with that collation and issue queries without a conflicting `COLLATION` override:
```kronotop
> BUCKET.INDEX CREATE users '{
"name": {
"bson_type": "string",
"collation": {"locale": "tr", "strength": 1}
}
}'
> BUCKET.QUERY users '{}' SORTBY name ASC
```
If you need a different collation for sorting than what the index provides, use `RESULTSORT` instead. It performs an in-memory sort on each batch and respects the full collation resolution order.
## Compound Index Support
[Section titled “Compound Index Support”](#compound-index-support)
`SORTBY` works with compound indexes. Range queries on the sort field itself are fine. The index naturally provides ordering within the scanned range. The only restriction is on **prefix fields**: all compound index fields **before** the sort field must use equality (`EQ`) filters. A range filter on a prefix field breaks the trailing-field ordering.
Example: compound index on `(status, age)`.
```kronotop
-- EQ on 'status', SORTBY on 'age' → works
BUCKET.QUERY users '{"status": "active"}' SORTBY age ASC LIMIT 10
-- Range on 'age', SORTBY on 'age' → works (sort field = range field)
BUCKET.QUERY users '{"age": {"$gt": 5, "$lt": 100}}' SORTBY age ASC LIMIT 10
-- Range on 'status', SORTBY on 'age' → rejected
BUCKET.QUERY users '{"status": {"$gt": "a"}}' SORTBY age ASC LIMIT 10
```
The last query is rejected because the range filter on `status` (a prefix field) breaks the ordering of `age` within the compound index.
## Filter and Sort on Separate Single-Field Indexes
[Section titled “Filter and Sort on Separate Single-Field Indexes”](#filter-and-sort-on-separate-single-field-indexes)
When the filter field and the sort field have separate single-field indexes, the query is rejected. The execution plan uses the filter-field index, which does not provide ordering on the sort field.
```kronotop
BUCKET.QUERY users '{"status": "active"}' SORTBY age ASC LIMIT 10
```
If `status` and `age` each have their own single-field index, this query fails. Create a compound index on `(status, age)` instead. The compound index provides both filtering on the prefix field and natural ordering on the sort field.
## Validation Errors
[Section titled “Validation Errors”](#validation-errors)
When `SORTBY` cannot be satisfied, the planner rejects the query with an actionable error message:
### No index on the sort field
[Section titled “No index on the sort field”](#no-index-on-the-sort-field)
The sort field has no index, so the engine cannot provide ordered results.
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{"status": "active"}' SORTBY score ASC
(error) ERR query cannot be executed: SORTBY 'score' requires an index that provides natural ordering. Hint: create an index on 'score'
```
### Compound index range prefix conflict
[Section titled “Compound index range prefix conflict”](#compound-index-range-prefix-conflict)
A compound index contains the sort field, but a range filter on a preceding field breaks the trailing-field ordering.
```kronotop
BUCKET.QUERY users '{"status": {"$gt": "a"}}' SORTBY age ASC
```
Compound index on `(status, age)`. The range on `status` breaks `age` ordering. The planner returns:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{"status": {"$gt": "a"}}' SORTBY age ASC
(error) ERR query cannot be executed: SORTBY 'age' cannot be executed because a range filter on 'status' breaks compound index ordering. Hint: use an equality filter on 'status' or create an index on 'age'
```
Two ways to fix: change the prefix filter to equality or create a dedicated single-field index on the sort field.
## Negation and Set Operators
[Section titled “Negation and Set Operators”](#negation-and-set-operators)
`$ne`, `$nin`, `$in`, and `$nor` interact with `SORTBY` differently depending on whether the operator targets the same field as the sort field.
### $ne and $nin
[Section titled “$ne and $nin”](#ne-and-nin)
`$ne` produces a single index scan that skips the excluded value. `$nin` becomes an AND of not-equal scans, which collapses to a single index scan with residual NE predicates. In both cases, the index scan iterates in order and filters out excluded values. Ordering is preserved.
```kronotop
-- 'age' is indexed
BUCKET.QUERY users '{"age": {"$ne": 25}}' SORTBY age ASC LIMIT 10
BUCKET.QUERY users '{"age": {"$nin": [20, 40]}}' SORTBY age ASC LIMIT 10
```
Both queries scan the `age` index in ascending order and skip excluded values. Global sort ordering is guaranteed.
`$ne` and `$nin` on a **different** field than the sort field are rejected:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{
"role": {"$nin": ["admin", "editor"]}
}' SORTBY age ASC
(error) ERR query cannot be executed: SORTBY 'age' cannot be used because the query's execution plan does not provide natural ordering on this field. Hint: use RESULTSORT 'age' for in-memory sorting
```
### $in
[Section titled “$in”](#in)
`$in` on the **same** indexed field as the sort field produces globally sorted results. The engine executes individual EQ scans sequentially in value order. Each scan returns entries for a single value, and scans are ordered by value. This avoids scanning the entire index.
```kronotop
-- 'age' is indexed
BUCKET.QUERY users '{"age": {"$in": [30, 10, 50]}}' SORTBY age ASC LIMIT 10
```
The engine executes EQ scans in order: first `age = 10`, then `age = 30`, then `age = 50`. Each scan is exhausted before the next begins. Pagination across value boundaries is handled correctly. A `BUCKET.ADVANCE` call may resume mid-value or cross into the next value.
`$in` on a **different** field than the sort field is rejected:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{
"name": {"$in": ["Alice", "Bob"]}
}' SORTBY age ASC
(error) ERR query cannot be executed: SORTBY 'age' cannot be used because the query's execution plan does not provide natural ordering on this field. Hint: use RESULTSORT 'age' for in-memory sorting
```
### $nor
[Section titled “$nor”](#nor)
`$nor` desugars to `$not($or(...))`. Negation always produces a full scan with a residual predicate. The full scan iterates by `_id`, not by the sort field. `SORTBY` is rejected:
```kronotop
127.0.0.1:5484> BUCKET.QUERY users '{
"$nor": [{"age": 10}, {"age": 30}]
}' SORTBY age ASC
(error) ERR query cannot be executed: SORTBY 'age' cannot be used because the query's execution plan does not provide natural ordering on this field. Hint: use RESULTSORT 'age' for in-memory sorting
```
Use `RESULTSORT` for in-memory per-batch sorting when `SORTBY` is not available.
## Pagination
[Section titled “Pagination”](#pagination)
The cursor checkpoint stores the position in the index. Each advance resumes from the exact key where the previous batch stopped. The combined result of all batches is identical to sorting the full result set.
```kronotop
-- 'created_at' is indexed
127.0.0.1:5484> BUCKET.QUERY events '{}' SORTBY created_at DESC LIMIT 5
1# "cursor_id" => (integer) 3
2# "entries" =>
1) {"_id": "69ce82626597b10d87d1350f", "created_at": {"$date": "2025-12-05T00:00:00Z"}}
2) {"_id": "69ce82626597b10d87d1350e", "created_at": {"$date": "2025-12-04T00:00:00Z"}}
3) {"_id": "69ce82626597b10d87d1350d", "created_at": {"$date": "2025-12-03T00:00:00Z"}}
4) {"_id": "69ce82626597b10d87d1350c", "created_at": {"$date": "2025-12-02T00:00:00Z"}}
5) {"_id": "69ce82626597b10d87d1350b", "created_at": {"$date": "2025-12-01T00:00:00Z"}}
```
```kronotop
127.0.0.1:5484> BUCKET.ADVANCE QUERY 3
1# "cursor_id" => (integer) 3
2# "entries" =>
1) {"_id": "69ce82626597b10d87d1350a", "created_at": {"$date": "2025-11-30T00:00:00Z"}}
2) {"_id": "69ce82626597b10d87d13509", "created_at": {"$date": "2025-11-29T00:00:00Z"}}
3) {"_id": "69ce82626597b10d87d13508", "created_at": {"$date": "2025-11-28T00:00:00Z"}}
4) {"_id": "69ce82626597b10d87d13507", "created_at": {"$date": "2025-11-27T00:00:00Z"}}
5) {"_id": "69ce82626597b10d87d13506", "created_at": {"$date": "2025-11-26T00:00:00Z"}}
```
## RESULTSORT
[Section titled “RESULTSORT”](#resultsort)
`RESULTSORT` is a separate parameter that provides in-memory per-batch sorting on any field, indexed or not. Unlike `SORTBY`, it does not require an index and does not guarantee global ordering across batches.
```kronotop
BUCKET.QUERY users '{}' RESULTSORT score ASC LIMIT 10
```
* Works on any field, no index required.
* Each batch is sorted independently in memory after documents are fetched.
* **Global ordering across `BUCKET.ADVANCE` calls is NOT guaranteed.** Batch N+1 may contain values that would sort before values in batch N.
* `RESULTSORT` is available for `BUCKET.QUERY` only, not for `BUCKET.UPDATE` or `BUCKET.DELETE`.
`SORTBY` and `RESULTSORT` serve different purposes: `SORTBY` provides globally sorted results through index ordering, while `RESULTSORT` provides per-batch ordering when an index is unavailable or unnecessary.
### Collation
[Section titled “Collation”](#collation-1)
When string values are compared during `RESULTSORT`, the effective collation is resolved in this order:
1. **Query-level collation**: set via the `COLLATION` parameter on the query.
2. **Index-level collation (single-field)**: the collation defined on a single-field index whose selector matches the `RESULTSORT` field.
3. **Index-level collation (compound)**: if all READY compound indexes containing the `RESULTSORT` field as a `string` field agree on the same collation, that collation is used. If any two disagree, this step is skipped.
4. **Bucket-level collation**: the default collation configured on the bucket.
5. **Binary comparison**: used when none of the above apply.
This matches the collation precedence used for filter predicate evaluation, so the same field behaves consistently whether it appears in a filter or in `RESULTSORT`.
## Sorting by `_id`
[Section titled “Sorting by \_id”](#sorting-by-_id)
The primary index on `_id` is always `READY`, so `SORTBY _id ASC|DESC` works without creating any additional indexes. Since ObjectId values encode a timestamp in their leading bytes, sorting by `_id` produces approximate insertion-order results.
```kronotop
BUCKET.QUERY users '{}' SORTBY _id ASC LIMIT 10
```
This is useful for scanning documents in the order they were inserted, or for fetching the most recently inserted documents with `DESC`.
## Null and Missing Field Handling
[Section titled “Null and Missing Field Handling”](#null-and-missing-field-handling)
When a document is missing the SORTBY field, or the field is explicitly `null`, the sort key is treated as `BsonNull`.
* `BsonNull` has type order 0 (the lowest rank in the type bracket ordering).
* In `ASC` order, nulls sort **first**.
* In `DESC` order, nulls sort **last**.
See [type-bracketing.md](/docs/bucket/type-bracketing/) for the full type order and comparison rules.
## Plan Cache Interaction
[Section titled “Plan Cache Interaction”](#plan-cache-interaction)
The SORTBY field is part of the plan cache key. Two queries with the same filter but different SORTBY fields produce different cache entries because different sort fields lead to different index selection during physical planning.
```kronotop
-- These produce separate cache entries:
BUCKET.QUERY users '{}' SORTBY age ASC
BUCKET.QUERY users '{}' SORTBY name ASC
```
See [plan-cache.md](/docs/bucket/plan-cache/) for details on cache keys, TTL, and eviction.
## Best Practices
[Section titled “Best Practices”](#best-practices)
* **Create indexes on fields you sort by.** `SORTBY` requires an index. Without one, the query is rejected.
* **Use `BUCKET.EXPLAIN`** to verify which index the optimizer selected.
* **Use `RESULTSORT`** when you need within-batch ordering on a field that doesn’t have an index and global ordering is not required.
* **Keep compound index prefix fields as EQ filters** when you need to sort on a trailing field in the compound index.
# Strict Types
> Kronotop treats every BSON type as distinct.
## Introduction
[Section titled “Introduction”](#introduction)
Kronotop treats every BSON type as distinct. `INT32` is not `STRING`. `BOOLEAN` is not `INT32`. There is no implicit type coercion between unrelated types, not during indexing, not during queries, not during background index builds.
For numeric types (`INT32`, `INT64`, `DOUBLE`, `DECIMAL128`), Kronotop supports **lossless numeric widening**. An `INT32` value can be widened to `INT64`, `DOUBLE`, or `DECIMAL128` without loss of precision. This allows queries and indexes to work across compatible numeric types while preserving exactness.
## Why strict types?
[Section titled “Why strict types?”](#why-strict-types)
### Clean data, clean results
[Section titled “Clean data, clean results”](#clean-data-clean-results)
When every value in an index has a known numeric or non-numeric type, queries are predictable. A predicate like `{age: {$gt: 25}}` with an `INT32` value matches `INT32`, `INT64`, `DOUBLE`, and `DECIMAL128` values via lossless widening, but never a `STRING`. No surprises from unrelated types sneaking in.
### Simpler index internals
[Section titled “Simpler index internals”](#simpler-index-internals)
Without strict typing, range scans on FoundationDB would need “type bracketing”, encoding type prefixes into index keys so a scan for integers doesn’t accidentally cross into string territory. FoundationDB’s tuple layer encodes different types differently, so a range scan for numeric values between 10 and 50 could pick up `STRING` values that happen to fall in the same byte range. Strict typing sidesteps this for non-numeric types. For numeric types, all four types (`INT32`, `INT64`, `DOUBLE`, `DECIMAL128`) share a single bracket, and lossless widening handles cross-type comparisons within that bracket.
### Less complexity, fewer bugs
[Section titled “Less complexity, fewer bugs”](#less-complexity-fewer-bugs)
Numeric widening is limited to lossless paths. Every conversion preserves the original value exactly. Non-numeric types remain strictly separated with no coercion of any kind.
## Configuration
[Section titled “Configuration”](#configuration)
```hocon
bucket {
index {
strict_types = true # default
}
}
```
`strict_types` is a global setting, not per-index. The default is `true`.
## Numeric widening
[Section titled “Numeric widening”](#numeric-widening)
Kronotop supports lossless numeric widening across the four numeric BSON types. Widening is applied during index selection, query predicate evaluation, index scan bounds calculation, and type bracket comparison.
### Allowed widening paths
[Section titled “Allowed widening paths”](#allowed-widening-paths)
| From | To |
| ------ | ------------------------- |
| INT32 | INT64, DOUBLE, DECIMAL128 |
| INT64 | DECIMAL128 |
| DOUBLE | DECIMAL128 |
### Forbidden path
[Section titled “Forbidden path”](#forbidden-path)
`INT64` to `DOUBLE` is explicitly forbidden. 64-bit integers exceed the 53-bit mantissa of IEEE 754 doubles, which would cause silent precision loss.
### Common type resolution and cost
[Section titled “Common type resolution and cost”](#common-type-resolution-and-cost)
When two numeric values of different types need to be compared, Kronotop resolves them to the cheapest lossless common type. It does NOT always promote to `DECIMAL128`.
| Pair | Common type | Cost |
| ----------------------- | ----------- | -------------------------------------------------------------- |
| INT32 + INT32 | INT32 | Identity, no conversion |
| INT32 + INT64 | INT64 | Cheap cast |
| INT32 + DOUBLE | DOUBLE | Cheap cast |
| INT32 + DECIMAL128 | DECIMAL128 | BigDecimal allocation |
| INT64 + INT64 | INT64 | Identity, no conversion |
| INT64 + DOUBLE | DECIMAL128 | BigDecimal allocation, unavoidable since INT64→DOUBLE is lossy |
| DOUBLE + DOUBLE | DOUBLE | Identity, no conversion |
| DOUBLE + DECIMAL128 | DECIMAL128 | BigDecimal allocation |
| DECIMAL128 + DECIMAL128 | DECIMAL128 | Identity, no conversion |
`DECIMAL128` (backed by `BigDecimal`) is only used when one side is already `DECIMAL128`, or for `INT64` vs `DOUBLE` where it is the only lossless common representation.
Same-type pairs are compared with primitive operations directly. Widening is only invoked for actual cross-type comparisons.
### Where widening applies
[Section titled “Where widening applies”](#where-widening-applies)
* **Index selection**: The physical planner can select an `INT64` index for an `INT32` query predicate.
* **Predicate evaluation**: A predicate `{age: {$gt: 25}}` (INT32) matches documents where `age` is `INT64(30)`.
* **Index scan bounds**: Query bounds are widened to the index’s declared type for correct FoundationDB tuple encoding.
* **Type bracket comparison**: All numeric types share a single bracket in the sort ordering.
### What is NOT widened
[Section titled “What is NOT widened”](#what-is-not-widened)
Non-numeric types are never widened. `STRING`, `BOOLEAN`, `DATETIME`, `OBJECT_ID`, and all other non-numeric types remain strictly typed. A type mismatch between any of these types always evaluates to `false`.
## Behavior when `strict_types = true` (default)
[Section titled “Behavior when strict\_types = true (default)”](#behavior-when-strict_types--true-default)
When strict types are enabled, a type mismatch between a document field and the index’s declared type causes the operation to fail, unless the mismatch is a lossless numeric widening. For example, inserting an `INT32` value into an `INT64` index succeeds because `INT32` can be losslessly widened to `INT64`.
* **INSERT**: If a document field’s type doesn’t match the index’s declared type and cannot be losslessly widened, the operation fails with an `INDEXTYPE_MISMATCH` error.
* **UPDATE**: A non-widenable type mismatch on an indexed field fails the update.
* **Background index build**: A non-widenable type mismatch marks the index build task as `FAILED`.
The error message format is:
```kronotop
Index type mismatch: index 'idx_age' expects 'INT32', but selector 'age' matched a value of type 'STRING'
```
## Behavior when `strict_types = false`
[Section titled “Behavior when strict\_types = false”](#behavior-when-strict_types--false)
When strict types are disabled, type mismatches are handled silently. Documents are still written, but mismatched fields are skipped during indexing.
* **INSERT**: Type-mismatched fields are silently skipped during indexing. The document is still inserted, but the mismatched field is not added to the index.
* **UPDATE**: Mismatched fields are skipped, and the document is updated normally.
* **Background index build**: Mismatched documents are skipped, and the build continues to completion.
* **Query impact**: Documents with unindexed fields won’t appear in index-assisted queries for that field. They can still be found via a full scan, but that’s slower.
## Query engine behavior
[Section titled “Query engine behavior”](#query-engine-behavior)
The query engine enforces type matching during predicate evaluation, regardless of the `strict_types` setting. For non-numeric types, this is strict: a `STRING` predicate never matches an `INT32` value. For numeric types, lossless widening is applied.
A predicate `{age: {$gt: 25}}` where `25` is `INT32` matches documents where `age` is `INT32`, `INT64`, `DOUBLE`, or `DECIMAL128`. Both values are promoted to a common lossless type before comparison. The same predicate will never match a document where `age` is a `STRING`.
**Non-numeric types** (strict matching, no cross-type comparison):
* `StringVal` matches only `STRING`
* `BooleanVal` matches only `BOOLEAN`
* `ObjectIdVal` matches only `OBJECT_ID`
* `DateTimeVal` matches only `DATE_TIME`
**Numeric types** (lossless widening across the numeric bracket):
* `Int32Val` matches `INT32`, `INT64`, `DOUBLE`, `DECIMAL128`
* `Int64Val` matches `INT32`, `INT64`, `DOUBLE`, `DECIMAL128`
* `DoubleVal` matches `INT32`, `INT64`, `DOUBLE`, `DECIMAL128`
* `Decimal128Val` matches `INT32`, `INT64`, `DOUBLE`, `DECIMAL128`
Every numeric predicate can match every numeric document type. Both values are promoted to the cheapest lossless common type before comparison (e.g., `INT32` + `INT64` → `INT64`, `INT64` + `DOUBLE` → `DECIMAL128`).
## Practical example
[Section titled “Practical example”](#practical-example)
Create a bucket and an `INT32` index on `age`:
```kronotop
BUCKET.CREATE users
BUCKET.INDEX CREATE users '{"age": {"bson_type": "int32", "name": "idx_age"}}'
```
Insert a document with a matching type:
```kronotop
BUCKET.INSERT users DOCS '{"name": "Alice", "age": 30}'
```
This succeeds. The `age` field is `INT32`, matching the index definition.
Now insert a document with a mismatched type:
```kronotop
BUCKET.INSERT users DOCS '{"name": "Bob", "age": "thirty"}'
```
With `strict_types = true` (default), this fails:
```kronotop
INDEXTYPE_MISMATCH Index type mismatch: index 'idx_age' expects 'INT32', but selector 'age' matched a value of type 'STRING'
```
With `strict_types = false`, the insert succeeds but the `age` field is not indexed. A query like `{age: {$gt: 25}}` using the index won’t find Bob’s document.
### Numeric widening in action
[Section titled “Numeric widening in action”](#numeric-widening-in-action)
Create a bucket and a `DOUBLE` index on `price`:
```kronotop
BUCKET.CREATE products
BUCKET.INDEX CREATE products '{"price": {"bson_type": "double", "name": "idx_price"}}'
```
Insert a document with an `INT32` value:
```kronotop
BUCKET.INSERT products DOCS '{"name": "Widget", "price": 50}'
```
This succeeds. The `price` field is `INT32`, which can be losslessly widened to `DOUBLE` for the index.
Query with an `INT32` predicate:
```kronotop
BUCKET.QUERY products '{"price": {"$gt": 25}}'
```
This also succeeds. The `INT32` predicate value `25` is widened to `DOUBLE` to match the index’s declared type, and the index scan finds the document.
Now insert a document with a `STRING` value:
```kronotop
BUCKET.INSERT products DOCS '{"name": "Gadget", "price": "fifty"}'
```
With `strict_types = true` (default), this fails with `INDEXTYPE_MISMATCH`. `STRING` cannot be widened to `DOUBLE`.
## Recommendation
[Section titled “Recommendation”](#recommendation)
Keep `strict_types = true` (the default). It catches data quality issues at write time rather than producing confusing query results later. Only disable it if you have a specific need for schemaless flexibility and understand that type-mismatched fields won’t be indexed.
# Type Bracketing
> Type bracketing defines a deterministic total ordering across all BSON types.
Type bracketing defines a deterministic total ordering across all BSON types. When a `RESULTSORT` operation encounters mixed types in the sort field, this ordering ensures the result set has a well-defined, consistent sort order.
## Type Order
[Section titled “Type Order”](#type-order)
Values of different types are ordered by their type bracket, from lowest to highest:
| Rank | Type |
| ---- | -------------------------------- |
| 0 | Null |
| 1 | Int32, Int64, Double, Decimal128 |
| 2 | String |
| 3 | Document |
| 4 | Array |
| 5 | Binary |
| 6 | ObjectId |
| 7 | Boolean |
| 8 | DateTime |
| 9 | Timestamp |
All four numeric types share a single bracket (rank 1). Within this bracket, values of different numeric types are compared numerically via lossless widening (see below).
A field with a `Null` value always sorts before any numeric value, a numeric value always sorts before a `String`, and so on regardless of the actual values.
## Same-Bracket Comparison
[Section titled “Same-Bracket Comparison”](#same-bracket-comparison)
When two values share the same type bracket, they are compared by their actual values:
* **Numeric bracket (Int32, Int64, Double, Decimal128)**: All four types share a single bracket. When two values have different numeric types, both are promoted to a common lossless type and compared numerically. For example, `Int32(100)` and `Int64(50)` are compared as `Int64` values, so `100 > 50`. The common type is determined by lossless widening rules (see [strict-types.md](/docs/bucket/strict-types/#numeric-widening)).
* **String**: Lexicographic comparison.
* **Document**: Field-by-field in iteration order. Keys are compared first, then values. A shorter document wins if all compared fields are equal.
* **Array**: Element-by-element ordering. A shorter array wins if all compared elements are equal.
* **Binary / ObjectId**: Byte-level comparison.
* **Boolean**: `false < true`.
* **DateTime / Timestamp**: Numeric comparison of the underlying value.
* **Null**: All nulls are equal.
## Where Type Bracketing Applies
[Section titled “Where Type Bracketing Applies”](#where-type-bracketing-applies)
Type bracketing is used in in-memory `RESULTSORT` operations. When a query includes `RESULTSORT`, the matched documents are sorted using type bracketing before applying `LIMIT`. It is also used at plan time to order `$in` scan predicates for `SORTBY` optimization.
Type bracketing is **not** used in:
* Index scans: indexes store entries of a declared type, and lossless numeric widening handles cross-type matching at scan time.
* Predicate evaluation: the query engine uses lossless numeric widening for numeric types and strict matching for non-numeric types.
## Relationship with strict\_types
[Section titled “Relationship with strict\_types”](#relationship-with-strict_types)
`strict_types` controls whether type mismatches during indexing produce errors or are silently skipped ( see [strict-types.md](/docs/bucket/strict-types/)). Type bracketing is orthogonal to this setting:
* **strict\_types = true** (default): A type mismatch at index time causes an error. However, fields that are not indexed can still contain mixed types across documents. A full scan with `RESULTSORT` on such a field will encounter mixed types, and type bracketing handles the ordering.
* **strict\_types = false**: Mismatched fields are silently skipped during indexing. Documents are still stored with their original field types, so `RESULTSORT` on those fields will encounter mixed types.
In both cases, type bracketing produces a consistent, deterministic sort order.
## Predicate Evaluation and Type Matching
[Section titled “Predicate Evaluation and Type Matching”](#predicate-evaluation-and-type-matching)
The query engine enforces type matching during predicate evaluation, regardless of `strict_types` configuration. For numeric types, lossless widening is applied: a predicate like `{age: {$gt: 30}}` (where `30` is an Int32) matches documents where `age` is Int32, Int64, Double, or Decimal128, and both values are promoted to a common lossless type before comparison. Documents where `age` is a String or any other non-numeric type are excluded.
Type bracketing applies after filtering when sorting the matched results.
# Vector Indexes
> A vector index enables approximate nearest neighbor (ANN) search on document fields that contain fixed-dimension numeric vectors.
## Introduction
[Section titled “Introduction”](#introduction)
A vector index enables approximate nearest neighbor (ANN) search on document fields that contain fixed-dimension numeric vectors. It is backed by [JVector](https://github.com/datastax/jvector), a high-performance graph-based vector search library. Use vector indexes when your documents contain embeddings and you need similarity search.
## Creation
[Section titled “Creation”](#creation)
Vector indexes are defined with the `$vector` key in the index schema:
```kronotop
BUCKET.INDEX CREATE products '{
"$vector": {"field": "embedding", "dimensions": 3, "distance": "cosine"}
}'
```
A vector index on a nested field using dot notation:
```kronotop
BUCKET.INDEX CREATE products '{
"$vector": {"field": "data.embedding", "dimensions": 128, "distance": "euclidean"}
}'
```
With an explicit name:
```kronotop
BUCKET.INDEX CREATE products '{
"$vector": {"name": "emb_idx", "field": "embedding", "dimensions": 768, "distance": "dot_product"}
}'
```
If `name` is omitted, a name is auto-generated from the selector, dimensions, and distance function.
| Parameter | Required | Description |
| ------------ | -------- | ------------------------------------------------------------- |
| `field` | Yes | Document field containing the vector. Supports dot notation. |
| `dimensions` | Yes | Number of dimensions (must be >= 1). |
| `distance` | Yes | Similarity function: `cosine`, `euclidean`, or `dot_product`. |
| `name` | No | Index name. Auto-generated if omitted. |
See [BUCKET.INDEX CREATE](/docs/bucket/commands/bucket-index/#bucketindex-create) for the full command reference.
## Distance functions
[Section titled “Distance functions”](#distance-functions)
The distance function determines how similarity is measured between two vectors.
| Function | Score range | Best for |
| ------------- | ----------- | --------------------------------------------------------------------------- |
| `cosine` | 0 to 1 | Normalized embeddings where only direction matters (e.g., text embeddings). |
| `euclidean` | 0 to 1 | Spatial data where absolute distance matters. |
| `dot_product` | unbounded | Magnitude-aware similarity where vector length carries meaning. |
Cosine similarity ignores vector magnitude. Two vectors pointing in the same direction score 1.0 regardless of length. Dot product does not normalize, so vectors with larger magnitudes produce higher scores.
## How vector search works
[Section titled “How vector search works”](#how-vector-search-works)
**Graph-based search.** The index builds an HNSW (Hierarchical Navigable Small World) graph where each vector is a node connected to its approximate nearest neighbors. Searching traverses this graph starting from an entry point, greedily following edges toward the query vector. This gives sub-linear search time.
**Product Quantization.** After enough vectors are indexed (configurable, default 1000), the system automatically trains a Product Quantization (PQ) model. PQ compresses each vector into a compact code, reducing memory usage significantly. During graph construction, PQ-scored comparisons replace exact comparisons. At search time, exact scores are used against the full vectors for final ranking, so recall is not degraded by the compression.
**Multi-tier storage.** Vectors start in an in-memory index. When the in-memory index exceeds a size threshold ( configurable, default 256 MB), it is flushed to disk. Search spans both in-memory and on-disk indexes, merging results from all tiers by similarity score.
## Post-filtering
[Section titled “Post-filtering”](#post-filtering)
Vector search can be combined with structured query predicates using the `FILTER` parameter on `BUCKET.VECTOR`. Filtering is applied after similarity ranking. The search first retrieves vector candidates ordered by similarity, then evaluates the filter expression against each candidate’s document.
If the initial batch of candidates does not yield enough results that pass the filter, the search automatically fetches additional candidates by resuming the graph traversal in fixed-size batches until:
* Enough results are found to satisfy `TOP`, or
* The `MAX-SCAN-CANDIDATES` limit is reached, or
* The vector graph is exhausted.
`MAX-SCAN-CANDIDATES` provides a safety cap to control latency when the filter is highly selective and most candidates do not match.
## Search parameters
[Section titled “Search parameters”](#search-parameters)
| Parameter | Default | Description |
| --------------------- | ------- | ------------------------------------------------------------------------------------------------ |
| `TOP` | 10 | Maximum number of results to return. |
| `THRESHOLD` | 0.0 | Minimum similarity score. Results below this value are excluded. |
| `OVERQUERY` | 1.0 | Multiplier (>= 1.0) for extra candidates examined beyond TOP before reranking with exact scores. |
| `MAX-SCAN-CANDIDATES` | 10000 | Upper bound on candidates examined during filtered search. |
| `FILTER` | none | BQL expression to post-filter results by document fields. |
| `PROJECTION` | none | Projection specification to control which fields are returned. |
| `COLLATION` | none | Collation specification for string comparison in filters. |
**OVERQUERY** must be >= 1.0. When omitted, uses the server default configured via `bucket.vector.default_overquery` ( 1.0 by default). Increase above 1.0 to improve recall when Product Quantization is active. PQ-scored graph traversal may miss some true neighbors; overquerying compensates by collecting extra candidates and reranking with exact scores. Values between 1.5 and 3.0 are typical.
**THRESHOLD** is useful for discarding low-quality matches. With cosine distance, a threshold of 0.8 means only vectors with at least 80% directional similarity are returned.
**MAX-SCAN-CANDIDATES** is only relevant when `FILTER` is provided. Limits worst-case latency when most candidates fail the filter.
See [BUCKET.VECTOR](/docs/bucket/commands/bucket-vector/) for the full command reference.
## Constraints
[Section titled “Constraints”](#constraints)
* **No ACID transaction guarantees.** The searchable graph index is updated asynchronously after the transaction commit. Matching documents are read directly from the storage engine outside of a transaction. Newly inserted or deleted vectors may not immediately appear in or disappear from search results.
* **Single-shard buckets only.** Vector indexes cannot be created on buckets that span multiple shards.
* **Dimensions must be >= 1.** The `dimensions` parameter must be a positive integer.
* **Query vector must match index dimensions.** The vector passed to `BUCKET.VECTOR` must have exactly the same number of elements as the index’s declared dimensions.
* **One vector index per field.** Each field selector can have at most one vector index.
* **Unique names.** Index names must be unique across all indexes (single-field, compound, and vector) in the bucket.
An update that modifies the vector field, or replaces a parent field that contains it, refreshes the vector index entry for that document.
## Practical example
[Section titled “Practical example”](#practical-example)
Create a single-shard bucket with a vector index on `embedding`:
```kronotop
BUCKET.CREATE products SHARDS 0 INDEXES '{
"$vector": {"field": "embedding", "dimensions": 3, "distance": "cosine"}
}'
```
Insert some documents:
```kronotop
BUCKET.INSERT products DOCS '{"label": "alpha", "embedding": [0.1, 0.2, 0.3]}'
BUCKET.INSERT products DOCS '{"label": "beta", "embedding": [0.4, 0.5, 0.6]}'
BUCKET.INSERT products DOCS '{"label": "gamma", "embedding": [0.7, 0.8, 0.9]}'
BUCKET.INSERT products DOCS '{"label": "delta", "embedding": [0.9, 0.1, 0.0]}'
```
**Basic similarity search**, find the 2 most similar vectors to `[0.4, 0.5, 0.6]`:
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' TOP 2
```
Returns `beta` (exact match, score 1.0) and `gamma` (the closest neighbor).
**Filtered search**, only return documents where the label is “gamma”:
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' FILTER '{"label": {"$eq": "gamma"}}'
```
**Threshold search**, only results with similarity >= 0.95:
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' THRESHOLD 0.95
```
**Overquery for better recall:**
```kronotop
BUCKET.VECTOR products embedding '[0.4, 0.5, 0.6]' TOP 5 OVERQUERY 2.0
```
# Configuration Reference
> Kronotop uses HOCON (Human-Optimized Config Object Notation) for its configuration files.
Kronotop uses [HOCON](https://github.com/lightbend/config/blob/main/HOCON.md) (Human-Optimized Config Object Notation) for its configuration files. HOCON is a superset of JSON that supports comments, variable substitution, and a more readable syntax.
The built-in defaults ship in `reference.conf` inside the Kronotop JAR. To override any value, pass a custom configuration file at startup:
```bash
java -Dconfig.file=/etc/kronotop/kronotop.conf -jar kronotop.jar
```
Only the values you want to change need to appear in your override file; everything else falls back to the built-in defaults.
## Overriding Individual Parameters
[Section titled “Overriding Individual Parameters”](#overriding-individual-parameters)
Any configuration parameter can be overridden directly on the command line using Java system properties (`-D`). The property name matches the dotted config path:
```bash
java -Dnetwork.external.port=6000 -Dcluster.name=production -jar kronotop.jar
```
This is useful for per-member overrides in a multi-node cluster where each member needs a different bind address or port without maintaining separate config files:
```bash
# Node 1
java -Dnetwork.external.host=10.0.0.1 -Dnetwork.external.port=5484 \
-Dnetwork.internal.host=10.0.0.1 -Dnetwork.internal.port=3320 \
-Ddata_dir=/data/node1 -jar kronotop.jar
# Node 2
java -Dnetwork.external.host=10.0.0.2 -Dnetwork.external.port=5484 \
-Dnetwork.internal.host=10.0.0.2 -Dnetwork.internal.port=3320 \
-Ddata_dir=/data/node2 -jar kronotop.jar
```
`-D` overrides take precedence over both `reference.conf` defaults and any config file supplied via `-Dconfig.file`. You can combine both approaches: use a shared config file for cluster-wide settings and `-D` flags for member-specific values.
***
## General
[Section titled “General”](#general)
| Parameter | Type | Default | Description |
| ------------------- | ------ | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_namespace` | string | `"global"` | Logical namespace used to isolate tenants or environments within a single cluster. All data structures are scoped under this namespace. |
| `data_dir` | string | `"kronotop-data"` | Directory where Kronotop stores local data (volumes, segments). Resolved relative to the working directory. Use an absolute path in production. |
***
## Cluster
[Section titled “Cluster”](#cluster)
Controls cluster membership, failure detection, and inter-node communication.
| Parameter | Type | Default | Description |
| ----------------------------------------- | ------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cluster.name` | string | `"development"` | Human-readable cluster identifier. All nodes that should form a cluster must share the same name. |
| `cluster.heartbeat.interval` | int (seconds) | `5` | How often each node sends a heartbeat to signal liveness. |
| `cluster.heartbeat.maximum_silent_period` | int (seconds) | `20` | If no heartbeat is received from a node within this window, the node is considered unreachable. Should be at least 3-4x the heartbeat interval to tolerate transient network hiccups. |
| `cluster.client_pool.idle_timeout` | int (minutes) | `10` | How long an idle inter-node client connection is kept open before being closed. |
***
## Session Defaults
[Section titled “Session Defaults”](#session-defaults)
Default values applied to every new client session. Clients can override these per-session.
| Parameter | Type | Default | Description |
| ------------------------------- | ------ | -------- | ------------------------------------------------------------------------------------------------ |
| `session_attributes.input_type` | string | `"bson"` | Default encoding for incoming documents. Accepted values: `"bson"`, `"json"`. |
| `session_attributes.reply_type` | string | `"bson"` | Default encoding for query results returned to clients. Accepted values: `"bson"`, `"json"`. |
| `session_attributes.limit` | int | `100` | Default maximum number of documents returned per query when the client does not specify a LIMIT. |
***
## Network
[Section titled “Network”](#network)
Kronotop exposes two network interfaces:
* **external**: Client-facing port (default `5484`). All application traffic uses this.
* **internal**: Inter-node and admin port (default `3320`). Used for replication, heartbeats, and cluster administration.
Both interfaces share the same configuration structure.
### External Interface
[Section titled “External Interface”](#external-interface)
| Parameter | Type | Default | Description |
| --------------------------------------- | ------- | ------------- | --------------------------------------------------------------------------------------------------------------------------- |
| `network.external.host` | string | `"127.0.0.1"` | Bind address for the client-facing interface. Set to `0.0.0.0` to accept connections on all interfaces. |
| `network.external.port` | int | `5484` | TCP port for client connections. |
| `network.external.netty.transport` | string | `"nio"` | Netty transport type. `"nio"` works on all platforms; `"epoll"` uses Linux kernel-level I/O for lower latency (Linux only). |
| `network.external.netty.worker_threads` | int | `0` | Number of Netty worker threads. `0` uses the Netty default (2 x CPU cores). |
| `network.external.netty.so_backlog` | int | `4096` | TCP listen backlog size. Higher values allow more pending connections during connection bursts. |
| `network.external.netty.so_reuseport` | boolean | `true` | Enable `SO_REUSEPORT` socket option. Only effective with `epoll` transport on Linux. |
### Internal Interface
[Section titled “Internal Interface”](#internal-interface)
| Parameter | Type | Default | Description |
| --------------------------------------- | ------- | ------------- | ------------------------------------------------------------- |
| `network.internal.host` | string | `"127.0.0.1"` | Bind address for the internal interface. |
| `network.internal.port` | int | `3320` | TCP port for inter-node communication and admin commands. |
| `network.internal.netty.transport` | string | `"nio"` | Netty transport type. Same options as the external interface. |
| `network.internal.netty.worker_threads` | int | `0` | Number of Netty worker threads. |
| `network.internal.netty.so_backlog` | int | `4096` | TCP listen backlog. |
| `network.internal.netty.so_reuseport` | boolean | `true` | Enable `SO_REUSEPORT`. |
### TLS (Optional)
[Section titled “TLS (Optional)”](#tls-optional)
TLS is disabled by default on both interfaces. To enable it, uncomment the `tls` block under the appropriate interface section.
**External TLS:**
```hocon
network.external.tls {
enabled = true
cert_path = "/path/to/server-cert.pem"
key_path = "/path/to/server-key.pem"
}
```
**Internal TLS:**
The internal interface additionally supports a `ca_path` for mutual TLS between cluster nodes:
```hocon
network.internal.tls {
enabled = true
cert_path = "/path/to/server-cert.pem"
key_path = "/path/to/server-key.pem"
ca_path = "/path/to/ca-cert.pem"
}
```
| Parameter | Type | Default | Description |
| --------------- | ------- | ------- | -------------------------------------------------------------------------------- |
| `tls.enabled` | boolean | `false` | Enable TLS on this interface. |
| `tls.cert_path` | string | - | Path to the PEM-encoded server certificate. |
| `tls.key_path` | string | - | Path to the PEM-encoded private key. |
| `tls.ca_path` | string | - | (Internal only) Path to the CA certificate for verifying peer node certificates. |
***
## Authentication (Optional)
[Section titled “Authentication (Optional)”](#authentication-optional)
Authentication is disabled by default. To enable it, add an `auth` block at the top level:
```hocon
auth {
requirepass = "your-password"
users = {
"alice": "alice-password"
"bob": "bob-password"
}
}
```
| Parameter | Type | Default | Description |
| ------------------ | ------ | ------- | ---------------------------------------------------------------------------------------------------------------------- |
| `auth.requirepass` | string | - | When set, clients must authenticate with the `AUTH` command using this password before issuing any other commands. |
| `auth.users` | object | - | Named user accounts. Keys are usernames, values are passwords. Clients authenticate with `AUTH `. |
***
## FoundationDB
[Section titled “FoundationDB”](#foundationdb)
Kronotop uses FoundationDB as its transactional metadata and index store.
| Parameter | Type | Default | Description |
| -------------------------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `foundationdb.clusterfile` | string | - | Path to the FoundationDB cluster file. When omitted, the FDB client uses its default search paths (`/etc/foundationdb/fdb.cluster` on Linux). |
| `foundationdb.fdbc` | string | - | Path to the native FDB C client library (`libfdb_c.so` / `libfdb_c.dylib`). Only needed if the library is not on the default library path. |
| `foundationdb.fdbjava` | string | - | Path to the FDB Java JNI library (`libfdb_java.so` / `libfdb_java.jnilib`). Only needed if the library is not on the default library path. |
| `foundationdb.apiversion` | int | `630` | FoundationDB API version to use. Must match the FDB client library version installed on the system. |
### FoundationDB Network Options (Optional)
[Section titled “FoundationDB Network Options (Optional)”](#foundationdb-network-options-optional)
These settings configure the FoundationDB client’s network layer. All are optional; when omitted, the FDB client defaults to apply.
**Trace logging:**
Enables FDB client-side trace logs, useful for debugging connectivity and performance issues.
```hocon
foundationdb.network_options.trace {
enable = "/var/log/kronotop/fdb-trace"
roll_size = 10485760
max_logs_size = 104857600
log_group = "default"
format = "json"
file_identifier = ""
}
```
| Parameter | Type | Default | Description |
| ---------------------------------------------------- | ----------- | --------------------- | ----------------------------------------------------------------------------------------------- |
| `foundationdb.network_options.trace.enable` | string | - | Directory path for FDB trace log files. Setting this value enables trace logging. |
| `foundationdb.network_options.trace.roll_size` | int (bytes) | `10485760` (10 MiB) | Maximum size of a single trace log file before rotation. |
| `foundationdb.network_options.trace.max_logs_size` | int (bytes) | `104857600` (100 MiB) | Maximum total size of all trace log files. Oldest files are deleted when this limit is reached. |
| `foundationdb.network_options.trace.log_group` | string | `"default"` | Label applied to trace log entries to identify this cluster or service. |
| `foundationdb.network_options.trace.format` | string | `"json"` | Trace log format. `"json"` or `"xml"`. |
| `foundationdb.network_options.trace.file_identifier` | string | `""` | Optional string appended to trace file names for identification. |
**FDB Client TLS:**
Configures TLS for the connection between Kronotop and the FoundationDB cluster (separate from Kronotop’s own network TLS).
| Parameter | Type | Default | Description |
| ----------------------------------------------- | ------ | ----------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `foundationdb.network_options.tls.cert_path` | string | - | Path to the TLS certificate for the FDB client connection. |
| `foundationdb.network_options.tls.key_path` | string | - | Path to the TLS private key. |
| `foundationdb.network_options.tls.ca_path` | string | - | Path to the CA bundle for verifying the FDB cluster’s certificate. |
| `foundationdb.network_options.tls.password` | string | `""` | Password for the private key, if encrypted. |
| `foundationdb.network_options.tls.verify_peers` | string | `"Check.Valid=1"` | Peer verification rules. See the [FoundationDB TLS documentation](https://apple.github.io/foundationdb/tls.html) for syntax. |
**FDB Client miscellaneous:**
| Parameter | Type | Default | Description |
| ---------------------------------------------------------------- | ------- | -------- | ------------------------------------------------------------------- |
| `foundationdb.network_options.client.tmp_dir` | string | `"/tmp"` | Temporary directory used by the FDB client for internal operations. |
| `foundationdb.network_options.client.disable_statistics_logging` | boolean | `false` | When `true`, disables the FDB client’s periodic statistics logging. |
| `foundationdb.network_options.client.distributed_tracer` | string | `"none"` | Distributed tracing backend. `"none"` disables tracing. |
***
## Volume
[Section titled “Volume”](#volume)
Global tuning parameters for the volume storage engine. These apply to all volume instances (bucket, stash, etc.).
### Vacuum
[Section titled “Vacuum”](#vacuum)
Controls the background vacuum process that reclaims space from stale volume segments.
| Parameter | Type | Default | Description |
| --------------------------- | ---- | ------- | -------------------------------------------------------------------------------------------------------- |
| `volume.vacuum.max_workers` | int | `1` | Maximum number of concurrent vacuum worker threads. Set to `0` to use the number of available CPU cores. |
### Replication
[Section titled “Replication”](#replication)
| Parameter | Type | Default | Description |
| -------------------------------------- | ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `volume.replication.max_retries` | int | `10` | Maximum number of retry attempts for a failed replication operation before giving up. |
| `volume.replication.retry_interval` | int (seconds) | `10` | Delay between replication retry attempts. |
| `volume.replication.reset_threshold` | int (seconds) | `10` | Time after which a stalled replication stream is reset and restarted from the last known good position. |
| `volume.replication.reconnect_backoff` | int (ms) | `250` | Delay a standby waits before retrying a replication stage that was interrupted because it had not connected to the primary yet, such as during a topology change. Prevents a tight retry loop while the connection is re-established. |
***
## Bucket
[Section titled “Bucket”](#bucket)
Configuration for the document database engine.
| Parameter | Type | Default | Description |
| ------------------------- | ------ | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket.shards` | int | `1` | Number of shards for bucket data. More shards allow parallel writes across cluster nodes but increase coordination overhead. |
| `bucket.object_id_format` | string | `"bytes"` | Controls how ObjectIds are encoded in DELETE and UPDATE responses. `"bytes"` returns raw 12-byte ObjectId values; `"hex"` returns 24-character hex strings. |
### Plan Cache
[Section titled “Plan Cache”](#plan-cache)
The query plan cache avoids re-planning identical queries.
| Parameter | Type | Default | Description |
| --------------------------- | ------------------ | ---------------- | ------------------------------------------------------------------------------------------------------ |
| `bucket.plan_cache.enabled` | boolean | `true` | Enable or disable the query plan cache. Disabling forces re-planning on every query. |
| `bucket.plan_cache.max_ttl` | int (milliseconds) | `300000` (5 min) | Time-to-live for cached query plans. Plans are evicted after this duration and re-planned on next use. |
### Bucket Volume
[Section titled “Bucket Volume”](#bucket-volume)
Controls the local storage segments used by bucket volumes.
| Parameter | Type | Default | Description |
| ---------------------------------------------- | ------------ | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket.volume.segment_size` | long (bytes) | `4294967296` (4 GiB) | Maximum size of a single volume segment file. When a segment reaches this size, a new segment is created. |
| `bucket.volume.segment_replication_chunk_size` | long (bytes) | `16777216` (16 MiB) | Chunk size used when replicating segment data to standby nodes. Larger chunks reduce round trips; smaller chunks reduce memory pressure. |
### Index
[Section titled “Index”](#index)
Configuration for secondary indexes.
| Parameter | Type | Default | Description |
| --------------------------- | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket.index.strict_types` | boolean | `true` | Enables strict type enforcement for secondary indexes. When enabled, queries with mismatched predicate types do not use the index, and index maintenance skips values with incompatible types. Prevents mixed-type values under the same indexed field. |
### Index Maintenance
[Section titled “Index Maintenance”](#index-maintenance)
Controls the background subsystem responsible for index build and cleanup tasks.
| Parameter | Type | Default | Description |
| ------------------------------------------------------ | ------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `bucket.index.maintenance.worker_pool_size` | int | `0` | Number of worker threads for index maintenance. `0` auto-selects based on available CPU cores. |
| `bucket.index.maintenance.worker_maintenance_interval` | int (seconds) | `60` | Interval at which the maintenance scheduler runs periodic checks such as cleaning up stale workers and dispatching pending tasks. |
### Vector
[Section titled “Vector”](#vector)
Configuration for the vector index subsystem.
| Parameter | Type | Default | Description |
| ------------------------------------- | ------------ | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `bucket.vector.flush_threshold_bytes` | long (bytes) | `268435456` (256 MiB) | RAM threshold for flushing an on-heap vector index to disk. When an index exceeds this limit after an insert, it is rotated and flushed asynchronously. |
| `bucket.vector.pq_training_threshold` | int | `1000` | Minimum number of vectors before Product Quantization training kicks in. Until this threshold is reached, exact scoring is used for graph construction. |
| `bucket.vector.pq_subspace_divisor` | int | `6` | Controls the number of PQ subspaces: `subspaces = dimensions / pq_subspace_divisor`. |
| `bucket.vector.max_scan_candidates` | int | `10000` | Safety cap for filtered vector search. Limits the total number of unique candidates examined during post-filter graph traversal. Once reached, the search stops expanding even if fewer than TOP results matched the filter. |
| `bucket.vector.default_overquery` | float | `1.0` | Overquery multiplier for vector search. Controls how many extra candidates the graph traversal collects before reranking to the final top-K. Higher values improve recall for PQ-scored indexes at the cost of latency. |
***
## Stash (Optional)
[Section titled “Stash (Optional)”](#stash-optional)
The stash is an experimental key-value store with String and Hash data types. It persists data by syncing it to the volume storage engine. The stash is disabled by default; set `stash.enabled = true` to opt in.
| Parameter | Type | Default | Description |
| --------------------------------------------- | ------------ | -------------------- | --------------------------------------------------- |
| `stash.enabled` | boolean | `false` | Enable or disable the stash subsystem. |
| `stash.shards` | int | `1` | Number of shards for stash data. |
| `stash.volume.segment_size` | long (bytes) | `4294967296` (4 GiB) | Maximum size of a single stash volume segment file. |
| `stash.volume.segment_replication_chunk_size` | long (bytes) | `16777216` (16 MiB) | Chunk size for stash segment replication. |
### Stash Volume Syncer
[Section titled “Stash Volume Syncer”](#stash-volume-syncer)
The volume syncer runs background workers that keep stash volumes synchronized.
| Parameter | Type | Default | Description |
| ----------------------------- | ------------------ | ----------------------- | -------------------------------------------------------------------------------------- |
| `stash.volume_syncer.prefix` | string | `"stash-volume-syncer"` | Prefix used to name volume syncer threads for identification in logs and thread dumps. |
| `stash.volume_syncer.workers` | int | `8` | Number of concurrent syncer worker threads. |
| `stash.volume_syncer.period` | int (milliseconds) | `1000` | How often each syncer worker checks for pending sync work. |
***
## Background Tasks
[Section titled “Background Tasks”](#background-tasks)
Configuration for system-wide background maintenance tasks.
### Journal Cleanup
[Section titled “Journal Cleanup”](#journal-cleanup)
The journal cleanup task removes old journal entries to reclaim FoundationDB storage.
| Parameter | Type | Default | Description |
| -------------------------------------------------------- | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
| `background_tasks.journal_cleanup_task.retention_period` | int | `1` | How long journal entries are retained before cleanup. Interpreted together with `timeunit`. |
| `background_tasks.journal_cleanup_task.timeunit` | string | `"days"` | Time unit for `retention_period`. Accepted values correspond to Java’s `TimeUnit` enum: `"days"`, `"hours"`, `"minutes"`, etc. |
# Connection
> Kronotop speaks RESP2 and RESP3. Connection commands handle protocol negotiation, authentication, and server introspection.
## Overview
[Section titled “Overview”](#overview)
Kronotop speaks RESP2 and RESP3, so existing RESP-compatible clients and tools connect without a special driver. Every instance listens on two ports: the client port (default 5484) serves regular workloads, and the internal port (default 3320) serves cluster administration.
Opening a TCP connection creates exactly one [session](/docs/sessions/) that holds all per-client state: configuration attributes, query cursors, and the active transaction. The session lives as long as the connection.
## Protocol Negotiation
[Section titled “Protocol Negotiation”](#protocol-negotiation)
New connections start in RESP2. `HELLO` switches the protocol version for the rest of the connection and returns connection metadata. The response itself is already encoded with the newly negotiated version:
```kronotop
> HELLO 3
1# "server" => "Kronotop"
2# "version" => "0.13"
3# "proto" => (integer) 3
4# "id" => (integer) 1
5# "mode" => "cluster"
6# "role" => "master"
7# "modules" => (empty array)
```
RESP3 is the better choice for new applications. Structured replies arrive as native maps instead of flat arrays, and all examples in this documentation use RESP3 output.
## Authentication
[Section titled “Authentication”](#authentication)
Authentication is disabled by default. When an `auth` block is present in the configuration, the connection must authenticate before doing anything else. Until then, every command except `AUTH` and `HELLO` is rejected:
```kronotop
> BUCKET.LIST
(error) NOAUTH Authentication required.
> AUTH devpass
OK
```
Two modes are supported: the default user authenticates with `auth.requirepass`, and named users authenticate with the accounts defined in `auth.users`. `HELLO` also accepts an inline `AUTH username password` clause, so protocol negotiation and authentication can happen in a single round trip. See [Configuration](/docs/config/) for the `auth` block parameters.
## Before Cluster Initialization
[Section titled “Before Cluster Initialization”](#before-cluster-initialization)
No connection command requires the cluster to be initialized. `PING`, `HELLO`, and `AUTH` work on a freshly started instance, which makes them suitable for health checks and bootstrap scripts.
## Commands
[Section titled “Commands”](#commands)
| Command | Description |
| --------------------------------------------- | ------------------------------------------------------------ |
| [AUTH](/docs/connection/commands/auth/) | Authenticates the current connection |
| [CLIENT](/docs/connection/commands/client/) | Manages client connection properties |
| [COMMAND](/docs/connection/commands/command/) | Returns information about registered server commands |
| [ECHO](/docs/connection/commands/echo/) | Echoes back the given message |
| [HELLO](/docs/connection/commands/hello/) | Negotiates the protocol version and optionally authenticates |
| [INFO](/docs/connection/commands/info/) | Returns server information and statistics |
| [PING](/docs/connection/commands/ping/) | Returns PONG or echoes back the given message |
| [TIME](/docs/connection/commands/time/) | Returns the current server time |
# AUTH
> Authenticates the current connection.
Authenticates the current connection.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
AUTH [username] password
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ---------- | ------ | -------- | -------------------------------------- |
| `username` | string | No | Username for named-user authentication |
| `password` | string | Yes | Password to authenticate with |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string `OK` on successful authentication.
## Behavior
[Section titled “Behavior”](#behavior)
Supports two authentication modes:
* **Default user mode (1 parameter):** Checks the provided password against the `auth.requirepass` configuration value.
* **Named user mode (2 parameters):** Checks the provided username and password against the `auth.users.` configuration.
On successful authentication, the session is marked as authenticated.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
| Error | Condition |
| ----------- | --------------------------------------------------------------------- |
| `WRONGPASS` | Invalid username or password |
| `ERR` | No password is configured but AUTH was called with a single parameter |
## Examples
[Section titled “Examples”](#examples)
**Default user authentication:**
```kronotop
127.0.0.1:5484> AUTH mysecretpassword
OK
```
**Named user authentication:**
```kronotop
127.0.0.1:5484> AUTH admin mysecretpassword
OK
```
**Wrong password:**
```kronotop
127.0.0.1:5484> AUTH wrongpassword
(error) WRONGPASS invalid username-password pair or user is disabled.
```
# CLIENT
> Manages client connection properties.
Manages client connection properties.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
CLIENT [arguments]
```
## Subcommands
[Section titled “Subcommands”](#subcommands)
### CLIENT SETINFO
[Section titled “CLIENT SETINFO”](#client-setinfo)
Sets client library metadata on the current connection.
```kronotop
CLIENT SETINFO
```
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | --------------------------------------- |
| `attribute` | string | Yes | Attribute name: `lib-name` or `lib-ver` |
| `value` | string | Yes | Attribute value |
### CLIENT SETNAME
[Section titled “CLIENT SETNAME”](#client-setname)
Sets a human-readable name for the current connection.
```kronotop
CLIENT SETNAME
```
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | --------------- |
| `name` | string | Yes | Connection name |
## Return Value
[Section titled “Return Value”](#return-value)
All subcommands return simple string `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
`CLIENT SETINFO` stores library metadata (`lib-name` or `lib-ver`) on the session. `CLIENT SETNAME` sets the connection name on the session.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
**`ERR`** is returned when:
* Wrong number of arguments for the subcommand.
* Unrecognized attribute for `SETINFO` (not `lib-name` or `lib-ver`).
* Unknown subcommand.
## Examples
[Section titled “Examples”](#examples)
**Set library name:**
```kronotop
127.0.0.1:5484> CLIENT SETINFO lib-name jedis
OK
```
**Set library version:**
```kronotop
127.0.0.1:5484> CLIENT SETINFO lib-ver 4.3.1
OK
```
**Set connection name:**
```kronotop
127.0.0.1:5484> CLIENT SETNAME my-app-connection
OK
```
# COMMAND
> Returns information about registered server commands.
Returns information about registered server commands.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
COMMAND [subcommand [arguments]]
```
## Subcommands
[Section titled “Subcommands”](#subcommands)
### COMMAND
[Section titled “COMMAND”](#command)
Returns information about all registered commands.
```kronotop
COMMAND
```
### COMMAND INFO
[Section titled “COMMAND INFO”](#command-info)
Returns information about one or more specific commands.
```kronotop
COMMAND INFO [command ...]
```
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ----------------------------------------------------------- |
| `command` | string | No | One or more command names to query (returns all if omitted) |
### COMMAND COUNT
[Section titled “COMMAND COUNT”](#command-count)
Returns the total number of registered commands.
```kronotop
COMMAND COUNT
```
### COMMAND DOCS
[Section titled “COMMAND DOCS”](#command-docs)
Returns command documentation (not yet implemented).
```kronotop
COMMAND DOCS
```
## Return Value
[Section titled “Return Value”](#return-value)
* **COMMAND / COMMAND INFO:** Array of arrays, one per command, each containing:
| Position | Field | Type | Description |
| -------- | ------------------ | ------- | --------------------------------------- |
| 1 | name | string | Command name (lowercase) |
| 2 | arity | integer | Number of arguments |
| 3 | flags | array | Command flags (e.g. `readonly`, `fast`) |
| 4 | first key | integer | Position of the first key argument |
| 5 | last key | integer | Position of the last key argument |
| 6 | step | integer | Key step interval |
| 7 | acl categories | array | ACL categories (prefixed with `@`) |
| 8 | tips | array | Command tips |
| 9 | key specifications | array | Key specification details |
| 10 | subcommands | array | Subcommand info (empty) |
* **COMMAND COUNT:** Integer: total number of commands.
* **COMMAND DOCS:** Empty array (not yet implemented).
## Behavior
[Section titled “Behavior”](#behavior)
Reads command metadata from the server’s command registry. For `COMMAND` and `COMMAND INFO`, builds a detailed array structure per command. `COMMAND COUNT` returns the total count of all registered commands.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
| Error | Condition |
| ----- | ------------------ |
| `ERR` | Unknown subcommand |
## Examples
[Section titled “Examples”](#examples)
**Get command count:**
```kronotop
127.0.0.1:5484> COMMAND COUNT
(integer) 42
```
**Get info for PING:**
```kronotop
127.0.0.1:5484> COMMAND INFO ping
1) 1) "ping"
2) (integer) -1
3) 1) "fast"
4) (integer) 0
5) (integer) 0
6) (integer) 0
7) 1) "@connection"
8) (empty array)
9) (empty array)
10) (empty array)
```
# ECHO
> Echoes back the given message.
Echoes back the given message.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ECHO message
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ------------------------ |
| `message` | string | Yes | The message to echo back |
## Return Value
[Section titled “Return Value”](#return-value)
Bulk string containing the provided message.
## Behavior
[Section titled “Behavior”](#behavior)
Returns the given message as a bulk string. The message is echoed back exactly as provided.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
No command-specific errors.
## Examples
[Section titled “Examples”](#examples)
```kronotop
127.0.0.1:5484> ECHO "Hello Kronotop"
"Hello Kronotop"
```
# HELLO
> Negotiates the protocol version and optionally authenticates the connection.
Negotiates the protocol version and optionally authenticates the connection.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
HELLO protover [AUTH username password] [SETNAME clientname]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ------------------------ | -------------- | -------- | ------------------------------------------------------ |
| `protover` | integer | Yes | Protocol version to use (`2` for RESP2, `3` for RESP3) |
| `AUTH username password` | string, string | No | Optional inline authentication |
| `SETNAME clientname` | string | No | Optional client connection name |
## Return Value
[Section titled “Return Value”](#return-value)
Returns connection metadata in a format determined by the negotiated protocol version:
* **RESP2:** Array of alternating key-value pairs.
* **RESP3:** Map.
| Field | Type | Description |
| --------- | ------- | -------------------------------- |
| `server` | string | Server product name (`Kronotop`) |
| `version` | string | Server version |
| `proto` | integer | Negotiated protocol version |
| `id` | integer | Client connection ID |
| `mode` | string | Server mode (`cluster`) |
| `role` | string | Server role (`master`) |
| `modules` | array | Loaded modules (empty array) |
## Behavior
[Section titled “Behavior”](#behavior)
Sets the RESP protocol version for the current session. If `AUTH` is provided, authenticates the connection using the same logic as the `AUTH` command. If `SETNAME` is provided, sets the client connection name.
The protocol version is applied after the response is generated, so the response itself is encoded using the newly negotiated version.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
| Error | Condition |
| ----------- | ----------------------------------------------- |
| `NOPROTO` | Unsupported protocol version (not 2 or 3) |
| `WRONGPASS` | Invalid username or password in the AUTH clause |
## Examples
[Section titled “Examples”](#examples)
**Switch to RESP3:**
```kronotop
127.0.0.1:5484> HELLO 3
1# "server" => "Kronotop"
2# "version" => "0.13"
3# "proto" => (integer) 3
4# "id" => (integer) 1
5# "mode" => "cluster"
6# "role" => "master"
7# "modules" => (empty array)
```
**Switch to RESP2 with authentication:**
```kronotop
127.0.0.1:5484> HELLO 2 AUTH admin mysecretpassword
1) "server"
2) "Kronotop"
3) "version"
4) "0.13"
5) "proto"
6) (integer) 2
7) "id"
8) (integer) 1
9) "mode"
10) "cluster"
11) "role"
12) "master"
13) "modules"
14) (empty array)
```
**Set client name:**
```kronotop
127.0.0.1:5484> HELLO 3 SETNAME my-app
1# "server" => "Kronotop"
2# "version" => "0.13"
3# "proto" => (integer) 3
4# "id" => (integer) 1
5# "mode" => "cluster"
6# "role" => "master"
7# "modules" => (empty array)
```
# INFO
> Returns server information and statistics.
Returns server information and statistics.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
INFO [section ...]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | --------------------------------------------------------- |
| `section` | string | No | One or more section names to filter (not yet implemented) |
## Return Value
[Section titled “Return Value”](#return-value)
Bulk string containing server information formatted as `key:value` pairs grouped under `# Section` headers.
Currently returned sections:
| Section | Fields |
| --------- | -------------------------------------- |
| `Server` | `kronotop_version`, `redis_mode`, `os` |
| `Cluster` | `cluster_enabled` |
## Behavior
[Section titled “Behavior”](#behavior)
Builds and returns a bulk string with server metadata. Each section is prefixed with a `# SectionName` header, followed by `key:value` lines.
Section filtering is not yet implemented. All sections are returned regardless of arguments.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
No command-specific errors.
## Examples
[Section titled “Examples”](#examples)
```kronotop
127.0.0.1:5484> INFO
# Server
kronotop_version:0.13
redis_mode:cluster
os:Linux 5.15.0 amd64
# Cluster
cluster_enabled:1
```
# PING
> Returns PONG or echoes back the given message.
Returns PONG or echoes back the given message.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
PING [message]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ----------------------------- |
| `message` | string | No | Optional message to echo back |
## Return Value
[Section titled “Return Value”](#return-value)
* **Without message:** Simple string `PONG`.
* **With message:** Bulk string containing the provided message.
## Behavior
[Section titled “Behavior”](#behavior)
If a non-empty message is provided, returns it as a bulk string. Otherwise, returns the simple string `PONG`.
This command does not require the cluster to be initialized.
## Errors
[Section titled “Errors”](#errors)
No command-specific errors.
## Examples
[Section titled “Examples”](#examples)
**Without message:**
```kronotop
127.0.0.1:5484> PING
PONG
```
**With message:**
```kronotop
127.0.0.1:5484> PING "hello world"
"hello world"
```
# TIME
> Returns the current server time.
Returns the current wall-clock time of the server with microsecond resolution, read from the operating system’s realtime clock.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
TIME
```
Takes no arguments.
## Return Value
[Section titled “Return Value”](#return-value)
Array of two bulk strings:
1. Unix timestamp in seconds
2. Microseconds already elapsed in the current second
## Behavior
[Section titled “Behavior”](#behavior)
The value comes from the system clock of the node that handles the connection. Different nodes can report different times.
Wall-clock time is not monotonic. It can move backwards on clock adjustments. Callers that need a monotonic value should use [TICK](/docs/transactions/commands/tick/) instead.
This command does not require the cluster to be initialized. It does not require an active transaction.
## Errors
[Section titled “Errors”](#errors)
No command-specific errors.
## Examples
[Section titled “Examples”](#examples)
```kronotop
> TIME
1) "1781119691"
2) "14102"
```
# Namespaces
> Namespaces are lightweight logical databases built on FoundationDB'sdirectory layer.
## Overview
[Section titled “Overview”](#overview)
Namespaces are lightweight logical databases built on FoundationDB’s [directory layer](https://apple.github.io/foundationdb/developer-guide.html#directories). They provide complete data isolation between tenants, applications, or environments with zero runtime overhead. The directory layer maps hierarchical paths to short binary prefixes at open time, so namespace resolution adds no cost to subsequent operations.
Every data structure created within a namespace (Buckets and ZMaps) is fully isolated from data in other namespaces.
## Default Namespace
[Section titled “Default Namespace”](#default-namespace)
Every session starts in the `global` namespace. This is the default namespace configured at the cluster level and cannot be removed or purged. If a client never issues a `NAMESPACE USE` command, all operations execute within `global`.
## Hierarchical Organization
[Section titled “Hierarchical Organization”](#hierarchical-organization)
Namespaces are identified by dot-separated hierarchical paths, analogous to directories in a filesystem. Creating a namespace like `production.users.api` automatically creates the intermediate directories `production` and `production.users` in FoundationDB’s directory layer.
This hierarchy is useful for organizing data by environment, team, or service boundary. For example, a set of microservices might use:
* `production.users`
* `production.orders`
* `production.products`
Each namespace is fully isolated: data stored in `production.users` is invisible to queries running in `production.orders`.
## Session Scoping
[Section titled “Session Scoping”](#session-scoping)
A client session is always bound to exactly one namespace at a time. The `NAMESPACE USE` command switches the active namespace for the current session. All subsequent commands (queries, inserts, index operations) operate within the selected namespace until the session ends or another `NAMESPACE USE` is issued.
```kronotop
> NAMESPACE USE production.orders
OK
> NAMESPACE CURRENT
production.orders
```
Different sessions connected to the same cluster can operate in different namespaces concurrently.
## Cross-Namespace Transactions
[Section titled “Cross-Namespace Transactions”](#cross-namespace-transactions)
`NAMESPACE USE` can be called inside an active transaction. The underlying FoundationDB transaction spans every namespace touched during the session, so `COMMIT` atomically applies all changes across namespaces and `ROLLBACK` discards them all.
This works because `BEGIN` binds a single FoundationDB transaction to the session, while `NAMESPACE USE` only updates which namespace subsequent commands target. It does not create a new transaction. Each command reads the current namespace at execution time, so switching namespaces mid-transaction simply routes the next operations to a different namespace within the same transaction.
```kronotop
> NAMESPACE USE production.sales
OK
> BEGIN
OK
> BUCKET.INSERT orders DOCS '{"item": "keyboard", "qty": 2}'
...
> NAMESPACE USE production.inventory
OK
> BUCKET.INSERT stock DOCS '{"item": "keyboard", "delta": -2}'
...
> COMMIT
OK
```
In this example, the insert into `production.sales` and the insert into `production.inventory` are committed as a single atomic operation. If either fails, neither write is applied.
## Two-Phase Removal
[Section titled “Two-Phase Removal”](#two-phase-removal)
Deleting a namespace in a distributed system requires coordination. A naive single-step delete could destroy data while background workers (index maintenance, replication) or other cluster members still hold cached references to the namespace. Kronotop therefore splits deletion into two phases:
1. **`NAMESPACE REMOVE`**: Marks the namespace as logically removed. A `NamespaceRemovedEvent` is published to the cluster journal. Each member that consumes this event invalidates caches, closes sessions bound to the namespace, and shuts down related workers.
2. **`NAMESPACE PURGE`**: Permanently deletes the FoundationDB directory. Before proceeding, the command enforces a **distributed sync barrier** that verifies every alive cluster member has observed the removal event. If the barrier is not satisfied, the command returns `BARRIERNOTSATISFIED` and should be retried. Typically a single retry is sufficient.
This two-phase approach guarantees that no cluster member references a namespace that has been physically deleted.
## Renaming
[Section titled “Renaming”](#renaming)
`NAMESPACE MOVE` renames a namespace by relocating its FoundationDB directory from the old path to a new path. After the move, a **tombstone** is written under the old name. The tombstone acts as a barrier: `NAMESPACE CREATE` on the old name is blocked until every alive cluster member has observed the move event. This prevents a member that still caches the old namespace from serving stale data under a newly created namespace with the same name.
Once all members have observed the tombstone, it is automatically cleaned up.
## Reserved Names
[Section titled “Reserved Names”](#reserved-names)
The name `__internal__` is reserved at any level of the namespace hierarchy. Commands that accept a namespace path will reject any path containing `__internal__` as a segment.
## Commands
[Section titled “Commands”](#commands)
| Command | Description |
| ----------------------------------------------------------------- | -------------------------------------- |
| [NAMESPACE CREATE](/docs/namespaces/commands/namespace-create/) | Create a new namespace |
| [NAMESPACE REMOVE](/docs/namespaces/commands/namespace-remove/) | Mark a namespace for logical removal |
| [NAMESPACE PURGE](/docs/namespaces/commands/namespace-purge/) | Permanently delete a removed namespace |
| [NAMESPACE MOVE](/docs/namespaces/commands/namespace-move/) | Rename a namespace |
| [NAMESPACE USE](/docs/namespaces/commands/namespace-use/) | Switch the session to a namespace |
| [NAMESPACE CURRENT](/docs/namespaces/commands/namespace-current/) | Show the session’s active namespace |
| [NAMESPACE EXISTS](/docs/namespaces/commands/namespace-exists/) | Check whether a namespace exists |
| [NAMESPACE LIST](/docs/namespaces/commands/namespace-list/) | List child namespaces under a path |
# NAMESPACE CREATE
> Creates a new namespace with the given hierarchical path.
Creates a new namespace with the given hierarchical path.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE CREATE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ---------------------------------------------------------------------------- |
| `namespace` | string | Yes | Dot-separated hierarchical path for the namespace (e.g. `production.users`). |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
Namespaces are hierarchical paths separated by dots (`.`). Creating a namespace like `production.users.api` automatically creates intermediate directories (`production` and `production.users`) in the FoundationDB directory layer.
The command uses an isolated one-off transaction to prevent consistency issues.
Before creating the namespace, a tombstone barrier check is performed. If the namespace was previously moved via `NAMESPACE MOVE` and not all cluster members have observed the tombstone, the creation is rejected. This prevents stale reads on members that still reference the old namespace path.
The maximum namespace depth is 10. For example, `a.b.c.d.e.f.g.h.i.j` is the deepest allowed path.
The `__internal__` name is reserved at any level of the hierarchy and cannot be used.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `NAMESPACEALREADYEXISTS` | A namespace with the same path already exists. |
| `NAMESPACEBEINGREMOVED` | The namespace was previously removed via `NAMESPACE REMOVE` but has not yet been purged via `NAMESPACE PURGE`. |
| `ERR` | The namespace path contains the reserved `__internal__` leaf, the namespace depth exceeds the maximum allowed depth of 10, or the tombstone barrier is not satisfied after a prior `NAMESPACE MOVE`. |
## Examples
[Section titled “Examples”](#examples)
**Create a namespace:**
```kronotop
> NAMESPACE CREATE production.users
OK
```
**Duplicate namespace:**
```kronotop
> NAMESPACE CREATE production.users
OK
> NAMESPACE CREATE production.users
(error) NAMESPACEALREADYEXISTS Namespace already exists: production.users
```
**Namespace being removed:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE CREATE staging.orders
(error) NAMESPACEBEINGREMOVED Namespace 'staging.orders' is being removed
```
**Reserved name:**
```kronotop
> NAMESPACE CREATE name.__internal__
(error) ERR Namespace 'name.__internal__' is reserved for internal use
```
**Tombstone barrier isn’t satisfied:**
```kronotop
> NAMESPACE CREATE old-namespace
(error) ERR Not all cluster members have observed the tombstone for namespace 'old-namespace'
```
# NAMESPACE CURRENT
> Returns the active namespace for the current session.
Returns the active namespace for the current session.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE CURRENT
```
## Parameters
[Section titled “Parameters”](#parameters)
None.
## Return Value
[Section titled “Return Value”](#return-value)
Bulk string: the dot-separated namespace path currently active in the session.
## Behavior
[Section titled “Behavior”](#behavior)
Every new session starts with the default namespace configured via `default_namespace` in the cluster configuration. The active namespace can be changed with `NAMESPACE USE`.
`NAMESPACE CURRENT` reads the active namespace from the session attributes and returns it as a bulk string.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ERR` | The current namespace is null, empty, or blank. This should not occur under normal operation since sessions are initialized with the default namespace. |
## Examples
[Section titled “Examples”](#examples)
**Return the default namespace:**
```kronotop
> NAMESPACE CURRENT
global
```
**Return the namespace after switching:**
```kronotop
> NAMESPACE USE production.users
OK
> NAMESPACE CURRENT
production.users
```
# NAMESPACE EXISTS
> Checks whether a namespace exists.
Checks whether a namespace exists.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE EXISTS
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ---------------------------------------------------------------------------- |
| `namespace` | string | Yes | Dot-separated hierarchical path for the namespace (e.g. `production.users`). |
## Return Value
[Section titled “Return Value”](#return-value)
Integer: `1` if the namespace exists, `0` if it does not.
## Behavior
[Section titled “Behavior”](#behavior)
The command checks the FoundationDB directory layer for the given namespace path. It uses an isolated one-off transaction.
If the directory entry exists but the namespace is marked for removal (`NAMESPACE REMOVE`), the command raises a `NAMESPACEBEINGREMOVED` error rather than returning `1`. A namespace pending removal is not considered to exist.
The `__internal__` reserved name is rejected at parse time.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | -------------------------------------------------------------------------------------------------------------- |
| `NAMESPACEBEINGREMOVED` | The namespace was previously removed via `NAMESPACE REMOVE` but has not yet been purged via `NAMESPACE PURGE`. |
| `ERR` | The namespace path contains the reserved `__internal__` leaf. |
## Examples
[Section titled “Examples”](#examples)
**Namespace exists:**
```kronotop
> NAMESPACE CREATE production.users
OK
> NAMESPACE EXISTS production.users
(integer) 1
```
**Namespace does not exist:**
```kronotop
> NAMESPACE EXISTS production.orders
(integer) 0
```
**Reserved name:**
```kronotop
> NAMESPACE EXISTS name.__internal__
(error) ERR Namespace 'name.__internal__' is reserved for internal use
```
**Namespace being removed:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE EXISTS staging.orders
(error) NAMESPACEBEINGREMOVED Namespace 'staging.orders' is being removed
```
# NAMESPACE LIST
> Lists the child namespaces under a given path or lists root-level namespaces when no path is provided.
Lists the child namespaces under a given path or lists root-level namespaces when no path is provided.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE LIST [namespace]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------ |
| `namespace` | string | No | Dot-separated hierarchical path to list children of (e.g. `production.users`). When omitted, root-level namespaces are listed. |
## Return Value
[Section titled “Return Value”](#return-value)
Array of bulk strings: each element is the name of a child namespace. Returns an empty array when no children exist.
## Behavior
[Section titled “Behavior”](#behavior)
The command opens an isolated one-off transaction against the FoundationDB directory layer.
When called without arguments, it lists all root-level namespaces. When called with a namespace path, it lists the immediate children of that path.
The reserved `__internal__` namespace is automatically filtered from the results and never appears in the output.
If the cluster has not been initialized yet and no path is provided, an empty array is returned.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------- | ------------------------------------------------------------- |
| `NOSUCHNAMESPACE` | The given namespace path does not exist. |
| `ERR` | The namespace path contains the reserved `__internal__` leaf. |
## Examples
[Section titled “Examples”](#examples)
**List root-level namespaces:**
```kronotop
> NAMESPACE LIST
1) "global"
```
**List children of a namespace:**
```kronotop
> NAMESPACE CREATE production.users
OK
> NAMESPACE CREATE production.orders
OK
> NAMESPACE LIST production
1) "users"
2) "orders"
```
**List children of a leaf namespace (no children):**
```kronotop
> NAMESPACE CREATE production.users
OK
> NAMESPACE LIST production.users
(empty array)
```
**Non-existent namespace:**
```kronotop
> NAMESPACE LIST nonexistent
(error) NOSUCHNAMESPACE No such namespace: 'nonexistent'
```
**Reserved name:**
```kronotop
> NAMESPACE LIST name.__internal__
(error) ERR Namespace 'name.__internal__' is reserved for internal use
```
# NAMESPACE MOVE
> Renames a namespace by moving its FoundationDB directory from the old path to the new path.
Renames a namespace by moving its FoundationDB directory from the old path to the new path.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE MOVE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------------- | ------ | -------- | ----------------------------------------------------------------------------------------- |
| `old-namespace` | string | Yes | Dot-separated hierarchical path of the source namespace (e.g. `staging.orders`). |
| `new-namespace` | string | Yes | Dot-separated hierarchical path for the destination namespace (e.g. `production.orders`). |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command moves a namespace from `old-namespace` to `new-namespace` within the FoundationDB directory layer. The operation uses an isolated one-off transaction to prevent consistency issues.
Before performing the move, the command checks whether the old namespace is marked for removal. If it is, the operation is rejected.
The `__internal__` name is reserved at any level of the hierarchy and cannot be used in either the old or new path.
### Tombstone and barrier
[Section titled “Tombstone and barrier”](#tombstone-and-barrier)
After moving the directory, a **tombstone** with a unique token is written under the old namespace name. This marks the old path as “recently moved.”
A cluster-wide event is published to the journal. Every cluster member that consumes this event:
1. Invalidates its bucket metadata cache entries keyed under the old namespace
2. Invalidates Bucket plan cache entries keyed under the old namespace
3. Removes the old namespace from all sessions’ open-namespaces set
4. Acknowledges the tombstone
The tombstone acts as a **barrier**: `NAMESPACE CREATE` on the old name is blocked until every alive cluster member has observed the tombstone. This prevents a member that still caches the old namespace from serving stale data under a newly created namespace with the same name.
Once all alive members have observed, the tombstone is automatically cleaned up on the next barrier check. If any alive member has not yet observed, the barrier check re-publishes the event to nudge lagging members.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------------------ | ---------------------------------------------------------------------------------------------- |
| `NOSUCHNAMESPACE` | The source namespace does not exist. |
| `NAMESPACEALREADYEXISTS` | The destination namespace already exists. |
| `NAMESPACEBEINGREMOVED` | The source namespace is marked for removal via `NAMESPACE REMOVE` and has not yet been purged. |
| `ERR` | The namespace path contains the reserved `__internal__` name. |
## Examples
[Section titled “Examples”](#examples)
**Move a namespace:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE MOVE staging.orders production.orders
OK
```
**Non-existent source:**
```kronotop
> NAMESPACE MOVE non.existent production.orders
(error) NOSUCHNAMESPACE No such namespace: 'non.existent'
```
**Destination already exists:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE CREATE production.orders
OK
> NAMESPACE MOVE staging.orders production.orders
(error) NAMESPACEALREADYEXISTS Namespace already exists: production.orders
```
**Namespace being removed:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE MOVE staging.orders production.orders
(error) NAMESPACEBEINGREMOVED Namespace 'staging.orders' is being removed
```
**Reserved name:**
```kronotop
> NAMESPACE MOVE name.__internal__ production.data
(error) ERR Namespace 'name.__internal__' is reserved for internal use
```
# NAMESPACE PURGE
> Permanently deletes a namespace and its FoundationDB directory (hard delete).
Permanently deletes a namespace and its FoundationDB directory (hard delete). This is the second phase of namespace deletion, following `NAMESPACE REMOVE`.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE PURGE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `namespace` | string | Yes | Dot-separated hierarchical path of the namespace to purge (e.g. `staging.orders`). The namespace must be marked for removal first using `NAMESPACE REMOVE`. |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command enforces a **distributed sync barrier** before physically deleting the namespace. This barrier waits for all alive cluster members to confirm they have observed the namespace’s “removed” status (set by `NAMESPACE REMOVE`).
The barrier polls up to 20 times with 250ms intervals (5 seconds total). If all members have observed the removal within that window, the namespace’s FoundationDB directory is permanently deleted.
The barrier mechanism prevents data races in a distributed environment:
* Background workers (index maintenance, replication) may still be processing the namespace
* Other cluster nodes may have pending operations or cached references
* Without coordination, purging could cause errors or inconsistent state
If any member has not yet observed the removal, the barrier fails with `BARRIERNOTSATISFIED`. When this happens, the command automatically publishes a namespace-removed event to accelerate propagation, and you should retry the purge. In most cases, a single retry is sufficient.
The default namespace (configured via `default_namespace`) cannot be purged.
### Hierarchical deletion
[Section titled “Hierarchical deletion”](#hierarchical-deletion)
Purging a parent namespace (e.g. `a.b`) permanently deletes all child namespaces (e.g. `a.b.c`, `a.b.c.d`) along with it.
### Volume data cleanup
[Section titled “Volume data cleanup”](#volume-data-cleanup)
`NAMESPACE PURGE` deletes the namespace’s FoundationDB directory but does **not** clean up volume data. The bytes on the disk and any orphaned prefix references remain until explicitly reclaimed. After purging, run `VOLUME.ADMIN MARK-STALE-PREFIXES START` on the affected volumes to clean up orphaned references. When possible, prefer deleting buckets individually with `BUCKET.REMOVE` + `BUCKET.PURGE` before dropping the namespace. `BUCKET.PURGE` handles prefix cleanup inline, avoiding the need for a separate stale prefix scan. See the [Volume Operations Guide](/docs/volume/operations-guide/) for the full procedure.
### Transaction conflict handling
[Section titled “Transaction conflict handling”](#transaction-conflict-handling)
If the underlying FoundationDB transaction fails due to a conflict (error code 1020), the operation is automatically retried.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `ERR` | The namespace is not marked for removal, attempting to purge the default namespace, or the namespace path contains the reserved `__internal__` name. |
| `BARRIERNOTSATISFIED` | Not all cluster members have observed the removal. Retry the command. |
## Examples
[Section titled “Examples”](#examples)
**Permanently delete a removed namespace:**
```kronotop
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE PURGE staging.orders
OK
```
**Attempting to purge without removing first:**
```kronotop
> NAMESPACE PURGE staging.orders
(error) ERR Namespace 'staging.orders' must be logically removed before purge
```
**Default namespace:**
```kronotop
> NAMESPACE PURGE global
(error) ERR Cannot purge the default namespace: 'global'
```
**Handling barrier not satisfied:**
```kronotop
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE PURGE staging.orders
(error) BARRIERNOTSATISFIED Barrier not satisfied: not all members observed version ...
> NAMESPACE PURGE staging.orders
OK
```
# NAMESPACE REMOVE
> Marks a namespace for logical removal without deleting its FoundationDB directory.
Marks a namespace for logical removal without deleting its FoundationDB directory. Physical deletion requires a subsequent `NAMESPACE PURGE`.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE REMOVE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ----------------------------------------------------------------------------------- |
| `namespace` | string | Yes | Dot-separated hierarchical path of the namespace to remove (e.g. `staging.orders`). |
## Return Value
[Section titled “Return Value”](#return-value)
Returns `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command sets a `removed` flag on the namespace metadata. The FoundationDB directory is **not** deleted. This is a logical removal only. To physically delete the directory, run `NAMESPACE PURGE` after the removal has been observed by all cluster members.
The command uses an isolated one-off transaction to prevent consistency issues. If the transaction fails due to a conflict (error code 1020), it is automatically retried.
The default namespace (configured via `default_namespace`) cannot be removed.
The `__internal__` name is reserved at any level of the hierarchy and cannot be used.
### Hierarchical removal
[Section titled “Hierarchical removal”](#hierarchical-removal)
Removing a parent namespace (e.g. `a.b`) affects all child namespaces (e.g. `a.b.c`, `a.b.c.d`). The children inherit the removed state from their parent.
### Cluster-wide side effects
[Section titled “Cluster-wide side effects”](#cluster-wide-side-effects)
A cluster-wide event is published to the journal. Every cluster member that consumes this event:
1. Invalidates its bucket metadata cache entries keyed under the namespace
2. Invalidates plan cache entries keyed under the namespace
3. Removes the namespace from all sessions’ open-namespaces set
4. Shuts down workers registered for the namespace
5. Records the observed namespace version once workers terminate (barrier for `NAMESPACE PURGE`)
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------ |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace is already marked for removal. |
| `ERR` | Attempting to remove the default namespace, or the namespace path contains the reserved `__internal__` name. |
## Examples
[Section titled “Examples”](#examples)
**Remove a namespace:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE REMOVE staging.orders
OK
```
**Non-existent namespace:**
```kronotop
> NAMESPACE REMOVE non.existent
(error) NOSUCHNAMESPACE No such namespace: 'non.existent'
```
**Already marked for removal:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE REMOVE staging.orders
(error) NAMESPACEBEINGREMOVED Namespace 'staging.orders' is being removed
```
**Default namespace:**
```kronotop
> NAMESPACE REMOVE global
(error) ERR Cannot remove the default namespace: 'global'
```
**Reserved name:**
```kronotop
> NAMESPACE REMOVE name.__internal__
(error) ERR Namespace 'name.__internal__' is reserved for internal use
```
# NAMESPACE USE
> Switches the current session to a namespace.
Switches the current session to a namespace.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
NAMESPACE USE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| ----------- | ------ | -------- | ---------------------------------------------------------------------------- |
| `namespace` | string | Yes | Dot-separated hierarchical path for the namespace (e.g. `production.users`). |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command checks whether the given namespace exists in the FoundationDB directory layer using an isolated one-off transaction. If the namespace exists and is not marked for removal, the session’s active namespace is updated to the given path. All subsequent commands in the session will operate in that namespace until changed again.
Every new session starts with the default namespace configured via `default_namespace` in the cluster configuration.
The `__internal__` reserved name is rejected at parse time.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `NOSUCHNAMESPACE` | The namespace does not exist. |
| `NAMESPACEBEINGREMOVED` | The namespace (or one of its ancestors) was previously removed via `NAMESPACE REMOVE` but has not yet been purged via `NAMESPACE PURGE`. |
| `ERR` | The namespace path contains the reserved `__internal__` leaf. |
## Examples
[Section titled “Examples”](#examples)
**Switch to a namespace:**
```kronotop
> NAMESPACE CREATE production.users
OK
> NAMESPACE USE production.users
OK
```
**Non-existent namespace:**
```kronotop
> NAMESPACE USE non.existing.namespace
(error) NOSUCHNAMESPACE No such namespace: 'non.existing.namespace'
```
**Reserved name:**
```kronotop
> NAMESPACE USE name.__internal__
(error) ERR Namespace 'name.__internal__' is reserved for internal use
```
**Namespace being removed:**
```kronotop
> NAMESPACE CREATE staging.orders
OK
> NAMESPACE REMOVE staging.orders
OK
> NAMESPACE USE staging.orders
(error) NAMESPACEBEINGREMOVED Namespace 'staging.orders' is being removed
```
# Sessions
> Every TCP connection to Kronotop creates exactly one session.
## Overview
[Section titled “Overview”](#overview)
Every TCP connection to Kronotop creates exactly one session. The session holds all per-client state: configuration attributes, query cursors, and the active FoundationDB transaction. When the connection closes, the session is destroyed and all of its state is discarded.
`SESSION.CLOSE` provides a way to reset the session without dropping the underlying connection. This is useful for connection-pooling scenarios or for returning a session to a known-good state between logical units of work.
## Session State
[Section titled “Session State”](#session-state)
A session tracks the following categories of state:
* **Configuration attributes**: reply format, input format, result limit, object ID format
* **Active FoundationDB transaction**, at most one at a time, with its post-commit hooks and version counter
* **Query cursors**, three independent pools for read, delete, and update operations
* **Cursor ID counter**, a monotonically increasing integer scoped to the session
All of this state is connection-scoped and invisible to other sessions.
## Session Attributes
[Section titled “Session Attributes”](#session-attributes)
Four configurable attributes control how the session processes commands and formats responses:
| Attribute | Type | Default | Valid Values | Description |
| ------------------ | ------- | ------- | ------------ | ------------------------------------------------- |
| `reply_type` | enum | bson | bson, json | Data interchange format for responses |
| `input_type` | enum | bson | bson, json | Data interchange format for inputs |
| `limit` | integer | 100 | > 0 | Maximum entries returned per query response |
| `object_id_format` | enum | bytes | bytes, hex | Encoding format for object ID values in responses |
Use `SESSION.ATTRIBUTE LIST` to view current values and `SESSION.ATTRIBUTE SET` to change them:
```kronotop
> SESSION.ATTRIBUTE LIST
1# reply_type => bson
2# input_type => bson
3# limit => (integer) 100
4# object_id_format => bytes
> SESSION.ATTRIBUTE SET limit 50
OK
```
Attributes are reset to defaults by `SESSION.CLOSE` or when the connection closes.
## Cursors
[Section titled “Cursors”](#cursors)
Queries do not return all matching documents at once. Instead, the first response includes a batch of results together with a `cursor_id`. The client uses `BUCKET.ADVANCE` with that cursor ID to fetch subsequent batches until the result set is exhausted.
```kronotop
> BUCKET.QUERY users '{"age": {"$gt": 18}}'
1# cursor_id => (integer) 1
2# entries => [ ... first batch ... ]
> BUCKET.ADVANCE QUERY 1
1# cursor_id => (integer) 1
2# entries => [ ... next batch ... ]
```
Cursor IDs are integers starting at 1 and increment within the session. Three independent cursor pools exist, one each for read, delete, and update operations, so a read cursor and a delete cursor may share the same numeric ID without a conflict.
`BUCKET.CLOSE` releases a specific cursor before it is naturally exhausted. `SESSION.CLOSE` releases all cursors at once.
## Interaction with Transactions
[Section titled “Interaction with Transactions”](#interaction-with-transactions)
By default, sessions operate in auto-commit mode. Each command that touches FoundationDB creates, executes, and commits its own transaction. `BEGIN` opens an explicit transaction that spans multiple commands until `COMMIT` or `ROLLBACK`.
An explicit transaction is bound to the session. If the client issues `SESSION.CLOSE` while a transaction is in progress, the transaction is automatically rolled back:
```kronotop
> BEGIN
OK
> SESSION.CLOSE
OK
> ROLLBACK
(error) TRANSACTION there is no transaction in progress.
```
Cursors are independent of transactions. A cursor created inside an explicit transaction survives `COMMIT` or `ROLLBACK` and can continue to be advanced afterward.
Watched keys (`WATCH`) are also session-scoped. `SESSION.CLOSE` unwatches all keys, just as `DISCARD` does.
## Session Reset
[Section titled “Session Reset”](#session-reset)
`SESSION.CLOSE` performs a full session reset without closing the network connection. The reset proceeds in order:
1. **Clear all cursors**: read, delete, and update cursor pools are emptied
2. **Roll back the active transaction**: the FoundationDB transaction is closed and all uncommitted changes are lost
3. **Reset MULTI state**: queued commands are discarded and the MULTI flag is cleared
4. **Unwatch all keys**: every key in the session’s watch list is released
5. **Reset the cursor ID counter**: the next cursor will start at 1 again
6. **Restore default attributes**: all four session attributes revert to their configured defaults
After `SESSION.CLOSE` returns `OK`, the session is indistinguishable from a freshly opened connection.
```kronotop
> SESSION.ATTRIBUTE SET limit 50
OK
> SESSION.CLOSE
OK
> SESSION.ATTRIBUTE LIST
1# reply_type => bson
2# input_type => bson
3# limit => (integer) 100
4# object_id_format => bytes
```
## Commands
[Section titled “Commands”](#commands)
| Command | Description |
| --------------------------------------------------------------- | ----------------------------------- |
| [SESSION.ATTRIBUTE](/docs/sessions/commands/session-attribute/) | View and modify session attributes |
| [SESSION.CLOSE](/docs/sessions/commands/session-close/) | Reset all session state to defaults |
# SESSION.ATTRIBUTE
> Views and modifies session-specific configuration attributes.
Views and modifies session-specific configuration attributes.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
SESSION.ATTRIBUTE LIST
SESSION.ATTRIBUTE SET
```
## Subcommands
[Section titled “Subcommands”](#subcommands)
### LIST
[Section titled “LIST”](#list)
Returns all session attributes with their current values.
In RESP3 the response is a map; in RESP2 it is a flat array of alternating key-value pairs.
### SET
[Section titled “SET”](#set)
Sets a single session attribute to the given value. Returns `OK` on success.
## Attributes
[Section titled “Attributes”](#attributes)
| Attribute | Type | Default | Valid Values | Description |
| ------------------ | ------- | ------- | ------------ | ------------------------------------------------- |
| `reply_type` | enum | bson | bson, json | Data interchange format for responses |
| `input_type` | enum | bson | bson, json | Data interchange format for inputs |
| `limit` | integer | 100 | > 0 | Maximum entries returned per query response |
| `object_id_format` | enum | bytes | bytes, hex | Encoding format for object ID values in responses |
All attribute names and enum values are case-insensitive.
## Errors
[Section titled “Errors”](#errors)
| Error | Cause |
| ------------------------------------------ | --------------------------------------------- |
| `ERR Invalid subcommand status: ` | The subcommand is neither `LIST` nor `SET` |
| `ERR Invalid reply type: ` | Invalid value for `reply_type` |
| `ERR Invalid input type: ` | Invalid value for `input_type` |
| `ERR 'limit' must be greater than 0` | `limit` was set to 0 or a negative number |
| `ERR Invalid versionstamp format: ` | Invalid value for `object_id_format` |
| `ERR invalid number of parameters` | `SET` called without both attribute and value |
| `ERR Invalid session attribute: ''` | The attribute name does not exist |
## Examples
[Section titled “Examples”](#examples)
**List all attributes:**
```kronotop
> SESSION.ATTRIBUTE LIST
1# reply_type => bson
2# input_type => bson
3# limit => (integer) 100
4# object_id_format => bytes
```
**Set the reply type to JSON:**
```kronotop
> SESSION.ATTRIBUTE SET reply_type JSON
OK
```
**Set limit:**
```kronotop
> SESSION.ATTRIBUTE SET limit 50
OK
```
**Invalid attribute name:**
```kronotop
> SESSION.ATTRIBUTE SET unknown_attr value
(error) ERR Invalid session attribute: 'unknown_attr'
```
**Invalid reply type value:**
```kronotop
> SESSION.ATTRIBUTE SET reply_type xml
(error) ERR Invalid reply type: xml
```
# SESSION.CLOSE
> Closes the current session and resets all session state while keeping the connection open.
Closes the current session and resets all session state while keeping the connection open.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
SESSION.CLOSE
```
This command takes no parameters.
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command performs a full session reset without closing the underlying network connection:
1. **Cursors**: All active cursors (read, delete, update query contexts) are cleared
2. **FDB Transaction**: Any active FoundationDB transaction is rolled back and closed
3. **MULTI State**: `MULTI` transaction state (queued commands, the MULTI flag) is reset
4. **Watched Keys**: All keys being watched via `WATCH` are unwatched
5. **Cursor ID Counter**: Reset to 1
6. **Session Attributes**: All attributes (`reply_type`, `input_type`, `limit`, `object_id_format`) are reset to their defaults
## Examples
[Section titled “Examples”](#examples)
**Basic usage:**
```kronotop
> SESSION.CLOSE
OK
```
**After starting a transaction:**
```kronotop
> BEGIN
OK
> SESSION.CLOSE
OK
> ROLLBACK
(error) TRANSACTION there is no transaction in progress.
```
The transaction is rolled back; no explicit `ROLLBACK` is needed.
**Resetting modified session attributes:**
```kronotop
> SESSION.ATTRIBUTE SET limit 50
OK
> SESSION.CLOSE
OK
> SESSION.ATTRIBUTE LIST
1# reply_type => bson
2# input_type => bson
3# limit => (integer) 100
4# object_id_format => bytes
```
The `limit` attribute is reset to its default value (100).
# Transactions
> Kronotop transactions are thin wrappers around FoundationDB transactions.
## Overview
[Section titled “Overview”](#overview)
Kronotop transactions are thin wrappers around FoundationDB transactions. Every session (one TCP connection = one session) can have at most one active transaction at a time. Transaction state (the FoundationDB transaction handle, post-commit hooks, the user version counter, and the snapshot read flag) lives on the session object.
If a session disconnects or the client issues `SESSION.CLOSE` while a transaction is in progress, the transaction is automatically rolled back.
## Auto-Commit
[Section titled “Auto-Commit”](#auto-commit)
By default, sessions operate in auto-commit mode. Each command that touches FoundationDB creates its own transaction, executes, and commits immediately. The client does not need to issue `BEGIN` or `COMMIT`.
```kronotop
> ZSET mykey 42
OK
```
In this mode every command is atomic in isolation, but consecutive commands are not grouped into a single atomic unit.
## Explicit Transactions
[Section titled “Explicit Transactions”](#explicit-transactions)
To group multiple commands into a single atomic unit, wrap them in a `BEGIN` / `COMMIT` block. While a transaction is open, auto-commit is disabled: all commands share the same underlying FoundationDB transaction.
```kronotop
> BEGIN
OK
> ZSET key1 100
OK
> ZSET key2 200
OK
> COMMIT
OK
```
`ROLLBACK` discards all uncommitted changes and returns the session to auto-commit mode. After either `COMMIT` or `ROLLBACK`, the session is back in auto-commit and ready for the next transaction.
`COMMIT` accepts an optional `RETURNING` clause to retrieve metadata from the committed transaction. Two parameters are supported:
* `committed-version` returns the committed version as an integer.
* `versionstamp` returns the transaction’s versionstamp as a bulk string.
```kronotop
> BEGIN
OK
> ZSET key1 42
OK
> COMMIT RETURNING committed-version
(integer) 1234567890
```
After a successful commit, any registered post-commit hooks are executed before the session returns to auto-commit.
### Transaction Inspection
[Section titled “Transaction Inspection”](#transaction-inspection)
Two commands provide visibility into an active transaction:
* `GETREADVERSION`: Returns the read version assigned to the transaction, useful for reasoning about causal ordering.
* `GETAPPROXIMATESIZE`: Returns the approximate byte size of mutations performed so far, useful for staying within FoundationDB’s 10 MB transaction size limit.
```kronotop
> BEGIN
OK
> ZSET key1 42
OK
> GETREADVERSION
(integer) 1391961467874
> GETAPPROXIMATESIZE
(integer) 156
> COMMIT
OK
```
## Snapshot Reads
[Section titled “Snapshot Reads”](#snapshot-reads)
`SNAPSHOTREAD ON` switches the session to snapshot isolation for reads. Snapshot reads do not create read conflict ranges, so they will not cause transactions to conflict with concurrent writes to the same keys. This is useful for long-running or read-heavy workloads where strict serializability is not required.
```kronotop
> SNAPSHOTREAD ON
OK
> BEGIN
OK
> ZGET mykey
42
> COMMIT
OK
```
The setting applies to ZMap read commands (`ZGET`, `ZGETI64`, `ZGETF64`, `ZGETD128`, `ZGETRANGE`, `ZGETKEY`, `ZGETRANGESIZE`) and `BUCKET.QUERY`. Mutation commands (`BUCKET.INSERT`, `BUCKET.DELETE`, `BUCKET.UPDATE`) always use serializable reads regardless of the setting.
The setting is session-scoped and persists until explicitly changed with `SNAPSHOTREAD OFF` or the session ends. It can be toggled at any time, regardless of whether a transaction is currently active.
## Cross-Namespace Transactions
[Section titled “Cross-Namespace Transactions”](#cross-namespace-transactions)
`BEGIN` binds a single FoundationDB transaction to the session. `NAMESPACE USE` only changes which namespace subsequent commands target. It does not create a new transaction. This means a single transaction can atomically span multiple namespaces.
```kronotop
> NAMESPACE USE production.sales
OK
> BEGIN
OK
> BUCKET.INSERT orders DOCS '{"item": "keyboard", "qty": 2}'
...
> NAMESPACE USE production.inventory
OK
> BUCKET.INSERT stock DOCS '{"item": "keyboard", "delta": -2}'
...
> COMMIT
OK
```
In this example, the insert into `production.sales` and the insert into `production.inventory` are committed as a single atomic operation. If either fails, neither write is applied.
## FoundationDB Constraints
[Section titled “FoundationDB Constraints”](#foundationdb-constraints)
Kronotop transactions inherit the constraints of the underlying FoundationDB transactions. Kronotop stores metadata, indexes, and cluster state in FoundationDB, while document bodies are stored in the Volume storage engine on the local filesystem. The constraints below apply to the metadata stored in FoundationDB. Document body writes to Volume are not subject to these limits.
| Constraint | Limit |
| -------------------- | ----------------------------------------------------------------- |
| Transaction size | 10 MB total (keys + values of all mutations) |
| Transaction duration | 5 seconds from when the read version is obtained (see below) |
| Default isolation | Serializable (snapshot isolation available via `SNAPSHOTREAD ON`) |
Use `GETAPPROXIMATESIZE` to monitor transaction size during large writes. If a workload exceeds these limits, split the work across multiple transactions.
### Transaction Time Window
[Section titled “Transaction Time Window”](#transaction-time-window)
Every FoundationDB transaction operates against a single point-in-time snapshot of the database, identified by a version number called the **read version**. The read version is obtained once and remains fixed for the entire lifetime of the transaction. All reads within the transaction see the database as it was at that version. The 5-second time window starts when the read version is obtained, not when the transaction object is created.
#### When is the read version obtained?
[Section titled “When is the read version obtained?”](#when-is-the-read-version-obtained)
`BEGIN` creates a transaction object but does not obtain a read version. The transaction is an empty shell at this point and no timer is running. The read version is obtained lazily:
* **First read operation**: When the transaction performs its first read (e.g., `ZGET`, `BUCKET.QUERY`), FoundationDB obtains the read version automatically. This is the most common case. All subsequent reads within the same transaction use the same read version; they do not obtain a new one.
* **Explicit `GETREADVERSION`**: Calling `GETREADVERSION` forces the read version to be obtained immediately, even if no read has been performed yet. This starts the 5-second window.
* **Commit without prior reads**: If a transaction performs only writes (blind writes) and never reads, the read version is obtained at commit time. Since the version is obtained and used immediately, the 5-second window effectively does not apply to blind-write transactions.
#### What does the 5-second window mean in practice?
[Section titled “What does the 5-second window mean in practice?”](#what-does-the-5-second-window-mean-in-practice)
Once the read version is obtained, you have approximately 5 seconds to complete and commit the transaction. If more than 5 seconds pass between obtaining the read version and committing, FoundationDB rejects the commit with a `TRANSACTION_TOO_OLD` error.
Consider this sequence:
```kronotop
> BEGIN
OK -- transaction object created, no read version yet
> ZGET key1 -- read version obtained HERE, 5-second window starts
"value1"
> ZGET key2 -- same read version, no new window
"value2"
-- 20 seconds pass --
> ZSET key3 100 -- mutation is buffered locally
OK
> COMMIT
(error) TRANSACTION_TOO_OLD Transaction is too old to perform reads or be committed
```
The commit fails because 20 seconds elapsed since the first `ZGET` obtained the read version. The later `ZGET` commands did not reset the window. They reused the same read version.
#### Blind writes
[Section titled “Blind writes”](#blind-writes)
A transaction that performs only writes and never reads does not obtain a read version until commit:
```kronotop
> BEGIN
OK -- no read version yet
> ZSET key1 100 -- mutation buffered, still no read version
OK
> ZSET key2 200 -- mutation buffered, still no read version
OK
> COMMIT -- read version obtained and committed in one step
OK
```
Because the read version is obtained at the moment of commit and used immediately, blind-write transactions are not subject to the 5-second window in practice.
#### Auto-commit mode
[Section titled “Auto-commit mode”](#auto-commit-mode)
In auto-commit mode, each command gets its own transaction that is created, executed, and committed within a single round-trip. The 5-second window is never a concern in this mode because the transaction lives only for the duration of a single command.
## Commands
[Section titled “Commands”](#commands)
| Command | Description |
| --------------------------------------------------------------------- | ------------------------------------------------ |
| [BEGIN](/docs/transactions/commands/begin/) | Start a new transaction |
| [COMMIT](/docs/transactions/commands/commit/) | Commit the current transaction |
| [ROLLBACK](/docs/transactions/commands/rollback/) | Abort the current transaction |
| [GETREADVERSION](/docs/transactions/commands/getreadversion/) | Get the read version of the current transaction |
| [GETAPPROXIMATESIZE](/docs/transactions/commands/getapproximatesize/) | Get the approximate byte size of the transaction |
| [SNAPSHOTREAD](/docs/transactions/commands/snapshotread/) | Enable or disable snapshot read mode |
| [TICK](/docs/transactions/commands/tick/) | Get a monotonically increasing 64-bit integer |
# BEGIN
> Opens a new FoundationDB transaction on the current session.
Opens a new FoundationDB transaction on the current session.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
BEGIN
```
This command takes no parameters.
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command creates a new FoundationDB transaction and binds it to the current session. The session enters explicit transaction mode, which means all subsequent commands that touch FoundationDB will use this transaction until it is committed or rolled back.
When a transaction is opened, the session also initializes post-commit hooks and a user version counter for the lifetime of the transaction.
While in explicit transaction mode, auto-commit behavior is implicitly disabled: commands no longer create their own one-off transactions.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------- | ---------------------------------------------------- |
| `TRANSACTION` | A transaction is already in progress on the session. |
## Examples
[Section titled “Examples”](#examples)
**Start a transaction:**
```kronotop
> BEGIN
OK
```
**Attempt to start a second transaction:**
```kronotop
> BEGIN
OK
> BEGIN
(error) TRANSACTION there is already a transaction in progress.
```
# COMMIT
> Commits the current transaction and applies all changes.
Commits the current transaction and applies all changes.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
COMMIT [RETURNING committed-version | versionstamp]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Required | Description |
| ----------- | -------- | ----------------------------------------------------------------------------------------------------- |
| `RETURNING` | No | Requests a value from the committed transaction. Must be followed by exactly one of the values below. |
**RETURNING values:**
| Value | Description |
| ------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `committed-version` | The version at which the transaction was committed. Returns `-1` if the transaction contained no mutations. |
| `versionstamp` | The 10-byte versionstamp assigned to the transaction. The transaction must have performed at least one mutation. |
## Return Value
[Section titled “Return Value”](#return-value)
* **No RETURNING**: Simple string `OK`.
* **RETURNING committed-version**: Integer representing the committed version. Returns `-1` if the transaction contained no mutations, meaning there was nothing to commit.
* **RETURNING versionstamp**: Bulk string containing the 10-byte versionstamp.
## Behavior
[Section titled “Behavior”](#behavior)
The command commits the active FoundationDB transaction bound to the current session. After a successful commit, any registered post-commit hooks are executed. The session then resets its transaction state: the BEGIN flag is cleared, auto-commit is re-enabled, post-commit hooks are discarded, and the user version counter is reset.
When `RETURNING versionstamp` is requested, the versionstamp future is obtained before the commit call, as required by the FoundationDB API. The resolved value is returned after the commit completes.
The session returns to its default state and is ready to accept new commands or start a new transaction with `BEGIN`.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------- | --------------------------------------------------- |
| `TRANSACTION` | There is no transaction in progress on the session. |
## Examples
[Section titled “Examples”](#examples)
**Commit a transaction:**
```kronotop
> BEGIN
OK
> COMMIT
OK
```
**Commit and return the committed version:**
```kronotop
> BEGIN
OK
> ZSET mykey 42
OK
> COMMIT RETURNING committed-version
(integer) 1234567890
```
**Commit a transaction with no mutations:**
```kronotop
> BEGIN
OK
> COMMIT RETURNING committed-version
(integer) -1
```
A return value of `-1` means the transaction contained no mutations and there was nothing to commit.
**Commit and return the versionstamp:**
```kronotop
> BEGIN
OK
> ZSET mykey 42
OK
> COMMIT RETURNING versionstamp
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
```
**Attempt to commit without an active transaction:**
```kronotop
> COMMIT
(error) TRANSACTION there is no transaction in progress.
```
# GETAPPROXIMATESIZE
> Returns the approximate byte size of the current transaction.
Returns the approximate byte size of the current transaction.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
GETAPPROXIMATESIZE
```
This command takes no parameters.
## Return Value
[Section titled “Return Value”](#return-value)
Integer: the approximate size of the transaction in bytes.
## Behavior
[Section titled “Behavior”](#behavior)
Returns the approximate number of bytes of mutations that have been performed on the current transaction. This value includes the combined size of all keys and values that have been set, cleared, or otherwise mutated since the transaction began.
This is useful for monitoring how much data a transaction has written, particularly when approaching FoundationDB’s 10 MB transaction size limit. Clients can use this to decide when to split work across multiple transactions or for logging and diagnostics.
A transaction must be started with `BEGIN` before calling this command.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------- | ----------------------------------------------------------------------------- |
| `TRANSACTION` | `there is no transaction in progress.`: No active transaction on the session. |
## Examples
[Section titled “Examples”](#examples)
**Check approximate size after performing writes:**
```kronotop
> BEGIN
OK
> ZSET key1 42
OK
> GETAPPROXIMATESIZE
(integer) 156
```
**Attempt without an active transaction:**
```kronotop
> GETAPPROXIMATESIZE
(error) TRANSACTION there is no transaction in progress.
```
# GETREADVERSION
> Returns the read version of the current transaction.
Returns the read version of the current transaction.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
GETREADVERSION
```
This command takes no parameters.
## Return Value
[Section titled “Return Value”](#return-value)
Integer: the read version of the active transaction.
## Behavior
[Section titled “Behavior”](#behavior)
Every FoundationDB transaction is assigned a read version that determines the snapshot of the database it observes. The read version is a monotonically increasing value that reflects causal ordering: a higher read version means the transaction sees a later state of the database.
This command returns the read version of the active transaction on the current session. It can be used to reason about causal ordering between transactions, coordinate across sessions, or diagnose transaction behavior.
A transaction must be started with `BEGIN` before calling this command.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------- | ----------------------------------------------------------------------------- |
| `TRANSACTION` | `there is no transaction in progress.`: No active transaction on the session. |
## Examples
[Section titled “Examples”](#examples)
**Get the read version of an active transaction:**
```kronotop
> BEGIN
OK
> GETREADVERSION
(integer) 1391961467874
```
**Attempt without an active transaction:**
```kronotop
> GETREADVERSION
(error) TRANSACTION there is no transaction in progress.
```
# ROLLBACK
> Aborts the current transaction and discards all uncommitted changes.
Aborts the current transaction and discards all uncommitted changes.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ROLLBACK
```
This command takes no parameters.
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command cancels the active FoundationDB transaction bound to the current session and releases all associated resources. After cancellation, the session resets its transaction state: the BEGIN flag is cleared, auto-commit is re-enabled, post-commit hooks are discarded, and the user version counter is reset.
The session returns to its default state and is ready to accept new commands or start a new transaction with `BEGIN`.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ------------- | --------------------------------------------------- |
| `TRANSACTION` | There is no transaction in progress on the session. |
## Examples
[Section titled “Examples”](#examples)
**Roll back a transaction:**
```kronotop
> BEGIN
OK
> ROLLBACK
OK
```
**Attempt to roll back without an active transaction:**
```kronotop
> ROLLBACK
(error) TRANSACTION there is no transaction in progress.
```
# SNAPSHOTREAD
> Enables or disables snapshot read mode for the current session.
Enables or disables snapshot read mode for the current session.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
SNAPSHOTREAD
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Required | Description |
| --------- | ------------ | ------------------------------------------- |
| `ON` | Yes (one of) | Enables snapshot read mode on the session. |
| `OFF` | Yes (one of) | Disables snapshot read mode on the session. |
Exactly one of `ON` or `OFF` must be provided. The argument is case-insensitive.
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
The command sets or clears snapshot read mode on the current session. When enabled, subsequent FoundationDB reads performed by the session use snapshot isolation instead of the default serializable isolation.
Snapshot reads do not create read conflict ranges, which means they will not cause transactions to conflict with concurrent writes to the same keys. This is useful for long-running reads or read-heavy workloads where strict serializability is not required, and reducing transaction conflicts is more important than guaranteeing a perfectly consistent view.
The setting is session-scoped. It persists until explicitly changed or the session ends. It can be toggled at any time, regardless of whether a transaction is currently active.
### Supported Commands
[Section titled “Supported Commands”](#supported-commands)
The following commands honor the `SNAPSHOTREAD` setting:
| Command | Honors SNAPSHOTREAD |
| --------------- | ---------------------------------- |
| `ZGET` | Yes |
| `ZGETI64` | Yes |
| `ZGETF64` | Yes |
| `ZGETD128` | Yes |
| `ZGETRANGE` | Yes |
| `ZGETKEY` | Yes |
| `ZGETRANGESIZE` | Yes |
| `BUCKET.QUERY` | Yes |
| `BUCKET.INSERT` | No, always uses serializable reads |
| `BUCKET.DELETE` | No, always uses serializable reads |
| `BUCKET.UPDATE` | No, always uses serializable reads |
Mutation commands (`BUCKET.INSERT`, `BUCKET.DELETE`, `BUCKET.UPDATE`) always use serializable reads regardless of the `SNAPSHOTREAD` setting.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | --------------------------------------------------------------------------------------- |
| `ERR` | `illegal argument for SNAPSHOTREAD: ''`: The argument is neither `ON` nor `OFF`. |
## Examples
[Section titled “Examples”](#examples)
**Enable snapshot read mode:**
```kronotop
> SNAPSHOTREAD ON
OK
```
**Disable snapshot read mode:**
```kronotop
> SNAPSHOTREAD OFF
OK
```
**Invalid argument:**
```kronotop
> SNAPSHOTREAD MAYBE
(error) ERR illegal argument for SNAPSHOTREAD: 'MAYBE'
```
# TICK
> Returns a monotonically increasing 64-bit integer.
Returns a monotonically increasing 64-bit integer.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
TICK
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Description |
| --------- | ---------------------------------------------------------------------------------------------------- |
| `FRESH` | Fetches the latest value from the cluster. One round trip per call. |
| `CACHED` | Serves the value from a node-local cache that may be up to one second behind. No cluster round trip. |
The mode parameter is mandatory and case-insensitive.
## Return Value
[Section titled “Return Value”](#return-value)
Integer: the current value.
## Guarantees
[Section titled “Guarantees”](#guarantees)
* The value never decreases. `FRESH` values are monotonic across the whole cluster. `CACHED` values are monotonic per node: a client that switches nodes within one second may observe a smaller value.
* The same value can be returned more than once, even with `FRESH`. TICK is not a unique identifier generator.
* The value does not advance at a fixed rate. Consecutive calls can return the same value; the only ordering guarantee is that it never goes backwards.
## Choosing a mode
[Section titled “Choosing a mode”](#choosing-a-mode)
`FRESH` gives the strongest guarantee at the cost of one cluster round trip per call. Use it when a decision depends on the latest value, such as fencing checks or watermarks that must not lag.
`CACHED` targets high-volume, low-precision use. It serves callers that need a monotonic number frequently and can tolerate up to one second of slack, without putting load on the cluster. Typical examples are coarse-grained ordering markers, periodic checkpoints, and staleness checks.
## Behavior
[Section titled “Behavior”](#behavior)
The value is a FoundationDB read version. The read version reflects the latest committed state of the database and advances only when new commits arrive. FoundationDB advances it at a rate of roughly one million per second of wall-clock time. On an idle cluster nothing commits, so the read version stands still and consecutive calls return the same value, even with `FRESH`. The next commit moves it forward by the amount of time that has passed, which is why the value appears to jump.
With `CACHED`, all connections to a node share a single cache. If the cached value is younger than one second, it is returned without contacting the cluster. Otherwise, the latest value is fetched and the cache is refreshed. Cache refreshes are monotonic: a refresh never replaces a cached value with a smaller one.
This command does not require an active transaction.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ------------------------------------------------------------------- |
| `ERR` | `illegal argument for TICK: ''`: The mode is not recognized. |
## Examples
[Section titled “Examples”](#examples)
**Get the latest value:**
```kronotop
> TICK FRESH
(integer) 1377052400154
```
**Consecutive calls can return the same value, even with FRESH:**
```kronotop
> TICK FRESH
(integer) 1377275302623
> TICK FRESH
(integer) 1377275302623
> TICK FRESH
(integer) 1377275302623
```
**Cached calls within one second return the same value without a cluster round trip:**
```kronotop
> TICK CACHED
(integer) 1377275302623
> TICK CACHED
(integer) 1377275302623
```
**Attempt with an invalid mode:**
```kronotop
> TICK FOO
(error) ERR illegal argument for TICK: 'FOO'
```
# Volume
> Volume is Kronotop's local storage engine.
***
## Overview
[Section titled “Overview”](#overview)
Volume is Kronotop’s local storage engine. It stores document body content in the local filesystem while all metadata (entry locations, versioning, segment accounting, and replication state) lives in FoundationDB.
FoundationDB is optimized for small key-value pairs and enforces a 100 KB value-size limit. Document bodies routinely exceed this limit, so the volume layer offloads bulk content to the local disk and keeps only lightweight pointers in FoundationDB. This gives Kronotop the transactional guarantees of FoundationDB for metadata with high-throughput sequential I/O for document content.
Every shard owns exactly one volume. During normal operation, users interact with buckets through `BUCKET.*` commands and never see volumes directly. Operators manage volumes through `VOLUME.ADMIN` commands on the management port.
***
## Dual-Storage Model
[Section titled “Dual-Storage Model”](#dual-storage-model)
| Layer | Stored In | What It Holds |
| ------------ | ------------ | ------------------------------------------------------------------ |
| **Metadata** | FoundationDB | Entry locations, versioning, segment accounting, replication state |
| **Content** | Local disk | Raw document bytes in segment files |
**Write path:** Content is appended to a segment file and flushed to disk first. Only after the flush succeeds is the metadata committed to FoundationDB in a single transaction. This ordering guarantees that metadata never references data that has not been persisted.
**Read path:** The entry’s metadata is looked up, either from a cache for read-only queries or from FoundationDB for transactional reads, to find the segment ID and byte offset. The content is then read directly from the segment file.
**Deletes** remove metadata from FoundationDB but leave the content bytes on disk. The space they occupied becomes garbage, reclaimed later by [vacuum](#vacuum-and-space-reclamation).
**Updates** append the new content to a segment (possibly a different one) and atomically swap the metadata pointer to the new location. The old content becomes garbage, just like a logical delete.
***
## Segments
[Section titled “Segments”](#segments)
A segment is a fixed-size, pre-allocated, append-only file on the local disk. Entries are written sequentially from the beginning of the file. An entry is never split across segments.
When the current segment fills up, a new segment is created automatically. Old segments accept no further writes. They serve only reads. The default segment size for buckets is 4 GiB, configurable via `bucket.volume.segment_size`.
Each segment tracks space accounting metrics:
| Metric | Description |
| ---------------------- | ------------------------------------------------------------------ |
| **Cardinality** | Number of live entries in the segment |
| **Used bytes** | Total bytes occupied by live entries |
| **Garbage percentage** | Fraction of the segment consumed by entries whose metadata is gone |
Use `VOLUME.ADMIN DESCRIBE` to inspect per-segment statistics:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN DESCRIBE bucket-shard-0
```
***
## Volume Status
[Section titled “Volume Status”](#volume-status)
A volume operates in one of three states:
| Status | Behavior |
| ------------ | ---------------------------------------- |
| `READWRITE` | Default. Reads and writes are permitted. |
| `READONLY` | Reads succeed; writes are rejected. |
| `INOPERABLE` | All operations are rejected. |
Set a volume to `READONLY` for planned maintenance or before decommissioning a node. Use `INOPERABLE` when the underlying storage has failed or the volume must be taken fully offline.
```kronotop
127.0.0.1:3320> VOLUME.ADMIN SET-STATUS bucket-shard-0 READONLY
OK
```
See [VOLUME.ADMIN SET-STATUS](/docs/volume/commands/volume-admin-set-status/) for the full command reference.
***
## Replication
[Section titled “Replication”](#replication)
Volume replication is an asynchronous, primary-to-standby system. Each shard’s volume is replicated independently: standby nodes pull data from the primary to maintain a copy of all segment content.
Replication proceeds in two phases:
1. **Segment transfer**: When a standby joins or falls behind, it copies existing segment data from the primary in chunks until it reaches the primary’s current write position.
2. **Change data capture**: Once caught up, the standby continuously streams incremental mutations from a changelog maintained in FoundationDB, applying them to its local segments in real time.
All replication progress is persisted in FoundationDB, so a standby can restart at any time and resume from exactly where it left off without re-transferring data it already has.
Replication starts automatically when a standby is assigned via cluster routing. Operators can stop and start it manually for maintenance:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN REPLICATION STOP bucket-shard-0
OK
127.0.0.1:3320> VOLUME.ADMIN REPLICATION START bucket-shard-0
OK
```
Use [VOLUME.INSPECT REPLICATION](/docs/volume/commands/volume-inspect-replication/) to check the current stage, cursor position, and status of a standby.
For protocol details, changelog structure, and consistency guarantees, see [Replication Internals](https://github.com/kronotop/kronotop/blob/main/internals/volume/replication.md).
***
## Vacuum and Space Reclamation
[Section titled “Vacuum and Space Reclamation”](#vacuum-and-space-reclamation)
Deletes and updates leave behind unreachable content in segments. Over time this garbage accumulates, consuming disk space that could be reclaimed. Vacuum is the process that reclaims it.
Vacuum scans segments whose garbage percentage exceeds a given threshold, evacuates their remaining live entries into the current writable segment, and destroys the emptied segment files. This consolidates live data and frees disk space.
The operator workflow is:
1. **Start**: Launch vacuum with a garbage threshold (percentage). Only segments above this threshold are processed.
2. **Monitor**: Check progress with `STATUS`.
3. **Clean up**: After completion, `DROP` removes the vacuum metadata.
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM START bucket-shard-0 30
OK
127.0.0.1:3320> VOLUME.ADMIN VACUUM STATUS bucket-shard-0
...
127.0.0.1:3320> VOLUME.ADMIN VACUUM DROP bucket-shard-0
OK
```
Only one vacuum can run per volume at a time.
**Changelog pruning** is a separate, complementary operation. `VOLUME.ADMIN PRUNE-CHANGELOG` removes old replication changelog entries from FoundationDB to reclaim metadata storage. It does not affect segment files.
For the full vacuum command reference, see [VOLUME.ADMIN VACUUM](/docs/volume/commands/volume-admin-vacuum/). For routine maintenance procedures, see the [Operations Guide](/docs/volume/operations-guide/).
***
## Volumes, Shards, and Buckets
[Section titled “Volumes, Shards, and Buckets”](#volumes-shards-and-buckets)
Each shard owns exactly one volume. A bucket spans one or more shards, so a bucket’s documents are distributed across one or more volumes. Volumes are named after their shard: `bucket-shard-0`, `bucket-shard-1`, and so on.
Within a volume, each bucket’s data is isolated by a prefix. When a bucket is deleted via [BUCKET.REMOVE](/docs/bucket/commands/bucket-remove/) and [BUCKET.PURGE](/docs/bucket/commands/bucket-purge/), its prefix and associated data are cleaned up automatically. After namespace-level purges, orphaned prefix references may require a manual cleanup scan. See the [Operations Guide](/docs/volume/operations-guide/#stale-prefix-cleanup) for details.
For the user-facing perspective on sharding and bucket management, see [Bucket](/docs/bucket/#sharding).
***
## Monitoring
[Section titled “Monitoring”](#monitoring)
Volume health and performance can be inspected through the `VOLUME.STATS` family of commands on the management port:
| Command | Description |
| -------------------------- | ---------------------------------------------------------- |
| `VOLUME.STATS` | Volume-wide overview: status, capacity, garbage percentage |
| `VOLUME.STATS OPCOUNTERS` | Operation counters (appends, deletes, reads, updates) |
| `VOLUME.STATS SEGMENTS` | Per-segment size, usage, and garbage breakdown |
| `VOLUME.STATS REPLICATION` | Replication state for a specific standby |
| `VOLUME.STATS RESET` | Reset operation counters to zero |
See [VOLUME.STATS Commands](/docs/monitoring/volume/) for the full reference.
***
## Admin Commands
[Section titled “Admin Commands”](#admin-commands)
| Command | Description |
| --------------------------------------------------------------------------------------------- | ------------------------------------------ |
| [VOLUME.ADMIN LIST](/docs/volume/commands/volume-admin-list/) | List all volumes on the connected member |
| [VOLUME.ADMIN DESCRIBE](/docs/volume/commands/volume-admin-describe/) | Show metadata and per-segment statistics |
| [VOLUME.ADMIN SET-STATUS](/docs/volume/commands/volume-admin-set-status/) | Change a volume’s operational status |
| [VOLUME.ADMIN LIST-SEGMENTS](/docs/volume/commands/volume-admin-list-segments/) | List segment IDs for a volume |
| [VOLUME.ADMIN VACUUM](/docs/volume/commands/volume-admin-vacuum/) | Start, stop, or inspect garbage collection |
| [VOLUME.ADMIN REPLICATION](/docs/volume/commands/volume-admin-replication/) | Start or stop replication on a standby |
| [VOLUME.ADMIN PRUNE-CHANGELOG](/docs/volume/commands/volume-admin-prune-changelog/) | Remove old changelog entries |
| [VOLUME.ADMIN MARK-STALE-PREFIXES](/docs/volume/commands/volume-admin-mark-stale-prefixes/) | Scan and clear orphaned prefix references |
| [VOLUME.ADMIN CLEANUP-ORPHAN-FILES](/docs/volume/commands/volume-admin-cleanup-orphan-files/) | Remove orphaned segment files from disk |
| [VOLUME.INSPECT REPLICATION](/docs/volume/commands/volume-inspect-replication/) | Inspect replication state for a standby |
| [VOLUME.INSPECT CURSOR](/docs/volume/commands/volume-inspect-cursor/) | Show the write cursor for a volume |
For step-by-step maintenance procedures, see the [Operations Guide](/docs/volume/operations-guide/).
# VOLUME.ADMIN CLEANUP-ORPHAN-FILES
> Identifies and removes orphaned segment files from a volume's data directory that are no longer tracked in metadata.
Identifies and removes orphaned segment files from a volume’s data directory that are no longer tracked in metadata.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN CLEANUP-ORPHAN-FILES
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------ |
| `volume-name` | string | Name of the volume to clean up (e.g. `bucket-shard-0`) |
## Return Value
[Section titled “Return Value”](#return-value)
Array of bulk strings, each containing the absolute path of a deleted orphan file. Returns an empty array if no orphan files were found.
## Behavior
[Section titled “Behavior”](#behavior)
Loads volume metadata and builds a set of expected segment file names, then lists the actual files present in the volume’s `segments/` directory on disk. Any file on disk that is not in the expected set is considered an orphan and is deleted. The absolute paths of successfully deleted files are returned.
Orphan segment files remain on disk after crashes where the metadata entry was removed, but the file was not deleted.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| -------------------------------------------------- | ---------------------------------- |
| Missing volume name parameter | `ERR invalid number of parameters` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
| Volume is closed | `ERR Volume is closed.` |
| Segments directory not found | `ERR File not found: ` |
## Examples
[Section titled “Examples”](#examples)
**Orphan files found and deleted:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN CLEANUP-ORPHAN-FILES bucket-shard-0
1) "/var/kronotop/data/bucket-shard-0/segments/00000a.seg"
2) "/var/kronotop/data/bucket-shard-0/segments/00000b.seg"
```
**No orphan files:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN CLEANUP-ORPHAN-FILES bucket-shard-0
(empty array)
```
**Volume not found:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN CLEANUP-ORPHAN-FILES non-existent-volume
(error) ERR Volume: 'non-existent-volume' is not open
```
# VOLUME.ADMIN DESCRIBE
> Returns metadata and segment-level statistics for a named volume.
Returns metadata and segment-level statistics for a named volume.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN DESCRIBE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------- |
| `volume-name` | string | Name of the volume to describe, in `-shard-` format (e.g. `bucket-shard-0`) |
## Return Value
[Section titled “Return Value”](#return-value)
RESP3 map with the following top-level fields:
| Field | Type | Description |
| -------------- | ------- | ------------------------------------------------------------ |
| `name` | string | Volume name |
| `status` | string | Operational status: `READWRITE`, `READONLY`, or `INOPERABLE` |
| `data_dir` | string | Filesystem path where segment files are stored |
| `segment_size` | integer | Maximum size of each segment file in bytes |
| `segments` | map | Per-segment statistics keyed by integer segment ID |
Each value in the `segments` map is a nested map:
| Field | Type | Description |
| -------------------- | ------- | -------------------------------------------------------------------------------- |
| `size` | integer | Total segment file size in bytes |
| `free_bytes` | integer | Unallocated space remaining in the segment |
| `used_bytes` | integer | Space occupied by live entries |
| `garbage_percentage` | double | Percentage of reclaimable space: `(size - free_bytes - used_bytes) / size * 100` |
| `cardinality` | integer | Number of live entries in the segment |
## Behavior
[Section titled “Behavior”](#behavior)
Reads the volume’s configuration and computes per-segment statistics.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| -------------------------------------------------- | ---------------------------------- |
| Volume name parameter is missing | `ERR invalid number of parameters` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
## Examples
[Section titled “Examples”](#examples)
**Describe a volume with one segment:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN DESCRIBE bucket-shard-0
1# "name" => "bucket-shard-0"
2# "status" => "READWRITE"
3# "data_dir" => "/tmp/kronotop/data/bucket-shard-0"
4# "segment_size" => (integer) 268435456
5# "segments" =>
1# (integer) 0 =>
1# "size" => (integer) 268435456
2# "free_bytes" => (integer) 268435200
3# "used_bytes" => (integer) 256
4# "garbage_percentage" => (double) 0.0
5# "cardinality" => (integer) 1
```
**Volume not found:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN DESCRIBE non-existent-volume
(error) ERR Volume: 'non-existent-volume' is not open
```
# VOLUME.ADMIN LIST
> Lists all volumes opened by the connected member.
Lists all volumes opened by the connected member.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN LIST
```
## Parameters
[Section titled “Parameters”](#parameters)
None.
## Return Value
[Section titled “Return Value”](#return-value)
RESP array of bulk strings. Each element is a volume name in `-shard-` format (e.g. `bucket-shard-0`). Returns an empty array if no volumes are open.
## Behavior
[Section titled “Behavior”](#behavior)
Returns all volume names managed by the connected member. This command does not require cluster initialization. It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
No command-specific errors. The command takes no parameters beyond the subcommand itself.
## Examples
[Section titled “Examples”](#examples)
**Member managing several shards:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN LIST
1) "bucket-shard-0"
2) "bucket-shard-1"
3) "bucket-shard-2"
```
**No volumes open:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN LIST
(empty array)
```
# VOLUME.ADMIN LIST-SEGMENTS
> Returns the list of segment IDs for a given volume.
Returns the list of segment IDs for a given volume. Segments are the underlying storage units within a volume. Each segment holds a range of appended data.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN LIST-SEGMENTS
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------- |
| `volume-name` | string | Name of the volume, in `-shard-` format (e.g. `bucket-shard-0`) |
## Return Value
[Section titled “Return Value”](#return-value)
RESP3 array of integers. Each element is a segment ID (`long`). Returns an empty array if the volume has no segments.
## Behavior
[Section titled “Behavior”](#behavior)
Loads volume metadata and returns the list of segment IDs as an integer array.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| -------------------------------------------------- | ---------------------------------- |
| Missing volume name parameter | `ERR invalid number of parameters` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
## Examples
[Section titled “Examples”](#examples)
**List segments for an empty volume:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN LIST-SEGMENTS bucket-shard-1
(empty array)
```
**List segments for a volume with data:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN LIST-SEGMENTS bucket-shard-0
1) (integer) 0
```
**Volume not found:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN LIST-SEGMENTS non-existent-volume
(error) ERR Volume: 'non-existent-volume' is not open
```
# VOLUME.ADMIN MARK-STALE-PREFIXES
> Starts, stops, removes, or locates the stale-prefix scanning task.
Starts, stops, removes, or locates the stale-prefix scanning task. A stale prefix is one whose pointer no longer references valid data. Stale prefixes are cleared and published to a disused-prefixes journal for downstream cleanup.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN MARK-STALE-PREFIXES
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ----------- | ------ | ---------------------------------------------------------------- |
| `operation` | string | One of `START`, `STOP`, `REMOVE`, or `LOCATE` (case-insensitive) |
**Operations:**
| Operation | Description |
| --------- | ------------------------------------------------------------------------------------------- |
| `START` | Starts the background task. Fails if the task is already running. |
| `STOP` | Gracefully stops the running task and removes it from the task registry. |
| `REMOVE` | Stops the task and also removes its persisted metadata. |
| `LOCATE` | Returns the member ID and process ID of the member that owns the task’s persisted metadata. |
## Return Value
[Section titled “Return Value”](#return-value)
* **START, STOP, REMOVE:** Simple string `OK` on success. `OK` means the operation was accepted, not that scanning has finished.
* **LOCATE:** A map with `member_id` (string), `process_id` (Base32Hex-encoded string), `external_address` (string, host:port or null), and `internal_address` (string, host:port or null) identifying the task owner.
## Behavior
[Section titled “Behavior”](#behavior)
Prefix pointers in the global prefix registry can become stale, the pointer target may have been deleted, or the data it references may no longer match. This task provides a safe, batched way to identify and remove these stale entries, keeping the prefix registry clean. The task scans the global prefix registry in batches. For each prefix, it checks whether the pointer still references valid data. Stale prefixes are cleared and published to a disused-prefixes journal for downstream cleanup.
Progress is tracked via a progress marker stored persistently, allowing the task to resume from where it left off if restarted. The task auto-completes when a batch returns zero entries, and removes its own metadata upon natural completion.
Task ownership is validated using both member ID and process ID. If the original member has been removed from the cluster or restarted (different process ID), another member can take over the task and resume from the last progress marker. If the original member is still registered in the cluster with the same process ID, the command fails.
When a member starts up, it checks for persisted task metadata that matches its own member ID. If found, the task is automatically resumed from the last progress marker without requiring a manual `START` command. This means that if a member is restarted while the task is in progress, the task picks up where it left off.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| ---------------------------------- | ------------------------------------------------------------------- |
| Missing or extra parameters | `ERR invalid number of parameters` |
| Invalid operation value | `ERR invalid operation: ` |
| Task already running | `ERR Task volume:mark-stale-prefixes-task already exists` |
| STOP when no task is running | `ERR Task with name volume:mark-stale-prefixes-task does not exist` |
| REMOVE when no task is running | `ERR Task with name volume:mark-stale-prefixes-task does not exist` |
| Task owned by another alive member | `ERR Run by another cluster member` |
| LOCATE when no metadata exists | `ERR no metadata found` |
## Examples
[Section titled “Examples”](#examples)
**Start the task:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES START
OK
```
**Stop a running task:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES STOP
OK
```
**Remove a task and its metadata:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES REMOVE
OK
```
**Start when the task is already running:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES START
(error) ERR Task volume:mark-stale-prefixes-task already exists
```
**Locate the task owner:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES LOCATE
1# "member_id" => "cdef7490344552d50e60f42b04e1febcaeafd4b4"
2# "process_id" => "000016M5BPGLO0000000xxxx"
3# "external_address" => "192.168.1.10:5484"
4# "internal_address" => "192.168.1.10:3320"
```
**Stop when no task is running:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES STOP
(error) ERR Task with name volume:mark-stale-prefixes-task does not exist
```
# VOLUME.ADMIN PRUNE-CHANGELOG
> Removes changelog entries older than a given retention period to reclaim storage.
Removes changelog entries older than a given retention period to reclaim storage.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN PRUNE-CHANGELOG
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------------ | ------- | -------------------------------------------------------------------------- |
| `volume-name` | string | Name of the volume, in `-shard-` format (e.g. `bucket-shard-0`) |
| `retention-period` | integer | Number of hours of changelog history to retain. Must be greater than zero. |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
Calculates a cutoff timestamp as `now() - retention-period` hours. Clears changelog entries older than the cutoff in a single atomic operation.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| --------------------------------------------------------- | ------------------------------------------------ |
| Missing volume name or retention period parameter | `ERR invalid number of parameters` |
| Retention period is zero or negative | `ERR retention period must be greater than zero` |
| Volume name does not match the `-shard-` format | `ERR invalid volume name: ` |
## Examples
[Section titled “Examples”](#examples)
**Prune changelog entries older than 24 hours:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN PRUNE-CHANGELOG bucket-shard-0 24
OK
```
**Retention period is zero:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN PRUNE-CHANGELOG bucket-shard-0 0
(error) ERR retention period must be greater than zero
```
**Invalid volume name:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN PRUNE-CHANGELOG non-existent-volume 24
(error) ERR invalid volume name: non-existent-volume
```
**Missing parameters:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN PRUNE-CHANGELOG
(error) ERR invalid number of parameters
```
# VOLUME.ADMIN REPLICATION
> Starts or stops volume replication on a standby node.
Starts or stops volume replication on a standby node.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN REPLICATION
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------ |
| `operation` | string | One of `START` or `STOP` (case-insensitive) |
| `volume-name` | string | Volume identifier (e.g. `bucket-shard-0`). Use `VOLUME.ADMIN LIST` to discover names |
**Operations:**
| Operation | Description |
| --------- | -------------------------------------------------------------------------------- |
| `START` | Starts replication. Resumes processing if it was previously stopped. |
| `STOP` | Stops replication and prevents automatic restart until explicitly started again. |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string `OK` on success. `OK` means the request was accepted.
## Behavior
[Section titled “Behavior”](#behavior)
Both operations verify that the current node is listed as a standby in the route for the given volume.
**START** resumes replication processing. Replication normally starts automatically when a standby is assigned via routing, but if it was previously stopped (e.g. for maintenance), this command re-initiates it manually.
**STOP** gracefully shuts down replication and prevents automatic restart until explicitly started again.
This command must be run on the standby node. It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| -------------------------------------------- | -------------------------------------------------- |
| Missing or extra parameters | `ERR invalid number of parameters` |
| Invalid volume name format | `ERR invalid volume name: ` |
| No route found for the volume | `ERR No route found for ` |
| Current node is not a standby for the volume | `ERR This node is not a standby for ` |
| Invalid operation (not START or STOP) | `ERR unknown subcommand: ''` |
## Examples
[Section titled “Examples”](#examples)
**Start replication on a standby node:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN REPLICATION START bucket-shard-1
OK
```
**Stop replication on a standby node:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN REPLICATION STOP bucket-shard-1
OK
```
**Node is not a standby:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN REPLICATION START bucket-shard-1
(error) ERR This node is not a standby for bucket-shard-1
```
**Missing parameters:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN REPLICATION
(error) ERR invalid number of parameters
```
# VOLUME.ADMIN SET-STATUS
> Changes the operational status of a named volume.
Changes the operational status of a named volume.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.ADMIN SET-STATUS
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ----------------------------------------------------------------------------------- |
| `volume-name` | string | Name of the volume to update, in `-shard-` format (e.g. `bucket-shard-0`) |
| `status` | string | New operational status: `READWRITE`, `READONLY`, or `INOPERABLE` (case-insensitive) |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
Persists the new status. The status input is converted to uppercase before validation.
It does not require cluster initialization. It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| -------------------------------------------------- | --------------------------------------------- |
| Missing volume name or status parameter | `ERR invalid number of parameters` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
| Status value is not a valid volume status | `ERR Invalid volume status: ` |
## Examples
[Section titled “Examples”](#examples)
**Set a volume to read-only:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN SET-STATUS bucket-shard-0 READONLY
OK
```
**Verify via DESCRIBE:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN DESCRIBE bucket-shard-0
1# "name" => "bucket-shard-0"
2# "status" => "READONLY"
3# "data_dir" => "/tmp/kronotop/data/bucket-shard-0"
4# "segment_size" => (integer) 268435456
5# "segments" =>
1# (integer) 0 =>
1# "size" => (integer) 268435456
2# "free_bytes" => (integer) 268435200
3# "used_bytes" => (integer) 256
4# "garbage_percentage" => (double) 0.0
5# "cardinality" => (integer) 1
```
**Invalid status value:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN SET-STATUS bucket-shard-0 INVALID
(error) ERR Invalid volume status: INVALID
```
**Volume not found:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN SET-STATUS non-existent-volume READONLY
(error) ERR Volume: 'non-existent-volume' is not open
```
# VOLUME.ADMIN VACUUM
> Manages garbage collection of stale segment data on a volume.
Manages garbage collection of stale segment data on a volume.
## Overview
[Section titled “Overview”](#overview)
Delete and update operations leave behind unreachable data in segments. Vacuum reclaims this space by evacuating live entries from high-garbage segments into the current writable segment, then destroying the emptied segment files.
`START` returns immediately and processing continues in the background. Only one vacuum can be active on a given volume at a time.
A typical lifecycle:
1. `START`: begin vacuum with a garbage threshold
2. `STATUS`: monitor progress
3. `STOP`: (optional) cancel early if needed
4. `DROP`: clear metadata after the run finishes or is stopped
Metadata from a completed or stopped run persists until explicitly dropped. A new vacuum cannot start while stale metadata exists; run `DROP` first.
All subcommands are available on the management port (default 3320).
An invalid subcommand (not START, STOP, DROP, or STATUS) returns `ERR unknown subcommand: ''`.
## Subcommands
[Section titled “Subcommands”](#subcommands)
### START
[Section titled “START”](#start)
Initiates a vacuum run on the specified volume.
**Syntax**
```kronotop
VOLUME.ADMIN VACUUM START
```
**Parameters**
| Parameter | Type | Description |
| ------------------- | ------ | ----------------------------------------------------------------------------------------------------------- |
| `volume-name` | string | Volume identifier (e.g. `bucket-shard-0`). Use `VOLUME.ADMIN LIST` to discover volume names |
| `garbage-threshold` | float | Minimum garbage percentage required to vacuum a segment. Must be between 0 and 100 (exclusive on both ends) |
**Return Value**
Simple string `OK`.
**Behavior**
Continuously analyzes all segments on the volume. For each segment whose garbage percentage exceeds the threshold, a worker is spawned to evacuate its live entries into the current writable segment.
The maximum number of concurrent workers is controlled by the `volume.vacuum.max_workers` configuration key. When set to `0` (the default is 1), it falls back to the number of available CPU cores.
The writable (current active) segment is always skipped. Segments already being processed by a worker are also skipped.
When a worker finishes evacuating all entries from a segment, the segment file is destroyed. If the vacuum is stopped before a worker completes, that segment’s status is marked `STOPPED` and the file is not destroyed.
**Errors**
| Condition | Message |
| -------------------------------------------------- | ------------------------------------------------------------------- |
| A vacuum is already running on this volume | `ERR Vacuum is already running on volume ` |
| Metadata from a previous run exists | `ERR Stale vacuum metadata exists on volume , run DROP first` |
| Threshold is not a valid number | `ERR garbage-threshold must be a number` |
| Threshold is out of range | `ERR garbage-threshold must be between 0 and 100 (exclusive)` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
**Examples**
**Start vacuum on bucket-shard-0 with a 30% threshold:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM START bucket-shard-0 30
OK
```
**Attempt to start while already running:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM START bucket-shard-0 30
(error) ERR Vacuum is already running on volume bucket-shard-0
```
**Stale metadata from a previous run:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM START bucket-shard-0 30
(error) ERR Stale vacuum metadata exists on volume bucket-shard-0, run DROP first
```
***
### STOP
[Section titled “STOP”](#stop)
Gracefully stops an active vacuum run.
**Syntax**
```kronotop
VOLUME.ADMIN VACUUM STOP
```
**Parameters**
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------ |
| `volume-name` | string | Volume identifier (e.g. `bucket-shard-0`). Use `VOLUME.ADMIN LIST` to discover names |
**Return Value**
Simple string `OK`.
**Behavior**
Signals the vacuum to stop. Active workers finish their current batch and then exit. The command blocks until all workers have completed. Final statistics are saved to metadata with result `STOPPED`.
After stopping, the metadata remains in place. Use `DROP` to clear it before starting a new vacuum.
**Errors**
| Condition | Message |
| -------------------------------------------------- | --------------------------------------- |
| No vacuum is running on this volume | `ERR No active vacuum on volume ` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
**Examples**
**Stop an active vacuum:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STOP bucket-shard-0
OK
```
**No vacuum to stop:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STOP bucket-shard-0
(error) ERR No active vacuum on volume bucket-shard-0
```
***
### DROP
[Section titled “DROP”](#drop)
Removes all vacuum metadata from the volume.
**Syntax**
```kronotop
VOLUME.ADMIN VACUUM DROP
```
**Parameters**
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------ |
| `volume-name` | string | Volume identifier (e.g. `bucket-shard-0`). Use `VOLUME.ADMIN LIST` to discover names |
**Return Value**
Simple string `OK`.
**Behavior**
Clears all vacuum metadata (volume-level statistics and per-segment records). The vacuum must be stopped before dropping. Drop metadata before starting a new vacuum run on the same volume.
**Errors**
| Condition | Message |
| -------------------------------------------------- | -------------------------------------------------------------- |
| Vacuum is still running | `ERR Vacuum is still running on volume , run STOP first` |
| No metadata exists | `ERR No active vacuum on volume ` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
**Examples**
**Drop metadata after a completed or stopped vacuum:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM DROP bucket-shard-0
OK
```
**Attempt to drop while still running:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM DROP bucket-shard-0
(error) ERR Vacuum is still running on volume bucket-shard-0, run STOP first
```
***
### STATUS
[Section titled “STATUS”](#status)
Returns the current state and progress of a vacuum run.
**Syntax**
```kronotop
VOLUME.ADMIN VACUUM STATUS
```
**Parameters**
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------ |
| `volume-name` | string | Volume identifier (e.g. `bucket-shard-0`). Use `VOLUME.ADMIN LIST` to discover names |
**Return Value**
RESP3 map with the following top-level fields:
| Field | Type | Description |
| ---------- | ------- | ------------------------------------------------------------------------------ |
| `active` | boolean | Whether a vacuum is currently running |
| `metadata` | map | Volume-level run statistics. Present only after the vacuum run has saved stats |
| `segments` | array | Per-segment progress. Each element is a map |
The `metadata` map contains:
| Field | Type | Description |
| -------------------- | ------- | ------------------------------------------------- |
| `started_at` | integer | Epoch milliseconds when the vacuum run began |
| `completed_at` | integer | Epoch milliseconds when the vacuum run ended |
| `result` | string | Run outcome: `NO_WORK`, `COMPLETED`, or `STOPPED` |
| `segments_processed` | integer | Number of segments that were vacuumed |
Each element in the `segments` array is a map:
| Field | Type | Description |
| ------------ | ------- | ----------------------------------------------------------------- |
| `segment_id` | integer | Segment identifier |
| `status` | string | Current phase: `ANALYZE`, `EVACUATING`, `COMPLETED`, or `STOPPED` |
| `started_at` | integer | Epoch milliseconds when analysis of this segment began |
**Behavior**
Checks whether a vacuum is active, then loads persisted metadata and per-segment records. If no metadata exists (vacuum has never run or was already dropped), an error is returned.
During an active vacuum, `metadata` may not yet be present (it is written when the run completes or is stopped), while `segments` reflects real-time per-segment progress.
**Errors**
| Condition | Message |
| -------------------------------------------------- | --------------------------------------- |
| No metadata exists (never run or already dropped) | `ERR No active vacuum on volume ` |
| No volume with that name is managed by this member | `ERR Volume: '' is not open` |
**Examples**
**Status of a completed vacuum:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STATUS bucket-shard-0
1# "active" => (false)
2# "metadata" =>
1# "started_at" => (integer) 1715100000000
2# "completed_at" => (integer) 1715100060000
3# "result" => "COMPLETED"
4# "segments_processed" => (integer) 3
3# "segments" =>
1) 1# "segment_id" => (integer) 0
2# "status" => "COMPLETED"
3# "started_at" => (integer) 1715100001000
2) 1# "segment_id" => (integer) 1
2# "status" => "COMPLETED"
3# "started_at" => (integer) 1715100002000
3) 1# "segment_id" => (integer) 2
2# "status" => "COMPLETED"
3# "started_at" => (integer) 1715100003000
```
**Status of an active vacuum with workers in progress:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STATUS bucket-shard-0
1# "active" => (true)
2# "segments" =>
1) 1# "segment_id" => (integer) 0
2# "status" => "EVACUATING"
3# "started_at" => (integer) 1715100001000
2) 1# "segment_id" => (integer) 1
2# "status" => "ANALYZE"
3# "started_at" => (integer) 1715100005000
```
**No vacuum has run:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STATUS bucket-shard-0
(error) ERR No active vacuum on volume bucket-shard-0
```
## Concepts
[Section titled “Concepts”](#concepts)
**Garbage Percentage**
Each segment tracks its total size, free bytes (unallocated space), and used bytes (space occupied by live entries). Garbage percentage is calculated as:
```plaintext
garbage_percentage = (size - free_bytes - used_bytes) / size * 100
```
Use `VOLUME.ADMIN DESCRIBE` to inspect per-segment garbage percentages before deciding on a threshold.
**Segment Status Values**
| Status | Description |
| ------------ | --------------------------------------------------------------------------------- |
| `ANALYZE` | Segment is being analyzed for prefix cardinalities before evacuation begins |
| `EVACUATING` | Live entries are being read from this segment and written to the writable segment |
| `COMPLETED` | All entries have been evacuated and the segment file has been destroyed |
| `STOPPED` | Evacuation was interrupted by a `STOP` command before completion |
**Vacuum Result Values**
| Result | Description |
| ----------- | ------------------------------------------------------------------ |
| `NO_WORK` | No segments exceeded the garbage threshold; nothing was vacuumed |
| `COMPLETED` | All eligible segments were successfully vacuumed |
| `STOPPED` | The vacuum was stopped before all eligible segments were processed |
# VOLUME.INSPECT CURSOR
> Returns the current write cursor for a volume's active segment, including its byte position, versionstamp, and changelog sequence number.
Returns the current write cursor for a volume’s active segment, including its byte position, versionstamp, and changelog sequence number.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.INSPECT CURSOR
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------ |
| `volume-name` | string | Name of the volume to inspect, in `-shard-` format (e.g. `bucket-shard-0`) |
## Return Value
[Section titled “Return Value”](#return-value)
RESP3 map with the following fields:
| Field | Type | Description |
| ------------------- | ------- | ---------------------------------------------------------------------------------------------------------------- |
| `active_segment_id` | integer | ID of the segment currently accepting writes |
| `versionstamp` | string | Base32-hex encoded versionstamp of the last write, or empty string if no writes have occurred |
| `next_position` | integer | Byte offset where the next write will be appended in the active segment |
| `sequence_number` | integer | Changelog sequence number of the last mutation, or `-1` if the volume is empty or the changelog entry was pruned |
## Behavior
[Section titled “Behavior”](#behavior)
Finds the active segment ID and resolves the sequence number of the last changelog entry. Returns `-1` for `sequence_number` and an empty string for `versionstamp` when the volume has no data.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| ----------------------------------------------------- | ---------------------------------- |
| Missing volume-name parameter | `ERR invalid number of parameters` |
| Volume name does not match `-shard-` format | `ERR invalid volume name: ` |
## Examples
[Section titled “Examples”](#examples)
**Volume with data:**
```kronotop
127.0.0.1:3320> VOLUME.INSPECT CURSOR bucket-shard-0
1# "active_segment_id" => (integer) 0
2# "versionstamp" => "A1B2C3D4E5F6G7H8I9J0"
3# "next_position" => (integer) 10240
4# "sequence_number" => (integer) 9
```
**Empty volume:**
```kronotop
127.0.0.1:3320> VOLUME.INSPECT CURSOR bucket-shard-0
1# "active_segment_id" => (integer) 0
2# "versionstamp" => ""
3# "next_position" => (integer) 0
4# "sequence_number" => (integer) -1
```
# VOLUME.INSPECT REPLICATION
> Returns the replication status for a specific standby member on a given shard, including the current stage, cursor position, per-stage progress, and any error message.
Returns the replication status for a specific standby member on a given shard, including the current stage, cursor position, per-stage progress, and any error message.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
VOLUME.INSPECT REPLICATION
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Description |
| ------------- | ------ | ------------------------------------------------------------------------------------ |
| `volume-name` | string | Volume identifier (e.g. `bucket-shard-0`). Use `VOLUME.ADMIN LIST` to discover names |
| `standby-id` | string | Full member ID of the standby, or a unique 4-character prefix |
## Return Value
[Section titled “Return Value”](#return-value)
RESP3 map with the following fields:
| Field | Type | Description |
| --------------------------- | ------ | ----------------------------------------------------------------------------------------------------------- |
| `stage` | string | Current replication stage: `SEGMENT_REPLICATION`, `CHANGE_DATA_CAPTURE`, or empty string if not yet started |
| `cursor` | map | Current replication cursor (see **cursor** below) |
| `status` | string | Replication status: `WAITING`, `RUNNING`, `DONE`, `STOPPED`, `FAILED`, or empty string if not yet started |
| `error_message` | string | Error description if replication has failed, or empty string |
| `cdc_stage` | map | CDC stage progress (see **cdc\_stage** below) |
| `segment_replication_stage` | map | Segment replication progress (see **segment\_replication\_stage** below) |
### `cursor`
[Section titled “cursor”](#cursor)
| Field | Type | Description |
| ------------ | ------- | -------------------------------------------- |
| `segment_id` | integer | ID of the segment currently being replicated |
| `position` | integer | Current byte position within that segment |
### `cdc_stage`
[Section titled “cdc\_stage”](#cdc_stage)
| Field | Type | Description |
| ----------------- | ------- | --------------------------------------------- |
| `sequence_number` | integer | Changelog sequence number of the CDC consumer |
| `position` | integer | Byte position within the current CDC segment |
### `segment_replication_stage`
[Section titled “segment\_replication\_stage”](#segment_replication_stage)
| Field | Type | Description |
| ---------------------- | ------- | ------------------------------------------------------ |
| `tail_sequence_number` | integer | Tail changelog sequence number for segment replication |
| `tail_next_position` | integer | Next byte position after the segment tail |
## Behavior
[Section titled “Behavior”](#behavior)
Reads the full replication status for the given standby and volume. Fields default to empty strings and zeroes when replication has not yet started.
It is available on the management port (default 3320).
## Errors
[Section titled “Errors”](#errors)
| Condition | Message |
| ------------------------------------------------- | ------------------------------------------------------ |
| Missing or extra parameters | `ERR invalid number of parameters` |
| Invalid volume name format | `ERR invalid volume name: ` |
| Invalid member ID format | `ERR Invalid memberId: ` |
| No member found with the given 4-character prefix | `ERR no member found with prefix: ` |
| More than one member matches the prefix | `ERR more than one member found with prefix: ` |
## Examples
[Section titled “Examples”](#examples)
**Active replication (running):**
```kronotop
127.0.0.1:3320> VOLUME.INSPECT REPLICATION bucket-shard-0 ab12
1# "stage" => "CHANGE_DATA_CAPTURE"
2# "cursor" => 1# "segment_id" => (integer) 2
2# "position" => (integer) 8192
3# "status" => "RUNNING"
4# "error_message" => ""
5# "cdc_stage" => 1# "sequence_number" => (integer) 15
2# "position" => (integer) 4096
6# "segment_replication_stage" => 1# "tail_sequence_number" => (integer) 10
2# "tail_next_position" => (integer) 6144
```
**Replication not yet started:**
```kronotop
127.0.0.1:3320> VOLUME.INSPECT REPLICATION bucket-shard-0 ab12
1# "stage" => ""
2# "cursor" => 1# "segment_id" => (integer) 0
2# "position" => (integer) 0
3# "status" => ""
4# "error_message" => ""
5# "cdc_stage" => 1# "sequence_number" => (integer) 0
2# "position" => (integer) 0
6# "segment_replication_stage" => 1# "tail_sequence_number" => (integer) 0
2# "tail_next_position" => (integer) 0
```
# Volume Operations Guide
> This guide covers routine storage maintenance tasks that reclaim disk space after data deletion.
This guide covers routine storage maintenance tasks that reclaim disk space after data deletion. It is intended for cluster administrators.
***
## How Storage Cleanup Works
[Section titled “How Storage Cleanup Works”](#how-storage-cleanup-works)
When documents are deleted or updated, the metadata in FoundationDB is removed or replaced immediately. However, internal prefix references may become orphaned over time.
The **Mark stale prefixes** admin task scans all prefix references in FoundationDB, identifies those that no longer point to valid data, and clears them. Stale references skew garbage percentage calculations until cleared.
This task does not run automatically. It must be triggered explicitly by an administrator.
***
## Stale Prefix Cleanup
[Section titled “Stale Prefix Cleanup”](#stale-prefix-cleanup)
A *prefix* is the internal grouping key that associates stored entries with their logical owner (a bucket). When a bucket or namespace is deleted, the data and metadata it owned become orphaned, but the prefix references in FoundationDB are not automatically cleaned up in every case.
The `MARK-STALE-PREFIXES` task scans all prefix references, detects those whose targets are missing or invalid, and clears them. This makes the associated data eligible for garbage collection.
**Starting the task:**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES START
OK
```
The task runs in the background using batch-priority transactions, so it does not compete with user-facing traffic for FoundationDB resources. It processes prefixes in batches of 10,000 and tracks its progress internally, allowing it to resume from where it left off if restarted.
The task auto-completes when all prefixes have been scanned. No manual stop is required under normal circumstances.
**Stopping the task (if needed):**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES STOP
OK
```
Only one instance of the task can run at a time.
***
## Namespace Purge Considerations
[Section titled “Namespace Purge Considerations”](#namespace-purge-considerations)
`NAMESPACE PURGE` permanently deletes a namespace’s FoundationDB directory, but it does **not** clean up volume data. The bytes on the disk and any orphaned prefix references remain until explicitly reclaimed.
After purging a namespace, run the stale prefix scan to clean up orphaned references:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES START
OK
# Wait for the task to complete
```
For single-bucket deletion, `BUCKET.PURGE` handles its own prefix cleanup immediately, so `MARK-STALE-PREFIXES` is not required.
**Prefer per-bucket deletion over namespace purge.** When possible, delete buckets individually with `BUCKET.REMOVE` + `BUCKET.PURGE` before dropping the namespace. This approach is cleaner because `BUCKET.PURGE` handles prefix cleanup inline. No separate `MARK-STALE-PREFIXES` pass is needed. Reserve `NAMESPACE PURGE` for cases where the namespace contains too many buckets to delete one by one, or when the namespace must be removed urgently regardless of cleanup cost.
***
## Vacuuming Segments
[Section titled “Vacuuming Segments”](#vacuuming-segments)
Over time, delete and update operations leave unreachable data in segments. Marking stale prefixes makes this garbage visible, but does not free disk space. Vacuum is the step that actually reclaims it: live entries are evacuated from high-garbage segments into the current writable segment, and the emptied segment files are destroyed.
**When to run:** After `MARK-STALE-PREFIXES` has completed and `VOLUME.ADMIN DESCRIBE` shows elevated `garbage_percentage` values on one or more segments.
**Typical workflow:**
1. **Start vacuum** on a volume, specifying a garbage threshold. Only segments whose garbage percentage exceeds this value will be processed:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM START bucket-shard-0 30
OK
```
2. **Monitor progress.** Vacuum auto-completes when all eligible segments are processed:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STATUS bucket-shard-0
```
3. **Drop metadata** after the run finishes. This is required before starting a new vacuum on the same volume:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM DROP bucket-shard-0
OK
```
**Stopping early (if needed):**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM STOP bucket-shard-0
OK
```
After stopping, run `DROP` to clear the metadata before starting a new vacuum.
Only one vacuum can run on a given volume at a time. Vacuum runs in the background and does not compete with user-facing traffic.
For full parameter details, error conditions, and status output format, see [VOLUME.ADMIN VACUUM](/docs/volume/commands/volume-admin-vacuum/).
***
## Recommended Maintenance Procedure
[Section titled “Recommended Maintenance Procedure”](#recommended-maintenance-procedure)
Follow this checklist after namespace purges or periodic maintenance:
1. **Run stale prefix cleanup**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN MARK-STALE-PREFIXES START
OK
```
Wait for the task to auto-complete. It processes in the background and does not require monitoring.
2. **List all volumes**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN LIST
1) bucket-shard-0
2) bucket-shard-1
...
```
3. **Inspect each volume**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN DESCRIBE bucket-shard-0
```
Check `garbage_percentage` values per segment.
4. **Vacuum high-garbage volumes**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM START bucket-shard-0 30
OK
```
Run this for each volume where `garbage_percentage` exceeds your chosen threshold. Wait for completion, then drop the metadata:
```kronotop
127.0.0.1:3320> VOLUME.ADMIN VACUUM DROP bucket-shard-0
OK
```
5. **Re-inspect volumes**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN DESCRIBE bucket-shard-0
```
Confirm that the stale segment has been evacuated and deleted.
6. **Clean up orphan files (optional)**
```kronotop
127.0.0.1:3320> VOLUME.ADMIN CLEANUP-ORPHAN-FILES bucket-shard-0
```
Run this if you suspect leftover files from crashes or interrupted operations.
7. **Repeat steps 3–6 for each volume** on each cluster member as needed.
# ZMap
> ZMap is a RESP protocol proxy over FoundationDB's ordered key-value API, with convenience commands for typed numeric operations, atomic mutations, and range queries.
## Overview
[Section titled “Overview”](#overview)
ZMap is a RESP protocol proxy over FoundationDB’s ordered key-value API, with convenience commands for typed numeric operations, atomic mutations, and range queries. All operations are scoped to the session’s active namespace.
Keys are stored in lexicographic order, enabling efficient range reads, range deletes, and key selector navigation. All operations inherit FoundationDB’s ACID transaction guarantees.
## Data Model
[Section titled “Data Model”](#data-model)
Keys and values are raw bytes. At the storage level there is no schema and no type system. Interpretation is left to the client. Keys are maintained in lexicographic order by FoundationDB, which enables ordered scans and range operations without secondary indexes.
Each ZMap entry is scoped to the session’s current namespace. The same key in different namespaces refers to different entries, providing full data isolation between namespaces.
## Typed Numeric Operations
[Section titled “Typed Numeric Operations”](#typed-numeric-operations)
On top of the raw byte layer, ZMap provides typed set, increment, and read commands for three numeric encodings:
| Encoding | Set | Increment | Read | Size | Conflict behavior |
| ---------- | ----------- | ----------- | ----------- | -------- | ---------------------------- |
| int64 | `ZSET.I64` | `ZINC.I64` | `ZGET.I64` | 8 bytes | Conflict-free (atomic `ADD`) |
| float64 | `ZSET.F64` | `ZINC.F64` | `ZGET.F64` | 8 bytes | Read-modify-write |
| decimal128 | `ZSET.D128` | `ZINC.D128` | `ZGET.D128` | 16 bytes | Read-modify-write |
`ZINC.I64` is conflict-free because it maps directly to FoundationDB’s [atomic `ADD` mutation](https://apple.github.io/foundationdb/developer-guide.html#atomic-operations), so concurrent increments on the same key never cause transaction conflicts. `ZINC.F64` and `ZINC.D128` perform a read-modify-write cycle because FoundationDB has no native atomic mutation for floating-point or decimal addition; concurrent calls on the same key can conflict.
If a key does not exist, all increment commands create it with an implicit starting value of zero.
```kronotop
> ZINC.I64 counter 10
OK
> ZINC.I64 counter 20
OK
> ZGET.I64 counter
(integer) 30
```
## Atomic Mutations
[Section titled “Atomic Mutations”](#atomic-mutations)
`ZMUTATE` exposes FoundationDB’s conflict-free [atomic mutation primitives](https://apple.github.io/foundationdb/developer-guide.html#atomic-operations) through a single command. Because mutations do not read the current value before writing, concurrent mutations on the same key do not cause transaction conflicts.
Available mutation types:
| Type | Description |
| -------------------------------- | ------------------------------------------------------------------ |
| `ADD` | Little-endian integer addition |
| `BIT_AND` / `BIT_OR` / `BIT_XOR` | Bitwise operations |
| `MIN` / `MAX` | Unsigned lexicographic minimum / maximum |
| `BYTE_MIN` / `BYTE_MAX` | Byte-level minimum / maximum |
| `COMPARE_AND_CLEAR` | Clears the key if its current value equals the operand |
| `APPEND_IF_FITS` | Appends the operand if the result fits within the value size limit |
| `SET_VERSIONSTAMPED_VALUE` | Sets a value with a versionstamp embedded in the value |
The operand is the raw byte representation of the value. For `ADD`, it is a little-endian signed 64-bit integer. The example below atomically adds 5 to `counter`. `\x05\x00\x00\x00\x00\x00\x00\x00` is the 8-byte little-endian encoding of 5:
```kronotop
> ZMUTATE counter "\x05\x00\x00\x00\x00\x00\x00\x00" ADD
OK
```
In client SDKs, the operand is a byte array:
Java
```java
ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(5).array();
```
Python
```python
struct.pack(' ZGETRANGE key-0 key-5 LIMIT 3 REVERSE
1) 1) "key-5"
2) "f"
2) 1) "key-4"
2) "e"
3) 1) "key-3"
2) "d"
```
## Transaction Support
[Section titled “Transaction Support”](#transaction-support)
By default, each ZMap command auto-commits: Kronotop creates a FoundationDB transaction, executes the command, and commits immediately. To group multiple commands into a single atomic unit, wrap them in `BEGIN` / `COMMIT`:
```kronotop
> BEGIN
OK
> ZSET key1 100
OK
> ZSET key2 200
OK
> COMMIT
OK
```
`ROLLBACK` discards all uncommitted changes. Read commands (`ZGET`, `ZGETRANGE`, `ZGETKEY`, `ZGETRANGESIZE`) support snapshot reads via `SNAPSHOTREAD ON`, which avoids read conflict ranges for higher throughput on read-heavy workloads.
See [Transactions](/docs/transactions/) for details on explicit transactions, snapshot reads, and FoundationDB constraints.
## Commands
[Section titled “Commands”](#commands)
| Command | Description |
| --------------------------------------------------- | ------------------------------------------ |
| [ZSET](/docs/zmap/commands/zset/) | Set a key-value pair |
| [ZGET](/docs/zmap/commands/zget/) | Get the value for a key |
| [ZDEL](/docs/zmap/commands/zdel/) | Delete a key |
| [ZDELRANGE](/docs/zmap/commands/zdelrange/) | Delete a range of keys |
| [ZGETRANGE](/docs/zmap/commands/zgetrange/) | Get an ordered range of key-value pairs |
| [ZGETKEY](/docs/zmap/commands/zgetkey/) | Resolve a key name by key selector |
| [ZGETRANGESIZE](/docs/zmap/commands/zgetrangesize/) | Get the estimated byte size of a key range |
| [ZMUTATE](/docs/zmap/commands/zmutate/) | Perform an atomic mutation on a key |
| [ZSET.I64](/docs/zmap/commands/zseti64/) | Set a value as a signed 64-bit integer |
| [ZINC.I64](/docs/zmap/commands/zinci64/) | Atomically increment a 64-bit integer |
| [ZGET.I64](/docs/zmap/commands/zgeti64/) | Get a value as a signed 64-bit integer |
| [ZSET.F64](/docs/zmap/commands/zsetf64/) | Set a value as a double-precision float |
| [ZINC.F64](/docs/zmap/commands/zincf64/) | Increment a 64-bit floating-point value |
| [ZGET.F64](/docs/zmap/commands/zgetf64/) | Get a value as a double-precision float |
| [ZSET.D128](/docs/zmap/commands/zsetd128/) | Set a value as a decimal128 number |
| [ZINC.D128](/docs/zmap/commands/zincd128/) | Increment a 128-bit decimal value |
| [ZGET.D128](/docs/zmap/commands/zgetd128/) | Get a value as a decimal128 number |
# ZDEL
> Deletes a key from the ZMap ordered key-value store.
Deletes a key from the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZDEL
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | ------------------ |
| `key` | bytes | Yes | The key to delete. |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
`ZDEL` removes a key and its associated value from the ZMap subspace of the session’s current namespace, backed by FoundationDB.
The operation is idempotent: deleting a non-existent key returns `OK` without raising an error.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the delete, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the delete is staged in the current transaction and only takes effect when `COMMIT` is called.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ---------------------------------------------- |
| `ERR` | Wrong number of arguments or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Delete an existing key:**
```kronotop
> ZSET mykey "Hello"
OK
> ZDEL mykey
OK
> ZGET mykey
(nil)
```
**Delete a non-existent key:**
```kronotop
> ZDEL nosuchkey
OK
```
**Use within an explicit transaction:**
```kronotop
> ZSET mykey "Hello"
OK
> BEGIN
OK
> ZDEL mykey
OK
> COMMIT
OK
> ZGET mykey
(nil)
```
# ZDELRANGE
> Deletes a range of keys from the ZMap ordered key-value store.
Deletes a range of keys from the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZDELRANGE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | --------------------------------------------------------------------------------------------- |
| `begin` | bytes | Yes | The start key of the range. Use `*` for unbounded start (from the beginning of the subspace). |
| `end` | bytes | Yes | The end key of the range (exclusive). Use `*` for unbounded end (to the end of the subspace). |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
`ZDELRANGE` removes all keys in the half-open interval \[begin, end) from the ZMap subspace of the session’s current namespace. The begin key is inclusive, and the end key is exclusive: a key equal to `end` is **not** deleted.
This follows FoundationDB’s native range-clear semantics, which always operate on half-open intervals.
The special value `*` can be used as a wildcard to represent an unbounded boundary:
* `*` as `begin`: starts the range from the very first key in the subspace.
* `*` as `end`: extends the range to the very last key in the subspace.
The operation is idempotent: clearing a range that contains no keys returns `OK` without raising an error.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the range delete, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the range delete is staged in the current transaction and only takes effect when `COMMIT` is called.
All data is scoped to the session’s active namespace. The same keys in different namespaces refer to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ---------------------------------------------- |
| `ERR` | Wrong number of arguments or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Delete an explicit key range (end-exclusive):**
```kronotop
> ZSET key-0 "a"
OK
> ZSET key-1 "b"
OK
> ZSET key-2 "c"
OK
> ZSET key-3 "d"
OK
> ZSET key-4 "e"
OK
> ZSET key-5 "f"
OK
> ZDELRANGE key-0 key-5
OK
> ZGET key-0
(nil)
> ZGET key-4
(nil)
> ZGET key-5
"f"
```
**Delete from the start of the subspace with `*`:**
```kronotop
> ZDELRANGE * key-5
OK
```
**Delete to the end of the subspace with `*`:**
```kronotop
> ZDELRANGE key-5 *
OK
```
**Use within an explicit transaction:**
```kronotop
> ZSET mykey "Hello"
OK
> BEGIN
OK
> ZDELRANGE mykey *
OK
> COMMIT
OK
> ZGET mykey
(nil)
```
# ZGET
> Retrieves the value for a key from the ZMap ordered key-value store.
Retrieves the value for a key from the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGET
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | ------------------- |
| `key` | bytes | Yes | The key to look up. |
## Return Value
[Section titled “Return Value”](#return-value)
Bulk string: the value associated with the key, or `nil` if the key does not exist.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGET` reads the value for a given key from the ZMap subspace of the session’s current namespace, backed by FoundationDB.
If the key does not exist, the command returns `nil`.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the read is performed within the current transaction.
`ZGET` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ---------------------------------------------- |
| `ERR` | Wrong number of arguments or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Get an existing key:**
```kronotop
> ZSET mykey "Hello"
OK
> ZGET mykey
"Hello"
```
**Get a non-existent key:**
```kronotop
> ZGET nosuchkey
(nil)
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZSET mykey "Hello"
OK
> COMMIT
OK
> BEGIN
OK
> ZGET mykey
"Hello"
> COMMIT
OK
```
# ZGET.D128
> Retrieves the value for a key from the ZMap ordered key-value store as an IEEE-754 decimal128 number, returning a plain decimal string.
Retrieves the value for a key from the ZMap ordered key-value store as an [IEEE-754 decimal128](https://en.wikipedia.org/wiki/Decimal128_floating-point_format) number, returning a plain decimal string.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGET.D128
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | ------------------- |
| `key` | bytes | Yes | The key to look up. |
## Return Value
[Section titled “Return Value”](#return-value)
Bulk string: the stored value decoded as a plain decimal string, or `nil` if the key does not exist.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGET.D128` reads the value for a given key from the ZMap subspace of the session’s current namespace and interprets it as an IEEE-754 decimal128 (BID encoding). It is the typed read companion to `ZINC.D128`. While `ZGET` returns raw bytes, `ZGET.D128` decodes the stored 16-byte little-endian value and returns it as a plain decimal string (no scientific notation).
Decimal128 provides 34 significant digits of precision, making it suitable for financial calculations and other use cases where exact decimal representation matters.
* If the key does not exist, the command returns `nil`.
* If the stored value is not exactly 16 bytes, the command returns an error. This can happen when a key was written with `ZSET` using a value that is not a valid 16-byte Decimal128 encoding.
* If the stored Decimal128 value is not representable as a BigDecimal (e.g., NaN or Infinity), the command returns an error.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the read is performed within the current transaction.
`ZGET.D128` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
**`ERR`** is returned when:
* Wrong number of arguments or internal failure.
* `Invalid stored value: expected 16-byte Decimal128 (IEEE-754 BID)`: The stored value is not exactly 16 bytes and cannot be decoded as a Decimal128.
* `Invalid stored value: Decimal128 is not representable as BigDecimal`: The stored value is a valid Decimal128 but cannot be converted to a BigDecimal (e.g., NaN or Infinity).
## Examples
[Section titled “Examples”](#examples)
**Read a value set by ZINC.D128:**
```kronotop
> ZINC.D128 counter 123.456
OK
> ZGET.D128 counter
"123.456"
```
**Read a non-existent key:**
```kronotop
> ZGET.D128 nosuchkey
(nil)
```
**Read after multiple increments:**
```kronotop
> ZINC.D128 counter 2.5
OK
> ZINC.D128 counter 2.5
OK
> ZINC.D128 counter 2.5
OK
> ZGET.D128 counter
"7.5"
```
**Read a high-precision value (34 digits):**
```kronotop
> ZINC.D128 pi 3.141592653589793238462643383279502
OK
> ZGET.D128 pi
"3.141592653589793238462643383279502"
```
**Read an invalid (non-16-byte) value:**
```kronotop
> ZSET badkey "abc"
OK
> ZGET.D128 badkey
(error) ERR Invalid stored value: expected 16-byte Decimal128 (IEEE-754 BID)
```
# ZGET.F64
> Retrieves the value for a key from the ZMap ordered key-value store as an IEEE-754 double-precision floating-point number.
Retrieves the value for a key from the ZMap ordered key-value store as an IEEE-754 double-precision floating-point number.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGET.F64
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | ------------------- |
| `key` | bytes | Yes | The key to look up. |
## Return Value
[Section titled “Return Value”](#return-value)
Double: the stored value decoded as an IEEE-754 double, or `nil` if the key does not exist.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGET.F64` reads the value for a given key from the ZMap subspace of the session’s current namespace and interprets it as an IEEE-754 double-precision floating-point number. It is the typed read companion to `ZINC.F64`. While `ZGET` returns raw bytes, `ZGET.F64` decodes the stored 8-byte little-endian value and returns it as a double type in the RESP3 protocol.
* If the key does not exist, the command returns `nil`.
* If the stored value is not exactly 8 bytes, the command returns an error. This can happen when a key was written with `ZSET` using a value that is not a valid 8-byte double encoding.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the read is performed within the current transaction.
`ZGET.F64` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
**`ERR`** is returned when:
* Wrong number of arguments or internal failure.
* `Invalid stored value: expected 8-byte IEEE-754 double`: The stored value is not exactly 8 bytes and cannot be decoded as a double.
## Examples
[Section titled “Examples”](#examples)
**Read a value set by ZINC.F64:**
```kronotop
> ZINC.F64 counter 123.456
OK
> ZGET.F64 counter
(double) 123.456
```
**Read a non-existent key:**
```kronotop
> ZGET.F64 nosuchkey
(nil)
```
**Read after multiple increments:**
```kronotop
> ZINC.F64 counter 2.5
OK
> ZINC.F64 counter 2.5
OK
> ZINC.F64 counter 2.5
OK
> ZGET.F64 counter
(double) 7.5
```
**Read an invalid (non-8-byte) value:**
```kronotop
> ZSET badkey "abc"
OK
> ZGET.F64 badkey
(error) ERR Invalid stored value: expected 8-byte IEEE-754 double
```
# ZGET.I64
> Retrieves the value for a key from the ZMap ordered key-value store as a signed 64-bit integer.
Retrieves the value for a key from the ZMap ordered key-value store as a signed 64-bit integer.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGET.I64
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | ------------------- |
| `key` | bytes | Yes | The key to look up. |
## Return Value
[Section titled “Return Value”](#return-value)
Integer: the stored value decoded as a signed 64-bit integer, or `nil` if the key does not exist.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGET.I64` reads the value for a given key from the ZMap subspace of the session’s current namespace and interprets it as a signed 64-bit integer. It is the typed read companion to `ZINC.I64`. While `ZGET` returns raw bytes, `ZGET.I64` decodes the stored 8-byte little-endian value and returns it as an integer type in the RESP3 protocol.
If the key does not exist, the command returns `nil`.
If the stored value is not exactly 8 bytes, the command returns an error. This can happen when a key was written with `ZSET` using a value that is not a valid 8-byte integer encoding.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the read is performed within the current transaction.
`ZGET.I64` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
**`ERR`** is returned when:
* Wrong number of arguments or internal failure.
* `Invalid stored value: expected 8-byte two's-complement int64`: The stored value is not exactly 8 bytes and cannot be decoded as a 64-bit integer.
## Examples
[Section titled “Examples”](#examples)
**Read a value set by ZINC.I64:**
```kronotop
> ZINC.I64 counter 12345
OK
> ZGET.I64 counter
(integer) 12345
```
**Read a non-existent key:**
```kronotop
> ZGET.I64 nosuchkey
(nil)
```
**Read after multiple increments:**
```kronotop
> ZINC.I64 counter 5
OK
> ZINC.I64 counter 5
OK
> ZINC.I64 counter 5
OK
> ZGET.I64 counter
(integer) 15
```
**Read an invalid (non-8-byte) value:**
```kronotop
> ZSET badkey "abc"
OK
> ZGET.I64 badkey
(error) ERR Invalid stored value: expected 8-byte two's-complement int64
```
# ZGETKEY
> Resolves and returns the key name that matches a key selector relative to a reference key in the ZMap ordered key-value store.
Resolves and returns the key name that matches a key selector relative to a reference key in the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGETKEY [KEY_SELECTOR selector]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| -------------- | ------ | -------- | ---------------------------------------------------------------- |
| `key` | bytes | Yes | The reference key for the selector lookup. |
| `KEY_SELECTOR` | string | No | The key selector strategy. Defaults to `first_greater_or_equal`. |
## Key Selectors
[Section titled “Key Selectors”](#key-selectors)
| Selector | Description |
| ------------------------ | -------------------------------------------------------------------------------------- |
| `first_greater_or_equal` | Returns the first key greater than or equal to the reference key. This is the default. |
| `first_greater_than` | Returns the first key strictly greater than the reference key. |
| `last_less_than` | Returns the last key strictly less than the reference key. |
| `last_less_or_equal` | Returns the last key less than or equal to the reference key. |
## Return Value
[Section titled “Return Value”](#return-value)
Bulk string: the resolved key name, or `nil` if no key matches the selector.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGETKEY` resolves a key using FoundationDB’s KeySelector mechanism. Instead of returning a value, it returns the *key name* that satisfies the given selector relative to the reference key. This enables cursor-like navigation over the ordered keyspace: finding the next key, previous key, or nearest match without knowing exact key names ahead of time.
The default selector is `first_greater_or_equal`, which returns the reference key itself if it exists, or the next key in order if it does not.
If no key in the keyspace satisfies the selector, the command returns `nil`.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the read is performed within the current transaction.
`ZGETKEY` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | --------------------------------------------------------------------- |
| `ERR` | Wrong number of arguments, invalid key selector, or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Get an exact key (default selector):**
```kronotop
> ZSET mykey "Hello"
OK
> ZGETKEY mykey
"mykey"
```
**Get the next key after a reference key:**
```kronotop
> ZSET key-0 "value-0"
OK
> ZSET key-1 "value-1"
OK
> ZGETKEY key-0 KEY_SELECTOR first_greater_than
"key-1"
```
**Get the previous key before a reference key:**
```kronotop
> ZSET key-0 "value-0"
OK
> ZSET key-1 "value-1"
OK
> ZGETKEY key-1 KEY_SELECTOR last_less_than
"key-0"
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZSET key-0 "value-0"
OK
> ZSET key-1 "value-1"
OK
> ZGETKEY key-0 KEY_SELECTOR first_greater_than
"key-1"
> COMMIT
OK
```
# ZGETRANGE
> Retrieves an ordered range of key-value pairs from the ZMap ordered key-value store.
Retrieves an ordered range of key-value pairs from the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGETRANGE [LIMIT count] [REVERSE] [BEGIN_KEY_SELECTOR selector] [END_KEY_SELECTOR selector]
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| -------------------- | ------- | -------- | --------------------------------------------------------------------------------------------- |
| `begin` | bytes | Yes | The start key of the range. Use `*` for unbounded start (from the beginning of the subspace). |
| `end` | bytes | Yes | The end key of the range. Use `*` for unbounded end (to the end of the subspace). |
| `LIMIT` | integer | No | Maximum number of key-value pairs to return. Default is `100`. |
| `REVERSE` | flag | No | When present, reverses the scan direction so results are returned in descending key order. |
| `BEGIN_KEY_SELECTOR` | string | No | Controls how the begin boundary is resolved. Default is `first_greater_or_equal`. |
| `END_KEY_SELECTOR` | string | No | Controls how the end boundary is resolved. Default is `first_greater_than`. |
## Key Selectors
[Section titled “Key Selectors”](#key-selectors)
Key selectors control exactly which keys are included at the range boundaries.
| Selector | Description |
| ------------------------ | ------------------------------------------------------------------------------------------------------- |
| `first_greater_or_equal` | The first key greater than or equal to the specified key. This is the default for `BEGIN_KEY_SELECTOR`. |
| `first_greater_than` | The first key strictly greater than the specified key. This is the default for `END_KEY_SELECTOR`. |
| `last_less_than` | The last key strictly less than the specified key. |
| `last_less_or_equal` | The last key less than or equal to the specified key. |
With the default selectors, the begin key is **inclusive** and the end key is also **inclusive**. This is because `first_greater_than` on the end key resolves to the first key *after* the specified end key, and FoundationDB uses a half-open interval on resolved keys, so the specified end key itself is included.
## Return Value
[Section titled “Return Value”](#return-value)
Array: a list of `[key, value]` pairs. Each pair is a two-element array containing the key and its associated value as bulk strings. Returns an empty array if no keys match the range.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGETRANGE` reads an ordered range of key-value pairs from the ZMap subspace of the session’s current namespace, backed by FoundationDB.
With the default key selectors, the range is **both endpoints inclusive**: both the begin and end keys are included in the results. FoundationDB natively uses half-open intervals; the default `END_KEY_SELECTOR first_greater_than` resolves the end boundary to the key *after* the specified end key, which makes the specified end key inclusive within that half-open interval.
The special value `*` can be used as a wildcard to represent an unbounded boundary:
* `*` as `begin`: starts the range from the very first key in the subspace.
* `*` as `end`: extends the range to the very last key in the subspace.
The `LIMIT` parameter caps the number of returned pairs. Combined with `REVERSE`, you can retrieve the last N entries in a range.
Key selectors allow fine-tuning of the range boundaries. For example, using `BEGIN_KEY_SELECTOR first_greater_than` excludes the begin key from the results.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the range read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the range read is performed within the current transaction.
`ZGETRANGE` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same keys in different namespaces refer to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ------------------------------------------------------------------------------------------ |
| `ERR` | Wrong number of arguments, invalid LIMIT value, invalid key selector, or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Get an explicit key range (both-inclusive by default):**
```kronotop
> ZSET key-0 "a"
OK
> ZSET key-1 "b"
OK
> ZSET key-2 "c"
OK
> ZSET key-3 "d"
OK
> ZSET key-4 "e"
OK
> ZSET key-5 "f"
OK
> ZGETRANGE key-0 key-5
1) 1) "key-0"
2) "a"
2) 1) "key-1"
2) "b"
3) 1) "key-2"
2) "c"
4) 1) "key-3"
2) "d"
5) 1) "key-4"
2) "e"
6) 1) "key-5"
2) "f"
```
**Limit results:**
```kronotop
> ZGETRANGE key-0 key-5 LIMIT 3
1) 1) "key-0"
2) "a"
2) 1) "key-1"
2) "b"
3) 1) "key-2"
2) "c"
```
**Reverse with limit (last 3 entries):**
```kronotop
> ZGETRANGE key-0 key-5 LIMIT 3 REVERSE
1) 1) "key-5"
2) "f"
2) 1) "key-4"
2) "e"
3) 1) "key-3"
2) "d"
```
**Full subspace scan with `* *`:**
```kronotop
> ZGETRANGE * *
1) 1) "key-0"
2) "a"
2) 1) "key-1"
2) "b"
...
6) 1) "key-5"
2) "f"
```
**Partial wildcard (from a specific key to the end):**
```kronotop
> ZGETRANGE key-2 *
1) 1) "key-2"
2) "c"
2) 1) "key-3"
2) "d"
3) 1) "key-4"
2) "e"
4) 1) "key-5"
2) "f"
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZSET mykey-a "alpha"
OK
> ZSET mykey-b "bravo"
OK
> COMMIT
OK
> BEGIN
OK
> ZGETRANGE mykey-a mykey-b
1) 1) "mykey-a"
2) "alpha"
2) 1) "mykey-b"
2) "bravo"
> COMMIT
OK
```
# ZGETRANGESIZE
> Returns the estimated byte size of a key range in the ZMap ordered key-value store.
Returns the estimated byte size of a key range in the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZGETRANGESIZE
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ----- | -------- | --------------------------------------------------------------------------------------------- |
| `begin` | bytes | Yes | The start key of the range. Use `*` for unbounded start (from the beginning of the subspace). |
| `end` | bytes | Yes | The end key of the range. Use `*` for unbounded end (to the end of the subspace). |
## Return Value
[Section titled “Return Value”](#return-value)
Integer: the estimated size in bytes of the key range.
## Behavior
[Section titled “Behavior”](#behavior)
`ZGETRANGESIZE` returns the estimated byte size of a key range from the ZMap subspace of the session’s current namespace, backed by FoundationDB’s `getEstimatedRangeSizeBytes` API.
The returned value is an **estimate**, not an exact count. It is useful for capacity planning and understanding data distribution without materializing the range.
The special value `*` can be used as a wildcard to represent an unbounded boundary:
* `*` as `begin`: starts the range from the very first key in the subspace.
* `*` as `end`: extends the range to the very last key in the subspace.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the read, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the read is performed within the current transaction.
`ZGETRANGESIZE` also supports **snapshot reads**. When snapshot mode is enabled on the session, the read does not conflict with concurrent writes, allowing higher throughput for read-heavy workloads.
All data is scoped to the session’s active namespace. The same keys in different namespaces refer to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ---------------------------------------------- |
| `ERR` | Wrong number of arguments or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Basic range size estimation:**
```kronotop
> ZSET key-0 "alpha"
OK
> ZSET key-1 "bravo"
OK
> ZSET key-2 "charlie"
OK
> ZGETRANGESIZE key-0 key-2
(integer) 186
```
**Full subspace size estimation with `* *`:**
```kronotop
> ZGETRANGESIZE * *
(integer) 372
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZSET mykey-a "alpha"
OK
> ZSET mykey-b "bravo"
OK
> COMMIT
OK
> BEGIN
OK
> ZGETRANGESIZE mykey-a mykey-b
(integer) 124
> COMMIT
OK
```
# ZINC.D128
> Increments a 128-bit decimal value in the ZMap ordered key-value store.
Increments a 128-bit decimal value in the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZINC.D128
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `key` | bytes | Yes | The key to increment. |
| `value` | string | Yes | A decimal number to add to the current value. Accepts plain decimals and scientific notation (e.g., `1.5E+10`). Use a negative number to decrement. |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
`ZINC.D128` performs a high-precision decimal increment on a key in the ZMap subspace of the session’s current namespace. It uses IEEE-754 decimal128 (BID encoding), which provides 34 significant digits and an exponent range of ±6144, far exceeding the \~15 significant digits of `ZINC.F64`.
Like `ZINC.F64`, this command performs a read-modify-write cycle because FoundationDB has no atomic decimal128-add mutation. This means concurrent `ZINC.D128` calls on the same key **can** cause transaction conflicts.
If the key does not exist, it is created with an implicit starting value of zero and then incremented by `value`.
The value is encoded as 16 bytes in little-endian IEEE-754 BID format (low 8 bytes first, high 8 bytes second). Three validation guards protect against invalid states:
1. **Valid input**: the delta must be a parseable decimal string; unparseable strings are rejected.
2. **Valid stored size**: if the key already exists, the stored value must be exactly 16 bytes.
3. **Range check**: the computed result must be representable as a Decimal128; values that exceed the Decimal128 range are rejected.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the operation, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the operation is staged in the current transaction and only persists when `COMMIT` is called.
`ZINC.D128` is a write operation and does not support snapshot reads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------------------------------------------------------------------- | ------------------------------------------------------ |
| `ERR invalid decimal` | The provided value is not a valid decimal number. |
| `ERR Invalid stored value: expected 16-byte Decimal128 (IEEE-754 BID)` | The existing value at the key is not 16 bytes. |
| `ERR Exponent is out of range for Decimal128 encoding ...` | The result exceeds the Decimal128 representable range. |
| `ERR` | Wrong number of arguments or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Increment a new key (implicit zero start):**
```kronotop
> ZINC.D128 counter 10.5
OK
> ZGET.D128 counter
"10.5"
```
**Accumulate multiple increments:**
```kronotop
> ZINC.D128 counter 1.5
OK
> ZINC.D128 counter 3.0
OK
> ZINC.D128 counter 4.5
OK
> ZGET.D128 counter
"9.0"
```
**Decrement with a negative value:**
```kronotop
> ZINC.D128 counter 100.0
OK
> ZINC.D128 counter -30.5
OK
> ZGET.D128 counter
"69.5"
```
**High-precision decimal (34 significant digits):**
```kronotop
> ZINC.D128 counter 0.123456789012345678901234567890123
OK
> ZGET.D128 counter
"0.123456789012345678901234567890123"
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZINC.D128 tx-counter 42.42
OK
> COMMIT
OK
> ZGET.D128 tx-counter
"42.42"
```
# ZINC.F64
> Increments a 64-bit floating-point value in the ZMap ordered key-value store.
Increments a 64-bit floating-point value in the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZINC.F64
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------ | -------- | ----------------------------------------------------------------------------------------- |
| `key` | bytes | Yes | The key to increment. |
| `value` | double | Yes | A finite IEEE-754 double to add to the current value. Use a negative number to decrement. |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
`ZINC.F64` performs a floating-point increment on a key in the ZMap subspace of the session’s current namespace. Unlike `ZINC.I64`, which uses FoundationDB’s conflict-free `ADD` mutation, `ZINC.F64` performs a read-modify-write cycle because FoundationDB has no atomic double-add mutation. This means concurrent `ZINC.F64` calls on the same key **can** cause transaction conflicts.
If the key does not exist, it is created with an implicit starting value of zero and then incremented by `value`.
The value is encoded as an 8-byte little-endian IEEE-754 double. Three validation guards protect against invalid states:
1. **Finite input**: the delta must be a finite double; NaN and Infinity are rejected.
2. **Valid stored size**: if the key already exists, the stored value must be exactly 8 bytes.
3. **Finite result**: the computed sum must be finite; overflow to Infinity is rejected.
The command normalizes negative zero: if the result is `-0.0`, it is stored as `0.0`.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the operation, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the operation is staged in the current transaction and only persists when `COMMIT` is called.
`ZINC.F64` is a write operation and does not support snapshot reads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
**`ERR`** is returned when:
* `Invalid delta: value must be a finite IEEE-754 double`: The provided value is NaN or Infinity.
* `Invalid stored value: expected 8-byte IEEE-754 double`: The existing value at the key is not 8 bytes.
* `Resulting value is not a finite IEEE-754 double (overflow or invalid operation)`: The sum overflows to Infinity.
* Wrong number of arguments, value is not a valid double, or internal failure.
## Examples
[Section titled “Examples”](#examples)
**Increment a new key (implicit zero start):**
```kronotop
> ZINC.F64 counter 10.5
OK
> ZGET.F64 counter
(double) 10.5
```
**Accumulate multiple increments:**
```kronotop
> ZINC.F64 counter 1.5
OK
> ZINC.F64 counter 3.0
OK
> ZINC.F64 counter 4.5
OK
> ZGET.F64 counter
(double) 9.0
```
**Decrement with a negative value:**
```kronotop
> ZINC.F64 counter 100.0
OK
> ZINC.F64 counter -30.5
OK
> ZGET.F64 counter
(double) 69.5
```
**Overflow rejection:**
```kronotop
> ZINC.F64 counter 1.7976931348623157E308
OK
> ZINC.F64 counter 1.7976931348623157E308
(error) ERR Resulting value is not a finite IEEE-754 double (overflow or invalid operation)
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZINC.F64 tx-counter 42.42
OK
> COMMIT
OK
> ZGET.F64 tx-counter
(double) 42.42
```
# ZINC.I64
> Atomically increments a 64-bit integer value in the ZMap ordered key-value store.
Atomically increments a 64-bit integer value in the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZINC.I64
```
## Parameters
[Section titled “Parameters”](#parameters)
| Parameter | Type | Required | Description |
| --------- | ------- | -------- | ---------------------------------------------------------------------------------------- |
| `key` | bytes | Yes | The key to increment. |
| `value` | integer | Yes | A signed 64-bit integer to add to the current value. Use a negative number to decrement. |
## Return Value
[Section titled “Return Value”](#return-value)
Simple string: `OK` on success.
## Behavior
[Section titled “Behavior”](#behavior)
`ZINC.I64` performs an atomic, conflict-free increment on a key in the ZMap subspace of the session’s current namespace. It is a typed convenience wrapper around FoundationDB’s `ADD` mutation, equivalent to `ZMUTATE ADD` but accepting a plain integer argument instead of raw bytes.
Because the underlying `ADD` mutation does not read before writing, concurrent `ZINC.I64` calls on the same key do not cause transaction conflicts. This makes the command well-suited for counters, rate limiters, and accumulators.
If the key does not exist, it is created with an implicit starting value of zero and then incremented by `value`.
The value is encoded as an 8-byte little-endian signed integer. Overflow follows two’s complement arithmetic: incrementing `Long.MAX_VALUE` by 1 wraps to `Long.MIN_VALUE`.
The command supports two transaction modes:
* **Auto-commit (one-off):** When no explicit transaction is active, Kronotop creates a transaction, performs the mutation, and commits it immediately. This is the default mode.
* **Explicit transaction:** When a `BEGIN` has been issued, the mutation is staged in the current transaction and only persists when `COMMIT` is called.
`ZINC.I64` is a write operation and does not support snapshot reads.
All data is scoped to the session’s active namespace. The same key in different namespaces refers to different entries.
## Errors
[Section titled “Errors”](#errors)
| Error Code | Description |
| ---------- | ----------------------------------------------------------------------------- |
| `ERR` | Wrong number of arguments, value is not a valid integer, or internal failure. |
## Examples
[Section titled “Examples”](#examples)
**Increment a new key (implicit zero start):**
```kronotop
> ZINC.I64 counter 10
OK
> ZGET.I64 counter
(integer) 10
```
**Accumulate multiple increments:**
```kronotop
> ZINC.I64 counter 10
OK
> ZINC.I64 counter 20
OK
> ZINC.I64 counter 30
OK
> ZGET.I64 counter
(integer) 60
```
**Decrement with a negative value:**
```kronotop
> ZINC.I64 counter 100
OK
> ZINC.I64 counter -30
OK
> ZGET.I64 counter
(integer) 70
```
**Use within an explicit transaction:**
```kronotop
> BEGIN
OK
> ZINC.I64 tx-counter 42
OK
> COMMIT
OK
> ZGET.I64 tx-counter
(integer) 42
```
# ZMUTATE
> Performs an atomic mutation on a key's value in the ZMap ordered key-value store.
Performs an atomic mutation on a key’s value in the ZMap ordered key-value store.
## Syntax
[Section titled “Syntax”](#syntax)
```kronotop
ZMUTATE