Questions and answers about YDB
- General questions
- YQL
- General questions
- How do I select table rows by a given list of keys?
- Is a search by index performed for conditions containing the LIKE operator?
- Why does a query return only 1000 rows?
- What should I do if I get the error "Datashard: Reply size limit exceeded"?
- How do I update only those values whose keys are not in the table?
- Join operations
- Transactions
- Errors
- What should I do if I frequently get the error "Transaction locks invalidated"?
- Why does the error "Exceeded maximum allowed number of active transactions" occur?
General questions
What's YDB?
YDB — is a distributed fault-tolerant NewSQL DBMS. YDB provides high availability and scalability while simultaneously ensuring strict consistency and ACID transaction support. Queries are made using an SQL dialect (YQL).
YDB is a fully managed database. DB instances are created through the YDB database management service.
What features does YDB provide?
YDB provides high availability and data security through synchronous replication in three availability zones. YDB also ensures even load distribution across available hardware resources. This means that you don't need to order resources, Yandex Database automatically allocates and releases resources based on the user load.
What consistency model does YDB use?
To read data, YDB uses a model of strict data consistency.
How do I design a primary key?
To design the primary key properly, follow the rules given below.
-
Avoid situations where the main load falls on a single partition of a table. With even load distribution, it's easier to achieve high overall performance.
This rule implies that you shouldn't use a monotonically increasing sequence, such as timestamp, as a table's primary key.
-
The fewer table partitions used during query execution, the faster it's executed. For greater performance, follow the one query — one partition rule.
-
Avoid situations where a small part of the DB is under much heavier load than the rest of the DB.
For more information, see Schema design.
How do I evenly distribute the load across table partitions?
You can use the following techniques to evenly distribute the load across table partitions and increase overall DB performance.
-
To avoid using monotonically increasing primary key values, you can:
- Change the order of its components.
- use a hash of the key column values as the primary key.
-
Reduce the number of partitions used in a single query.
For more information, see Techniques that let you evenly distribute the load across table partitions.
Can I use NULL in a key column?
In YDB, all columns, including key ones, may contain a NULL
value, but we don't recommend using NULL
as values in key columns.
According to the SQL standard (ISO/IEC 9075), you can't compare NULL
with other values. Therefore, the use of concise SQL statements with simple comparison operators may lead to skipping rows containing NULL during filtering, for example.
Is there an optimal size of a database row?
To achieve high performance, we don't recommend writing rows larger than 8 MB and key columns larger than 2 KB to the DB.
For more information about limits, see Database limits.
How are secondary indexes used in YDB?
Secondary indexes in YDB are global and can be non-unique.
For more information, see Secondary indexes.
How is paginated output performed?
To organize paginated output, we recommend selecting data sorted by primary key sequentially, limiting the number of rows with the LIMIT
keyword. We do not recommend using the OFFSET
keyword to solve this problem.
For more information, see Paginated output.
How do I effectively upload large amounts of data to YDB?
To speed up uploading large amounts of data, follow these recommendations:
-
When creating a table, explicitly specify the required number of partitions or their boundaries. This will help you effectively use system bandwidth as soon as you start uploading data by avoiding unnecessary re-partitioning of the table.
-
Don't insert data in separate transactions for each row. It's more efficient to insert multiple rows at once (batch inserts). This reduces the overhead on the transaction mechanism itself.
-
In addition to the previous step, within each transaction (batch), insert rows from the primary key-sorted set of data to minimize the number of partitions that the transaction affects.
-
Avoid writing data sequentially in ascending or descending order of the primary key value to evenly distribute the load across all table partitions.
For more information, see Uploading large data volumes.
How do I delete expired data?
To effectively remove expired data, we recommend using Time to Live (TTL).
Can I get logs of my operations with services?
Yes, you can request log records about your resources from Yandex.Cloud services. For more information, see Data requests.
YQL
General questions
How do I select table rows by a given list of keys?
You can select table rows from the specified list of values of the table's primary key (or key prefix) using the IN operator:
DECLARE $keys AS List<UInt64>;
SELECT * FROM some_table
WHERE Key1 IN $keys;
If a selection is made using a composite key, the query parameter must have the type of a list of tuples:
DECLARE $keys AS List<Tuple<UInt64, String>>;
SELECT * FROM some_table
WHERE (Key1, Key2) IN $keys;
To select rows effectively, make sure that the value types in the parameters match the key column types in the table.
Is a search by index performed for conditions containing the LIKE operator?
You can only use the LIKE operator to search by table index if it sets the row prefix:
SELECT * FROM string_key_table
WHERE Key LIKE "some_prefix%";
Why does a query return only 1000 rows?
1000 rows is the response size limit per YQL query. If a response is shortened, it is flagged as Truncated
. To output more table rows, you can use paginated output or the ReadTable
operation.
What should I do if I get the error "Datashard: Reply size limit exceeded"?
This error means that, when executing the query, an attempt was made to return more than 50 MB of data from one of the involved datashards, which exceeds the allowed limit.
Recommendations:
- A general recommendation is to reduce the amount of data processed in a transaction.
- If a Join operation is running, make sure that its method is Index Lookup Join.
- If a simple selection is performed, make sure that it is done by keys, or add
LIMIT
in the query.
How do I update only those values whose keys are not in the table?
You can use LEFT JOIN to mark the keys that are missing from the table and then update their values:
DECLARE $values AS List<Struct<Key: UInt64, Value: String>>;
UPSERT INTO kv_table
SELECT v.Key AS Key, v.Value AS Value
FROM AS_TABLE($values) AS v
LEFT JOIN kv_table AS t
ON v.Key = t.Key
WHERE t.Key IS NULL;
Join operations
Are there any specific features of Join operations?
Join operations in YDB are performed in one of two ways:
- Common Join
- Index Lookup Join
Common Join
The contents of both tables (the left and right parts of a Join) are sent to the node executing the query, where the operation is performed on the entire data. This is a universal way to perform a Join operation, which is used when more optimal methods are not applicable. For large tables, this method is either slow or doesn't work in general due to exceeding the data transfer limits.
Index Lookup Join
For rows from the left part of a Join operation, a lookup is performed for the corresponding values in the right part. This method is used when the right part is a table and the Join operation key is the prefix of its primary key or of the secondary index key. In this method, limited selections are made from the right table instead of full reads. This lets you use it when working with large tables.
Note
For most OLTP queries, we recommend using Index Lookup Join with a small size of the left part. These operations read little data and can be performed efficiently.
How do I make a Join with data from query parameters?
You can use query parameter data as a constant table. To do this, use the AS_TABLE
modifier with a parameter whose type is a list of structures:
DECLARE $data AS List<Struct<Key1: UInt64, Key2: String>>;
SELECT * FROM AS_TABLE($data) AS d
INNER JOIN some_table AS t
ON t.Key1 = d.Key1 AND t.Key2 = d.Key2;
There is no explicit limit on the number of entries in the constant table, but mind the standard limit on the total size of query parameters (50 MB).
Transactions
How efficient is it to run multiple queries in a transaction?
When multiple queries are run sequentially, the total transaction latency may be greater than when the same operations are executed within a single query. This is primarily due to additional network latency for each query. Therefore, if a transaction doesn't need to be interactive, we recommend formulating all operations in a single YQL query.
Is a separate query atomic?
In general, YQL queries can be executed in multiple consecutive phases. For example, a Join query can be executed in two phases: reading data from the left and right table, respectively. This aspect is important when you run a query in a transaction with a low isolation level (online_read_only
), as in this case, data in-between its execution phases can be updated by other transactions.
Errors
What should I do if I frequently get the error "Transaction locks invalidated"?
Typically, if you get this error, repeat a transaction, as YDB uses optimistic locking. If this error occurs frequently, it means that a large number of rows are being read in a transaction or that many transactions are competing for the same "hot" rows. It makes sense to view the queries running in the transaction and check if they're reading unnecessary rows.
Why does the error "Exceeded maximum allowed number of active transactions" occur?
The logic on the client side should try to keep transactions as short as possible.
No more than 10 active transactions are allowed per session. When starting a transaction, use either the commit flag for autocommit or an explicit commit/rollback.