Terms and definitions Query
A query is an expression that is written in YQL, an SQL dialect, and used for unified execution of streaming and analytical data queries.
A query consists of query text written in YQL, information about a connection to a data source, and a data schema in the source.
A connection is a set of parameters necessary for connecting Yandex Query to a data source. For example, if a file from Yandex Object Storage is used as a data source, a connection contains the name of a bucket and its authorization parameters.
For more information, see Working with connections.
The same YQL query can be run on data from different sources (such as for streaming and batch processing). In this case, for each source, you can create a data binding that is a resource that contains information about a connection, data format, and data schema.
For more information, see Working with data bindings.
Information about executed queries
Yandex Query saves the following information for each executed
- Query execution results.
- Query execution status.
- Query execution start date and time.
- Query execution duration.
- Name of the user who ran the query.
- Query execution metrics.
Execution results are saved only for the last execution of a YQL query.
A data source is an object with structured data. The following can be used as data sources in Yandex Query:
- Yandex Data Streams streams.
- Files in Yandex Object Storage.
A data schema is a list of input data fields and types to be used in a query.
Streaming analysis systems handle infinite (without beginning or end) data streams. To avoid processing all data in a stream from the beginning every time, when a query is rerun Yandex Query remembers offsets in processed data. If processing is paused and then restarted, Yandex Query rewinds the data stream to the saved offset and resumes processing data from that point.
Checkpoints contain information about a streaming query, including offsets in data streams.
If you add instructions to access new streaming sources of data to the text of a query, checkpoints won't contain information about offsets within data streams. As a result, some data may be read from existing streams starting from the last checkpoint, while other data may be read once new messages appear in new data streams.
Query execution method settings (whether to process data starting from a checkpoint or anew) are specified when running a query.