Dataset settings determine how the dataset uses the source data.
Connecting multiple tables
If multiple tables are available from a data source, you can merge them using the JOIN operator.
When the tables are joined, a link is created between them. When you create a link, you specify the fields from the source table and merged table.
You can use the following types of JOIN operators:
Data source operation modes
You can run your dataset in one of two modes, depending on the data source.
Datasets can work with data sources in the following modes:
- Direct access. DataLens executes all the data queries on the data source side.
- Materialization. DataLens pre-saves the data to the database and then works with the data.
Materialization mode is not available for Yandex.Metrica API and AppMetrica API sources.
To configure the database to populate from these sources, you can use connections via the Logs API.
All data requests are executed on the data source side.
Materialization is the process of uploading data from the data source to the DataLens database.
You can run materialization as a one-time process or periodically on a schedule.
Data is uploaded to the materialization DB once. Then DataLens only makes queries to the materialized data.
To sync DataLens storage with the source, you can reload the data.
If you use a CSV file as the data source, DataLens automatically materializes the dataset.
Data is uploaded to the DataLens materialization DB according to a schedule. The schedule is set in the dataset settings.
For example, every second week on Thursday at 03:00.
The minimum available interval is once a day.
Connecting via the Logs API for Yandex.Metrica and AppMetrica sources
For Metrica and AppMetrica sources, the DataLens materialization database is unavailable for populating with data. To set up automatic data loading from these sources, you can create a connection to the Metrica Logs API or the AppMetrica Logs API and set the data update frequency. This will load data from Yandex.Metrica or AppMetrica into the ClickHouse database, which will serve as the dataset source.
For AppMetrica, you can also export data to Yandex.Cloud.
Default filters for new charts
In a dataset, you can create a default filter. It will be applied to any new chart created from data in the current dataset.
- You can create a filter for an individual chart in the chart settings.
- Default filters are not applied to data in the dataset preview area.
Default filters for new charts will enable you:
- To reduce the amount of data requested from a source when building a chart.
- To add customized filters to new charts created from data in the same dataset.
You can configure permissions for an entire dataset. For more information, see Managing dataset access.
You can also set data permissions at the row level (Row-level security or RLS). Learn more in Managing access to data rows in a dataset.
Executing SQL queries in datasets
You can define a dataset data source using ad-hoc SQL queries over database connections. When a data source is accessed, a query code executes as a subquery. For more information about using SQL queries in a dataset, see Describing a dataset via a source SQL query.
When using SQL queries in datasets, we recommend:
- Restricting the user specified in the connection to
- Assigning users that should not have ad-hoc query privileges
Executeaccess to the connection and the associated datasets.
You can enable or disable the use of subqueries as a source when creating or editing a connection.