Joining Yandex DataLens data
Yandex DataLens uses a connection to retrieve data from a source (DB, CSV, Yandex Metrica, and so on). You can create datasets, charts, and selectors from the connection data. If the source has multiple tables, you can join them to build the required set of data. You can link data from different datasets at the chart level or through selector links.
Data joining methods
You can use different data joining methods:
Dataset level
To join data at the dataset level, you can add tables to the workspace or write an SQL query.
Warning
You cannot join data from different sources at a single dataset level.
Adding tables
You can join data through the dataset creation interface by dragging tables to the workspace and configuring links between them using the JOIN operator. This use case gives an example of joining data by adding tables.
SQL query
In a dataset, you can add an ad-hoc SQL query to the data source. When accessing a data source, the query code is run as a subquery. You can use the output of the query as final dataset data or combine it with other source tables via the interface.
Chart level
DataLens enables you to combine data at the chart level. To combine data at the chart level, you can use an SQL chart or multi-dataset charts.
QL chart
QL charts are charts created from a connection if there is a database at the other end of the connection. Running a SQL query does not create a separate dataset object; instead, it generates one on the fly and displays it in the preview panel. For more information, see Creating a QL chart.
Multi-dataset charts
Multi-dataset charts are charts that visualize data from different datasets. Queries for each dataset are processed independently of each other. For more information, see the instructions Creating a multi-dataset chart.
Selector link level
You can add a selector to a dashboard to modify query output in its associated widgets:
- On the dashboard, selectors and charts built from a single dataset get linked automatically.
- Selectors and charts built from different datasets can be linked manually using aliases.
Before creating a link, make sure the field used by the selector as a filter is included in the dataset the chart is built from. Otherwise, the link will not work. For more information, see the instructions Creating an alias.
Optimizing data when joining tables
In some charts based on a dataset with joined tables, you can only use fields from a single table. In this case, DataLens optimizes a query to a source. No JOIN
is used, and the query only returns data from one table without filtering data based on others. This allows reducing the amount of requested data and the query execution time. However, data returned by an optimized query may differ from that you expect.
Optimization is used under the following conditions:
- A chart only uses fields of one of the joined tables.
- There are no fields from the other tables in any of the chart sections.
- Fields from the other tables are not used in the chart's calculated fields.
Note
Optimization will not work if a dataset is described using an SQL query to the source.
Let's look at examples of query optimization using different datasets for a source with tables:
id | name | department_id |
---|---|---|
1 | Ivanov | 2 |
2 | Petrov | 4 |
3 | Sidorov | 1 |
4 | Stepanov | 1 |
5 | Sokolov | |
6 | Orlova | 3 |
7 | Shishkina | 3 |
8 | Semyonov | |
9 | Antonova | 3 |
10 | Sergeev | 4 |
id | name |
---|---|
1 | Logistics |
2 | IT |
3 | Accounting |
4 | Security |
employee_id | bonus |
---|---|
1 | 35,000 |
2 | 40,000 |
5 | 28,000 |
7 | 30,000 |
9 | 50,000 |
employee_id | category |
---|---|
2 | Category 1 |
4 | Category 1 |
5 | Category 2 |
6 | Category 3 |
7 | Category 3 |
8 | Category 2 |
10 | Category 1 |
Examples
A dataset is based on the Employees
and Departments
tables joined with an INNER JOIN
As a result of the Join, the dataset only contains rows that are common for the two tables. Let's create a chart containing fields from the two tables.
Now let's create a chart that will only contain fields from the Employees
table. In this case, DataLens does not use JOIN
and only works with this table. The chart displays all values from the Employees
table, not just those overlapping with the Departments
table.
A dataset is based on the Bonuses
and Employees
tables joined with a LEFT JOINBonuses
table is used in full, while only those values which are present in the Bonuses
table are selected from the Employees
table.
Let's create a chart containing fields from the two tables.
Next, let's only leave the Employees
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the Bonuses
table).
Optimization for a RIGHT JOIN
A dataset is based on three tables:
- The first table (
Departments
) is joined with the second table (Employees
) using anINNER JOIN
. - The second table (
Employees
) is joined with the third table (Bonuses
) using anINNER JOIN
.
Let's create a chart containing fields from all the tables.
Next, let's only leave the Employees
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the Bonuses
and Departments
tables).
Let's add to the chart only the fields from the first (Departments
) and third (Bonuses
) tables. These tables are not linked explicitly. However, each of them is linked to the Employees
table. Therefore, DataLens does not optimize a query to the source. In this case, the chart will include values from all the three tables based on filtering results.
A dataset is based on three tables:
- The first table (
Employees
) is joined with the second table (Departments
) using anINNER JOIN
. - The first table (
Employees
) is joined with the third table (Bonuses
) using anINNER JOIN
.
Let's create a chart containing fields from all the tables.
Next, let's only leave the Employees
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the Departments
and Bonuses
tables).
Let's add to the chart only the fields from the first (Employees
) and third (Bonuses
) tables. In this case, the chart will feature values that are common for these tables (without filtering data by the Departments
table).
A dataset is based on four tables:
- The first table (
Qualification
) is joined with the second table (Bonuses
) using anINNER JOIN
. - The first table (
Qualification
) is joined with the third table (Employees
) using anINNER JOIN
. - The third table (
Employees
) is joined with the fourth table (Departments
) using anINNER JOIN
.
Let's create a chart containing fields from all the tables.
Next, let's only leave the Qualification
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the other tables).
If we only use a pair of joined tables (the first and the second one, the first and the third, or the third and the fourth) in the chart, it will show common values from these tables (without filtering data by other tables). For example, let's add to the chart only the fields from the first (Qualification
) and second (Bonuses
) tables.