Web analytics with funnels and cohorts calculated based on Yandex Metrica data
- Before you start
- 1. Connect ClickHouse and DataSphere
- 2. Retrieve and upload data to ClickHouse
- 3. Connect DataLens and create charts
- 4. Create and configure a dashboard in DataLens
- 5. Funnels
- 6. Cohorts
- 6.1 DataSphere. Cohort analysis
- 6.2 DataLens. Create a dataset and a chart with cohort visualization
- 6.3 DataLens. Configure a chart with cohort visualization
- 6.4 DataLens. Create a chart with retention
- 6.5 DataLens. Add charts to a new dashboard tab
- 6.6 DataLens. Create charts
- 6.7 DataLens. Add charts to the dashboard
Yandex Metrica data is used as the data source.
In this scenario, you'll learn how to build conversion funnels, conduct a cohort analysis, and calculate the retention of the user base.
This scenario is also available as a recording from the Yandex Cloud YouTube channel.
To visualize and explore data, set up your cloud, then follow the steps below:
- Connect ClickHouse and DataSphere
- Retrieve and upload data to ClickHouse
- Connect DataLens and create charts
- Create and configure a dashboard in DataLens
- Funnels
- Cohorts
- DataSphere. Perform a Cohort analysis
- DataLens. Create a dataset and a chart with cohort visualization
- DataLens. Configure a chart with cohort visualization
- DataLens. Create a chart with retention
- DataLens. Add charts to a new dashboard tab
- DataLens. Create charts
- DataLens. Add charts to the dashboard
Before you start
Before working, you need to register in Yandex Cloud and create a billing account:
- Go to the management console. Then log in to Yandex Cloud or sign up if don't already have an account.
- On the billing page, make sure you linked a billing account, and it has the
ACTIVE
orTRIAL_ACTIVE
status. If you don't have a billing account, create one.
If you have an active billing account, you can create or select a folder to run your VM in from the Yandex Cloud page.
Learn more about clouds and folders.
1. Connect ClickHouse and DataSphere
1.1 Connect ClickHouse
- In the management console, select Managed Service for ClickHouse from the list on the left.
- In the window that opens, click Create cluster.
- Specify the settings for a ClickHouse cluster.
-
Under Basic parameters, specify a name for the cluster.
-
Under Host class, select the burstable VM type and b2.medium host type.
Warning
We don't recommend using burstable VM configurations in production environments. This tutorial uses them as an example. For production solutions, use standard or memory-optimized configurations.
-
Under Storage size, keep the value of 10 GB.
-
Under Database, specify the database name, for example,
metrica_data
, the username, and the password. Remember the database name. -
Under Hosts, click . Enable Public access and click Save.
-
Under Additional settings, enable 4 options:
- DataLens access
- Access from the management console
- Access from Yandex Metrica and AppMetrica
- Serverless access
-
After configuring all the settings, click Create cluster.
-
1.2 Connect DataSphere
- Go to the management console.
- Select DataSphere from the list on the left.
- In the window that opens, click Create project.
- Specify a name for the project and click Create.
- Open the project. To do this, in the line with the project name, click → Open.
Now we see the JupyterLab development environment, where we'll continue to work.
1.3 Clone the repository to DataSphere
- In the upper-left corner, click Git Clone.
- In the window that opens, specify the URI of the repository
https://github.com/zhdanchik/yandex_metrika_cloud_case.git
and click CLONE.
2. Retrieve and upload data to ClickHouse
If you don't have a Yandex Metrica tag, it doesn't have enough data, or if you want to make sure and complete all the steps in the instructions and get a result, go to step 2.3 (skip steps 2.1 and 2.2).
If you have a Yandex Metrica tag and can access it, go to step 2.1 and 2.2 (skip step 2.3). We recommend these steps if you're an experienced user because the logic of calculating funnels and cohorts depends on the data itself and you may need to tweak the scripts.
2.1 Yandex Metrica. Create an app and get an access token
-
To work with the API, get your OAuth token.
-
Create an application:
- Go to https://oauth.yandex.com/client/new.
- Specify a name for the app.
- Go to Platforms → Web services. In the Callback URI #1 field, insert
https://oauth.yandex.com/verification_code
. - Go to Accesses → Yandex Metrica (metrika). Enable Access to statistics and ability to view all counter settings (metrika:read).
- Click Create app.
- A description of our application appears in the window that opens. Copy the ID of your app.
-
Click
https://oauth.yandex.com/authorize?response_type=token&client_id=<app ID>
. For theclient_id
parameter, add the copied ID of your app. -
Click Log in as …
-
Copy the received access token.
2.2 DataSphere. Upload data via the Yandex Metrica Logs API
-
In the DataSphere project, in the root of the working directory, create a text file.
-
Name the file
.yatoken.txt
and insert the received access token into the file content. Save your changes and close the file. -
Open the folder yandex_metrika_cloud_case → notebook 1a. get_data_via_logs_api.ipynb.
If you couldn't get data for the demo tag from the Logs API, you can download it via Yandex Disk.
2.3 DataSphere. Download the test tag data via Yandex Disk
Note
Skip this section if you are using your own tag data.
- Open the folder yandex_metrika_cloud_case → notebook 1b. get_data_via_yadisk.ipynb.
- Complete all the steps (cells with the code) in the notebook 1b. get_data_via_yadisk.ipynb.
2.4 ClickHouse. Get the cluster's IP address
-
Go to the created ClickHouse cluster. Wait for the cluster status to be Alive. Then open the cluster by clicking on it.
-
Select Hosts from the list on the left.
-
On the Overview tab, go to the Hostname column. To copy a hostname, point to the right of the hostname and click the copy icon.
2.5 DataSphere. Upload the data to ClickHouse
-
Open the folder yandex_metrika_cloud_case → notebook 2. upload_data_to_ClickHouse.ipynb.
-
Paste the copied hostname into the
CH_HOST_NAME
variable. -
Use the name of the logged-in user in the
CH_USER
variable. -
Use the name of the opened database in the
CH_DB_NAME
variable.
-
-
In the root directory, create a new text file named .chpass.txt.
-
Enter the password of the logged-in user in the .chpass.txt file. Save and close the file.
-
Complete all the steps (the cells with the code) in the notebook.
3. Connect DataLens and create charts
3.1 Connect to DataLens
- On the page of the created ClickHouse cluster, on the left in the menu, select DataLens.
- In the window that opens, click Activate.
- In the window that opens, select the default folder and click Activate DataLens.
3.2 Create a connection to ClickHouse in DataLens
- Click Create connection.
- Select a ClickHouse connection.
- Fill in the connection settings.
-
Add a name.
-
Select a ClickHouse host from the Hostname drop-down list.
-
Select the username.
-
Enter the password and click Check connection.
-
After checking the connection, in the upper-right corner, click Create.
-
3.3 Create a dataset based on the connection
-
In the upper-right corner, click Create dataset.
-
Select the
metrica_data.hits
table as a source. To do this, drag the table from the list on the left to the editing area. -
Open the Fields tab.
-
In the upper-right corner, click Add field.
-
To count the number of hits, create the calculated Hits field equal to 1. Click Create.
-
For the Hits field, select the Sum value in the aggregation.
-
In the upper-right corner, click Save.
-
Name the dataset
ch_metrica_data_hits
and click Create.
3.4 Create a stacked area chart
Now create charts.
-
In the upper-right corner, click Create chart.
-
In the window that opens, drag the following fields to the chart section:
- Drag the EventDate field to the X section.
- Drag the Browser field to the Colors section.
- Drag the Hits field to the Y section.
-
Change the chart type to Stacked area chart.
-
Click Save.
-
In the window that appears, enter
ch_metrica_data_hits_area
as the chart name and click Save.
3.5 Create a pivot table chart
Create another chart based on the pre-existing chart: a pivot table.
- In the top right-hand corner, click
→ Save as.
- For the chart copy, enter
ch_metrica_data_hits_table
as the new name and click Save. - Select Pivot table as the new chart type.
- Add or adjust the following fields in the chart area:
- Drag the Browser field to the Rows section.
- Drag the Hits field to the Sorting section.
- Click Save.
4. Create and configure a dashboard in DataLens
4.1 Create a dashboard
-
Open the DataLens homepage and click Create dashboard.
-
Enter
ch_metrica_data
as the name of the dashboard and click Create. -
Add the first chart to the dashboard. To do this, in the upper-right corner, click Add → Chart.
- From the Chart drop-down list, select ch_metrica_data_hits_area.
- In the Name field, enter Hits by browser as the chart name and click Add.
-
Similarly, add the chart ch_metrica_data_hits_table named Hits by browser for period.
-
Move the charts and resize them on the dashboard.
- Drag the table chart to the right of the diagram chart.
- To change the vertical dimensions of the charts, drag them by the lower-right corner.
-
In the upper-right corner, click Save.
4.2 Configure a dashboard
-
Add filtering to select a specific browser. To do this, in the upper-right corner, click Add → Selector.
-
You can add the selector to a field from any dataset. From the Dataset list, select the created dataset ch_metrica_data_hits.
-
From the Field list, select Browser. In the Name field, enter a name for the selector and enable Show.
-
Enable the Multiple choice option.
-
In the Default value field, select browsers:
- android_browser
- chrome
- chromemobile
- firefox
- opera
- safari
- safari_mobile
- samsung_internet
- yandex_browser
- yandexsearch
-
Click Add.
-
Drag the selector to the top of the dashboard and stretch it horizontally.
-
In the upper-right corner, click Save.
5. Funnels
5.1 DataSphere. Build funnels
-
Open the notebook 3. funnels.ipynb. Specify the host, the user, and the DB name.
-
Run the cells and evaluate the analysis results.
In ClickHouse, the table metrica_data.funnels_by_bro
is created, where funnels by browser are calculated.
5.2 DataLens. Funnels by browser. Create a dataset
Create a new dataset based on the new table and the connection to ClickHouse.
-
Open the DataLens homepage and click Create dataset.
-
Go to the Connections section and click Add.
-
From the list of connections, select metrica_analysis.
-
Drag the new table
metrica_data.funnels_by_bro
to the editing area. -
Open the Fields tab.
- Rename the fields step X → Step X, where X is the step sequence number.
- Specify the value of the Sum aggregation for the Step X fields and click Save.
-
Name the dataset ch_metrica_data_funnels_by_bro and click Create.
5.3 DataLens. Funnels by browser. Create a chart
Create a chart based on the dataset ch_metrica_data_funnels_by_bro.
- Click Create chart.
- Select the Pivot table chart type.
- Drag the following fields to the chart section:
- The Browser field to the Rows section.
- The Step X field to the Measures section.
- The Step 1 field to the Sorting section.
- Click Save.
- Enter ch_metrica_data_funnels_by_bro_table as the chart name and click Save.
5.4 DataLens. Funnels by browser. Add a chart to your dashboard
-
Go to the created dashboard (you can do it from the dashboards page).
-
Add a new chart. In the upper-right corner, click Edit.
-
Add the chart
ch_metrica_data_funnels_by_bro_table
. Enter Funnels by browser as the name and click Add. -
Place the new chart to the right of the existing two. Stretch the chart so that it matches the others vertically and reaches the right border of the page.
-
Click Save.
5.5 DataLens. Funnels by browser. Set up a dashboard
Configure relationships so that the selector affects the new chart from another dataset.
-
Click Edit → Links.
-
In the window that opens, select the Browser: selector from the list.
-
On the page with the other dashboard elements, scroll down to the Funnels by browser chart, and click on the list with the link.
- Select the connection type Outgoing link.
-
From each list, select the fields for the Browser link. Click Add.
-
In the upper-right corner of the dashboard, click Save.
-
In the upper-left corner, click → Rename.
-
Enter Supermarket.ru — funnel and cohort analysis as the name. Click Done.
6. Cohorts
6.1 DataSphere. Cohort analysis
-
Open the notebook 4. cohorts.ipynb. Specify the host, the user, and the DB name.
-
Run the cells and evaluate the analysis results.
In ClickHouse, the table metrica_data.retention_users
is created, which contains all the data necessary to render visualization in DataLens.
6.2 DataLens. Create a dataset and a chart with cohort visualization
Create a dataset based on the new table and the connection to ClickHouse.
-
Open the DataLens homepage and click Create dataset.
-
In the Connections section, click Add.
-
From the list of connections, select
metrica_analysis
. -
Drag the new table
metrica_data.retention_users
to the work area to connect to it. -
Open the Fields tab and create the new calculated field week_num, which is equal to
([date]-[min_date])/7
.
This field indicates the number of weeks from the user's first visit. -
Click Create.
-
For the visits, purchases, and revenue fields, enable the Sum aggregation.
-
Rename the fields to Visits, Purchases, and Revenue, respectively.
-
Save the dataset.
- Name the dataset ch_metrica_data_users_visits.
- Click Create.
-
Create a new chart based on the dataset:
- Change the chart type to a pivot table.
- Drag the week_num field to the Columns section.
- Drag the min_date field to the Rows section.
- Drag the Visits field to the Measures section.
6.3 DataLens. Configure a chart with cohort visualization
Filter out incomplete weeks of June 29, 2020 and September 28, 2020.
- Drag the min_date field to the Filters section.
- In the min_date field, click the calendar icon.
- In the window that opens, select the start and end dates of the date range for filtering:
- Start date: July 06, 2020.
- End date: September 27, 2020.
- Click Apply filter.
- In the window that opens, select the start and end dates of the date range for filtering:
- Format the numbers in the values of the week_num field by removing the decimal places. To do this, in the Columns section, in the week_num field, click the grid icon. In the window that opens, set the following configuration:
- Set the Precision measure to 0.
- Set the Show delimiter to Hide.
- Click Apply.
- To color the table, add the Visits field to the Colors section and click the gear icon. In the window that opens, configure the colors:
- Select Gradient type: 3 point.
- Select Color: Orange-Violet-Blue.
- Enable Set threshold values and specify the values 100, 1000, and 5000.
- Click Apply.
- Click Save.
- Name the chart ch_metrica_data_users_visits_cohorts_abs and click Save.
6.4 DataLens. Create a chart with retention
Create a chart with retention based on the ch_metrica_data_users_visits_cohorts_abs chart. You can open the chart from the dashboard or find it in the chart list.
- Click Save as.
- Enter ch_metrica_data_users_visits_cohorts_rel as the name of the chart and click Save.
- Create a new calculated field to calculate retention relative to the first week:
- In the left part of the screen, click above the list of dataset fields and select Add field.
- Name the field Visits from the first week.
- Enter the following formula:
SUM([Visits])/RMAX(SUM([Visits]) among [week_num])
. - Click Create.
- Drag the Visits from the first week field to the Measures section.
- Drag the Visits from the first week field to the Colors section in place of the Visits field.
- Select the format for Visits from the first week. To do this, click the grid icon under Measures in the Visits from the first week field. In the window that opens, set the following configuration:
- Set Format to Percent.
- Click Apply.
- Edit the threshold values for the measure's colors. Under Colors, click the gear icon. In the resulting window, specify the threshold values of 0.01, 0.025, and 0.1 and click Apply.
- Click Save.
6.5 DataLens. Add charts to a new dashboard tab
Now return to the dashboard.
- Click Edit → Tabs.
- Rename the existing tab Overview + Funnels.
- Add a new tab and name it Cohorts. Click Save.
- Go to the new Cohort tab.
- Add the chart ch_metrica_data_users_visits_cohorts_abs to the dashboard.
- In the Name field, specify Visits by cohort (absolute).
- To add a new tab, click Add on the left.
- In the new tab, add the chart ch_metrica_data_users_visits_cohorts_rel.
- Enter Visits by cohort (relative) as the name.
- Click Add.
Now you have a chart with two switchable tabs.
6.6 DataLens. Create charts
Create a new chart based on the chart ch_metrica_data_users_visits_cohorts_abs. You can open the chart from the dashboard or find it in the chart list.
-
Click Save as.
-
Enter ch_metrica_data_users_revenue_cohorts_abs as the name of the chart and click Save.
-
Drag the Revenue field to the Measures and Colors sections on top of the Visits field.
-
In the Revenue section, click the grid icon. Change the field formatting.
- Select 1 decimal place.
- Select the Millions scale.
- Change the color thresholds for the new field to 500000, 1500000, and 10000000.
-
Save the chart.
Create another chart based on the chart ch_metrica_data_users_visits_cohorts_rel.
-
Click Save as.
-
Enter ch_metrica_data_users_revenue_cohorts_rel as the name of the chart and click Save.
-
Change the Visits from the first week field.
- Rename the field Revenue from the first week.
- Change the formula to
SUM([Revenue])/RMAX(SUM([Revenue]) among [week_num])
. - Change the color thresholds for the new field to 0.01, 0.2, and 0.3.
-
Save the chart.
6.7 DataLens. Add charts to the dashboard
Add charts with cohort visualization to the dashboard.
-
Click Edit.
-
Click Add.
-
Choose Chart.
-
Select the chart ch_metrica_data_users_revenue_cohorts_abs from the chart list.
-
Enter Revenue by cohort (absolute) as the name.
-
Use the + Add button to create a new tab.
- In the new tab, select the chart ch_metrica_data_users_revenue_cohorts_rel from the chart list.
- Enter Revenue by cohort (relative) as the name.
- Click Save.
-
Arrange the charts side by side.