Web analytics with funnels and cohorts calculated based on Yandex Metrica data
- Before you begin
- 1. Connect ClickHouse and DataSphere
- 2. Retrieve and upload data to ClickHouse
- 3. Connect DataLens and create charts
- 4. Create and configure a dashboard in DataLens
- 5. Build conversion funnels
- 6. Perform cohort analysis
- 6.1. DataSphere. Perform cohort analysis
- 6.2. DataLens. Create a dataset and a chart with cohort visualization
- 6.3. DataLens. Configure a chart with cohort visualization
- 6.4. DataLens. Create a chart with retention
- 6.5. DataLens. Add charts to a new dashboard tab
- 6.6. DataLens. Create charts
- 6.7. DataLens. Add charts to the dashboard
- How to delete created resources
In this tutorial, you will learn how to build conversion funnels, run cohort analysis, calculate the Retention rate for the user base in Yandex DataSphere, and visualize the data in Yandex DataLens.
Yandex Metrica data is used as the data source.
- Connect ClickHouse and DataSphere:
- Retrieve and upload data to ClickHouse:
- Connect DataLens and create charts:
- Create and configure a dashboard in DataLens:
- Build conversion funnels:
- Perform cohort analysis:
- DataSphere. Perform cohort analysis.
- DataLens. Create a dataset and a chart with cohort visualization.
- DataLens. Configure a chart with cohort visualization.
- DataLens. Create a chart with retention.
- DataLens. Add charts to a new dashboard tab.
- DataLens. Create charts.
- DataLens. Add charts to the dashboard.
If you no longer need these resources, delete them.
Before you begin
Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.
- On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account in the identity federation (SSO).
- Select the organization for working in Yandex Cloud.
- Go to the default organization community or create a new community.
- Link your billing account to the DataSphere community you are going to work in. Make sure that the billing account is linked and its status is
ACTIVE
orTRIAL_ACTIVE
. If you don't have a billing account, create one in the DataSphere interface.
Tip
To make sure Yandex DataLens and Yandex DataSphere can run within the Yandex Cloud network, create their instances in the same organization.
Required paid resources
The cost of the infrastructure deployment includes:
- A fee for the cluster computing resources and storage (see Managed Service for ClickHouse pricing).
- A fee for the computation time (see DataSphere pricing).
- A fee for the outbound traffic (see Virtual Private Cloud pricing).
1. Connect ClickHouse and DataSphere
1.1. Connect ClickHouse
- In the management console, select a folder to create a ClickHouse cluster in.
- Select Managed Service for ClickHouse.
- In the window that opens, click Create ClickHouse cluster.
- Specify the settings for a ClickHouse cluster:
-
Under Basic parameters, specify a name for the cluster.
-
Under Resources, select the
Intel Cascade Lake
platform, theburstable
type, and theb2.medium
host type.Warning
We don't recommend using
burstable
VM configurations in production environments. This tutorial uses them as an example. For production solutions, usestandard
ormemory-optimized
configurations. -
Under Storage size, keep the value of
10 GB
. -
Under Hosts, click
. Enable Public access and click Save. -
Under DBMS settings, disable user management via SQL, enter the username, password, and database name (e.g.,
metrica_data
). -
Under Service settings, enable the following options:
- Access from DataLens
- Access from the management console
- Access from Yandex Metrica and AppMetrica
- Access from Serverless
-
Click Create cluster.
-
1.2. Connect DataSphere
-
Open the [(https://datasphere.yandex.com/) homepage]DataSphere.
-
In the window that opens, click Create → Create project.
-
Give your project a name. Naming conventions:
- The length can be from 3 to 63 characters.
- It may contain lowercase Latin letters, numbers, and hyphens.
- The first character must be a letter. The last character can't be a hyphen.
-
Specify a community for the project and click Сreate.
-
Click Open project in JupyterLab.
This is the JupyterLab development environment, and you are going to use it to complete the next steps.
1.3. Clone the repository to DataSphere
- In the Git menu, select Clone.
- In the window that opens, specify the URI of the
repositoryhttps://github.com/zhdanchik/yandex_metrika_cloud_case.git
and click CLONE. - Click OK.
2. Retrieve and upload data to ClickHouse
If you do not yet have a Yandex Metrica tag or it has not accumulated enough data, or if you want to be sure that you get a result by completing all steps in the tutorial, go to step 2.3 (skip steps 2.1 and 2.2).
If you have a Yandex Metrica tag and can access it, go to step 2.1 and 2.2 (skip step 2.3). We recommend walking through these steps if you are an experienced user because the logic of calculating funnels and cohorts depends on the data itself, and you may need to tweak the scripts.
2.1. Yandex Metrica. Create an app and get an access token
-
To work with the API, get your OAuth token.
-
Create an application:
- Go to https://oauth.yandex.ru/client/new.
- Give the service a name.
- Go to Which platform is the app required for?. → Web services. In the Callback URL field, insert
https://oauth.yandex.com/verification_code
. - Go to What data do you need? → Yandex Metrica (metrika). Enable Access to statistics and ability to view all counter settings (metrika:read).
- Click Create app.
- A description of our application appears in the window that opens. Save the ClientID of your app.
-
Click
https://oauth.yandex.ru/authorize?response_type=token&client_id=<app_ID>
. Paste the ClientID of your app as<app_ID>
. -
Click Log in as.
-
Save the received access token.
2.2. DataSphere. Upload data via the Yandex Metrica Logs API
-
In the DataSphere project, in the root of the working directory, create a text file. To do this, click Text File in the working area.
-
Name the file
.yatoken.txt
and insert the received access token into the file content. Save your changes and close the file. -
Open the folder yandex_metrika_cloud_case → notebook 1a. get_data_via_logs_api.ipynb.
-
Use your Yandex Metrica tag ID as the value of the
COUNTER_ID
variable. You can find your Yandex Metrica tag ID on the My tags page. -
Enter the start date for the analyzed period in the
START_DATE
variable. -
Enter the end date for the analyzed period in the
END_DATE
variable.Warning
The date range will NOT include the end date. For example, to get data up to December 5, 2022, paste
2022-12-06
into theEND_DATE
variable -
Complete all steps (cells with code) in the notebook 1a. get_data_via_logs_api.ipynb.
If you couldn't get data for the demo tag from the Logs API, you can download it via Yandex Disk.
2.3. DataSphere. Download the test tag data via Yandex Disk
Note
Skip this section if you are using your own tag data.
- Open the folder yandex_metrika_cloud_case → notebook 1b. get_data_via_yadisk.ipynb.
- Complete all steps (cells with the code) in the notebook 1b. get_data_via_yadisk.ipynb.
2.4. ClickHouse. Get the cluster's IP address
- In the management console, go to the created ClickHouse cluster. Wait for the cluster status to be Alive. Then open the cluster by clicking on it.
- Select Hosts from the list on the left.
- On the Overview tab, copy the host name.
2.5. DataSphere. Upload the data to ClickHouse
-
Open the folder yandex_metrika_cloud_case → notebook 2. upload_data_to_ClickHouse.ipynb:
- Paste the copied hostname into the
CH_HOST_NAME
variable. - In the
CH_USER
variable, insert the name of the user you specified when creating your ClickHouse cluster. - In the
CH_DB_NAME
variable, insert the database name you specified when creating your ClickHouse cluster.
- Paste the copied hostname into the
-
In the root directory, create a new text file named .chpass.txt.
-
In the .chpass.txt file, paste the user password you specified when creating your ClickHouse cluster. Save and close the file.
-
Complete all the steps (the cells with the code) in the notebook.
3. Connect DataLens and create charts
3.1. Connect to DataLens
- In the management console, open the page of the ClickHouse cluster you created.
- In the list on the left, select DataLens.
- In the window that opens, select the folder with your DataLens, then click Activate.
3.2. Create a connection to ClickHouse in DataLens
- Click Create connection.
- Select a ClickHouse connection.
- Fill in the connection settings:
- Select a cluster from the Cluster drop-down list or create a new one. If the cluster is missing in the list, click Specify manually, then specify the ClickHouse cluster name.
- Select a ClickHouse host from the Host name drop-down list.
- Select the username.
- Enter password and click Check connection.
- When the connection check succeeds, click Create connection. In the window that opens, enter the connection name and click Create.
3.3. Create a dataset based on the connection
- In the upper-right corner, click Create dataset.
- Select the
metrica_data.hits
table as a source. To do this, drag the table from the list on the left to the editing area. - Open the Fields tab.
- In the upper-right corner, click Add field.
- To calculate the number of hits, create a calculated field: in Field, enter
Hits
. In the workspace, enter1
and click Create. - For the Hits field, select the Amount value in the Aggregation column.
- Rename the Browser field to Browser.
- In the upper-right corner, click Save.
- Name the dataset
ch_metrica_data_hits
and click Create.
3.4. Create a stacked area chart
- In the upper-right corner, click Create chart.
- In the window that opens, drag the following fields to the chart section:
- EventDate, to the X section.
- Browser, to the Colors section.
- Hits, to the Y section.
- Change the chart type from Column chart to Stacked area chart.
- Click Save.
- In the window that opens, enter
ch_metrica_data_hits_area
as the chart name and click Save.
3.5. Create a pivot table chart
- In the top right-hand corner, click → Save as.
- For the chart copy, enter
ch_metrica_data_hits_table
as the new name and click Save. - Select Pivot table as the new chart type.
- Add or drag the following fields to the chart area:
- Browser, to the Rows section.
- Hits, to the Sorting section.
- Click Save.
4. Create and configure a dashboard in DataLens
4.1. Create a dashboard
- In the panel on the left, select
- Enter
ch_metrica_data
as the name of the dashboard and click Create. - Add the first chart to the dashboard. To do this, in the top right corner, click Add → Chart:
- From the Chart drop-down list, select ch_metrica_data_hits_area.
- In the Name field, enter Hits by browser as the chart name and click Add.
- Similarly, add the chart ch_metrica_data_hits_table named Hits by browser for period.
- Move the charts and resize them on the dashboard:
- Drag the table chart to the right of the diagram chart.
- To change the vertical dimensions of the charts, drag them by the lower-right corner.
- In the upper-right corner, click Save.
4.2. Set up a dashboard
- Add filtering to select a specific browser. To do this, in the top right corner, click Edit → Add → Selector.
- You can add the selector to a field from any dataset. From the Dataset list, select the created dataset ch_metrica_data_hits.
- In the Field list, select Browser.
- Enable the Multiple choice option.
- In the Default value field, select browsers:
- android_browser
- chrome
- chromemobile
- firefox
- opera
- safari
- safari_mobile
- samsung_internet
- yandex_browser
- yandexsearch
- In the Name field, enter a name for the selector and enable Show.
- Click Add.
- Drag the selector to the top of the dashboard and stretch it horizontally.
- In the upper-right corner, click Save.
5. Build conversion funnels
5.1. DataSphere. Build funnels
- Open the [(https://datasphere.yandex.com/) homepage]DataSphere.
- Open the notebook 3. funnels.ipynb. Specify the host, the user, and the DB name.
- Run the cells and evaluate the analysis results.
In ClickHouse, the tablemetrica_data.funnels_by_bro
is created, where funnels by browser are calculated.
5.2. DataLens. Funnels by browser. Create a dataset
Create a new dataset based on the new table and the connection to ClickHouse:
- Open the DataLens homepage (or click DataLens in the left panel) and click Create dataset.
- Go to the Connections section and click Add.
- From the list of connections, select the connection name that you created in Step 3.2.
- Drag the new table
metrica_data.funnels_by_bro
to the editing area. - Open the Fields tab:
- Rename the fields step X to Step X, where X is the step number.
- Specify the Sum value in the Aggregation column for the Step X fields and click Save.
- Name the dataset ch_metrica_data_funnels_by_bro and click Create.
5.3. DataLens. Funnels by browser. Create a chart
Create a chart based on the ch_metrica_data_funnels_by_bro dataset:
- Click Create chart.
- Select the Pivot table chart type.
- Drag the fields to the chart sections:
- Browser, to the Rows section.
- Step X, to the Measures section, where X is the step sequence number.
- Step 1, to the Sorting section.
- Click Save.
- Enter ch_metrica_data_funnels_by_bro_table as the chart name and click Save.
5.4. DataLens. Funnels by browser. Add a chart to your dashboard
- Go to the created dashboard (you can do it from the dashboards page).
- Add a new chart. In the upper-right corner, click Edit.
- Add the chart
ch_metrica_data_funnels_by_bro_table
. Enter Funnels by browser as the name and click Add. - Place the new chart to the right of the existing two. Stretch the chart so that it matches the others vertically and reaches the right border of the page.
- Click Save.
5.5. DataLens. Funnels by browser. Set up a dashboard
Configure relationships so that the selector affects the new chart from another dataset:
- Click Edit → Links.
- In the window that opens, select the Browser selector from the list.
- On the page with the other dashboard elements, scroll down to the Funnels by browser chart, and click on the list with the link.
- Select the connection type Outgoing link.
- From each list, select the fields for the Browser link. Click Add.
- Click Save.
- In the top left corner, click
- Enter Supermarket.ru — funnel and cohort analysis as the name. Click Done.
6. Perform cohort analysis
6.1. DataSphere. Perform cohort analysis
- Open the notebook 4. cohorts.ipynb. Specify the host, the user, and the DB name.
- Run the cells and evaluate the analysis results.
In ClickHouse, the table metrica_data.retention_users
is created, which contains all the data necessary to render visualization in DataLens.
6.2. DataLens. Create a dataset and a chart with cohort visualization
Create a new dataset based on the new table and the connection to ClickHouse:
- Open the DataLens homepage and click Create dataset.
- In the Connections section, click Add.
- From the list, select the connection you created.
- Drag the new table
metrica_data.retention_users
to the work area to connect to it. - Open the Fields tab and create the new calculated field week_num, which is equal to
([date]-[min_date])/7
.
This field indicates the number of weeks from the user's first visit. - Click Create.
- For the visits, purchases, and revenue fields, select the Sum value in the Aggregation column.
- Rename the fields to Visits, Purchases, and Revenue, respectively.
- Save the dataset:
- Name the dataset ch_metrica_data_users_visits.
- Click Create.
- Create a new chart based on the dataset:
- Change the chart type to Pivot table.
- Drag the week_num field to the Columns section.
- Drag the min_date field to the Rows section.
- Drag the Visits field to the Measures section.
6.3. DataLens. Configure a chart with cohort visualization
Filter out incomplete weeks of June 29, 2020 and September 28, 2020:
- Drag the min_date field to the Filters section.
- In the min_date field, click the calendar icon:
- In the window that opens, select the start and end dates of the date range for filtering:
- Start date:
June 06, 2020
. - End date:
September 27, 2020
.
- Start date:
- Click Apply filter.
- In the window that opens, select the start and end dates of the date range for filtering:
- Format the numbers in the values of the week_num field by removing the decimal places. To do this, in the Columns section, in the week_num field, click the
- Set the Decimal place measure to
0
. - Set the Show delimiter measure to Hide.
- Click Apply.
icon. In the window that opens, set the following configuration:
- Set the Decimal place measure to
- To color the table, add the Visits field to the Colors section and click the
icon. In the window that opens, configure the colors:- Select Gradient type:
3 point
. - Select Color:
Orange-Violet-Blue
. - Enable Set threshold values and specify the values:
100
,1000
, and5000
. - Click Apply.
- Select Gradient type:
- Click Save.
- Name the chart ch_metrica_data_users_visits_cohorts_abs and click Save.
6.4. DataLens. Create a chart with retention
Create a chart with retention based on the ch_metrica_data_users_visits_cohorts_abs chart. You can open the chart from the dashboard or find it in the chart list.
- In the top right-hand corner, click → Save as.
- Enter ch_metrica_data_users_visits_cohorts_rel as the name of the chart and click Save.
- Create a new calculated field to calculate the retention rate relative to the first week:
- In the left part of the screen, click
- Name the field Visits from the first week.
- Paste the following formula:
SUM([Visits])/RMAX(SUM([Visits]) among [week_num])
. - Click Create.
- In the left part of the screen, click
- Drag the Visits from the first week field to the Measures section.
- Drag the Visits from the first week field to the Colors section in place of the Visits field.
- Select the format for Visits from the first week. To do this, click the
- Set Format to Percent.
- Click Apply.
icon under Measures in the Visits from the first week field. In the window that opens, set the following configuration:
- Edit the threshold values for the measure's colors. Under Colors, click the
icon. In the window that opens, enable Set threshold values, then specify the threshold values of0.01
,0.025
, and0.1
and click Apply. - Click Save.
6.5. DataLens. Add charts to a new dashboard tab
- In the panel on the left, click
- Click Edit → Tabs.
- Rename the existing tab Overview + Funnels.
- Add a new tab and name it Cohorts. Click Save.
- Go to the new Cohort tab:
- Add the chart ch_metrica_data_users_visits_cohorts_abs to the dashboard.
- In the Name field, specify Visits by cohort (absolute).
- To add a new tab, click Add on the left:
- In the new tab, add the chart ch_metrica_data_users_visits_cohorts_rel.
- Enter Visits by cohort (relative) as the name.
- Click Add.
- Click Save.
Now you have a chart with two switchable tabs.
6.6. DataLens. Create charts
Create a new chart based on the chart ch_metrica_data_users_visits_cohorts_abs. You can open the chart from the dashboard or find it in the chart list.
- In the top right corner, click Save.
- Enter ch_metrica_data_users_revenue_cohorts_abs as the name of the chart and click Save.
- Drag the Revenue field to the Measures and Colors sections on top of the Visits field.
- In the Revenue section, click the
- Select 1 decimal place.
- Select the Millions scale.
- Change the color thresholds for the new field to
500000
,1500000
, and10000000
.
icon. Change the field formatting:
- Save the chart.
Create another chart based on the ch_metrica_data_users_visits_cohorts_rel chart:
- In the top right corner, click Save.
- Enter ch_metrica_data_users_revenue_cohorts_rel as the name of the chart and click Save.
- Change the Visits from the first week field:
- Rename the field Revenue from the first week.
- Change the formula to
SUM([Revenue])/RMAX(SUM([Revenue]) among [week_num])
. - Change the color thresholds for the new field to
0.01
,0.2
, and0.3
.
- Save the chart.
6.7. DataLens. Add charts to the dashboard
Add charts with cohort visualization to the dashboard:
- Click Edit.
- Click Add.
- Choose Chart.
- Select the chart ch_metrica_data_users_revenue_cohorts_abs from the chart list.
- Enter Revenue by cohort (absolute) as the name.
- Use the + Add button to create a new tab:
- In the new tab, select the chart ch_metrica_data_users_revenue_cohorts_rel from the chart list.
- Enter Revenue by cohort (relative) as the name.
- In the top right corner, click Save.
- Arrange the charts side by side.
How to delete created resources
To stop paying for the resources created, delete the cluster.