Visualizing geodata from a CSV file
- Before you start
- Step 1. Create a connection to a CSV file
- Step 2. Create a dataset and configure its fields
- Step 3. Create your first chart: a heat map
- Step 4. Create your second chart: a bar chart
- Step 5. Create line charts
- Step 6. Create a dashboard
- Step 7. Add charts to the dashboard
- Step 8. Add selectors and create a dashboard
- Step 9. Connect an additional source with region geolayers
- Step 10. Add new fields to the dataset
- Step 11. Create a chart using the new fields
- Step 12. Add a new chart to the dashboard
- Step 13. Publish the dashboard
This scenario analyzes public data on road accidents in Russia. In addition to identifying the facts of accident statistics in Russia, during the analysis you'll learn how to:
- Work with the DataLens key entities: Connections, Datasets, Charts, and Dashboards.
- Combine multiple sources at the level of a single dataset.
- Use the
Date and time
data type and edit chart-level groupings. - Work with geodata: geopoints and geopolygons.
- Create public dashboards that will be available to everyone from any device without authentication.
To visualize and analyze the data, make sure you have a ready-to-use cloud and follow these steps:
- Create a connection to a CSV file.
- Create a dataset and configure its fields.
- Create your first chart: a heat map.
- Create your second chart: a bar chart.
- Create line charts.
- Create a dashboard.
- Add charts to the dashboard.
- Add selectors and create a dashboard.
- Connect an additional source with region geolayers.
- Add new fields to the dataset.
- Create a chart using the new fields.
- Add the new chart to the dashboard.
- Publish the dashboard.
Before you start
To use DataLens, you have to sign up for Yandex.Cloud and create a billing account (without a billing account, DataLens is only available in demo mode):
-
Go to the management console. Then log in to Yandex.Cloud or sign up if you don't have an account yet.
-
On the billing page make sure that a billing account is linked and that its status is
ACTIVE
orTRIAL_ACTIVE
. If you don't have a billing account, create one.
When you activate your billing account, you automatically switch to the Free DataLens plan. By default, you can create a DataLens instance for no more than 1 folder. If you haven't done this yet:
- Select a folder on the cloud page.
- Select the DataLens service.
- Click Activate DataLens.
An instance of the service is created in the selected folder and you can go directly to the DataLens interface.
Learn more about clouds and folders.
Step 1. Create a connection to a CSV file
-
Download public accident statistics in CSV format. As of writing these instructions in February 2020, the most current dataset was for 2018. Unpack the downloaded archive. You can also download a ready-to-use file with data for 2018.
-
Open the DataLens homepage and click Create connection.
-
Select the CSV connection type.
-
Click Select CSV file and select the downloaded file. It may take up to several minutes to load it, depending on the speed of your internet connection.
When loaded, you'll see a preview of the file data. To view it, use the scroll bars on the right and at the bottom.
-
In the upper-right corner, click Create.
The connection to the CSV file is created. The data is presented in the same form as in the file. To use it, you need to create a dataset.
Step 2. Create a dataset and configure its fields
The dataset will consist of a single source: the CSV file.
-
Click Create dataset.
-
Go to the Fields tab.
Here you can add data fields and calculated fields, change field data types and aggregation rules, as well as rename fields.
-
Change the field names to Russian ones in the following order:
- Region code
- Region name
- Road code
- Road name
- Road type
- OKTMO code
- Address
- Accident type
-
The accident date and time are shown in the following fields:
- crash_date: The date in numeric format like
20190218
. - crash_time: The time written like
19:34
.
They won't be used in the dataset in this form. Click the visibility icon to hide them.
- crash_date: The date in numeric format like
-
To display information about the accident date and time, add a new field.
-
Click Add field.
-
Specify the following data for it:
- Name: Date and time
- Formula:
datetime_parse(str([crash_date])+'-'+str([crash_time]))
Tip
To review the documentation for these functions yourself, click Reference.
-
Click Create.
The preview section now correctly displays the accident date and time as a value of the
Date and time
type. The new Date and time field is at the top of the table. Click to edit the field formula. -
-
Continue renaming the fields in the following order:
- Cause of accident
- Number of victims
- Death toll
- Number of vehicles
- Number of participants
Set the Sum aggregation type for the following fields:
- Number of victims
- Death toll
- Number of vehicles
- Number of participants
-
Add a field to display a geopoint.
-
Click Add field at the top right.
-
Enter data for the new field:
- Name: Geopoint
- Formula:
GEOPOINT([latitude],[longitude])
where [latitude] and [longitude] are dataset fields.
-
Click Create.
Note
The example is based on ready-to-use data in coordinate format. In other cases, to convert data to coordinate format, you can use the geocoding function.
-
Make sure that the table and preview contain the new
Geopoint
type field.
-
-
Add a field to display the number of accidents.
-
Click Add field.
-
Enter data for the new field:
- Name: Number of accidents
- Formula:
SUM(1)
-
Click Create.
You don't need to set the aggregation type for this field: when you add it to the chart, DataLens applies the aggregation function automatically.
-
-
Add a field for data on deaths in road accidents.
-
Click Add field.
-
Enter data for the new field:
- Name: Accident death rate
- Formula:
[Death toll]/[Number of accidents]
-
Click Create
Note
You can create calculated fields directly in charts. However, it's more convenient to do this at the dataset level. This makes them available in all charts.
-
-
Save the dataset by clicking Save at the top right.
-
Specify the dataset name: Accidents. Click Create.
The dataset is created.
Step 3. Create your first chart: a heat map
Proceed to creating the first chart.
-
Click Create chart.
-
In the wizard window that opens, click on the field that defaults to Column chart and select the Map chart type.
-
Under Points, select the Points (heat map) layer type.
-
Drag the Geopoint field to the Geopoints section.
That's your first chart on the map.
-
To save it, click the arrow icon next to the Save button and select Save as.
-
In the dialog box, name the chart Heat map and click Save.
Step 4. Create your second chart: a bar chart
-
Analyze the number of accidents by region.
-
In the same window, select Bar chart as the chart type.
-
Drag:
- The Region name field to the Y section.
- The Number of accidents field to the X section.
- The Number of accidents field to the Sorting section.
The chart shows the number of accidents by region. Most accidents occurred in the Moscow region.
Note
The legend may be missing for some of the chart lines. If this is the case, hover over the line and you'll see a tooltip with the region name.
-
Click Save as to save the chart.
Name the chart Number of accidents by region and save it.
-
-
Analyze the death rate in accidents by region.
-
Replace the Number of accidents measure with Accident death rate by dragging and dropping.
This time, Ingushetia is far ahead in the number of deaths, followed by Tyva and Kalmykia.
-
Save the chart by clicking Save as
and name it Death rate by region.
-
Step 5. Create line charts
Let's see how the number of accidents and related deaths are distributed by week, day of the week, and time of day.
-
Analyze the number of accidents and death rate by week.
-
In the same window, select Line chart as the chart type.
-
The chart management sections contain the previous values. Delete them by clicking .
-
Drag the Date and time field to the X section and click the green calendar icon.
-
In the Grouping field, choose Date part ⟶ Week and click Apply.
-
Drag:
- The Number of accidents field to the Y section.
- The Accident death rate field to the Y2 section.
Now you can see a chart with two graphs: the number of accidents and death rate. If you hover over a point on the chart, a tooltip appears with specific values.
-
Save the chart by clicking Save as
and name it Number of accidents and death rate by week.
-
-
Now analyze the number of accidents and death rate by day of the week.
-
Change grouping by date and time to Date part ⟶ Day of week.
This chart shows more obvious trends.
-
Save the chart by clicking Save as
and name it Number of accidents and death rate by day of week.
-
-
Analyze the number of accidents and death rate by hour of day.
-
Change grouping by date and time to Date part ⟶ Hour.
The chart shows that less accidents occur at night, but they result in more casualties.
-
Save the chart by clicking Save as
and name it Number of accidents and death rate by hour of day.
-
Step 6. Create a dashboard
-
Open the menu by clicking the icon to the left of the logo and select Dashboards.
-
Click Create
and select Dashboard.
-
Name the dashboard Accidents in Russia and click Create.
Note
If this is the first dashboard you create in the folder, it opens as soon as it's created. If the folder already contains dashboards, the list of them opens. In this case, select the Accidents in Russia dashboard from the list.
Step 7. Add charts to the dashboard
-
Add the first chart.
-
Click Add and select Chart from the drop-down list.
-
In the Chart field, select the previously created Heat Map from the list of charts.
-
Click Add.
The chart is displayed on the dashboard.
-
-
Repeat the previous three steps for the Number of accidents by region, Number of accidents and death rate by week, Number of accidents and death rate by day of week, and Number of accidents and death rate by hour of day charts by adding them to the dashboard.
-
Create a switch for the Number of accidents by region and Death rate by region charts on the dashboard.
-
Click the Number of accidents by region chart settings icon to open them.
-
Click Add to the left of the screen.
-
In the Chart field, click Select and choose the Death rate by region chart.
-
Click Save.
-
Step 8. Add selectors and create a dashboard
Selectors let you filter data by value.
-
Add a selector to show statistics by region name.
-
Click Add and choose Selector.
-
Select Based on dataset, click Select, and choose the previously created Accidents dataset.
-
In the Field property, choose Region name and select Show next to the selector name. Click Add.
The selector is displayed on the dashboard as a rectangle.
-
-
Repeat the previous three steps for the fields:
- Cause of accident
- Accident type
- Road type
- Road name
-
You can drag and resize dashboard elements. Place the elements according to the screenshot or however is convenient for you and click Save to save the dashboard.
If you choose any values in the selectors, the charts will display data for these values.
You can switch the Number of accidents by region chart to Death rate by region.
Step 9. Connect an additional source with region geolayers
Heat maps aren't always the most informative. Upload the polygon reference and add a chart with a color fill for Russian regions.
-
Open the menu by clicking the button to the left of the logo and select Connections.
-
At the top right, click Create
and select Connection.
-
Select the CSV file type.
Download the Regions.csv file. Then, in the create connection window, upload it by clicking Select CSV file.
-
Take a look at a preview of the file data. For the Table header parameter, set the Available value. At the top right, click Create.
Step 10. Add new fields to the dataset
You need to add new fields to the previously created Accidents dataset.
-
In the left menu, select Datasets.
-
Choose the Accidents dataset.
-
Go to the Sources tab.
-
Click Add.
-
Choose Connections and then Regions.
Since the connection data has not yet been merged, an error message may appear. Follow these steps.
-
Click the merge data icon.
-
Click Add link.
-
Select the fields to link: reg_name and Accident region. Then click Save.
Make sure the data preview is displayed correctly.
-
Go to the Fields tab.
-
New fields are displayed at the bottom of the list. For the Polygon field, specify the Geopolygon data type.
-
Click Save to save the dataset.
Step 11. Create a chart using the new fields
-
Click Create chart.
-
Select the Map chart type.
-
Add polygons to the map. Drag the Polygon field from the Dimensions section to the Polygons section.
-
Change colors of the polygons based on the number of accidents. Drag the Number of accidents field from the Measures section to the Colors section.
-
Drag the following fields to the Tooltips section:
- Region name
- Number of accidents
- Death toll
- Number of victims
- Number of vehicles
- Number of participants
- Accident death rate
The chart shows a map with a color fill for regions. If you hover over a region, information for that region appears.
-
Click Save at the top right to save the chart.
-
Name the chart Map of regions and click Save once again.
Step 12. Add a new chart to the dashboard
-
Select Dashboards in the top-left menu.
-
Select the previously created dashboard with accident data.
-
Click Edit at the top right.
-
Click the Heat map settings icon to open them.
-
Click Add to the left of the screen.
-
In the Chart field, click Select and choose the last chart named Map of regions.
-
Click Save.
You can now switch the type of map data visualization: heat map or region fill.
Step 13. Publish the dashboard
-
To set up public access to the dashboard, click .
-
Enable access via link. You can copy and share the link that appears. Everyone can use the link to access the dashboard from any device and without authentication. Click Apply.
Note
You can enable or disable public access separately for each chart and dataset associated with the dashboard.