Getting started with DataSphere
DataSphere is a service that simplifies the use of the JupyterLab development environment on Yandex.Cloud computing resources. This lets you perform complex calculations, such as training neural networks or analyzing big data, using the familiar Jupyter Notebook interface.
In this section, you'll learn how to:
- Create projects where you'll work in the JupyterLab environment.
- Install required packages.
- Manage computing resources by changing their configurations.
Before you start
- Go to the management console. Then log in to Yandex.Cloud or sign up if you don't have an account yet.
- On the Billing, make sure that a billing account is linked and that its status is
TRIAL_ACTIVE. If you don't have a billing account, create one.
- On the Access management page, make sure you have the
editorrole or higher. The role must be assigned for the folder where you'll work or the cloud that the folder belongs to.
Create a project
To create a project:
- In the management console, open the DataSphere section in the folder where you want to create your project.
- Go to the Projects tab.
- Click Create project.
- Set the project name.
- If necessary, add a description.
- Click Create.
To start working with JupyterLab, open the created project:
- Click the row of the project you need.
Click next to the project.
It takes 1 to 3 minutes to launch a project.
Popular packages for data analysis and machine learning are pre-installed and ready for use: view the list.
You can install missing packages using the pip package manager.
To install a package:
Write the following command in the notebook cell:
%pip install <Package name>
For example, install the seaborn package to visualize statistics:
%pip install seaborn
Run the cell. To do this, click .
The package installation result is displayed under the cell.
Increase a cell's computing resources
By default, projects run with the minimal configuration:
c1.4 (32 GB RAM and 4 vCPUs). You can change the configuration. This saves the state of the interpreter, meaning that no variables or computation results are lost.
Some variables aren't serialized and therefore can't be saved. For example, a variable with a file open for writing:
f = open("file.txt", "w").
A warning is shown for these variables during the assignment:
The following variables cannot be serialized:.
Change the configuration using a prefix
Increase a cell's computing resources to the configuration g1.1 using a prefix:
- Select the cell to change the configuration for.
- In the first row of the cell, add the configuration prefix
#!g1.1(8 vCPUs, 1 GPU).
If you want to restore the default configuration, remove the prefix or change it to
Change the configuration in the interface
Increase a cell's computing resources in the interface:
- Click the button with the name of the configuration in the menu on the notebook tab.
- Select the desired configuration.
- Wait until the
instance is readystatus appears on the notebook panel.