Getting started with DataSphere
DataSphere is a service that simplifies the use of the JupyterLab development environment on Yandex.Cloud computing resources. This lets you perform complex calculations, such as training neural networks or analyze big data, using the familiar Jupyter Notebook interface.
In this section, you'll learn how to:
- Create projects where you'll work in the JupyterLab environment.
- Install required packages.
- Manage computing resources by changing their configurations.
Before you start
- Go to the management console. Then log in to Yandex.Cloud or sign up if you don't have an account.
- On the Billing, make sure that a billing account is linked and that its status is
TRIAL_ACTIVE. If you don't have a billing account, create one.
- On the Access management page, make sure you have the
editorrole or higher. The role must be assigned for the folder where you'll work or the cloud that the folder belongs to.
Create a project
To create a project:
- In the management console, open the DataSphere section in the folder where you want to create your project.
- Go to the Projects tab.
- Click Create project.
- Set the project name.
- If necessary, add a description.
- Click Create.
To start working with JupyterLab, open the created project:
- Click the row of the project you need.
Click next to the project.
It takes 1 to 3 minutes to launch a project.
Popular packages for data analysis and machine learning are pre-installed and ready for use: view the list.
You can install missing packages using the pip package manager.
To install a package:
Write the following command in the notebook cell:
%pip install <Package name>
For example, install the requests package to make HTTP requests:
%pip install requests
Run the cell. To do this, click .
The package installation result is displayed under the cell.
Increase a cell's computing resources
By default, projects run with the minimal configuration:
S (32 GB RAM and 4 vCPUs). You can change the configuration. This saves the state of the interpreter, meaning that no variables or computation results are lost.
Some variables aren't serialized and therefore can't be saved. For example, a variable with a file open for writing:
f = open("file.txt", "w").
A warning is shown for these variables during the assignment:
The following variables cannot be serialized:.
Change the configuration using a prefix
Increase a cell's computing resources using a prefix:
- Select the cell to change the configuration for.
- In the first row of the cell, add the configuration prefix
#!L(8 cores, gpu: 1xV100).
To reduce the configuration, remove the prefix or change it to
Change the configuration in the interface
Increase a cell's computing resources in the interface:
- Click the button with the name of the configuration in the menu on the notebook tab.
- Select the configuration:
- S (4 cores) (default).
- L (8 cores, gpu: 1xV100).
- Wait until the S/L instance is ready status appears on the notebook panel.