Getting started with DataSphere
DataSphere is a service that simplifies the use of the JupyterLab development environment on Yandex.Cloud computing resources. This lets you perform complex calculations, such as training neural networks or analyzing big data, using the familiar Jupyter Notebook interface.
In this section, you'll learn how to:
- Create projects where you'll work in the JupyterLab environment.
- Install required packages.
- Manage computing resources by changing their configurations.
Before you start
- Go to the management console. Then log in to Yandex.Cloud or sign up if you don't have an account yet.
- On the Billing, make sure that a billing account is linked and that its status is
TRIAL_ACTIVE. If you don't have a billing account, create one.
Assign roles for using DataSphere
In the management console on the Access management page, make sure you have relevant roles:
To work with existing projects, you need the
datasphere.userrole or higher.
To create, edit, and delete projects, you need the
datasphere.adminrole or higher.
Learn more about access management.
Create a project
To create a project:
- In the management console, open the DataSphere section in the folder where you want to create your project.
- Go to the Projects tab.
- Click Create project.
- Enter the Name of the project.
- (optional) Enter the Description of the project.
- (optional) Configure Advanced settings:
- Click Create.
To start working with JupyterLab, open the created project:
- Click the row of the project you need.
Click next to the project.
It takes 1 to 3 minutes to launch a project.
Popular packages for data analysis and machine learning are pre-installed and ready for use: view the list.
You can install missing packages using the pip package manager.
To install a package:
Write the following command in the notebook cell:
%pip install <Package name>
For example, install the seaborn package to visualize statistics:
%pip install seaborn
Run the cell. To do this, click .
The package installation result is displayed under the cell.
Increase computing resources for your cell
By default, projects run with the minimal configuration:
c1.4 (32 GB RAM and 4 vCPUs). You can change the configuration. This saves the state of the interpreter, meaning that no variables or computation results are lost.
Some variables aren't serialized and therefore can't be saved. For example, a variable with a file open for writing:
f = open("file.txt", "w").
A warning is shown for these variables during the assignment:
The following variables cannot be serialized:.
Change the configuration using a prefix
Increase a cell's computing resources to the configuration g1.1 using a prefix:
- Select the cell to change the configuration for.
- In the first row of the cell, add the configuration prefix
#!g1.1(8 vCPUs, 1 GPU).
If you want to restore the default configuration, remove the prefix or change it to
Change the configuration in the interface
Increase a cell's computing resources in the interface:
- Click the button with the name of the configuration in the menu on the notebook tab.
- Select the desired configuration.
- Wait until the
instance is readystatus appears on the notebook panel.
Contact support in the service
To contact technical support in the service:
Click in the lower-right corner of the notebook window or select Report a bug in the Help menu.
In the window that opens, describe your problem in the Bug and Give us more detail fields.
Click Report a bug.
You'll receive your request number by email.