Getting started with DataSphere

Written by

Updated at April 4, 2024

Getting started
Create a project
Run the project
Configure the environment
Upload data to the project
Start training
What's next

Yandex DataSphere is an end-to-end ML development environment where you can use familiar IDEs, serverless computing technology, and seamlessly combine a broad range of Yandex Cloud computing resource configurations. Yandex DataSphere is part of the data platform and offers powerful features to easily work with Yandex Cloud services. As an IDE, DataSphere provides Jupyter® Notebook.

In this section, you will learn how to:

Create projects.
Run projects.
Configure the environment.
Upload data to projects.
Start training.
Share your results.

Getting started

Go to the management console and log in to Yandex Cloud or create an account if you do not have one yet.
Go to Yandex Cloud Billing and make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account yet, create one.
Open the DataSphere home page.
Accept the user agreement.
Select the organization to work with DataSphere in or create a new one.

Create a project

Open the DataSphere home page.
In the left-hand panel, select Communities.
Select the community to create a project in.
On the community page, click Create project.
In the window that opens, enter a name and description (optional) for the project.
Click Create.

Run the project

To run a project, click Open project in JupyterLab.

Configure the environment

Popular packages for data analysis and machine learning are pre-installed and ready for use, see the list.

You can install missing packages using the pip package manager.

To install a package:

Write the following command in the notebook cell:
```
%pip install <package_name>
```
For example, install the seaborn package to visualize statistics:
```
%pip install seaborn
```
You can use various options that the pip install command supports. See usage examples for this command.
Run the cell. To do this, click .

The package installation result is displayed under the cell.

You can also configure the environment to run your code using Docker images.

Upload data to the project

You can upload small amounts of data (up to 100 MB) to your DataSphere project through the JupyterLab interface. If you want to upload larger amounts of data, use your network storages or databases. For larger data volumes, it's also convenient to use datasets.

To upload data to your project through the JupyterLab interface:

Under the File Browser section, select the directory to upload a data to.
Click at the top left.
Select the files to upload.

Learn more about project storage.

DataSphere lets you upload data from different sources:

Start training

To start computations:

Under the File Browser section, select the notebook with the Python or bash code.
Select and run one or more cells with the code by choosing Run → Run Selected Cells, or pressing Shift + Enter.
Wait for the operation to complete.

The execution result is displayed under the cell.

Getting started with DataSphere

Getting startedGetting started

Create a projectCreate a project

Run the projectRun the project

Configure the environmentConfigure the environment

Upload data to the projectUpload data to the project

Start trainingStart training

What's nextWhat's next