Getting started with DataSphere
Yandex DataSphere is an end-to-end ML development environment where you can use familiar IDEs, serverless computing technology, and seamlessly combine a broad range of Yandex Cloud computing resource configurations. Yandex DataSphere is part of the data platform and offers powerful features to easily interact with Yandex Cloud services. As an IDE, DataSphere provides Jupyter® Notebook.
In this section, you'll learn how to:
- Create projects.
- Run projects.
- Configure the environment.
- Upload data to projects.
- Start training.
- Share your results.
Before you begin
- Go to the management console and log in to Yandex Cloud or register if you don't have an account yet.
- On the billing page, make sure you linked a billing account and it has the
TRIAL_ACTIVEstatus. If you don't have a billing account, create one.
- Open the homepage DataSphere.
- Accept the user agreement.
- Select the organization to work with DataSphere in or create a new one.
Create a project
To create a project in DataSphere:
Open the DataSphere homepage.
Go to the Communities tab and select the community to create a project in.
On the community page, click Create project.
In the Create project window:
- Enter the name of the project.
- (Optional) Enter a description of the project.
- Select a community for the project.
Run the project
To run a project, click Open project in JupyterLab.
Configure the environment
Popular packages for data analysis and machine learning are pre-installed and ready for use, see the list.
You can install missing packages using the pip package manager.
To install a package:
Write the following command in the notebook cell:
%pip install <Package name>
For example, install the seaborn package to visualize statistics:
%pip install seaborn
Run the cell. To do this, click .
The package installation result is displayed under the cell.
You can also configure the environment to run your code using Docker images.
Upload data to the project
You can upload small amounts of data (up to 100 MB) to your DataSphere project over the JupyterLab interface. If you want to upload larger amounts of data, use your network storages or databases. For larger data volumes, it's also convenient to use datasets.
To upload data to your project over the JupyterLab interface:
- Under the File Browser section, select the directory to upload a data to.
- Click at the top left.
- Select the files to upload.
DataSphere lets you upload data from different sources:
- Connecting to S3 storage.
- Connecting to Google Drive.
- Connecting to a ClickHouse database.
- Connecting to a PostgreSQL database.
- Connecting to Yandex Disk.
To start computations:
Under the File Browser section, select the notebook with the Python or bash code.
Select and run one or more cells with the code by choosing Run → Run Selected Cells, or pressing Shift + Enter.
Wait for the operation to complete.
The execution result is displayed under the cell.