Launching distributed training
The distributed training feature is at the Preview stage. To access the feature, contact support.
Distributed training supports PyTorch and PyTorch Lighting. By default, PyTorch version 1.6.0 is installed in DataSphere. Update it to version 1.9.1 so that TaaS runs correctly:
%pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
At the Preview stage, distributed training is only available on g2.8 VM instances.
TaaS on multiple GPUs
Prepare the training code and define the DataLoader for PyTorch.
If you are using PyTorch, initialize distributed training based on the environment variables:
If you are using PyTorch Lightning, skip this step: you don't need any additional initialization.
In a separate cell, call the
#pragma taasservice command. Enter the number of GPUs to distribute training among.
#!g2.8 #pragma taas --gpus 8 <start training>
When you run training using multiple processes, only the process with the
RANK=0 environment variable will be able to write to the DataSphere project repository. Keep this in mind when saving the model during your training process.
TaaS with distributed data delivery
Besides distributed training across multiple GPUs, TaaS provides an option to optimize data loading and prepare the data for training. This may be useful if you store large amounts of data in cloud storage that accesses data substantially slower than you process it.
Define the PyTorch DataLoader in a separate cell and register it.
import taas data_loader=DataLoader(DataSet) taas.register(data_loader)
The registered DataLoader will be launched on multiple c1.4 VM instances to prepare the data before training is started on costly GPU resources. After preparation, the data is delivered to the GPU-enabled VM, and loading can now continue in parallel with calculations and training.
To cancel registration, call:
#!g2.8 #pragma taas --gpus 8 --cpus 1 --units 20480000 <start training>
gpus: Number of GPUs. The parameter can take the values:
- 8: One VM
- 16: Two VMs
- 32: Four VMs
cpus: The number of c1.4 VMs on which each registered data loader will be launched. The parameter value is from 1 to 8.
units: The number of elements to be extracted from the data loader.