Yandex Cloud
  • Services
  • Solutions
  • Why Yandex Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Language / Region
© 2022 Yandex.Cloud LLC
Yandex DataSphere
  • Getting started
  • Step-by-step instructions
    • All instructions
    • Project management
      • Creating a project
      • Installing dependencies
      • Managing computing resources
      • Setting up consumption limits for a project
      • Setting up consumption limits for a folder
      • Resizing project storage
      • Changing a name or description
      • Deleting a notebook or project
    • Sharing a notebook
      • Publishing a notebook
      • Exporting a project
    • Working with a notebook
      • Running sample code in a notebook
      • Versioning. Working with checkpoints
      • Clearing the interpreter state
      • Working with Git
    • Managing Docker images
      • Docker image for a project
      • Docker image in a cell
    • Connecting to data sources
      • Connecting to a ClickHouse database
      • Connecting to a PostgreSQL database
      • Connecting to S3 storage
    • Setting up integration with Data Proc
    • Working with confidential data
      • Creating a secret
      • Referencing a secret
      • Editing a secret
      • Copying a secret
      • Destroying a secret
    • Launching distributed training
    • Deploying models
      • Creating a node from a Python code cell
      • Configuring the node environment
      • Queries to nodes
  • Concepts
    • Overview
    • Project
    • List of pre-installed software
    • Available commands
    • #pragma service commands
    • Computing resource configurations
    • Integration with version and data control systems
    • Saving a state
    • Integration with Data Proc
    • Background operations
    • Datasets
    • Private data storage
    • Deploying models
    • Using TensorBoard in Yandex DataSphere
    • Distributed training
    • Cost management
    • Quotas and limits
  • Early access
    • Overview
    • Special background operations
  • Practical guidelines
    • All tutorials
    • Getting started with Yandex DataSphere
    • Voice biometrics
    • Evaluating the quality of STT models
    • Marking up audio files
    • Classification of images in video frames
  • API reference
    • Authentication in the API
    • gRPC
      • Overview
      • AppTokenService
      • FolderBudgetService
      • NodeService
      • ProjectDataService
      • ProjectService
      • OperationService
    • REST
      • Overview
      • AppToken
        • Overview
        • validate
      • FolderBudget
        • Overview
        • get
        • set
      • Node
        • Overview
        • execute
      • Project
        • Overview
        • create
        • delete
        • execute
        • get
        • getCellOutputs
        • getNotebookMetadata
        • getStateVariables
        • getUnitBalance
        • list
        • open
        • setUnitBalance
        • update
  • Access management
  • Pricing policy
  • Releases
  • Questions and answers
  1. Concepts
  2. Integration with version and data control systems

Integration with version and data control systems

Written by
Yandex Cloud
  • Integration with the DVC data version control system
  • Integration with the Git version control system

DataSphere is integrated with the Git version control system and DVC data version control system.

Integration with the DVC data version control system

To work with the DVC system, use the following commands:

  • %dvc_init: Initialize a DVC project in the current directory.

    Description of command parameters.
    • -f, --force: Deletes an existing internal DVC directory. This clears the entire local cache.
    • --subdir: Initializes a DVC project in the working directory even if it isn't the root of the Git repository. This parameter is ignored when running in the root directory of a DVC project.
    • --no-scm: Initializes a DVC project separately from Git. This means that DVC doesn't try to find or use Git in the target directory. Some DVC functions are not available in this mode.
    • -h, --help: Shows Help.
    • -q, --quiet: Stops writing data to standard output. 0 is output if no problems arise, otherwise 1.
    • -v, --verbose: Displays detailed tracing information.
  • %dvc_add: Add files or directories to track in DVC.

    Description of command parameters.
    • -R, --recursive: Defines files to add by searching for data files in each target directory and its subdirectories. If there are no directories among the targets, this parameter is ignored.
    • --no-commit: Do not save files to cache. In this case, a DVC file is created and an entry is added to .dvc/state.
    • --file <filename>: Specifies the name of the generated DVC file. The default name is <target>.dvc, where <target> is the name of the file to add.
    • --external: Allows adding files and folders that are outside of the DVC repository.
    • -h, --help: Shows Help.
    • -q, --quiet: Stops writing data to standard output. 0 is output if no problems arise, otherwise 1.
  • %dvc_remove: Stops tracking files or directories in DVC.

    Description of command parameters.
    • --outs: Additionally deletes data generated at all stages of the target operation. By default, always used for DVC files.
    • -h, --help: Shows Help.
    • -q, --quiet: Stops writing data to standard output. 0 is output if no problems arise, otherwise 1.
    • -v, --verbose: Displays detailed tracing information.
  • %dvc_status: Show changes in the project pipelines and file mismatches either between the cache and workspace, or between the cache and remote storage.

    Description of command parameters.
    • -c, --cloud: Comparisons are made with remote storage.
    • -a, --all-branches: Compares cache content against all Git branches instead of just the workspace. Applies only if the --cloud or -r parameter is specified.
    • -T, --all-tags: The same as -a, but applies to Git tags and the workspace. Can be combined with the -a parameter, for example, using the -aT flag.
    • -R, --recursive: Defines files for status checks in each target directory and its subdirectories. A search is made in dvc.yaml and DVC files.
    • --show-json: Prints the output in JSON format instead of a table.
    • --all-commits: The same as -a or -T, but applies to all Git commits and the workspace. Used for comparing cache content for the entire existing history of the project.
    • -d, --with-deps: Defines files to check by tracking dependencies to the targets. If nothing is specified, this parameter is ignored.
    • -r <name>, --remote <name>: Specifies which remote storage to compare against.
    • -j <number>, --jobs <number>: Specifies the number of jobs that DVC can use to retrieve information from remote servers. Only applies if the --cloud parameter is used or remote access is granted.
    • -h, --help: Shows Help.
    • -q, --quiet: Stops writing data to standard output. 0 is output if data is up-to-date, otherwise 1.
    • -v, --verbose: Displays detailed tracing information.
  • %dvc_checkout: Update files and directories in the workspace based on current DVC files.

    Description of command parameters.
    • --summary: Displays a summary of changes made by the current command in the workspace.
    • -R, --recursive: Defines files to update by searching each target directory and its subdirectories for DVC files to check. If there are no directories among the targets, this parameter is ignored.
    • -d, --with-deps: Defines files to update by tracking dependencies to the target DVC files. If no targets are specified, the parameter is ignored.
    • -f, --force : Deletes unsaved changes to the workspace.
    • --relink: Ensures that the file linking strategy (reflink, hardlink, symlink, or copy) for all data in the workspace is consistent with the project's cache.
    • -q, --quiet: Stops writing data to standard output. 0 is output if no problems arise, otherwise 1.
    • -v, --verbose: Displays detailed tracing information from executing the dvc pull command.
  • %dvc_pull: Download tracked files or directories from remote storage to the cache and workspace.

    Description of command parameters.
    • -a, --all-branches: Defines files to download by checking dvc.yaml and .dvc files in all Git branches instead of just those present in the current workspace.
    • -T, --all-tags: The same as -a, but applies to Git tags and the workspace. Can be combined with the -a parameter, for example, using the -aT flag.
    • --all-commits: The same as -a or -T, but applies to all Git commits and the workspace. Used for downloading all the data for the entire existing history of the project.
    • -d, --with-deps: Defines files to download by tracking dependencies to the targets. If no targets are specified, the parameter is ignored.
    • -R, --recursive: Defines files to download by searching each target directory and its subdirectories for dvc.yaml and .dvc files to check. If there are no directories among the targets, this parameter is ignored.
    • -f, --force : Doesn't prompt when removing workspace files that no longer match the current stages or DVC files.
    • -r <name>, --remote <name>: Sets the name of the remote storage to pull from.
    • --run-cache: Downloads all available history of stage runs from the remote repository into the local run cache.
    • -j <number>, --jobs <number>: Specifies the number of parallel jobs to download files from remote storage.
    • -h, --help: Shows Help.
    • -q, --quiet: Stops writing data to standard output. 0 is output if data is up-to-date, otherwise 1.
    • -v, --verbose: Displays detailed tracing information.
  • %dvc_push: Upload tracked files or directories to remote storage.

    Description of command parameters.
    • -a, --all-branches: Defines files to upload by checking dvc.yaml and .dvc files in all Git branches instead of just those present in the current workspace.
    • -T, --all-tags: The same as -a, but applies to Git tags and the workspace. Can be combined with the -a parameter, for example, using the -aT flag.
    • --all-commits: The same as -a or -T, but applies to all Git commits and the workspace. Used for downloading all the data for the entire existing history of the project.
    • -d, --with-deps: Defines files to upload by tracking dependencies to the targets. If no targets are specified, the parameter is ignored.
    • -R, --recursive: Defines files to upload by searching each target directory and its subdirectories for dvc.yaml and .dvc files to check. If there are no directories among the targets, this parameter is ignored.
    • -r <name>, --remote <name>: Sets the name of the remote storage to push to.
    • --run-cache: Uploads all available history of stage runs to the remote repository.
    • -j <number>, --jobs <number>: Specifies the number of parallel jobs to process file uploads to the remote repository.
    • -h, --help: Shows Help.
    • -q, --quiet: Stops writing data to standard output. 0 is output if data is up-to-date, otherwise 1.
    • -v, --verbose: Displays detailed tracing information.

Integration with the Git version control system

To use the version control system, click Git in the project window. To work with the system, you can clone an existing repository or initiate a new one, and use all common methods for working with Git. For more information, see Working with Git.

Was the article helpful?

Language / Region
© 2022 Yandex.Cloud LLC
In this article:
  • Integration with the DVC data version control system
  • Integration with the Git version control system