Yandex.Cloud
  • Services
  • Why Yandex.Cloud
  • Pricing
  • Documentation
  • Contact us
Get started
Yandex Vision
  • Getting started
  • Step-by-step instructions
    • All instructions
    • Text recognition
    • Assessing image quality
    • Image moderation
    • Face detection
    • Base64 encoding
  • Concepts
    • Overview
    • Text recognition
      • Overview
      • Template recognition
      • Supported languages
      • Current version restrictions
    • Image classification
      • Overview
      • Supported models
    • Face detection
    • Quotas and limits
  • Access management
  • Pricing policy
  • API reference
    • Authentication in the API
    • gRPC
      • Обзор
      • VisionService
    • REST
      • Handling errors
      • Troubleshooting
      • Overview
      • Vision
        • Overview
        • batchAnalyze
  • Questions and answers
  1. Getting started

Getting started with Vision

  • Before getting started
  • Analyze the image

This section describes how to get started with the Vision API for image analysis.

Before getting started

To use the examples, install cURL and get the authorization data for your account:

User's account on Yandex
Service accounts
Federated account
  1. On the billing page, make sure that your billing account status is ACTIVE or TRIAL_ACTIVE. If you don't have a billing account, create one.
  2. Get an IAM token required for authentication.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  1. Select the authentication method:

    • Get an IAM token used in the examples.

    • Create an API key. Pass the API key in the Authorization header in the following format:

      Authorization: Api-Key <API key>
      
  2. Assign the editor role or a higher role to the service account for the folder where it was created.

    Don't specify the folder ID in your requests: the service uses the folder where the service account was created.

  1. Authenticate with the CLI as a federated user.

  2. Use the CLI to get an IAM token required for authentication:

    $ yc iam create-token
    
  3. Get the ID of any folder that your account is granted the editor role or higher for.

Analyze the image

You can use any of the analysis features from the list. For example, use the Text recognition feature:

  1. Prepare an image file that meets the requirements:

    • Supported file formats: JPEG, PNG, PDF.

      You specify the MIME type of the file in the mime_type property. The default is image.

    • Maximum file size: 1 MB.

    • Image size should not exceed 20 MP (length x width).

    Note

    Need a sample image? Download an image of the penguin crossing road sign.

  2. Encode the file as Base64:

    UNIX
    Windows
    PowerShell
    Python
    Node.js
    Java
    Go
    $ base64 -i input.jpg > output.txt
    
    C:> Base64.exe -e input.jpg > output.txt
    
    [Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
    
    # Импортируйте библиотеку для кодирования в Base64
    import base64
    
    # Создайте функцию, которая кодирует файл и возвращает результат.
    def encode_file(file):
      file_content = file.read()
      return base64.b64encode(file_content)
    
    // Считайте содержимое файла в память.
    var fs = require('fs');
    var file = fs.readFileSync('/path/to/file');
    
    // Получите содержимое файла в формате Base64.
    var encoded = Buffer.from(file).toString('base64');
    
    // Импортируйте библиотеку для кодирования в Base64.
    import org.apache.commons.codec.binary.Base64;
    
    // Получите содержимое файла в формате Base64.
    byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
    
    import (
        "bufio"
        "encoding/base64"
        "io/ioutil"
        "os"
    )
    
    // Откройте файл
    f, _ := os.Open("/path/to/file")
    
    // Прочитайте содержимое файла.
    reader := bufio.NewReader(f)
    content, _ := ioutil.ReadAll(reader)
    
    // Получите содержимое файла в формате Base64.
    base64.StdEncoding.EncodeToString(content)
    
  3. Create a file with the request body (for example, body.json). In the content property, specify the Base64-encoded image.

    For the service to automatically detect the text language, specify the "language_codes": ["*"] property in the configuration.

    body.json:

    {
        "folderId": "b1gvmob95yysaplct532",
        "analyze_specs": [{
            "content": "iVBORw0KGgo...",
            "features": [{
                "type": "TEXT_DETECTION",
                "text_detection_config": {
                    "language_codes": ["*"]
                }
            }]
        }]
    }
    
  4. Send a request using the batchAnalyze method and save the response in a file, such as output.json:

    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -d '@body.json' \
        https://vision.api.cloud.yandex.net/vision/v1/batchAnalyze > output.json
    

    The response will consist of recognized blocks of text, lines, and words with their position on the image indicated:

    {
      "results": [{
        "results": [{
          "textDetection": {
            "pages": [{
              "blocks": [{
                "boundingBox": { ... },
                "lines": [{
                    "boundingBox": { ... },
                    "words": [{
                      "boundingBox": {
                        "vertices": [{
                            "x": "504",
                            "y": "760"
                          },
                          {
                            "x": "504",
                            "y": "836"
                          },
                          {
                            "x": "826",
                            "y": "836"
                          },
                          {
                            "x": "826",
                            "y": "760"
                          }
                        ]
                      },
                      "languages": [{
                        "languageCode": "en",
                        "confidence": 0.9520227313
                      }],
                      "text": "PENGUINS",
                      "confidence": 0.9520227313
                    }],
                    "confidence": 0.9520227313
                  },
                  ...
                ]
              }],
              "width": "1920",
              "height": "1280"
            }]
          }
        }]
      }]
    }
    
  5. To get all the recognized words from the image, find all the lines with the text property (one way to do this is using the grep utility):

    $ grep -o "\"text\":\s\".*\"" output.json
    "text": "PENGUINS"
    "text": "CROSSING"
    "text": "SLOW"
    

What's next

  • Find out about all the service features
  • Learn about API authentication methods
  • Detect faces in a photo
  • Assess the quality of an image
  • Moderate images
In this article:
  • Before getting started
  • Analyze the image
Language
Careers
Privacy policy
Terms of use
© 2021 Yandex.Cloud LLC