Getting started with Vision

This section describes how to get started with the Vision API for image analysis.

Before getting started

To try the examples in this section:

  1. On the billing page, make sure that the payment account has the ACTIVE or TRIAL_ACTIVE status. If you don't have a payment account, create one.
  2. Make sure you have installed the cURL utility that is used in the examples.
  3. Get the ID of any folder that your account is granted the editor role or higher for.
  4. Get an IAM token for your Yandex account.

To perform these operations on behalf of the service account:

  1. Assign the editor role or a higher role to the service account for the folder where it was created.
  2. Do not specify the folder ID in the request: the service uses the folder where the service account was created.
  3. Choose the authentication method: get an IAM token or API key.

Analyze the image

You can use any of the analysis features from the list. For example, use the Text recognition feature:

  1. Prepare an image file that meets the requirements:

    • Supported file formats: JPEG, PNG, PDF.

      The MIME-type of the file is specified in the mime_type property. The default is image.

    • Maximum file size: 1 MB.

    • Image size should not exceed 20 MP (length x width).

    Note

    Need a sample image? Download an image of the penguin crossing road sign.

  2. Encode the file as Base64:

    $ base64 -i input.jpg > output.txt
    
    C:> Base64.exe -e input.jpg > output.txt
    
    [Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
    
    # Импортируйте библиотеку для кодирования в Base64
    import base64
    
    # Создайте функцию, которая кодирует файл и возвращает результат.
    def encode_file(file):
      file_content = file.read()
      return base64.b64encode(file_content)
    
    // Считайте содержимое файла в память.
    var fs = require('fs');
    var file = fs.readFileSync('/path/to/file');
    
    // Получите содержимое файла в формате Base64.
    var encoded = Buffer.from(file).toString('base64');
    
    // Импортируйте библиотеку для кодирования в Base64.
    import org.apache.commons.codec.binary.Base64;
    
    // Получите содержимое файла в формате Base64.
    byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
    
    import (
        "bufio"
        "encoding/base64"
        "io/ioutil"
        "os"
    )
    
    // Откройте файл
    f, _ := os.Open("/path/to/file")
    
    // Прочитайте содержимое файла.
    reader := bufio.NewReader(f)
    content, _ := ioutil.ReadAll(reader)
    
    // Получите содержимое файла в формате Base64.
    base64.StdEncoding.EncodeToString(content)
    
  3. Create a file with the request body (for example, body.json). In the content property, specify the Base64-encoded image.

    For the service to automatically detect the text language, specify the "language_codes": ["*"] property in the configuration.

    body.json:

    {
        "folderId": "b1gvmob95yysaplct532",
        "analyze_specs": [{
            "content": "iVBORw0KGgo...",
            "features": [{
                "type": "TEXT_DETECTION",
                "text_detection_config": {
                    "language_codes": ["*"]
                }
            }]
        }]
    }
    
  4. Send a request using the batchAnalyze method and save the response in a file, such as output.json:

    $ export IAM_TOKEN=CggaATEVAgA...
    $ curl -X POST \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${IAM_TOKEN}" \
        -d @body.json \
        https://vision.api.cloud.yandex.net/vision/v1/batchAnalyze > output.json
    

    The response will consist of recognized blocks of text, lines, and words with their position on the image indicated:

    {
      "results": [{
        "results": [{
          "textDetection": {
            "pages": [{
              "blocks": [{
                "boundingBox": { ... },
                "lines": [{
                    "boundingBox": { ... },
                    "words": [{
                      "boundingBox": {
                        "vertices": [{
                            "x": "504",
                            "y": "760"
                          },
                          {
                            "x": "504",
                            "y": "836"
                          },
                          {
                            "x": "826",
                            "y": "836"
                          },
                          {
                            "x": "826",
                            "y": "760"
                          }
                        ]
                      },
                      "languages": [{
                        "languageCode": "en",
                        "confidence": 0.9520227313
                      }],
                      "text": "PENGUINS",
                      "confidence": 0.9520227313
                    }],
                    "confidence": 0.9520227313
                  },
                  ...
                ]
              }],
              "width": "1920",
              "height": "1280"
            }]
          }
        }]
      }]
    }
    
  5. To get all the recognized words from the image, find all the lines with the text property (one way to do this is using the grep utility):

    $ grep -o "\"text\":\s\".*\"" output.json
    "text": "PENGUINS"
    "text": "CROSSING"
    "text": "SLOW"
    

What's next