Background

During the COVID-19 pandemic, doctors were faced with a huge number of CT lung scans in need of analysis and interpretation. In response, the Moscow healthcare system decided to include CT lung scans in an experiment in 2020 involving the use of innovative computer vision technology for the analysis of medical imaging.

A service developed by CVisionLab on the Yandex Cloud platform fully processes 90% of scans — from a medical facility’s incoming request to the results uploaded in response — in 2.5 minutes. During the third wave of the pandemic, the service was able to successfully handle a load of 1,000 CT scans per day, which is roughly 200 GB of data processed.

Developing a project in uncertain conditions

The use of CT scans in the diagnosis of coronavirus infections makes it possible to identify complications at an early stage and start a patient’s treatment or hospitalization as soon as possible. Understandably, doctors have needed to analyze a huge number of them throughout the COVID-19 pandemic. Yet with doctors' increasing workloads as more and more scans needing to be processed manually, comes increased likelihood of errors as their attention erro a doctor’s workload increases, however, with more and more scans to be processed manually, the risk increases that the doctor’s attention will suffer, with greater probability of errors occurring.

Enter CVisionLab, a developer of creative R+D solutions in the fields of computer vision and AI. CVL decided to participate in the development of a new service to use AI to analyze CT scans and offer a preliminary conclusion, significantly easing the burden on already busy radiologists.

Development of the imaging analysis system began just as the epidemiological situation started to worsen, bringing with it constant changes to radiologists' work processes. These changes led to ever stricter requirements for the services being developed. One example was the maximum time permitted for one scan to be processed — it was reduced from 15 to 6.5 minutes, where as the number of CT scans to be processed daily increased several times.

Another factor to account for was the unstable workload — during the day, requests for interpretations of new scans could arrive every 2-3 seconds, and then drop off to one every 10 minutes in the evening and at night, sometimes even less. According to CVisionLab’s calculations, to cope with peak loads independently and process CT scans in 6.5 minutes, roughly 10 servers equipped with modern GPUs and load balancing between them would have been required. During non-peak periods, however, most of these same servers would stand idle for a significant part of the time.

The company decided to take a different path, developing a flexible, scalable solution using a microservice architecture based on Yandex Cloud’s managed services. This approach made it possible for the amount of computing resources available for us to dynamically change in response to the service’s load at any given moment. Hosting the service on Yandex’s cloud platform made it possible to ensure both system and data security in accordance with the requirements of Russian personal data protection laws and industry standards ISO 27001, 27017, and 27018.

CVisionLab took part in the Yandex Cloud Boost program, which offers companies who have developed their own IT solutions (at least at the MVP stage) grants to test Yandex Cloud, assistance in migrating to the cloud, and free technical support for a year.

From MVP to microservices

Prior to the microservice solution, CVisionLab had been using a more basic system in development since the spring of 2020. It took three months to create and train an ML model, and the next four months to develop a minimum viable product (MVP), integrate it with the Moscow healthcare service’s radiological information service, and conduct third-party function and calibration tests.

Meanwhile, in early 2020, as the pandemic in Russia was just starting, there was not yet enough marked-up data to train ML algorithms, and all the specialists who could prepare such a dataset were busy fighting for patients' lives on the front line. As a result, CVisionLab had to find a way to squeeze the most out of a very limited amount of data. Today, the imaging analysis system is based on three convolutional neural networks:

  1. One for binary classification, which analyzes all sections of the scan for signs of pathologies, aggregates information, and offers a conclusion about the presence (or absence) of signs of coronavirus infection in the patient’s lungs.
  2. Another for the segmentation of the lung area in the scan.
  3. A third for the segmentation of areas with pathological changes in the lungs.

CVisionLab also made use of the MosMedData dataset, which consists of 1110 CT scans, in the development of the algorithms. In the fall of 2020, the service was connected to the production environment and began processing the CT scans of real patients.

Developing a microservice solution

Having assessed the load on the new system and predicting its future growth, CVisionLab started the next stage of development — the creation of a scalable microservice solution based on Yandex Cloud’s managed services.

Microservice architecture involves building a single application from a number of loosely connected smaller components (microservices) that support independent deployment. Such an approach makes it much easier not only to update the code and add new functionalities, but also to automate the scaling of the system and increase fault tolerance. Unlike a monolithic solution, microservices scale automatically as loads increase, and the failure of one service will not lead to a system shutdown as a whole.

The microservice approach is based on containerization, and managing containers in a single cluster of microservices, i.e. orchestration, is facilitated by Kubernetes. To interact with the cluster, this system provides mechanisms which automate the deployment, scaling, and management of applications in the containers. We talked in detail about working with the Kubernetes® ecosystem at the [Kuber Conf] conference (/events/369).

Yandex Cloud’s ecosystem of tightly integrated managed services spports the creation of a microservice architecture:

  • Yandex Managed Service for Kubernetes® allows you to create clusters and groups of Kubernetes nodes with the ability to replicate to three geographically distributed availability zones, with the cloud provider maintaining and updating all infrastructure components.
  • Yandex Container Registry is a service for storing and managing Docker containers and images in the cloud.
  • Yandex Network Load Balancer is a service that distributes loads across cloud resources and provides fault tolerance for applications.
  • Yandex Message Queue is a scalable queue service for exchanging messages between services in a microservice architecture.

CVisionLab deployed clusters and node groups in Yandex Managed Service for Kubernetes®, which allowed them to provide scaling within groups and fault tolerance for the entire system. Virtual machines in Yandex Compute Cloud within the cluster were selected with the characteristics necessary for all service instances. Some virtual machines with GPUs process the scans, while others load, save, and process incoming messages, etc. Using Instance Groups Instance Groups through Kubernetes, instances of physical machines were divided into groups in which scaling rules were set both for the services deployed on these machines and directly for the physical machines.

Yandex Container Registry ensured the rapid deployment of new images and guaranteed stability regardless of the external environment. And Yandex Network Load Balancer served as a load balancer between Dicom Downloader services for downloading DICOM files with scans. This made it possible to have one common entry point via white lists of IP addresses in the external system’s firewall.

To store dumps and exchange data between services, a universal scalable S3 storage Yandex Object Storage was taken, and logs and contexts (for transferring information between services) placed in a managed service for MongoDB databases — Yandex Managed Service for MongoDB. The database, for example, stores information about the source of the file received by the system, its name, and the unique identifier of the series. The DICOM file obtained from CT scans may contain several series with CT slices, but only one series is processed in the system — the most suitable one based on the position of the series, the filter applied, the number of slices in the series, etc.

To ensure security and separate levels of access to Yandex Cloud resources, the Yandex Identity and Access Management service was used, and Yandex Monitoring evaluates the metrics of the key stages of the system (queues in particular), which are then used to scale the number of instances of the same type of services. This means that their number can increase when loads increase and vice-versa.

Refactoring the code and switching to the microservice architecture took 3 months, and a new version of the system was launched in 2021.

One scan complete every 2.5 minutes

CVisionLab has developed a system that fully processes 90% of scans in 2.5 minutes: from a medical facility’s incoming request to the results uploaded in response. Once the procedure is complete. the scan is automatically sent to the Moscow healthcare system’s radiological information service, to the CVisionLab service for processing, and then, to the doctor. The original image and additional series processed using AI can be viewed together, since the series are automatically synchronized by slice.

Scans in which AI predicted the presence of pathology are prioritized in the radiologist’s list of scans and are highlighted in red or orange (depending on the probability of the service’s prediction). When viewing a specific scan, the contours of a possible infection are marked red.

During the pandemic’s third wave, Yandex Cloud’s microservice architecture and cloud services made it possible for CVL to successfully handle a load of 1,000 CT scans — roughly 200 GB of processed data — on a daily basis. CVisionLab continues to maintain and update the service to keep it in line with rapidly changing functional requirements and the Moscow radiological information service’s API.

Independent testing showed showed the following quality metrics:

  • AUC (area under the ROC curve): 0.97
  • Sensitivity: 0.94
  • Specificity: 0.94

In a ranking of experiments involving computer vision for medical imaging analysis and its further application in the Moscow healthcare system, CVisionLab’s solution was first place in the category COVID CT scans as of August 2021.

Opinion

Denis Igorevich Brailovsky,
Technical project manager
Denis Igorevich Brailovsky,
Technical project manager

During development, we had deadlines we had to meet for the implementation of the solution while ensuring its performance and the accuracy of the ML algorithm. The keys to our success were in an efficiently scalable serverless solution architecture built on Yandex Cloud’s services, and in our powerful AI algorithms. We actively use GPU instances in the Compute Cloud service for our AI algorithms. Just like other resources, GPU instances scale depending on traffic levels, ensuring that we use resources efficiently, saving them when traffic decreases and quickly scaling when it increases. As a result, we have implemented a service that has already helped real doctors analyze the CT scans of tens of thousands of CT scans of real patients suspected of having COVID-19.