Computer Vision | Use Cases in AIW

Computer Vision | Use Cases in AIW

by AIW Blog Team - July 14, 2022

Computer Vision refers to the branch of computer science that is dedicated to the creation and building of digital systems that are given the capability to analyze, process, and comprehend data in multiple formats such as images, videos, and other visual inputs. Computer vision largely involves training computers to analyze and process pixel-level images to understand what every data element means and signifies.  

The most current and common uses of Machine Learning in computer vision include object detection, object classification, extraction of meaningful data or text from documents, images, photos, audios, and videos. 

3 Main Functions of Computer Vision Systems

Object Identification

A computer vision system can be used to parse visual content and identify a specific object in an image or video. For instance, a computer vision system can be trained to identify one specific cat among multiple cats in any image or video. 

Object Tracking

Computer vision systems built for object tracking are trained to process videos to find a single specific object or multiple specific objects that align with the search criteria and track the movement of the object simultaneously. 

Object Classification

A computer vision system is designed to parse visual content and classify the objects visible and identifiable in an image or video to assign specific categories. For instance, a computer vision system being trained and directed to analyze a set of multiple images to spot a cat. 

AIW Use Cases in Computer Vision

Use Case: Video Cue Classification

Industry: Media & Networking (Computer Vision)


The primary purpose of Video Cue Classification is to annotate videos (Movies/Webseries) into categories like shot boundary, credit boundary, black frames & SMPTE Bars.

Problem Statement: 

At present, the users do not have a choice to skip a specific portion of a video which they do not intend to watch. Annotating or classifying a video into said types, will give users the option to know more about the frame-wise or timewise sequence of the content they are looking for in any given video.

Process We Followed:

Stage 1

Choosing the Type of Annotations in VTC:

  • Shot boundary Annotation
  • Credit boundary Annotation
  • Black Frame Annotation
  • SMPTE Bars Annotation
  • Slates

Stage 2

  • Credit Annotation

Annotating the frames where the credits of the Movies/TV shows have been displayed. 

  • Shot Boundary Annotation

Shot boundary annotation is to detect the transition in between shots.

  • Black Frame Annotation

A single or multiple black frames in a Movie/TV Shows can be considered under Black frame annotation.

  • SMPTE Bar Annotation

SMPTE is color bars which are used as trademarked television test patterns.

  • Slate Annotation

Slates are frames that are not the actual scenes/shots in the video and also not credits/black frames. Slates frames have text with information about the video.

AIW Provided a Solution

Annotated videos will give users an option to skip through and watch only the intended content of a video. 

Use Case: Medical Imaging

Industry: Medical (Computer Vision)


The objective is to source different types of X-ray images of human body

Problem Statement: 

This is a part of the machine learning process and a part of the medical domain, where different types of X-ray images are sourced which are then used as inputs to train machines to recognize different types of X-Ray images of the human body.

Process We Followed:

As a part of sourcing activity different X-Ray images are being sourced as per specifications encompassing different human body parts.

AIW Provided a Solution

The source images are used as inputs to deep learning models for the machine to acquire human intelligence to detect different bone ailments and fractures.

Use Case: Text Annotation

Industry: Banking & Financial Service (Computer Vision)


Text annotation is a document analysis service that detects and extracts printed and handwritten texts, structured data such as fields of interest and their values, tables from images and scanned of documents. This machine learning model has been trained on millions of documents so that virtually any type of document uploaded will automatically get recognized and processed for text extraction. When any information is extracted from any document, the service returns a confidence score for each element it identifies so that one can make informed decisions about how they want to use the results. For instance, if one is extracting information from a tax based document, custom rules can be set to flag any extracted information from there. Also, all extracted data will be reflected with a bounding box coordinate, which is a rectangular frame that fully encompasses each piece of data identified, so that it can quickly identify where a word or a number appears on a document.

Problem Statement: 

When there are millions of text documents and important PII documents like driving licenses, SSN , payslips, medical documents, receipts & invoices, tax forms and historical documents, it is important to store such data in a secure place. However, earlier they were stored in hard drives and physical lockers. As a result the capacity and security of data was very less, also there were high chances of data loss which cannot be recovered further as there were no smart record keeping and searching features. Hence to overcome the problems Text annotation is the one stop solution.

Process We Followed:

Stage 1

Pre Processing Steps:

Before heading to the annotation there are some pre-processing requirements such as collecting and sourcing text based documents from both online and offline. After that primary task is to analyze the documents and upload the same in a designated bucket for Human data annotation.

Optical character recognition (OCR):

  • Rectification and checking of the documents and alignments.
  • The entire content (words, lines, letters or characters) to be annotated as per the requirement.
  • The texts available in the documents to be labeled and classified in various annotation types such as Handwritten, Printed, Signature, Tax, Vendor Name, Total, Subtotal etc.
  • And lastly every word in the document to be transcribed with an accuracy of 100%.

Stage 2

Tables Annotation:

In any document first the table structure is to be identified and recognized under this annotation journey. Then the tabular structure to be annotated in a single bounding box and all the vertical and horizontal cells to be annotated along with all the logical table identification following the guidelines.

Stage 3

Key Values Annotation:

The objective is to analyze the document and identify keys and values pairs to be annotated in the documents based on the guidelines (e.g. if in any document it is written as – Name: Mark, while annotation it will reflect as Key=Name: and Value=Mark)

AIW Provided a Solution

The annotated text documents are used as an input to deep learning models for the machine to acquire human intelligence to detect the anomalies in data structure and keeping the data information safe and secured.

These intelligent AI systems could improve the speed and accuracy of large-scale document identification and help in storing the same, and open a gateway for a broader aspect in terms of Machine Learning.

Leave a Comment

Your email address will not be published. Required fields are marked *