- 7 min read

50 Essential Computer Vision Terms (Part II)

Learn essential terms of computer vision, Part 2.

50 Essential Computer Vision Terms (Part II)
Instance Segmentation | Unitlab Annotate

In Part I of this series, we explored the first 25 essential computer vision terms—covering foundational ideas like AI and machine learning, as well as core metrics like IoU and F1 score.

50 Essential Computer Vision Terms (Part I)
Learn essential terms of computer vision, Part 1.

50 Essential Computer Vision Terms (Part I) | Unitlab Annotate

This second part continues with 25 more terms commonly encountered in computer vision—especially in areas like data annotation, image labeling, object detection, and real-world deployment. These entries include annotation types, file formats, data challenges, and emerging tools like SAM and synthetic data.

Whether you're preparing datasets, developing models, or managing labeling workflows, understanding these terms will help you work more effectively across teams and systems.

Image Classification | Toloka AI

26. Classification

Image classification assigns a single label to an entire image. It answers the question: What is this image about? Examples include determining whether an image contains a cat or a dog, or identifying a product category in retail.

Semantic Segmentation | Unitlab Annotate

27. Segmentation

Segmentation, also known as semantic segmentation or pixel-perfect labeling, involves assigning a label to every pixel in an image. This divides the image into regions that correspond to different objects or categories.

Unlike object detection, which focuses on locating objects, segmentation works at the pixel level and is essential for detailed tasks such as medical imaging or autonomous vehicle scene analysis.

Image Instance Segmentation | Unitlab Annotate

28. Instance Segmentation

Instance segmentation builds on semantic segmentation by distinguishing between individual instances of the same object class. While semantic segmentation treats all objects of the same class identically, instance segmentation keeps them separate.

For example, two people in the same image would each be segmented individually, even though they both belong to the "person" class.

0:00
/0:06

Object Detection Demo | Unitlab Annotate

29. Object Detection

Object detection identifies and classifies objects within an image, typically by drawing a bounding box and assigning a label to each. While similar to instance segmentation, object detection is generally less precise due to its reliance on rectangular boxes.

That said, bounding boxes are simple, intuitive, and widely adopted for real-world tasks such as vehicle tracking, product detection, and pedestrian monitoring.

30. OCR

OCR (Optical Character Recognition) is the process of detecting and extracting text from images. Common applications include digitizing printed documents, reading license plates, and extracting text from scanned legal or banking records.

OCR plays an essential role in computer vision by enabling systems to interpret and act on textual content embedded in images.

Bounding Boxes | Unitlab Annotate

31. Bounding Boxes

Bounding boxes are rectangular boxes drawn around objects to indicate their position. This is the most common form of object annotation used in detection tasks.

More advanced use cases include rotated bounding boxes, which add an angle parameter for improved accuracy around tilted or irregularly oriented objects.

Polygons | Unitlab Annotate

32. Polygons

Polygon annotation allows for precise outlining of object boundaries by connecting multiple points. This method is especially useful for objects with irregular shapes that cannot be accurately captured by rectangles.

Because of this precision, polygons are frequently used in instance segmentation tasks.

Polyline Annotation

33. Polylines

Polylines are open shapes formed by connecting a series of points with straight lines. Unlike polygons, they do not close into a loop. Polylines are used to label road lanes, borders, and paths—particularly in transportation, mapping, and geospatial applications.

34. Keypoints

Keypoints are specific coordinates within an object that highlight important locations—such as facial landmarks, hand joints, or tool tips. Bounding boxes or polygons draw annotations around the object of interest, while skeletons draw inside to mark specific locations.

This type of annotation is especially useful in gesture recognition, facial analysis, and robotics.

Human Pose Estimation | Unitlab Annotate

35. Human Pose Estimation

Pose estimation identifies the positions of human joints and connects them to estimate body posture. It relies on internal keypoints—like shoulders, elbows, knees—and maps them into a skeletal model.

Applications include motion tracking in sports, rehabilitation programs, and interactive systems.

36. LiDAR

LiDAR (Light Detection and Ranging) is a remote sensing technology that uses lasers to measure distances and build 3D maps of environments. It emits pulses of light in all directions and calculates distances based on return time.

LiDAR is widely used in self-driving vehicles,

LiDAR Example | Mark Rober

Image Point Annotation | Unitlab Annotate

37. Point

Point annotation involves marking a single pixel or coordinate to represent the presence or center of an object. This is useful in scenarios like crowd counting or key feature localization, where full outlines are unnecessary.

It’s a minimalist yet powerful form of annotation when accuracy at a single location is enough.

38. Data Annotator

A data annotator is a specialist who prepares labeled data for machine learning, particularly in supervised learning tasks. They follow detailed instructions to ensure consistency and quality in the labeled dataset.

In computer vision, annotators use tools to draw boxes, assign labels, trace outlines, and validate AI-generated labels.

39. Manual Annotation

Manual annotation refers to data labeling performed entirely by human annotators. While labor-intensive, it offers high-quality results and is often used to create ground-truth datasets.

Due to the volume of data needed for modern ML, manual annotation is often reserved for critical or complex samples.

40. SAM (Segment Anything Model)

The Segment Anything Model (SAM), developed by Meta AI, is a foundational segmentation model that can label objects in any image with minimal input.

SAM supports diverse domains and reduces the time required for manual labeling. It represents a major step forward in interactive and automatic image segmentation. Here's the example of annotating a medical image with SAM:

0:00
/0:30

SAM Model | Unitlab Annotate

41. Automatic Annotation

Automatic annotation leverages pre-trained models to label data automatically. It greatly accelerates dataset creation, especially for large-scale projects.

While fast, these labels often lack the domain-specific accuracy of human annotators and typically require review and refinement.

42. Hybrid Annotation

Hybrid annotation combines automatic tools with manual oversight. The system produces initial labels, which are then corrected or enhanced by human annotators.

This workflow balances efficiency and quality and is the most common setup in production data labeling pipelines.

43. Brush Annotation

Brush annotation is a freeform labeling technique where annotators use a digital brush to “paint” object areas. It is often used in segmentation tasks where precision is required, but pixel-level polygon drawing is inefficient.

Brush tools are ideal for organic or irregular shapes such as tissues, clouds, or fluid boundaries:

0:00
/0:11

Polygon Brush Annotation | Unitlab Annotate

44. Dataset Format

Dataset format defines how images, labels, and metadata are stored and structured. It determines how tools read, write, and manage data.

Choosing the right format ensures compatibility with model training tools, annotation platforms, and version control systems. Common formats include JSON, XML, and CSV.

45. JSON

JSON (JavaScript Object Notation) is a lightweight, human-readable format used to store structured data. In CV, it’s widely used to represent annotations, such as bounding boxes, class labels, and confidence scores.

Bounding Box | Medium

For example, for the image and bounding box above, these are the JSON coordinates:

{
	'class': 'Person',
	'coordinates': {
		'x': 10,
		'y': 10,
		'width': 150,
		'length': 300
	},
	'confidence_score': 0.93,
}

46. XML

XML (eXtensible Markup Language) is a structured data format used in older or legacy systems. While more verbose than JSON, it is still widely used in datasets such as PASCAL VOC.

Understanding both XML and JSON is important for compatibility across tools and workflows in computer vision.

47. Imbalanced dataset

An imbalanced dataset contains a disproportionately high number of samples from certain classes and very few from others—e.g., 95% cats and 5% dogs.

This can lead to biased models that underperform on underrepresented classes. Solutions include resampling, class weighting, and data augmentation.

48. Occlusion

Occlusion occurs when part of an object is blocked from view—by another object, the edge of the image, or an obstruction. For example, a person standing behind a table may only be partially visible.

Models must learn to recognize partially occluded objects. The best practice during labeling is to annotate the object as if fully visible.

49. Data Augmentation

Data augmentation refers to techniques used to artificially expand a dataset by applying transformations like rotation, flipping, cropping, or lighting changes.

This improves model generalization, reduces overfitting, and helps address class imbalance when data is limited.

50. Synthetic Data

Synthetic data is artificially generated through simulations, rendering engines, or generative models like GANs. It mimics real-world data but can be produced at scale without manual collection or labeling.

Synthetic data is valuable for training in privacy-sensitive or data-scarce domains and enables rapid prototyping before gathering real samples.

💡
Subscribe to our blog for more posts on computer vision.

Conclusion

Computer vision today goes far beyond object detection. It involves thoughtful dataset design, high-quality annotations, evaluation metrics, file formats, and practical trade-offs between speed and accuracy.

With these 50 essential terms, you now have a vocabulary that spans the full CV pipeline—from annotation tools to model performance. Whether you're a developer, annotator, researcher, or team lead, these terms will help you communicate and build with greater confidence.

Explore More

  1. Who is a Data Annotator?
  2. Four Essential Aspects of Data Annotation
  3. A Comprehensive Guide to Image Annotation Types and Their Applications

References

  1. Brad Dwyer (Oct 5, 2020). Glossary of Common Computer Vision Terms. Roboflow Blog: Link
  2. Nikolaj Buhl (November 11, 2022). 39 Computer Vision Terms You Should Know. Encord Blog: Link