Get a head start on program admission

Preview this course in the non-credit experience today! 
Start working toward program admission and requirements right away. Work you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.

Course Type: Elective

Specialization: Introduction to Computer Vision

Instructors: Dr. Tom Yeh

Prior knowledge needed: TBD

View on Coursera

Course Description

Introduction to Computer Vision guides learners through the essential algorithms and methods to help computers 'see' and interpret visual data. You will first learn the core concepts and techniques that have been traditionally used to analyze images. Then, you will learn modern deep learning methods, such as neural networks and specific models designed for image recognition, and how it can be used to perform more complex tasks like object detection and image segmentation. Additionally, you will learn the creation and impact of AI-generated images and videos, exploring the ethical considerations of such technology. 

Learning Outcomes

  • Analyze a complex computing problem and to apply principles of computing and other relevant disciplines to identify solutions.
  • Design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the program’s discipline.
  • Communicate effectively in a variety of professional contexts.
  • Recognize professional responsibilities and make informed judgments in computing practice based on legal and ethical principles.
  • Function effectively as a member or leader of a team engaged in activities appropriate to the program’s discipline.
  • Apply computer science theory and software development fundamentals to produce computing-based solutions. 

Course Grading Policy

Assignment

Percentage of Grade

Module 1: Graded Quiz

20%

Module 2: Graded Quiz

20%

Module 3: Graded Quiz

20%

Module 4: Graded Quiz

20%

CSCA 5222 Introduction to Computer Vision Final Exam

20%

Course Content

Duration: 7 hours 8 minutes

This module introduces foundational concepts related to common image types and functions. It offers a comprehensive overview of different formats and their unique characteristics. This section establishes the context for understanding how images are represented and processed in various applications. Next, the module delves into image functions, explaining the basic operations that can be performed on images to enhance or manipulate them, such as cropping, resizing, or adjusting brightness. It also covers more advanced operations like filtering and thresholding, illustrating how these functions play a crucial role in image processing.Then the module explores the underlying mathematics of image transformations. It starts with linear transforms, highlighting their application in image scaling, rotation, and translation. The module then introduces homogeneous coordinates, providing a simplified approach to represent complex transformations with additional dimensions. This leads into a deeper exploration of homogeneous transformations, demonstrating how they are used to perform multiple transformations in a single step.

Duration: 3 hours 35 minutes

This module provides a deep dive into image analysis and similarity assessment techniques. It starts by exploring the basic concept of comparing pixels, highlighting how individual pixel values can be used to gauge similarity. This is followed by a detailed discussion on comparing multiple images by their features, emphasizing the advantages of feature-based analysis over pixel-by-pixel comparison. The module introduces the concept of image moments, revealing how these statistical properties help identify shapes and patterns within images.The module then addresses similarity and distance, offering a quick overview of how these concepts are calculated and applied in image processing. You'll also learn about converting pixels into distributions, an essential technique for more complex analysis. This leads to a comprehensive explanation of cross-entropy, providing insights into its role in measuring the dissimilarity between distributions. You'll explore cross-correlation in 1D, followed by a deeper examination of cross-correlation as matrix multiplication. The module wraps up by exploring cross-correlation in more detail, with a focus on the mathematics behind it

Duration: 4 hours 33 minutes

This module delves into multiview geometry, a pivotal concept in computer vision and 3D modeling. It starts with a brief overview of the motivation behind multiview systems, highlighting the advantages of capturing scenes from multiple viewpoints. The module then discusses multiple coordinate systems, exploring how different reference frames can describe points and transformations in 3D space. You'll also learn about multiple viewing planes, which play a crucial role in multiview setups by providing unique perspectives for scene reconstruction.The focus shifts to multiview projection, examining how distinct images from multiple cameras can be used to create a cohesive 3D scene. You'll gain insights into the principles of translation and rotation in 3D, crucial for understanding camera movement and orientation. The module also covers camera translation and camera rotation, offering practical examples to illustrate how camera motion affects the geometry and visual representation of a scene. 

Duration: 4 hours 3 minutes

This module delves into key concepts of camera models and their role in computer vision and photogrammetry. Learn about the Extrinsic Matrix, exploring how it defines the position and orientation of a camera in 3D space. Understand the Pinhole Camera Model, a simplified optical system that forms the basis for many computer vision applications, alongside the Intrinsic Matrix, which captures the internal parameters of the camera. Epipolar geometry is examined, with a focus on its significance in 3D reconstruction and stereo vision. The module covers the motivation behind epipolar geometry, breaking down its basic components, and explaining the Essential Matrix, which encapsulates the geometric relationship between camera views, as well as the Fundamental Matrix, a core component in epipolar geometry that represents the relationship between two cameras in stereo vision.

Duration: 2 hours 12 minutes

This module contains materials for the final exam. If you've upgraded to the for-credit version of this course, please make sure you review the additional for-credit materials in the Introductory module and anywhere else they may be found.

Notes

  • Cross-listed Courses: Courses that are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
  • Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click the View on Coursera button above for the most up-to-date information.