Course Description
This class will cover state-of-the-art methods in dynamic vision,
with an emphasis on segmentation, reconstruction and recognition of
static and dynamic scenes. Topics include: reconstruction of static
scenes (tracking and correspondences, multiple view geometry, self
calibration), reconstruction of dynamic scenes (2-D and 3-D motion
segmentation, nonrigid motion analysis), recognition of visual
dynamics (dynamic textures, face and hand gestures, human gaits,
crowd motion analysis), as well as geometric and statistical methods
for clustering and unsupervised learning, such as K-means,
Expectation Maximization, and Generalized Principal Component
Analysis.
Syllabus
- Image Formation, Feature Extraction, Matching and Correspondences
-
Image Formation (MASKS 1,2,3): photometry (radiance,
irradiance, BRDF), geometry (3-D surfaces, pinhole camera,
orthographic, affine, perspective and paracatadioptric projection
model), dynamics (rigid-body motions SO(3), SE(3), so(3), se(3),
twists and exponential maps).
-
Feature Extraction, Matching and Correspondences
(MASKS 4):
deformation models (2-D translational and 2-D affine), brightness
constancy constraint, tracking and optical flow, aperture problem,
feature extraction, matching and correspondences (sum of square
differences, normalized cross correlation), mosaicing.
- Reconstruction of Static Scenes
- Reconstruction from Two Calibrated Views (MASKS 5): epipolar
geometry, epipolar lines, epipoles, essential matrix, essential
manifold, pose recovery using the eight-point algorithm, planar
homographies, structure recovery via triangulation (linear and
optimal).
- Reconstruction from Two Uncalibrated Views
(MASKS 6):
uncalibrated epipolar geometry, fundamental matrix, transfer
properties of the fundamental matrix, camera calibration with a rig,
camera self-calibration, Kruppa equations.
- Reconstruction from Three Views (AZ 15,16):
trilinear constraint, trifocal tensor, transfer properties of
the trifocal tensor, seven-point algorithm, estimation of the
fundamental and camera matrices from the trifocal tensor.
- Reconstruction from Multiple Views
(Factorization,
MASKS 4,9, and AZ 18):
multiframe factorization for affine cameras,
multiple view matrix, multilinear constraints, optimal
reconstruction from multiview normalized epipolar constraint,
minimizing reprojection error, multiframe factorization for perspective cameras.
- Reconstruction of Symmetric Structures
(MASKS 10):
- Reconstruction of Dynamic Scenes
- Nonrigid Shape and Motion Estimation (AZ 18):
shape basis, shape constraints, motion constraints, nonrigid
factorization algorithms, nonrigid multiple view geometry.
-
Segmentation of Linear Motion Models from Two Views (VMS 3): 2-D translational from image intensities, 2-D translational,
2-D similarity, 2-D affine and 3-D translational from point
correspondences or optical flow.
-
Segmentation of Bilinear Motion Models from Two Views (VMS 7): multibody brightness constancy constraint, multibody affine
matrix, multibody epipolar constraint, multibody fundamental matrix,
generalized 8-point algorithm, multibody homography.
-
Motion Segmentation from Three Perspective Views (VMS 8):
multibody trilinear constraint, multibody trifocal tensor,
estimation and factorization of the multibody trifocal tensor.
-
Motion Segmentation from Multiple Views (VMS 3):
motion subspaces, multibody factorization algorithms for affine
cameras, multibody factorization from optical flow in perspective
and central panoramic cameras.
- Recognition of Visual Dynamical Processes
- Modeling Dynamic Textures, Hand Gestures, Human Gaits and Crow Motions: linear
dynamical models, system identification (N4SID),
- Recognition of Dynamic
Textures, Hand Gestures, Human Gaits and Crow Motions: metrics on the
space of dynamical models, classification on the space of dynamical models.
Textbooks
- Ma, Soatto, Kosecka and Sastry (MASKS): An Invitation to 3D Vision, Springer Verlag, 2003
- Hartley and Zisserman (HZ): Multiple View Geometry in Computer Vision, Cambridge University Press, second edition, 2004
- Vidal, Ma and Sastry (VMS): Generalized Principal Component Analysis, Springer Verlag, 2009
Grading
- Homeworks (30%): There will one homework every other
week (approximately). Homework problems will include both
analytical exercises as well as programming assignments in MATLAB.
- Midterm (40%): Wednesday April 1st (1.30-2.45PM)
- Project (30%): There will be a final project to be done in teams of two students.
Each group will either apply techniques from the course to solve a real problem
or solve an open research problem in dynamic vision. Each group
will write a report in LaTeX following the authors instructions in
the CVPR 2009 website and give a 20 minute presentation (including 5 minutes for questions) on the scheduled
exam day May 7th 2-5PM.
- Description: 1 page including project title and problem description (April 8)
- Progress report: 3 pages including title, abstract, introduction, problem description and proposed solution (April 22)
- Presentations: 15 min + 5 min questions (May 7)
- Final report: 6 pages including title, abstract, introduction, problem description, proposed solution, experimental evaluation, conclusions and references (May 7)
Administrative
- Late policy:
Homeworks and projects are due on the specified dates.
No late homeworks or projects will be accepted.
- Honor policy:
The strength of the university depends on academic and personal
integrity. In this course, you must be honest and truthful. Ethical
violations include cheating on exams, plagiarism, reuse of
assignments, improper use of the Internet and electronic devices,
unauthorized collaboration, alteration of graded assignments, forgery
and falsification, lying, facilitating academic dishonesty, and unfair competition.
Homeworks and exams will be strictly individual. You will not be allowed
to discuss problems with other fellow students or to reuse solutions to prior assignments
from JHU or other institutions. In some cases you will be allowed to use code
available in the internet, but you must properly acknowledge the source.
Projects can be done in teams of two students.
Useful Computer Vision Resources