JHU Computer Vision Machine Learning

Advanced Topics in Computer Vision - Spring 2009

Instructor: Rene Vidal web e-mail

Class Hours: MW 1:30-2:45 p.m. 316 Hodson

Office Hours: M 3-4 p.m. 302B Clark Hall

Course Description

This class will cover state-of-the-art methods in dynamic vision, with an emphasis on segmentation, reconstruction and recognition of static and dynamic scenes. Topics include: reconstruction of static scenes (tracking and correspondences, multiple view geometry, self calibration), reconstruction of dynamic scenes (2-D and 3-D motion segmentation, nonrigid motion analysis), recognition of visual dynamics (dynamic textures, face and hand gestures, human gaits, crowd motion analysis), as well as geometric and statistical methods for clustering and unsupervised learning, such as K-means, Expectation Maximization, and Generalized Principal Component Analysis.

Syllabus

Image Formation, Feature Extraction, Matching and Correspondences

Image Formation (MASKS 1,2,3): photometry (radiance, irradiance, BRDF), geometry (3-D surfaces, pinhole camera, orthographic, affine, perspective and paracatadioptric projection model), dynamics (rigid-body motions SO(3), SE(3), so(3), se(3), twists and exponential maps).
Feature Extraction, Matching and Correspondences (MASKS 4): deformation models (2-D translational and 2-D affine), brightness constancy constraint, tracking and optical flow, aperture problem, feature extraction, matching and correspondences (sum of square differences, normalized cross correlation), mosaicing.

Reconstruction of Static Scenes

Reconstruction from Two Calibrated Views (MASKS 5): epipolar geometry, epipolar lines, epipoles, essential matrix, essential manifold, pose recovery using the eight-point algorithm, planar homographies, structure recovery via triangulation (linear and optimal).
Reconstruction from Two Uncalibrated Views (MASKS 6): uncalibrated epipolar geometry, fundamental matrix, transfer properties of the fundamental matrix, camera calibration with a rig, camera self-calibration, Kruppa equations.
Reconstruction from Three Views (AZ 15,16): trilinear constraint, trifocal tensor, transfer properties of the trifocal tensor, seven-point algorithm, estimation of the fundamental and camera matrices from the trifocal tensor.
Reconstruction from Multiple Views (Factorization, MASKS 4,9, and AZ 18): multiframe factorization for affine cameras, multiple view matrix, multilinear constraints, optimal reconstruction from multiview normalized epipolar constraint, minimizing reprojection error, multiframe factorization for perspective cameras.
Reconstruction of Symmetric Structures (MASKS 10):

Reconstruction of Dynamic Scenes

Nonrigid Shape and Motion Estimation (AZ 18): shape basis, shape constraints, motion constraints, nonrigid factorization algorithms, nonrigid multiple view geometry.
Segmentation of Linear Motion Models from Two Views (VMS 3): 2-D translational from image intensities, 2-D translational, 2-D similarity, 2-D affine and 3-D translational from point correspondences or optical flow.
Segmentation of Bilinear Motion Models from Two Views (VMS 7): multibody brightness constancy constraint, multibody affine matrix, multibody epipolar constraint, multibody fundamental matrix, generalized 8-point algorithm, multibody homography.
Motion Segmentation from Three Perspective Views (VMS 8): multibody trilinear constraint, multibody trifocal tensor, estimation and factorization of the multibody trifocal tensor.
Motion Segmentation from Multiple Views (VMS 3): motion subspaces, multibody factorization algorithms for affine cameras, multibody factorization from optical flow in perspective and central panoramic cameras.

Recognition of Visual Dynamical Processes

Modeling Dynamic Textures, Hand Gestures, Human Gaits and Crow Motions: linear dynamical models, system identification (N4SID),
Recognition of Dynamic Textures, Hand Gestures, Human Gaits and Crow Motions: metrics on the space of dynamical models, classification on the space of dynamical models.

Textbooks

Ma, Soatto, Kosecka and Sastry (MASKS): An Invitation to 3D Vision, Springer Verlag, 2003
Hartley and Zisserman (HZ): Multiple View Geometry in Computer Vision, Cambridge University Press, second edition, 2004
Vidal, Ma and Sastry (VMS): Generalized Principal Component Analysis, Springer Verlag, 2009

Grading

Homeworks (30%): There will one homework every other week (approximately). Homework problems will include both analytical exercises as well as programming assignments in MATLAB.
Midterm (40%): Wednesday April 1st (1.30-2.45PM)
Project (30%): There will be a final project to be done in teams of two students. Each group will either apply techniques from the course to solve a real problem or solve an open research problem in dynamic vision. Each group will write a report in LaTeX following the authors instructions in the CVPR 2009 website and give a 20 minute presentation (including 5 minutes for questions) on the scheduled exam day May 7th 2-5PM.

Description: 1 page including project title and problem description (April 8)
Progress report: 3 pages including title, abstract, introduction, problem description and proposed solution (April 22)
Presentations: 15 min + 5 min questions (May 7)
Final report: 6 pages including title, abstract, introduction, problem description, proposed solution, experimental evaluation, conclusions and references (May 7)

Administrative

Late policy: Homeworks and projects are due on the specified dates. No late homeworks or projects will be accepted.
Honor policy: The strength of the university depends on academic and personal integrity. In this course, you must be honest and truthful. Ethical violations include cheating on exams, plagiarism, reuse of assignments, improper use of the Internet and electronic devices, unauthorized collaboration, alteration of graded assignments, forgery and falsification, lying, facilitating academic dishonesty, and unfair competition. Homeworks and exams will be strictly individual. You will not be allowed to discuss problems with other fellow students or to reuse solutions to prior assignments from JHU or other institutions. In some cases you will be allowed to use code available in the internet, but you must properly acknowledge the source. Projects can be done in teams of two students.

Useful Computer Vision Resources