11-755 MLSP Homework 1

11-755 MLSP Homework 2: Face Detection and Boosting

Part 1: Linear Algebra

The following matrix transforms 4-dimensional vectors into 3-dimensional ones:

 
A =
 
1 2 3 4
3 4 5 7
5 7 9 11
 

 

  1. A 4x1 vector v of length 4 is transformed by A as u = Av. What is the longest that u can be? What is the shortest length of u?
  2. The "Restricted Isometry Property" (RIP) constant of a matrix characterizes the change in length of vectors transformed by sub-matrices of the matrix. For our matrix A, let As be a matrix formed of any s columns of A. If A is MxN, As will be Mxs. We can form As in NCs ways from the N columns of A (we assume that the order of vectors in As is immaterial). Let w be an sx1 vector of length 1. Let lmax be the longest vector that one can obtain by transforming w by any As. Let lmin be the shortest vector obtained by transforming w by any As. The RIP-s constant δs of the matrix A is defined as:
    δs = (lmax - lmin) / (lmax + lmin)
    What is δ2 (i.e. δs for s = 2) for the matrix A given above? Hint: You must consider all 4C2 possible values for A2.

Part 2: Face Detection

This problem too has two parts.

All data for this problem are available from the links in the Downloads section below.

Problem 1: A simple face detector

You are given a corpus of facial images. You must learn a "typical" (i.e. Eigen) face from one of them.

You are also given four group photographs with many faces. You must use the Eigen face you have learnt to detect all faces in the photos.

The faces in the group photo may have different sizes. You must account for these variations.

Programming

Use matlab, if you can. Other similar tools, such as “Octave” or “Python” (which comes with some very nice scientific and visualization libraries) are also reasonable alternatives. The machochistic among you may want to do it all in “C”.

Procedural Details

You are given a collection of images of faces (see the downloads section) from image database 1. These images are all dimension 64x64. From these you must compute the “best” Eigen faces.

Some hints on how to read files into matlab can be found here

You must compute the first Eigen face from this data. To do so, you will have to read all images into a matrix. Here are instructions for building a matrix of images in matlab. You must then compute the first Eigen vector for this matrix. Information on computing Eigen faces from an image matrix can be found here.

To detect faces in the image, you must scan the group photo and identify all regions in it that “match” the patterns in Eigen face most. To “Scan” the image to find matches against an N x M Eigen face, you must match every N x M region of the photo against the Eigen face.

The “match” between any N x M region of an image and an Eigen face is given by the normalized dot product between the Eigen face and the region of the image being evaluated. The normalized dot product between an N x M Eigen face and a corresponding N x M segment of the image is given by E.P / norm(P), where E is the vector (unrolled) representation of the Eigen face, and P is the unrolled vector form of the N x M patch.

A simple matlab loop that scans an image for an Eigen vector is given here

The locations of faces are likely to be where the match score peaks.

Some tricks may be useful to get better results.

Scaling and Rotation

The Eigen face is fixed in size and can only be used to detect faces of approximately the same size as the Eigen face itself. On the other hand faces in the group photos are of different sizes -- they get smaller as the subject gets farther away from the camera.

The solution to this is to make many copies of the eigen face and match them all.

In order to make your detection system robust, resize the Eigen faces from 64 pixels to 32x32, 48x48, 96x96, and 128x128 pixels in size. You can use the scaling techniques we discussed in the linear algebra lecture. Matlab also provides some easy tools for scaling images. You can find information on scaling images in matlab here. Once you've scaled your eigen face, you will have a total of five “typical” faces, one at each level of scaling. You must scan the group pictures with all of the five eigen faces. Each of them will give you a “match” score for each position on the image. If you simply locate the peaks in each of them, you may find all the faces. Sometimes multiple peaks will occur at the same position, or within a few pixels of one another. In these cases, you can merge all of these, they probably all represent the same face.

Additional heuristics may also be required (appropriate setting of thresholds, comparison of peak values from different scaling factors, addiitonal scaling etc.). These are for you to investigate.

Problem 2: A boosting-based face detector

You are a training corpus of facial images. You must learn the first K Eigen faces from the corpus. Set K = 10 initially. Mean and variance normalize the images before computing Eigenfaces.

You are given a second training set of facial images. Express each image as a linear combination of the Eigen faces. i.e., express each face F as
F ≈ wF,1E1 + wF,2E2 + wF,3E3 + ... + wF,KEK
where Ei is the ith Eigen face and wF,i is the weight of the ith Eigen face, when composing face F. wF,i can, of course, be computed as the dot product of w and Ei

It will generally not be possible to represent a face exactly using a limited number of typical faces; as a result there will be an error between the face F and the approximation in terms of the K Eigenfaces. You can also compute the normalized total error in representation as: errF = ||F - Σi wF,iEi||2 / N
where, ||.||2 represents the sum of the squares of the error of each pixel, and N represents the number of pixels in the image.

Represent each face by the set of weights for the Eigen faces and the error, i.e. F → {wF,1, wF,2, wF,3, ..., wF,K, errF}

You are also given a collection of non-face images in the dataset. Represent each of these images too as linear combinations of the Eigen faces, i.e. express each non-face image NF as
NF ≈ wNF,1E1 + wNF,2E2 + wNF,3E3 + ... + wNF,KEK
As before, the weights wNF,i can be computed as dot products. As in the case of faces, the approximation of the non-face images in terms of Eigenfaces will not be exact and will result in error. You can compute the normalized total error as you did for the face images to obtain errNF. In general, this error will be greater for non-face images than faces, since we're trying to compose them from Eigenfaces. Represent each of the non-face images by the set of weights i.e. NF → {wNF,1, wNF,2, ... , wNF,K, errNF}.

The set of weights and the normalized error for the Eigen faces are the features representing all the face and non-face images.

From the set of face and non-face images represented by the Eigenface weights, learn and an Adaboost classifier to classify faces vs. non-faces.

You are given a fourth set which is a collection of face and non-face images. Use the adaboost classifier to classify these images.

The classifier you have learned will be for the same size of images that were used in the training data (64 x 64). Scale the classifier by scaling the Eigenfaces to other sizes (32 x 32, 48 x 48, 96 x 96, 128x 128).

Problem 3 (Bonus)

Scan the group photographs from the class to detect faces using your adaboost classifier.

You can adjust the tradeoff between missing faces and false alarms by comparing the margin (H(x)) of the Adaboost classifier to a threshold other than 0.

Downloads