11-755 MLSP Homework 1

11-755 MLSP Homework 2: Face Detection and Boosting

Part 1: Linear Algebra

The following matrix transforms 4-dimensional vectors into 3-dimensional ones:

A =

1 2 3 4

3 4 5 7

5 7 9 11

A 4x1 vector v of length 4 is transformed by A as u = Av. What is the longest that u can be? What is the shortest length of u?
The "Restricted Isometry Property" (RIP) constant of a matrix characterizes the change in length of vectors transformed by sub-matrices of the matrix. For our matrix A, let A_s be a matrix formed of any s columns of A. If A is MxN, A_s will be Mxs. We can form A_s in ^NC_s ways from the N columns of A (we assume that the order of vectors in A_s is immaterial). Let w be an sx1 vector of length 1. Let l_max be the longest vector that one can obtain by transforming w by any A_s. Let l_min be the shortest vector obtained by transforming w by any A_s. The RIP-s constant δ_s of the matrix A is defined as:
δ_s = (l_max - l_min) / (l_max + l_min)
What is δ₂ (i.e. δ_s for s = 2) for the matrix A given above? Hint: You must consider all ⁴C₂ possible values for A₂.

Part 2: Face Detection

This problem too has two parts.

All data for this problem are available from the links in the Downloads section below.

Problem 1: A simple face detector

You are given a corpus of facial images. You must learn a "typical" (i.e. Eigen) face from one of them.

You are also given four group photographs with many faces. You must use the Eigen face you have learnt to detect all faces in the photos.

The faces in the group photo may have different sizes. You must account for these variations.

Programming

Use matlab, if you can. Other similar tools, such as “Octave” or “Python” (which comes with some very nice scientific and visualization libraries) are also reasonable alternatives. The machochistic among you may want to do it all in “C”.

Procedural Details

You are given a collection of images of faces (see the downloads section) from image database 1. These images are all dimension 64x64. From these you must compute the “best” Eigen faces.

Some hints on how to read files into matlab can be found here

You must compute the first Eigen face from this data. To do so, you will have to read all images into a matrix. Here are instructions for building a matrix of images in matlab. You must then compute the first Eigen vector for this matrix. Information on computing Eigen faces from an image matrix can be found here.

To detect faces in the image, you must scan the group photo and identify all regions in it that “match” the patterns in Eigen face most. To “Scan” the image to find matches against an N x M Eigen face, you must match every N x M region of the photo against the Eigen face.

The “match” between any N x M region of an image and an Eigen face is given by the normalized dot product between the Eigen face and the region of the image being evaluated. The normalized dot product between an N x M Eigen face and a corresponding N x M segment of the image is given by E.P / norm(P), where E is the vector (unrolled) representation of the Eigen face, and P is the unrolled vector form of the N x M patch.

A simple matlab loop that scans an image for an Eigen vector is given here

The locations of faces are likely to be where the match score peaks.

Some tricks may be useful to get better results.

Your test image (the group photograph) is in color; your Eigen faces are greyscale. You will have to convert the color photograph to greyscale by taking the mean of the red, green and blue values. The matlab method for doing this is given here.
You will obtain better Eigen faces if all of the faces in the training data are histogram equalized. The faces in the training data all have somewhat different lighting and contrast. These variations can affect your estimate of the Eigen face. Histogram equalization can be performed in matlab as explained here.
You will also be able to detect faces better if you histogram-equalize each patch of the group photo before you evaluate its match to the Eigen face.

Scaling and Rotation

The Eigen face is fixed in size and can only be used to detect faces of approximately the same size as the Eigen face itself. On the other hand faces in the group photos are of different sizes -- they get smaller as the subject gets farther away from the camera.

The solution to this is to make many copies of the eigen face and match them all.

In order to make your detection system robust, resize the Eigen faces from 64 pixels to 32x32, 48x48, 96x96, and 128x128 pixels in size. You can use the scaling techniques we discussed in the linear algebra lecture. Matlab also provides some easy tools for scaling images. You can find information on scaling images in matlab here. Once you've scaled your eigen face, you will have a total of five “typical” faces, one at each level of scaling. You must scan the group pictures with all of the five eigen faces. Each of them will give you a “match” score for each position on the image. If you simply locate the peaks in each of them, you may find all the faces. Sometimes multiple peaks will occur at the same position, or within a few pixels of one another. In these cases, you can merge all of these, they probably all represent the same face.

Additional heuristics may also be required (appropriate setting of thresholds, comparison of peak values from different scaling factors, addiitonal scaling etc.). These are for you to investigate.

Problem 2: A boosting-based face detector

You are a training corpus of facial images. You must learn the first K Eigen faces from the corpus. Set K = 10 initially. Mean and variance normalize the images before computing Eigenfaces.

You are given a second training set of facial images. Express each image as a linear combination of the Eigen faces. i.e., express each face F as
F ≈ w_F,1E₁ + w_F,2E₂ + w_F,3E₃ + ... + w_F,KE_K where E_i is the i^th Eigen face and w_F,i is the weight of the i^th Eigen face, when composing face F. w_F,i can, of course, be computed as the dot product of w and E_i

It will generally not be possible to represent a face exactly using a limited number of typical faces; as a result there will be an error between the face F and the approximation in terms of the K Eigenfaces. You can also compute the normalized total error in representation as: err_F = ||F - Σ_i w_F,iE_i||² / N
where, ||.||² represents the sum of the squares of the error of each pixel, and N represents the number of pixels in the image.

Represent each face by the set of weights for the Eigen faces and the error, i.e. F → {w_F,1, w_F,2, w_F,3, ..., w_F,K, err_F}

You are also given a collection of non-face images in the dataset. Represent each of these images too as linear combinations of the Eigen faces, i.e. express each non-face image NF as
NF ≈ w_NF,1E₁ + w_NF,2E₂ + w_NF,3E₃ + ... + w_NF,KE_K As before, the weights w_NF,i can be computed as dot products. As in the case of faces, the approximation of the non-face images in terms of Eigenfaces will not be exact and will result in error. You can compute the normalized total error as you did for the face images to obtain err_NF. In general, this error will be greater for non-face images than faces, since we're trying to compose them from Eigenfaces. Represent each of the non-face images by the set of weights i.e. NF → {w_NF,1, w_NF,2, ... , w_NF,K, err_NF}.

The set of weights and the normalized error for the Eigen faces are the features representing all the face and non-face images.

From the set of face and non-face images represented by the Eigenface weights, learn and an Adaboost classifier to classify faces vs. non-faces.

You are given a fourth set which is a collection of face and non-face images. Use the adaboost classifier to classify these images.

The classifier you have learned will be for the same size of images that were used in the training data (64 x 64). Scale the classifier by scaling the Eigenfaces to other sizes (32 x 32, 48 x 48, 96 x 96, 128x 128).

Problem 3 (Bonus)

Scan the group photographs from the class to detect faces using your adaboost classifier.

You can adjust the tradeoff between missing faces and false alarms by comparing the margin (H(x)) of the Adaboost classifier to a threshold other than 0.

Downloads

Training data for problem 1 : Download the training database of faces from here. Each image in this corpus should be 64X64 and in grayscale. The corpus was obta ined from the LFWcrop Database.
Test data for problem 1 and the bonus problem 3: Here is a set of pictures that you may recognize. You must detect the faces in these pictures.
Train and test data for problem 2: Here is a collection of face and non-face data for problem 2. Use the data in the "train" subdirectory to train your classifier and classify the data in the "test" subdirectory. Also use the trained classifiers for problem 3.
Submission Details

The HW is due on Oct 20, 2011. What to submit:
1. a brief write-up of what you did
2. the segments that your detector found to be faces. You may either copy those segments into individual files into a folder, or mark them on the group photograph. Make sure we can understand which part(s) of the image was detected as a face.
3. You do not need to submit code. However, we might ask you for the code at a later point, so try to document the code well.
How to submit:
Put the above in a zipfile and name it "MLSP-HW2-FirstnameLastname.zip".
Send the zip file by email to me, cc-ing Anoop and Manuel with "MLSP HW 2 Problem 1 Submission" as the subject line.
You may include any other information if necessary in a "Readme.txt" in the zip file that you submit. Do not put important information in the body of the email.