Shriphani's Goals Blog

Working on targeted short term goals involving graduate study

Obtaining a training dataset

I have approximately 0 experience building something like this. So the question we obviously have is , how does one build a dataset for something like this. I decided that I would record hand data from the kinect for each alphabet and I would try to explore various orientations that are possible.

 

Ok that didn't make much sense - a video should explore better. This is how I am recording "Y":

(download)

The UI for the training tool has 1 button. After you line up the distances etc and think the Kinect's got a good view of your hand, then you can hit Start and it records 100 samples and quits. The auto-quitting is important because I need to walk over to the machine to hit the close button which will assuredly give me bullshit data. There is also a field where you can tell it what you want to label it as (so put your alphabet there).

I will put up some data from my hand later tonight or early tomorrow morning.

This is what the tool looks like.

(download)

Fighting Noise - Working With Tainted Data

Anyone who has worked with SVMs knows that they are not infallible in the face of label noise. However the reality of the situation is that we will have to suffer with noise - there is just no way around it.

How do we combat noise?

First, let us examine the effects of noise. 

Worst Case: Label noise causes some major updates to the weight vector. The result of this is obviously that it might be hard to recover from the noise.

Solution: Tone down the aggressiveness. The algorithm aggressively tries to maintain a margin of 1. In the light of noise, we need to try and tone down the aggressiveness - i.e. it is ok if you can't meet the margin requirement.

This is a very familiar trick if you have experience with soft-margin classifiers.

So, let us work with this slack variable. We assign a parameter C to weigh it.

For a point in the dataset, the slack variable gives it some leeway to meet the margin requirement. For more info about the slack-variable, pick up any book on SVMs and they should have a clear explanation. I will just throw the code in the repo that contains the updates made. 

I promise I will work with some real-world data tomorrow :)

As always, the code can be browsed in the github repo : https://github.com/shriphani/ASLFingerSpell/tree/master/PassiveAggressive/ML

Online Learning

Introduction:

The concepts here are straightforward:

  • You gots a classifier.
  • The classifier classifies.
  • The classifier waits from feedback from you.
  • The classifier updates its parameters.

For this project I am working on, there isn't any decent sized dataset available. The result is that the classifier will just have to learn from the mistakes it makes (a fairly standard HCI concept).

Perceptron is a very simple online learning algorithm. Let us deal with the 2D perceptron situation. You ultimately want a line that will separate the 2 classes. In every feedback step, this line's orientation is changed to accomodate new feedback. After a certain number of training samples, the line's position is fixed and it no longer moves.

 

Of course, the perceptron is a little weak for our algorithm. After a lot of deliberation on IRC for a simple, easy to code up algorithm, I was recommended Coby Krammer and Shalev's Passive-Aggressive algorithms (which I decided to use)[1]. Let us discuss this paper.

 

Binary Classifier:

This guy's task is fairly simple. Pick 1 of 2 classes, wait for feedback and fix the decision boundary if needed.

The decision boundary is a hyperplane (basically think of this as some vector dotted with a generic vector in this space). This hyperplane needs to be tweaked everytime the decision boundary fails. 

At this stage we need a metric to judge the current state of our hyperplane. Consider the following property:

  • Consider all the points in the neighborhood of the hyperplane.
  • Take the points in this cluster and pick the guy that is closest to the hyperplane (there is a simple formula to compute the distance)
  • The distance between the closest point to the hyperplane and the hyperplane itself is called the margin.
  • The margin is used to "judge" the hyperplane. 

Let us explore what "judge" means. Now, if you are tightly constrained on either end by a point, you don't really have much of a wiggle room - your dataset might not be easily classifiable. Your hyperplane could end up misclassifying very easily because it doesn't have a lot of wiggle room to make sweeping adjustments on every mis-classification.

Now, what does an "update" or an "adjustment" look like. In the perceptron it was a straightforward fix where we computed a vector sum and rebuilt the hyperplane.

In the passive-aggressive setting, the main goals that an update aims to achieve is:

  • After the update, we want to be as close as possible to our previous hyperplane
  • At least unit margin must be achieved. (this leads to some solid problems with noisy data)

Now that we have a good hook of the process, let us get a bit more formal with the terms.

Classification Technique: The classifier must throw any point into 1 of 2 classes (duh!). Let us see how we perform a "classification" or a "prediction". Let us consider W to represent the normal to a homogenous hyperplane (our decision boundary).

Obviously, the equation of the hyperplane is given by <w, x>  (this is my notation for a dot product).

We can use || <w, x> || as a good indicator of the confidence we have in our prediction. Simple vector algebra shows us that this value is 0 when x is on the plane. It is a large value if x is very far away from the plane. If we encounter a point on the decision boundary plane, then it obviously is a shitty prediction that we have with us.

Next, we need to label the obtained point as one of 2 classes. Let us go with +1 and -1. So, let us look at <w, x> again. This will end up returning different sign values for points that lie on different sides of the hyperplane.

So, our update will consist of tweaks to w. The w maintained during a round is called w_t (for round #t). 

Now say after an update, we end up with a margin of < 1. We define this as an instantaneous loss. This loss function is 0 when we get a margin that meets our requirements. If we don't, it encodes how short we fell of meeting the margin value of 1.

The main equations are in the paper so I won't write them out here but the goal of the update procedure is :

1. In the entire vector space, pick that weight vector that is closest to the existing weight vector

2. This weight vector should produce a hinge loss of 0. 

The above two conditions help us preserve previous info and still obtain a solid margin. It beats the perceptron since the decision boundary is not just any hyperplane but one that won't be prone to misclassifications.

 

In a later post I will describe how to tweak the algorithm to work with noise and we shall play with some Kinect data.

The C# source for the PA algorithm can be downloaded @ https://github.com/shriphani/ASLFingerSpell/tree/master/PassiveAggressive/ML

 

References:

Filed under  //   ML   c#   code   computer-science   machine learning   online learning  

Open-NI based fingertip tracker

So, the next step for my project would be to acquire some fingertip information. Essentially I dicked around with the Microsoft SDK and found that an open-source project already implemented a basic fingertip tracker for the Kinect - The CandescentNUI project on codeplex. With some help from the owner of the project, Stefan, I was able to get the OpenNI based fingertip tracker working on my machine (no screenshots since his page is more than sufficient for getting this info).

 

Anyway, I will begin performing feature-selection and other such stuff from the data I get. I also hope to get the blog post frequency higher on this site.

 

Exploring The Kinect's Video

The Microsoft Kinect gives us Video, 3D Depth Data and SkeletalData streams to work with. Of these three streams, the one of most interest for me is the 3D Depth Data. The skeletal data is essentially the result of a classifier on the depth data that performs high-speed post estimation (and this also happens to be the reason that the Kinect is a user-friendly consumer electronics device - you don't need to engage in shitty calibration every single time the device must be used). 

So, enough about the Skeletal Data. We are going to look @ the depth + video data. I know that this is an exact mirror of the Sample Skeletal Viewer that comes with the sample. But hey, this is just to learn.

Today, I got the Video Stream to work and I will push the depth stream as well when I can get some other stuff out of the way (aka a stupid mid-term).

Videostreamonly

Playing with the Microsoft Kinect SDK

I was trying out some simple stuff with the SDK. Since this stuff is already extremely well documented elsewhere on the internet, I am only going to upload code to GitHub. You can look @ the repo : https://github.com/shriphani/KinectExplore

Kinect Hacking - CS590ML

This summer, Microsoft gave every intern a free xbox + kinect and since CS590ML requires us to throw machine learning at some project, I decided to make use of the Kinect and do something useful with it. Also I want to test out this Posterous thing. I will target fingerspelling for now since I am not sure how to build a full language model for ASL.

 

Till next time, CIAO.