Fighting Noise - Working With Tainted Data
Anyone who has worked with SVMs knows that they are not infallible in the face of label noise. However the reality of the situation is that we will have to suffer with noise - there is just no way around it.
How do we combat noise?
First, let us examine the effects of noise.
Worst Case: Label noise causes some major updates to the weight vector. The result of this is obviously that it might be hard to recover from the noise.
Solution: Tone down the aggressiveness. The algorithm aggressively tries to maintain a margin of 1. In the light of noise, we need to try and tone down the aggressiveness - i.e. it is ok if you can't meet the margin requirement.
This is a very familiar trick if you have experience with soft-margin classifiers.
So, let us work with this slack variable. We assign a parameter C to weigh it.
For a point in the dataset, the slack variable gives it some leeway to meet the margin requirement. For more info about the slack-variable, pick up any book on SVMs and they should have a clear explanation. I will just throw the code in the repo that contains the updates made.
I promise I will work with some real-world data tomorrow :)
As always, the code can be browsed in the github repo : https://github.com/shriphani/ASLFingerSpell/tree/master/PassiveAggressive/ML
