A Support vector machine (SVM) is a Machine Learning algorithm that requires supervised learning to classify data. When an SVM is given two sets of labelled data, it produces a line (or hyperplane), known as the decision boundary to separate the two. Then, new data can be classified based on the hyperplane rule.
A SVM is known as a binary classifier: All data falls into either one set or the other.
Decision Boundary/Hyperplane
The SVM machine learning algorithm attempts to find the best hyperplane - one that has the largest margin between the two classes i.e. the maximum-margin or optimal hyperplane. The SVM calculates the hyperplane so that the distance from it to the nearest data point on each side is maximised.
For two-dimensional attributes plotted on a cartesian plane, a separating line that divides the two classes with the maximum margin is the hyperplane. Similarly, for three dimensions, a plane with two dimensions divides the 3D space into two parts and acts as a hyperplane. Therefore for data of N features or dimensions, we have a hyperplane of N − 1 dimensions separating it into two classes.
A simple SVM acts on a 1D or 2D dataset, while a complex SVM uses kernel transformations to ‘lift’ the data into higher dimensions.
The 2 points that lies on each side, closest to the decision boundary are called support vectors.
Kernel Transformations
Sometimes the labelled data cannot linearly classify the data. In these cases, a kernel transformation function is used to map the data into a new function.
Connect to#maths/linear-algebra ?
Generally the symbol is used to denote SVM transformations. E.g.
Common kernel transformations are polynomial kernels ( ) and mod kernels ()
Applications
- Image recognition, such as detecting cancerous cells based on millions of images,
- Predicting future driving routes with a well-fitted regression model.
Advantages and Disadvantages
Pros of using SVMs
- Effective on data with multiple features, like financial or medical data.
- Faster than Neural Network.
- Effective in cases where the number of features is greater than the number of data points.
- Uses a subset of training points in the decision function, called support vectors, which makes it memory efficient.
Cons of using SVMs
- If the number of features is a lot bigger than the number of data points, over-fitting may occur.
- SVMs don’t directly provide probability estimates of accuracy.
- Require full labelling of input data for supervised training.
- The SVM is only directly applicable for binary classification.