Support vector machines (SVMs) is a Machine Learning algorithm that requires supervised learning to classify data. When an SVM is given two sets of labelled data, it produces a line (or hyperplane), known as the decision boundary to separate the two. Then, new data can be classified based on the hyperplane rule.

A SVM is known as a binary classifier: All data falls into either one set or the other.

Decision Boundary/Hyperplane

The SVM machine learning algorithm attempts to find the best hyperplane - one that has the largest margin between the two classes i.e. the maximum-margin or optimal hyperplane. The SVM calculates the hyperplane so that the distance from it to the nearest data point on each side is maximised.

For two-dimensional attributes plotted on a cartesian plane, a separating line that divides the two classes with the maximum margin is the hyperplane. Similarly, for three dimensions, a plane with two dimensions divides the 3D space into two parts and acts as a hyperplane. Therefore for data of N features or dimensions, we have a hyperplane of N − 1 dimensions separating it into two classes.

A simple SVM acts on a 1D or 2D dataset, while a complex SVM uses kernel transformations to ‘lift’ the data into higher dimensions.

The 2 points that lies on each side, closest to the decision boundary are called support vectors.

Kernel Transformations

Sometimes the labelled data cannot linearly classify the data. In these cases, a kernel transformation function is used to map the data into a new function.

Generally the symbol is used to denote SVM transformations. E.g.

Common kernel transformations are polynomial kernels ( ) and mod kernels ()

Applications

  • Image recognition, such as detecting cancerous cells based on millions of images,
  • Predicting future driving routes with a well-fitted regression model.

Advantages and Disadvantages

Pros of using SVMs

  • Effective on data with multiple features, like financial or medical data.
  • Faster than neural networks.
  • Effective in cases where the number of features is greater than the number of data points.
  • Uses a subset of training points in the decision function, called support vectors, which makes it memory efficient.

Cons of using SVMs

  • If the number of features is a lot bigger than the number of data points, over-fitting may occur.
  • SVMs don’t directly provide probability estimates of accuracy.
  • Require full labelling of input data for supervised training.
  • The SVM is only directly applicable for binary classification.