Although machine learning is not the primary specialization of OpenCV it can come handy to use the built-in classifiers (k-Nearest Neighbors and Support Vector Machines) for many real-life tasks that might require some of the image preprocessing functions. To illustrate how to do this, we will use the hot dog data set.

This data set contains pictures of, you guessed it, hot dogs and non-hot dogs. For example:

A tasty hot dog.

Before doing any machine learning, we need to transform our tasty hot dog into something that the SVM classifier can use. We will create a matrix where each row will have the vector representation of a given image. In machine learning slang, this is called feature extraction, for the computer vision community a close equivalent of a feature (in the machine learning sense) is a descriptor. We will use histogram of oriented gradients as our descriptor/feature. Very loosely speaking, this is a frequency count of the gradients (i.e. two-dimensional derivatives) of our image. The full theory behind this algorithm is beyond the scope of the post, suffice to say that we will calculate something like this:

Histogram of Oriented Gradients of our hot dog.

We can calculate this using OpenCV functions:

def calc_hog(img):
    img = cv2.resize(img, (50,50))
    win_size = (20, 20) # default: (64,64)
    block_size = (10, 10) #default: (16,16)
    block_stride = (5, 5) #default: (8,8)
    cell_size = (5, 5) #default: (8,8)
    nbins = 9
    hog = cv2.HOGDescriptor(win_size, block_size, block_stride, cell_size, nbins)
    hog_features = hog.compute(img)
    return hog_features.T

Note that OpenCV does not have (as of version 4.0.0) an easy way to create the visualization above for the HOG descriptor. We can use the scikit-image library for this:

from skimage.feature import hog
fd, hog_image = hog(img, visualize=True)
plt.imshow(hog_image, cmap="gray");

We create now the train set.

## Getting the train set

import glob
train_hd_files = glob.glob("../data/hotdog/train/hot_dog/*")
train_nhd_files = glob.glob("../data/hotdog/train/not_hot_dog/*")

train_size = len(train_hd_files) + len(train_nhd_files)
n_features = hog_features.shape[1]

X_train = np.zeros((train_size,n_features))
y_train = np.zeros((train_size,1))

for i,f in enumerate(train_hd_files):
    img = cv2.imread(f)
    feat = calc_hog(img)
    X_train[i,:] = feat
    y_train[i] = 1 #hot dog is 1

for i,f in enumerate(train_nhd_files):
    img = cv2.imread(f)
    feat = calc_hog(img)
    X_train[i+len(train_hd_files),:] = feat

For use in OpenCV, we need to convert the arrays to the correct data types! This is because the Python interface for OpenCV only recognizes 32-bit arrays (since the Python code is passed internally to other C++ functions).

X_train = np.ndarray.astype(X_train,dtype=np.float32)<br> y_train = np.ndarray.astype(y_train,dtype=np.int32)

Now we are all set! We can create an instance of our model and train it.

svm_model = cv2.ml.SVM_create()
svm_model.setGamma(2)
svm_model.setC(1)
svm_model.setKernel(cv2.ml.SVM_RBF)
svm_model.setType(cv2.ml.SVM_C_SVC)
svm_model.train(X_train, cv2.ml.ROW_SAMPLE, y_train)

To generate individual predictions, we need to ensure that the array has shape (1,num_dim). This can be done by either predicting as a sliced array:

 svm_model.predict(X_train[0:1])

Since the right limit of the array is not included in numpy, this is only the first instance of the training set. Alternatively, we can use reshape:

svm_model.predict(X_train[0].reshape(1,-1))