Quantized classifier Machine Learning in GO

ML Quantized classifier in GO

A general purpose, high performance machine learning classifier. Source Code

Examples /  Tests cover:

  • ASL Sign language Gesture recognition
  • Classify breast cancer from 9 features. 96% accuracy at 100% recall first pass with no optimization. Using 623 training rows.
  • Predict death or survival of Titanic passengers.
  • Predict Diabetes
  • Please send me data sets you would like to add to the test.

We Offer Collaboration / Consulting services see: contact

The library includes TensorFlow Deep Learning implementation of classifiers using the same data to compare the run-time performance combined with classification recall and accuracy.

Quantized Classifier

The design for Quantized classifiers was inspired by design elements in KNN, Bayesian and SVM engines. A key design goal was a faster mechanism to identify similarity for a given feature while providing very fast classification using moderate hardware resources.

In KNN we find similar records for a given feature by finding those with the most similar value. This works but consumes a lot of space and run-time. In Quantized approach we look at the range of data and attempt to group data based on similar values. EG: if a given feature has a value from 0.0 to 1.0 then a 10 bucket system could consider all records that have a value from 0 to 0.1 as similar. Those from 0.1 to 0.2 are similar, etc. Rather than keeping all the training records we only need to keep the statistics for have many of the records in a given bucket belong to each class which we can then use to compute a base probability by feature by bucket by class. Applying this across active features gives us a set of probabilities that can be combined using ensemble techniques into the probability a given row would belong to any of the classes.

Quantizing the data allows a small memory foot print with fast training without the need to keep all the training records in memory. Retaining only the statistics allows very large training sets with moderate memory use. The trade off is loosing some of KNN ability to adjust the number of closest neighbors considered quick at runtime. The offset is that training is so fast that the quanta size can be adjusted quickly. The memory use is so much smaller that we can afford to keep multiple models with different quanta sizes loaded and updated simultaneously.

Source code is available on BitBucket  Most of the Source is written in GO with some utilities and Examples in Python 3.5.2.  Some utilities and that also act as examples for invocation are written as windows Bat files but they could be easily adapted for use in Linux.

Easy Invocation using CSV Files

This utility uses the executable classifyFiles built from GO source code to read one CSV file for training data then test it’s ability to classify that data using data in a second CSV file.  The two files were produced from the source data using the splitCSVFile executable also included in the system.

[pcsh lang=”bash” user=”joexdobs” repos=”ml-classifier-gesture-recognition” path_id=”classifyTestBCancer.bat” revision=”default” tab_size=”4″ hl_lines=”” provider=”bitbucket”/]

Main Quantized Classifier in GO File

[pcsh lang=”cpp” user=”joexdobs” repos=”ml-classifier-gesture-recognition” path_id=”/src/qprob/classify.go” revision=”default” tab_size=”2″ hl_lines=”” provider=”bitbucket”/]

Leave a Reply