VHamMLL
A machine learning (ML) library for classification using a nearest neighbor algorithm based on Hamming distances.
You can incorporate the
VHamMLL
cli.v
Link to html documentation for the library functions and structs
You can use
VHamMLL
datasets
This table reports
balanced accuracy results
What, another AI package?
Is that necessary?
For interactive descriptions of the two key algorithms used by VHamMLL, download the Numbers app spreadsheets:
Description of Ranking Algorithm
Usage:
To use the VHamMLL library in an existing Vlang project:
v install holder66.vhammll
You may also need to install its dependencies, if not automatically installed:
v install vsl
v install Mewzax.chalk
In your v code, add:
import holder66.vhammll
To use the library with the Command Line Interface (CLI):
First, install V, if not already installed. On MacOS, Linux etc. you need
git
In a terminal:
git clone https://github.com/vlang/v
cd v
make
sudo ./v symlink # add v to your PATH
v install holder66.vhammll
See above re needed dependencies.
In a folder or directory that you want to use for your project, you will need to create a file with module
main
main()
module main
import holder66.vhammll
fn main() {
vhammll.cli()!
}
Assuming you've named the directory or folder
vhamml
main.v
v run .
v run . --help
v run . analyze <path_to_dataset_file>
v run . explore --help
v run . explore -h
Note that the publicly available datasets included with the VHamMLL distribution can be found at
~/.vmodules/holder66/vhammll/datasets
That's it!
Tutorial:
v run . examples go
Updating:
v up # installs the latest release of V
v update # get the latest version of the libraries, including holder66.vhammll
v . # recompile
Getting help:
The V lang community meets on Discord
For bug reports, feature requests, etc., please raise an issue on github
Speed things up:
Use the -c (--concurrent) argument (in the CLI) to make use of available CPU cores for some vhammll functions; this may speed things up (timings are on a MacBook Pro 2019)
v main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab # 10.157 sec
./main explore -c ~/.vmodules/holder66/vhammll/datasets/iris.tab # 4.910 sec
A huge speedup usually happens if you compile using the -prod (for production) option. The compilation itself takes longer, but the resulting code is highly optimized.
v -prod main.v
./main explore ~/.vmodules/holder66/vhammll/datasets/iris.tab # 3.899 sec
./main explore -c ~/.vmodules/holder66/vhammll/datasets/iris.tab # 4.849 sec!!
Note that in this case, there is no speedup for
-prod
-c
Examples showing use of the Command Line Interface
Please see examples_of_command_line_usage.md
Example: typical use case, a clinical risk calculator
Health care professionals frequently make use of calculators to inform clinical decision-making. Data regarding symptoms, findings on physical examination, laboratory and imaging results, and outcome information such as diagnosis, risk for developing a condition, or response to specific treatments, is collected for a sample of patients, and then used to form the basis of a formula that can be used to predict the outcome information of interest for a new patient, based on how their symptoms and findings, etc. compare to those in the dataset.
Please see
clinical_calculator_example.md
Example: finding useful information embedded in noise
Please see a worked example here: noisy_data.md
MNIST dataset
The mnist_train.tab file is too large to keep in the repository. If you wish to experiment with it, it can be downloaded by right-clicking on
this link
wget https://henry.olders.ca/datasets/mnist_train.tab
The process of development in its early stages is described in
this essay
Copyright (c) 2017, 2024: Henry Olders.