diff --git a/README.md b/README.md index 4b0e6a8..20cb9ca 100644 --- a/README.md +++ b/README.md @@ -37,15 +37,19 @@ composer require php-ai/php-ml ## Features * Classification + * [SVC](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/svc/) * [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/) * [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/) * Regression * [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/) + * [SVR](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/svr/) * Clustering * [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means) * [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan) * Cross Validation * [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split) +* Feature Extraction + * [Token Count Vectorizer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/token-count-vectorizer) * Datasets * [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset) * Ready to use: diff --git a/docs/index.md b/docs/index.md index d3f65b7..20cb9ca 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,4 +1,4 @@ -# PHP Machine Learning library +# PHP-ML - Machine Learning library for PHP [![Build Status](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/build.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/build-status/develop) [![Documentation Status](https://readthedocs.org/projects/php-ml/badge/?version=develop)](http://php-ml.readthedocs.org/en/develop/?badge=develop) @@ -37,15 +37,19 @@ composer require php-ai/php-ml ## Features * Classification + * [SVC](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/svc/) * [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/) * [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/) * Regression * [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/) + * [SVR](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/svr/) * Clustering * [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means) * [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan) * Cross Validation * [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split) +* Feature Extraction + * [Token Count Vectorizer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/token-count-vectorizer) * Datasets * [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset) * Ready to use: diff --git a/docs/machine-learning/classification/svc.md b/docs/machine-learning/classification/svc.md new file mode 100644 index 0000000..d502dac --- /dev/null +++ b/docs/machine-learning/classification/svc.md @@ -0,0 +1,47 @@ +# Support Vector Classification + +Classifier implementing Support Vector Machine based on libsvm. + +### Constructor Parameters + +* $kernel (int) - kernel type to be used in the algorithm (default Kernel::LINEAR) +* $cost (float) - parameter C of C-SVC (default 1.0) +* $degree (int) - degree of the Kernel::POLYNOMIAL function (default 3) +* $gamma (float) - kernel coefficient for ‘Kernel::RBF’, ‘Kernel::POLYNOMIAL’ and ‘Kernel::SIGMOID’. If gamma is ‘null’ then 1/features will be used instead. +* $coef0 (float) - independent term in kernel function. It is only significant in ‘Kernel::POLYNOMIAL’ and ‘Kernel::SIGMOID’ (default 0.0) +* $tolerance (float) - tolerance of termination criterion (default 0.001) +* $cacheSize (int) - cache memory size in MB (default 100) +* $shrinking (bool) - whether to use the shrinking heuristics (default true) +* $probabilityEstimates (bool) - whether to enable probability estimates (default false) + +``` +$classifier = new SVC(Kernel::LINEAR, $cost = 1000); +$classifier = new SVC(Kernel::RBF, $cost = 1000, $degree = 3, $gamma = 6); +``` + +### Train + +To train a classifier simply provide train samples and labels (as `array`). Example: + +``` +use Phpml\Classification\SVC; +use Phpml\SupportVectorMachine\Kernel; + +$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]]; +$labels = ['a', 'a', 'a', 'b', 'b', 'b']; + +$classifier = new SVC(Kernel::LINEAR, $cost = 1000); +$classifier->train($samples, $labels); +``` + +### Predict + +To predict sample label use `predict` method. You can provide one sample or array of samples: + +``` +$classifier->predict([3, 2]); +// return 'b' + +$classifier->predict([[3, 2], [1, 5]]); +// return ['b', 'a'] +``` diff --git a/docs/machine-learning/feature-extraction/token-count-vectorizer.md b/docs/machine-learning/feature-extraction/token-count-vectorizer.md new file mode 100644 index 0000000..83c6aaa --- /dev/null +++ b/docs/machine-learning/feature-extraction/token-count-vectorizer.md @@ -0,0 +1,50 @@ +# Token Count Vectorizer + +Transform a collection of text samples to a vector of token counts. + +### Constructor Parameters + +* $tokenizer (Tokenizer) - tokenizer object (see below) +* $minDF (float) - ignore tokens that have a samples frequency strictly lower than the given threshold. This value is also called cut-off in the literature. (default 0) + +``` +use Phpml\FeatureExtraction\TokenCountVectorizer; +use Phpml\Tokenization\WhitespaceTokenizer; + +$vectorizer = new TokenCountVectorizer(new WhitespaceTokenizer()); +``` + +### Transformation + +To transform a collection of text samples use `transform` method. Example: + +``` +$samples = [ + 'Lorem ipsum dolor sit amet dolor', + 'Mauris placerat ipsum dolor', + 'Mauris diam eros fringilla diam', +]; + +$vectorizer = new TokenCountVectorizer(new WhitespaceTokenizer()); +$vectorizer->transform($samples) +// return $vector = [ +// [0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 1], +// [5 => 1, 6 => 1, 1 => 1, 2 => 1], +// [5 => 1, 7 => 2, 8 => 1, 9 => 1], +//]; + +``` + +### Vocabulary + +You can extract vocabulary using `getVocabulary()` method. Example: + +``` +$vectorizer->getVocabulary(); +// return $vocabulary = ['Lorem', 'ipsum', 'dolor', 'sit', 'amet', 'Mauris', 'placerat', 'diam', 'eros', 'fringilla']; +``` + +### Tokenizers + +* WhitespaceTokenizer - select tokens by whitespace. +* WordTokenizer - select tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator). diff --git a/docs/machine-learning/regression/svr.md b/docs/machine-learning/regression/svr.md new file mode 100644 index 0000000..ed2d10f --- /dev/null +++ b/docs/machine-learning/regression/svr.md @@ -0,0 +1,44 @@ +# Support Vector Regression + +Class implementing Epsilon-Support Vector Regression based on libsvm. + +### Constructor Parameters + +* $kernel (int) - kernel type to be used in the algorithm (default Kernel::LINEAR) +* $degree (int) - degree of the Kernel::POLYNOMIAL function (default 3) +* $epsilon (float) - epsilon in loss function of epsilon-SVR (default 0.1) +* $cost (float) - parameter C of C-SVC (default 1.0) +* $gamma (float) - kernel coefficient for ‘Kernel::RBF’, ‘Kernel::POLYNOMIAL’ and ‘Kernel::SIGMOID’. If gamma is ‘null’ then 1/features will be used instead. +* $coef0 (float) - independent term in kernel function. It is only significant in ‘Kernel::POLYNOMIAL’ and ‘Kernel::SIGMOID’ (default 0.0) +* $tolerance (float) - tolerance of termination criterion (default 0.001) +* $cacheSize (int) - cache memory size in MB (default 100) +* $shrinking (bool) - whether to use the shrinking heuristics (default true) + +``` +$regression = new SVR(Kernel::LINEAR); +$regression = new SVR(Kernel::LINEAR, $degree = 3, $epsilon=10.0); +``` + +### Train + +To train a model simply provide train samples and targets values (as `array`). Example: + +``` +use Phpml\Regression\SVR; +use Phpml\SupportVectorMachine\Kernel; + +$samples = [[60], [61], [62], [63], [65]]; +$targets = [3.1, 3.6, 3.8, 4, 4.1]; + +$regression = new SVR(Kernel::LINEAR); +$regression->train($samples, $targets); +``` + +### Predict + +To predict sample target value use `predict` method. You can provide one sample or array of samples: + +``` +$regression->predict([64]) +// return 4.03 +``` diff --git a/mkdocs.yml b/mkdocs.yml index a596d91..f833fc3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -3,15 +3,19 @@ pages: - Home: index.md - Machine Learning: - Classification: + - SVC: machine-learning/classification/svc.md - KNearestNeighbors: machine-learning/classification/k-nearest-neighbors.md - NaiveBayes: machine-learning/classification/naive-bayes.md - Regression: - LeastSquares: machine-learning/regression/least-squares.md + - SVR: machine-learning/regression/svr.md - Clustering: - KMeans: machine-learning/clustering/k-means.md - DBSCAN: machine-learning/clustering/dbscan.md - Cross Validation: - RandomSplit: machine-learning/cross-validation/random-split.md + - Feature Extraction: + - Token Count Vectorizer: machine-learning/feature-extraction/token-count-vectorizer.md - Datasets: - Array Dataset: machine-learning/datasets/array-dataset.md - CSV Dataset: machine-learning/datasets/csv-dataset.md