update and refactor documentation

This commit is contained in:
Arkadiusz Kondas 2016-05-02 13:49:19 +02:00
parent 55e73b48e9
commit 5950af6072
18 changed files with 434 additions and 62 deletions

View File

@ -37,9 +37,23 @@ composer require php-ai/php-ml
## Features
* Classification
* [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/)
* [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/)
* Regression
* [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/)
* Clustering
* [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means)
* [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan)
* Cross Validation
* [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split)
* Datasets
* [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset)
* Ready to use:
* [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
* Math
* [Distance](http://php-ml.readthedocs.io/en/latest/math/distance/)
* [Matrix](http://php-ml.readthedocs.io/en/latest/math/matrix/)
## Contribute

View File

@ -1,11 +1,30 @@
# PHP Machine Learning (PHP-ML)
# PHP Machine Learning library
[![Build Status](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/build.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/build-status/develop)
[![Documentation Status](https://readthedocs.org/projects/php-ml/badge/?version=develop)](http://php-ml.readthedocs.org/en/develop/?badge=develop)
[![Total Downloads](https://poser.pugx.org/php-ai/php-ml/downloads.svg)](https://packagist.org/packages/php-ai/php-ml)
[![License](https://poser.pugx.org/php-ai/php-ml/license.svg)](https://packagist.org/packages/php-ai/php-ml)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/quality-score.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/?branch=develop)
Fresh approach to machine learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...
Fresh approach to Machine Learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...
Simple example of classification:
```php
use Phpml\Classifier\KNearestNeighbors;
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];
$classifier = new KNearestNeighbors();
$classifier->train($samples, $labels);
$classifier->predict([3, 2]);
// return 'b'
```
## Documentation
To find out how to use PHP-ML follow [Documentation](http://php-ml.readthedocs.org/).
## Installation
@ -15,14 +34,33 @@ Currently this library is in the process of developing, but You can install it w
composer require php-ai/php-ml
```
## To-Do
## Features
* implements more algorithms
* integration with Lavacharts for data visualization
* Classification
* [k-Nearest Neighbors](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/k-nearest-neighbors/)
* [Naive Bayes](http://php-ml.readthedocs.io/en/latest/machine-learning/classification/naive-bayes/)
* Regression
* [Least Squares](http://php-ml.readthedocs.io/en/latest/machine-learning/regression/least-squares/)
* Clustering
* [k-Means](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/k-means)
* [DBSCAN](http://php-ml.readthedocs.io/en/latest/machine-learning/clustering/dbscan)
* Cross Validation
* [Random Split](http://php-ml.readthedocs.io/en/latest/machine-learning/cross-validation/random-split)
* Datasets
* [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset)
* Ready to use:
* [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
* Math
* [Distance](http://php-ml.readthedocs.io/en/latest/math/distance/)
* [Matrix](http://php-ml.readthedocs.io/en/latest/math/matrix/)
## Testing
## Contribute
After installation, you can launch the test suite in project root directory (you will need to install dev requirements with composer)
- Issue Tracker: github.com/php-ai/php-ml/issues
- Source Code: github.com/php-ai/php-ml
After installation, you can launch the test suite in project root directory (you will need to install dev requirements with Composer)
```
bin/phpunit
@ -34,4 +72,4 @@ PHP-ML is released under the MIT Licence. See the bundled LICENSE file for detai
## Author
Arkadiusz Kondas (@ArkadiuszKondas)
Arkadiusz Kondas (@ArkadiuszKondas)

View File

@ -5,7 +5,7 @@ Classifier implementing the k-nearest neighbors algorithm.
### Constructor Parameters
* $k - number of nearest neighbors to scan (default: 3)
* $distanceMetric - Distance class, default Euclidean (see Distance Metric documentation)
* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/))
```
$classifier = new KNearestNeighbors($k=4);
@ -14,7 +14,7 @@ $classifier = new KNearestNeighbors($k=3, new Minkowski($lambda=4));
### Train
To train a classifier simply provide train samples and labels (as `array`):
To train a classifier simply provide train samples and labels (as `array`). Example:
```
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
@ -26,7 +26,7 @@ $classifier->train($samples, $labels);
### Predict
To predict sample class use `predict` method. You can provide one sample or array of samples:
To predict sample label use `predict` method. You can provide one sample or array of samples:
```
$classifier->predict([3, 2]);

View File

@ -4,7 +4,7 @@ Classifier based on applying Bayes' theorem with strong (naive) independence ass
### Train
To train a classifier simply provide train samples and labels (as `array`):
To train a classifier simply provide train samples and labels (as `array`). Example:
```
$samples = [[5, 1, 1], [1, 5, 1], [1, 1, 5]];
@ -16,7 +16,7 @@ $classifier->train($samples, $labels);
### Predict
To predict sample class use `predict` method. You can provide one sample or array of samples:
To predict sample label use `predict` method. You can provide one sample or array of samples:
```
$classifier->predict([3, 1, 1]);

View File

@ -0,0 +1,27 @@
# DBSCAN clustering
It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.
*(source: wikipedia)*
### Constructor Parameters
* $epsilon - epsilon, maximum distance between two samples for them to be considered as in the same neighborhood
* $minSamples - number of samples in a neighborhood for a point to be considered as a core point (this includes the point itself)
* $distanceMetric - Distance object, default Euclidean (see [distance documentation](math/distance/))
```
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3, new Minkowski($lambda=4));
```
### Clustering
To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.
```
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
$dbscan->cluster($samples);
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]]
```

View File

@ -0,0 +1,37 @@
# K-means clustering
The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
This algorithm requires the number of clusters to be specified.
### Constructor Parameters
* $clustersNumber - number of clusters to find
* $initialization - initialization method, default kmeans++ (see below)
```
$kmeans = new KMeans(2);
$kmeans = new KMeans(4, KMeans::INIT_RANDOM);
```
### Clustering
To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.
```
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
$kmeans = new KMeans(2);
$kmeans->cluster($samples);
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]]
```
### Initialization methods
#### kmeans++ (default)
K-means++ method selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.
It use the DASV seeding method consists of finding good initial centroids for the clusters.
#### random
Random initialization method chooses completely random centroid. It get the space boundaries to avoid placing clusters centroid too far from samples data.

View File

@ -12,4 +12,4 @@ Helper class that loads data from CSV file. It extends the `ArrayDataset`.
$dataset = new CsvDataset('dataset.csv', 2, true);
```
See Array Dataset for more information.
See [ArrayDataset](machine-learning/datasets/array-dataset/) for more information.

View File

@ -17,7 +17,7 @@ To load Iris dataset simple use:
$dataset = new Iris();
```
### Several samples
### Several samples example
```
sepal length,sepal width,petal length,petal width,class

View File

@ -4,7 +4,7 @@ Class for calculate classifier accuracy.
### Score
To calculate classifier accuracy score use `score` static method. Parametrs:
To calculate classifier accuracy score use `score` static method. Parameters:
* $actualLabels - (array) true sample labels
* $predictedLabels - (array) predicted labels (e.x. from test group)

View File

@ -1,3 +0,0 @@
# Chebyshev
Class for calculation Chebyshev distance.

View File

@ -1,16 +0,0 @@
# Euclidean
Class for calculation Euclidean distance.
![euclidean](https://upload.wikimedia.org/math/8/4/9/849f040fd10bb86f7c85eb0bbe3566a4.png "Euclidean Distance")
To calculate distance:
```
$a = [4, 6];
$b = [2, 5];
$euclidean = new Euclidean();
$euclidean->distance($a, $b);
// return 2.2360679774998
```

View File

@ -1,16 +0,0 @@
# Manhattan
Class for calculation Manhattan distance.
![manhattan](https://upload.wikimedia.org/math/4/c/5/4c568bd1d76a6b15e19cb2ac3ad75350.png "Manhattan Distance")
To calculate distance:
```
$a = [4, 6];
$b = [2, 5];
$manhattan = new Manhattan();
$manhattan->distance($a, $b);
// return 3
```

View File

@ -1 +0,0 @@
# Minkowski

View File

@ -0,0 +1,51 @@
# LeastSquares Linear Regression
Linear model that use least squares method to approximate solution.
### Train
To train a model simply provide train samples and targets values (as `array`). Example:
```
$samples = [[60], [61], [62], [63], [65]];
$targets = [3.1, 3.6, 3.8, 4, 4.1];
$regression = new LeastSquares();
$regression->train($samples, $targets);
```
### Predict
To predict sample target value use `predict` method with sample to check (as `array`). Example:
```
$regression->predict([64]);
// return 4.06
```
### Multiple Linear Regression
The term multiple attached to linear regression means that there are two or more sample parameters used to predict target.
For example you can use: mileage and production year to predict price of a car.
```
$samples = [[73676, 1996], [77006, 1998], [10565, 2000], [146088, 1995], [15000, 2001], [65940, 2000], [9300, 2000], [93739, 1996], [153260, 1994], [17764, 2002], [57000, 1998], [15000, 2000]];
$targets = [2000, 2750, 15500, 960, 4400, 8800, 7100, 2550, 1025, 5900, 4600, 4400];
$regression = new LeastSquares();
$regression->train($samples, $targets);
$regression->predict([60000, 1996])
// return 4094.82
```
### Intercept and Coefficients
After you train your model you can get the intercept and coefficients array.
```
$regression->getIntercept();
// return -7.9635135135131
$regression->getCoefficients();
// return [array(1) {[0]=>float(0.18783783783783)}]
```

109
docs/math/distance.md Normal file
View File

@ -0,0 +1,109 @@
# Distance
Selected algorithms require the use of a function for calculating the distance.
### Euclidean
Class for calculation Euclidean distance.
![euclidean](https://upload.wikimedia.org/math/8/4/9/849f040fd10bb86f7c85eb0bbe3566a4.png "Euclidean Distance")
To calculate Euclidean distance:
```
$a = [4, 6];
$b = [2, 5];
$euclidean = new Euclidean();
$euclidean->distance($a, $b);
// return 2.2360679774998
```
### Manhattan
Class for calculation Manhattan distance.
![manhattan](https://upload.wikimedia.org/math/4/c/5/4c568bd1d76a6b15e19cb2ac3ad75350.png "Manhattan Distance")
To calculate Manhattan distance:
```
$a = [4, 6];
$b = [2, 5];
$manhattan = new Manhattan();
$manhattan->distance($a, $b);
// return 3
```
### Chebyshev
Class for calculation Chebyshev distance.
![chebyshev](https://upload.wikimedia.org/math/7/1/2/71200f7dbb43b3bcfbcbdb9e02ab0a0c.png "Chebyshev Distance")
To calculate Chebyshev distance:
```
$a = [4, 6];
$b = [2, 5];
$chebyshev = new Chebyshev();
$chebyshev->distance($a, $b);
// return 2
```
### Minkowski
Class for calculation Minkowski distance.
![minkowski](https://upload.wikimedia.org/math/a/a/0/aa0c62083c12390cb15ac3217de88e66.png "Minkowski Distance")
To calculate Minkowski distance:
```
$a = [4, 6];
$b = [2, 5];
$minkowski = new Minkowski();
$minkowski->distance($a, $b);
// return 2.080
```
You can provide the `lambda` parameter:
```
$a = [6, 10, 3];
$b = [2, 5, 5];
$minkowski = new Minkowski($lambda = 5);
$minkowski->distance($a, $b);
// return 5.300
```
### Custom distance
To apply your own function of distance use `Distance` interface. Example
```
class CustomDistance implements Distance
{
/**
* @param array $a
* @param array $b
*
* @return float
*/
public function distance(array $a, array $b): float
{
$distance = [];
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
$distance[] = $a[$i] * $b[$i];
}
return min($distance);
}
}
```

129
docs/math/matrix.md Normal file
View File

@ -0,0 +1,129 @@
# Matrix
Class that wraps PHP arrays to mathematical matrix.
### Creation
To create Matrix use simple arrays:
```
$matrix = new Matrix([
[3, 3, 3],
[4, 2, 1],
[5, 6, 7],
]);
```
You can also create Matrix (one dimension) from flat array:
```
$flatArray = [1, 2, 3, 4];
$matrix = Matrix::fromFlatArray($flatArray);
```
### Matrix data
Methods for reading data from Matrix:
```
$matrix->toArray(); // cast matrix to PHP array
$matrix->getRows(); // rows count
$matrix->getColumns(); // columns count
$matrix->getColumnValues($column=4); // get values from given column
```
### Determinant
Read more about [matrix determinant](https://en.wikipedia.org/wiki/Determinant).
```
$matrix = new Matrix([
[3, 3, 3],
[4, 2, 1],
[5, 6, 7],
]);
$matrix->getDeterminant();
// return -3
```
### Transpose
Read more about [matrix transpose](https://en.wikipedia.org/wiki/Transpose).
```
$matrix->transpose();
// return new Matrix
```
### Multiply
Multiply Matrix by another Matrix.
```
$matrix1 = new Matrix([
[1, 2, 3],
[4, 5, 6],
]);
$matrix2 = new Matrix([
[7, 8],
[9, 10],
[11, 12],
]);
$matrix1->multiply($matrix2);
// result $product = [
// [58, 64],
// [139, 154],
//];
```
### Divide by scalar
You can divide Matrix by scalar value.
```
$matrix->divideByScalar(2);
```
### Inverse
Read more about [invertible matrix](https://en.wikipedia.org/wiki/Invertible_matrix)
```
$matrix = new Matrix([
[3, 4, 2],
[4, 5, 5],
[1, 1, 1],
]);
$matrix->inverse();
// result $inverseMatrix = [
// [0, -1, 5],
// [1 / 2, 1 / 2, -7 / 2],
// [-1 / 2, 1 / 2, -1 / 2],
//];
```
### Cross out
Cross out given row and column from Matrix.
```
$matrix = new Matrix([
[3, 4, 2],
[4, 5, 5],
[1, 1, 1],
]);
$matrix->crossOut(1, 1)
// result $crossOuted = [
// [3, 2],
// [1, 1],
//];
```

View File

@ -3,20 +3,23 @@ pages:
- Home: index.md
- Machine Learning:
- Classification:
- KNearestNeighbors: machine-learning/classification/knearestneighbors.md
- NaiveBayes: machine-learning/classification/naivebayes.md
- KNearestNeighbors: machine-learning/classification/k-nearest-neighbors.md
- NaiveBayes: machine-learning/classification/naive-bayes.md
- Regression:
- LeastSquares: machine-learning/regression/least-squares.md
- Clustering:
- KMeans: machine-learning/clustering/k-means.md
- DBSCAN: machine-learning/clustering/dbscan.md
- Cross Validation:
- RandomSplit: machine-learning/cross-validation/randomsplit.md
- RandomSplit: machine-learning/cross-validation/random-split.md
- Datasets:
- Array Dataset: machine-learning/datasets/array-dataset.md
- CSV Dataset: machine-learning/datasets/csv-dataset.md
- Demo:
- Ready to use datasets:
- Iris: machine-learning/datasets/demo/iris.md
- Metric:
- Accuracy: machine-learning/metric/accuracy.md
- Distance:
- Euclidean: machine-learning/metric/distance/euclidean.md
- Chebyshev: machine-learning/metric/distance/chebyshev.md
- Manhattan: machine-learning/metric/distance/manhattan.md
- Minkowski: machine-learning/metric/distance/minkowski.md
theme: readthedocs
- Math:
- Distance: math/distance.md
- Matrix: math/matrix.md
theme: readthedocs