mirror of
https://github.com/Llewellynvdm/php-ml.git
synced 2024-11-12 00:16:31 +00:00
7d5c6b15a4
* Fix typo in Features list * Update distance.md documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation * Fix grammatical mistakes in documentation
40 lines
1.5 KiB
Markdown
40 lines
1.5 KiB
Markdown
# K-means clustering
|
|
|
|
The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
|
|
This algorithm requires the number of clusters to be specified.
|
|
|
|
### Constructor Parameters
|
|
|
|
* $clustersNumber - number of clusters to find
|
|
* $initialization - initialization method, default kmeans++ (see below)
|
|
|
|
```
|
|
$kmeans = new KMeans(2);
|
|
$kmeans = new KMeans(4, KMeans::INIT_RANDOM);
|
|
```
|
|
|
|
### Clustering
|
|
|
|
To divide the samples into clusters, simply use the `cluster` method. It returns the `array` of clusters with samples inside.
|
|
|
|
```
|
|
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
|
|
Or if you need to keep your identifiers along with yours samples you can use array keys as labels.
|
|
$samples = [ 'Label1' => [1, 1], 'Label2' => [8, 7], 'Label3' => [1, 2]];
|
|
|
|
$kmeans = new KMeans(2);
|
|
$kmeans->cluster($samples);
|
|
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]] or [0=>['Label1' => [1, 1], 'Label3' => [1, 2], ...], 1=>['Label2' => [8, 7], ...]]
|
|
```
|
|
|
|
### Initialization methods
|
|
|
|
#### kmeans++ (default)
|
|
|
|
K-means++ method selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.
|
|
It uses the DASV seeding method consists of finding good initial centroids for the clusters.
|
|
|
|
#### random
|
|
|
|
Random initialization method chooses completely random centroid. It gets the space boundaries to avoid placing cluster centroids too far from samples data.
|