2018-02-10 17:07:09 +00:00
|
|
|
|
# Variance Threshold
|
|
|
|
|
|
2019-11-02 10:41:34 +00:00
|
|
|
|
`VarianceThreshold` is a simple baseline approach to feature selection.
|
|
|
|
|
It removes all features whose variance doesn’t meet some threshold.
|
2018-02-10 17:07:09 +00:00
|
|
|
|
By default, it removes all zero-variance features, i.e. features that have the same value in all samples.
|
|
|
|
|
|
|
|
|
|
## Constructor Parameters
|
|
|
|
|
|
|
|
|
|
* $threshold (float) - features with a variance lower than this threshold will be removed (default 0.0)
|
|
|
|
|
|
|
|
|
|
```php
|
|
|
|
|
use Phpml\FeatureSelection\VarianceThreshold;
|
|
|
|
|
|
|
|
|
|
$transformer = new VarianceThreshold(0.15);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Example of use
|
|
|
|
|
|
2019-11-02 10:41:34 +00:00
|
|
|
|
As an example, suppose that we have a dataset with boolean features and
|
2018-02-10 17:07:09 +00:00
|
|
|
|
we want to remove all features that are either one or zero (on or off)
|
2019-11-02 10:41:34 +00:00
|
|
|
|
in more than 80% of the samples.
|
|
|
|
|
Boolean features are Bernoulli random variables, and the variance of such
|
2018-02-10 17:07:09 +00:00
|
|
|
|
variables is given by
|
|
|
|
|
```
|
|
|
|
|
Var[X] = p(1 - p)
|
|
|
|
|
```
|
|
|
|
|
so we can select using the threshold .8 * (1 - .8):
|
|
|
|
|
|
|
|
|
|
```php
|
|
|
|
|
use Phpml\FeatureSelection\VarianceThreshold;
|
|
|
|
|
|
|
|
|
|
$samples = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]];
|
|
|
|
|
$transformer = new VarianceThreshold(0.8 * (1 - 0.8));
|
|
|
|
|
|
|
|
|
|
$transformer->fit($samples);
|
|
|
|
|
$transformer->transform($samples);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
$samples = [[0, 1], [1, 0], [0, 0], [1, 1], [1, 0], [1, 1]];
|
|
|
|
|
*/
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Pipeline
|
|
|
|
|
|
|
|
|
|
`VarianceThreshold` implements `Transformer` interface so it can be used as part of pipeline:
|
|
|
|
|
|
|
|
|
|
```php
|
|
|
|
|
use Phpml\FeatureSelection\VarianceThreshold;
|
|
|
|
|
use Phpml\Classification\SVC;
|
|
|
|
|
use Phpml\FeatureExtraction\TfIdfTransformer;
|
|
|
|
|
use Phpml\Pipeline;
|
|
|
|
|
|
|
|
|
|
$transformers = [
|
|
|
|
|
new TfIdfTransformer(),
|
|
|
|
|
new VarianceThreshold(0.1)
|
|
|
|
|
];
|
|
|
|
|
$estimator = new SVC();
|
|
|
|
|
|
|
|
|
|
$pipeline = new Pipeline($transformers, $estimator);
|
|
|
|
|
```
|