php-ml/docs/machine-learning/preprocessing/imputation-missing-values.md

68 lines
1.7 KiB
Markdown
Raw Normal View History

2016-05-10 21:44:28 +00:00
# Imputation missing values
2016-05-14 19:30:13 +00:00
For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders.
To solve this problem you can use the `Imputer` class.
## Constructor Parameters
* $missingValue (mixed) - this value will be replaced (default null)
* $strategy (Strategy) - imputation strategy (read to use: MeanStrategy, MedianStrategy, MostFrequentStrategy)
* $axis (int) - axis for strategy, Imputer::AXIS_COLUMN or Imputer::AXIS_ROW
* $samples (array) - array of samples to train
2016-05-14 19:30:13 +00:00
```
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer = new Imputer(null, new MedianStrategy(), Imputer::AXIS_ROW);
```
## Strategy
* MeanStrategy - replace missing values using the mean along the axis
* MedianStrategy - replace missing values using the median along the axis
* MostFrequentStrategy - replace missing using the most frequent value along the axis
## Example of use
```
2016-05-31 16:02:30 +00:00
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;
2016-05-14 19:30:13 +00:00
$data = [
[1, null, 3, 4],
[4, 3, 2, 1],
[null, 6, 7, 8],
[8, 7, null, 5],
];
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer->fit($data);
$imputer->transform($data);
2016-05-14 19:30:13 +00:00
/*
$data = [
[1, 5.33, 3, 4],
[4, 3, 2, 1],
[4.33, 6, 7, 8],
[8, 7, 4, 5],
];
*/
```
You can also use `$samples` constructer parameter instead of `fit` method:
```
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;
$data = [
[1, null, 3, 4],
[4, 3, 2, 1],
[null, 6, 7, 8],
[8, 7, null, 5],
];
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN, $data);
$imputer->transform($data);
```