mirror of
https://github.com/Llewellynvdm/php-ml.git
synced 2024-11-22 21:15:10 +00:00
e255369636
* Update docs for Imputer class * Throw exception when trying to transform imputer without train data * Update changelog
1.7 KiB
1.7 KiB
Imputation missing values
For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders.
To solve this problem you can use the Imputer
class.
Constructor Parameters
- $missingValue (mixed) - this value will be replaced (default null)
- $strategy (Strategy) - imputation strategy (read to use: MeanStrategy, MedianStrategy, MostFrequentStrategy)
- $axis (int) - axis for strategy, Imputer::AXIS_COLUMN or Imputer::AXIS_ROW
- $samples (array) - array of samples to train
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer = new Imputer(null, new MedianStrategy(), Imputer::AXIS_ROW);
Strategy
- MeanStrategy - replace missing values using the mean along the axis
- MedianStrategy - replace missing values using the median along the axis
- MostFrequentStrategy - replace missing using the most frequent value along the axis
Example of use
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;
$data = [
[1, null, 3, 4],
[4, 3, 2, 1],
[null, 6, 7, 8],
[8, 7, null, 5],
];
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer->fit($data);
$imputer->transform($data);
/*
$data = [
[1, 5.33, 3, 4],
[4, 3, 2, 1],
[4.33, 6, 7, 8],
[8, 7, 4, 5],
];
*/
You can also use $samples
constructer parameter instead of fit
method:
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;
$data = [
[1, null, 3, 4],
[4, 3, 2, 1],
[null, 6, 7, 8],
[8, 7, null, 5],
];
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN, $data);
$imputer->transform($data);