php-ml/docs/machine-learning/preprocessing/imputation-missing-values.md
Attila Bakos 7d5c6b15a4 Updates to the documentation (linguistic corrections) (#414)
* Fix typo in Features list

* Update distance.md documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation
2019-11-02 11:41:34 +01:00

1.7 KiB

Imputation missing values

For various reasons, many real world datasets contain missing values, often encoded as blanks, NaNs or other placeholders. To solve this problem you can use the Imputer class.

Constructor Parameters

  • $missingValue (mixed) - this value will be replaced (default null)
  • $strategy (Strategy) - imputation strategy (read to use: MeanStrategy, MedianStrategy, MostFrequentStrategy)
  • $axis (int) - axis for strategy, Imputer::AXIS_COLUMN or Imputer::AXIS_ROW
  • $samples (array) - array of samples to train
$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer = new Imputer(null, new MedianStrategy(), Imputer::AXIS_ROW);

Strategy

  • MeanStrategy - replace missing values using the mean along the axis
  • MedianStrategy - replace missing values using the median along the axis
  • MostFrequentStrategy - replace missing using the most frequent value along the axis

Example of use

use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;

$data = [
    [1, null, 3, 4],
    [4, 3, 2, 1],
    [null, 6, 7, 8],
    [8, 7, null, 5],
];

$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN);
$imputer->fit($data);
$imputer->transform($data);

/*
$data = [
    [1, 5.33, 3, 4],
    [4, 3, 2, 1],
    [4.33, 6, 7, 8],
    [8, 7, 4, 5],
];
*/

You can also use the $samples constructor parameter instead of the fit method:

use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Imputer\Strategy\MeanStrategy;

$data = [
    [1, null, 3, 4],
    [4, 3, 2, 1],
    [null, 6, 7, 8],
    [8, 7, null, 5],
];

$imputer = new Imputer(null, new MeanStrategy(), Imputer::AXIS_COLUMN, $data);
$imputer->transform($data);