php-ml/docs/machine-learning/workflow/pipeline.md

66 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Pipeline
In machine learning, it is common to run a sequence of algorithms to process and learn from dataset. For example:
* Split each documents text into tokens.
* Convert each documents words into a numerical feature vector ([Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)).
* Learn a prediction model using the feature vectors and labels.
PHP-ML represents such a workflow as a Pipeline, which consists of a sequence of transformers and an estimator.
### Constructor Parameters
* $transformers (array|Transformer[]) - sequence of objects that implements the Transformer interface
* $estimator (Estimator) - estimator that can train and predict
```
use Phpml\Classification\SVC;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Pipeline;
$transformers = [
new TfIdfTransformer(),
];
$estimator = new SVC();
$pipeline = new Pipeline($transformers, $estimator);
```
### Example
First, our pipeline replaces the missing value, then normalizes samples and finally trains the SVC estimator.
Thus prepared pipeline repeats each transformation step for predicted sample.
```
use Phpml\Classification\SVC;
use Phpml\Pipeline;
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Normalizer;
use Phpml\Preprocessing\Imputer\Strategy\MostFrequentStrategy;
$transformers = [
new Imputer(null, new MostFrequentStrategy()),
new Normalizer(),
];
$estimator = new SVC();
$samples = [
[1, -1, 2],
[2, 0, null],
[null, 1, -1],
];
$targets = [
4,
1,
4,
];
$pipeline = new Pipeline($transformers, $estimator);
$pipeline->train($samples, $targets);
$predicted = $pipeline->predict([[0, 0, 0]]);
// $predicted == 4
```