php-ml/docs/machine-learning/workflow/pipeline.md

# Pipeline

In machine learning, it is common to run a sequence of algorithms to process and learn from dataset. For example:

    * Split each document’s text into tokens.
    * Convert each document’s words into a numerical feature vector ([Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)).
    * Learn a prediction model using the feature vectors and labels.

PHP-ML represents such a workflow as a Pipeline, which consists of a sequence of transformers and an estimator.

### Constructor Parameters

* $transformers (array|Transformer[]) - sequence of objects that implements the Transformer interface
* $estimator (Estimator) - estimator that can train and predict

```
use Phpml\Classification\SVC;
use Phpml\FeatureExtraction\TfIdfTransformer;
use Phpml\Pipeline;

$transformers = [
    new TfIdfTransformer(),
];
$estimator = new SVC();

$pipeline = new Pipeline($transformers, $estimator);
```

### Example

First, our pipeline replaces the missing value, then normalizes samples and finally trains the SVC estimator.
Thus prepared pipeline repeats each transformation step for predicted sample.

```
use Phpml\Classification\SVC;
use Phpml\Pipeline;
use Phpml\Preprocessing\Imputer;
use Phpml\Preprocessing\Normalizer;
use Phpml\Preprocessing\Imputer\Strategy\MostFrequentStrategy;

$transformers = [
    new Imputer(null, new MostFrequentStrategy()),
    new Normalizer(),
];
$estimator = new SVC();

$samples = [
    [1, -1, 2],
    [2, 0, null],
    [null, 1, -1],
];

$targets = [
    4,
    1,
    4,
];

$pipeline = new Pipeline($transformers, $estimator);
$pipeline->train($samples, $targets);

$predicted = $pipeline->predict([[0, 0, 0]]);

// $predicted == 4
```
-												add docs for Pipeline

											
										
										
											2016-07-11 22:00:17 +00:00
+								# Pipeline
 								In machine learning, it is common to run a sequence of algorithms to process and learn from dataset. For example:
 								    * Split each document’s text into tokens.
 								    * Convert each document’s words into a numerical feature vector ([Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)).
 								    * Learn a prediction model using the feature vectors and labels.
-												Updates to the documentation (linguistic corrections) (#414)

* Fix typo in Features list

* Update distance.md documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

											
										
										
											2019-11-02 10:41:34 +00:00
+								PHP-ML represents such a workflow as a Pipeline, which consists of a sequence of transformers and an estimator.
-												add docs for Pipeline

											
										
										
											2016-07-11 22:00:17 +00:00
 								### Constructor Parameters
-												Updates to the documentation (linguistic corrections) (#414)

* Fix typo in Features list

* Update distance.md documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

											
										
										
											2019-11-02 10:41:34 +00:00
+								* $transformers (array|Transformer[]) - sequence of objects that implements the Transformer interface
-												add docs for Pipeline

											
										
										
											2016-07-11 22:00:17 +00:00
+								* $estimator (Estimator) - estimator that can train and predict
 								```
 								use Phpml\Classification\SVC;
 								use Phpml\FeatureExtraction\TfIdfTransformer;
 								use Phpml\Pipeline;
 								$transformers = [
 								    new TfIdfTransformer(),
 								];
 								$estimator = new SVC();
 								$pipeline = new Pipeline($transformers, $estimator);
 								```
 								### Example
-												Updates to the documentation (linguistic corrections) (#414)

* Fix typo in Features list

* Update distance.md documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

											
										
										
											2019-11-02 10:41:34 +00:00
+								First, our pipeline replaces the missing value, then normalizes samples and finally trains the SVC estimator.
 								Thus prepared pipeline repeats each transformation step for predicted sample.
-												add docs for Pipeline

											
										
										
											2016-07-11 22:00:17 +00:00
 								```
 								use Phpml\Classification\SVC;
 								use Phpml\Pipeline;
 								use Phpml\Preprocessing\Imputer;
 								use Phpml\Preprocessing\Normalizer;
 								use Phpml\Preprocessing\Imputer\Strategy\MostFrequentStrategy;
 								$transformers = [
 								    new Imputer(null, new MostFrequentStrategy()),
 								    new Normalizer(),
 								];
 								$estimator = new SVC();
 								$samples = [
 								    [1, -1, 2],
 								    [2, 0, null],
 								    [null, 1, -1],
 								];
 								$targets = [
 ,
 ,
 ,
 								];
 								$pipeline = new Pipeline($transformers, $estimator);
 								$pipeline->train($samples, $targets);
 								$predicted = $pipeline->predict([[0, 0, 0]]);
 								// $predicted == 4
 								```