create docs for tf-idf transformer

2025-02-13 09:28:44 +00:00 · 2016-07-12 00:21:34 +02:00 · 2016-07-12 00:21:34 +02:00 · 7c0767c15a
commit 7c0767c15a
parent ba8927459c
6 changed files with 47 additions and 2 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -9,7 +9,7 @@ This changelog references the relevant changes done in PHP-ML library.

 * 0.1.1 (2016-07-12)
    * feature [Cross Validation] Stratified Random Split - equal distribution for targets in split
-    * feature [General] Documentation - add missing pages and fix links
+    * feature [General] Documentation - add missing pages (Pipeline, ConfusionMatrix and TfIdfTransformer) and fix links 

 * 0.1.0 (2016-07-08)
    * first develop release
--- a/README.md
+++ b/README.md
@ -59,6 +59,7 @@ composer require php-ai/php-ml
    * [Imputation missing values](http://php-ml.readthedocs.io/en/latest/machine-learning/preprocessing/imputation-missing-values/)
 * Feature Extraction
    * [Token Count Vectorizer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/token-count-vectorizer/)
+    * [Tf-idf Transformer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/tf-idf-transformer/)
 * Datasets
    * [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset/)
    * Ready to use:
--- a/docs/index.md
+++ b/docs/index.md
@ -59,6 +59,7 @@ composer require php-ai/php-ml
    * [Imputation missing values](machine-learning/preprocessing/imputation-missing-values/)
 * Feature Extraction
    * [Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)
+    * [Tf-idf Transformer](machine-learning/feature-extraction/tf-idf-transformer/)
 * Datasets
    * [CSV](machine-learning/datasets/csv-dataset/)
    * Ready to use:
--- a/docs/machine-learning/feature-extraction/tf-idf-transformer.md
+++ b/docs/machine-learning/feature-extraction/tf-idf-transformer.md
@ -0,0 +1,42 @@
+# Tf-idf Transformer
+
+Tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
+
+### Constructor Parameters
+
+* $samples (array) - samples for fit tf-idf model
+
+```
+use Phpml\FeatureExtraction\TfIdfTransformer;
+
+$samples = [
+    [1, 2, 4],
+    [0, 2, 1]
+];
+
+$transformer = new TfIdfTransformer($samples);
+```
+
+### Transformation
+
+To transform a collection of text samples use `transform` method. Example:
+
+```
+use Phpml\FeatureExtraction\TfIdfTransformer;
+
+$samples = [
+    [0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 0, 5 => 0],
+    [0 => 1, 1 => 1, 2 => 0, 3 => 0, 4 => 2, 5 => 3],
+];
+        
+$transformer = new TfIdfTransformer($samples);
+$transformer->transform($samples);
+
+/*
+$samples = [
+   [0 => 0, 1 => 0, 2 => 0.602, 3 => 0.301, 4 => 0, 5 => 0],
+   [0 => 0, 1 => 0, 2 => 0, 3 => 0, 4 => 0.602, 5 => 0.903],
+];
+*/
+        
+```
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -25,6 +25,7 @@ pages:
      - Imputation missing values: machine-learning/preprocessing/imputation-missing-values.md
    - Feature Extraction:
      - Token Count Vectorizer: machine-learning/feature-extraction/token-count-vectorizer.md
+      - Tf-idf Transformer: machine-learning/feature-extraction/tf-idf-transformer.md
    - Datasets:
      - Array Dataset: machine-learning/datasets/array-dataset.md
      - CSV Dataset: machine-learning/datasets/csv-dataset.md
--- a/tests/Phpml/FeatureExtraction/TfIdfTransformerTest.php
+++ b/tests/Phpml/FeatureExtraction/TfIdfTransformerTest.php
@ -10,7 +10,7 @@ class TfIdfTransformerTest extends \PHPUnit_Framework_TestCase
 {
    public function testTfIdfTransformation()
    {
-        //https://en.wikipedia.org/wiki/Tf%E2%80%93idf
+        // https://en.wikipedia.org/wiki/Tf-idf

        $samples = [
            [0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 0, 5 => 0],