Commit Graph

228 Commits

Author SHA1 Message Date
David Monllaó
12b8b118dd Fix division by 0 error during normalization (#83)
* Fix division by 0 error during normalization

std is 0 when a feature has the same value in samples.

* Expand std normalization test
2017-04-24 11:47:30 +02:00
Mustafa Karabulut
a87859dd97 Linear algebra operations, Dimensionality reduction and some other minor changes (#81)
* Lineer Algebra operations

* Covariance

* PCA and KernelPCA

* Tests for PCA, Eigenvalues and Covariance

* KernelPCA update

* KernelPCA and its test

* KernelPCA and its test

* MatrixTest, KernelPCA and PCA tests

* Readme update

* Readme update
2017-04-23 09:03:30 +02:00
Arkadiusz Kondas
6296e44db0 cs fixer 2017-04-19 22:28:07 +02:00
David Monllaó
e1854d44a2 Partial training base (#78)
* Cost values for multiclass OneVsRest uses

* Partial training interface

* Reduce linear classifiers memory usage

* Testing partial training and isolated training

* Partial trainer naming switched to incremental estimator

Other changes according to review's feedback.

* Clean optimization data once optimize is finished

* Abstract resetBinary
2017-04-19 22:26:31 +02:00
Humberto Castelo Branco
b27f08f420 Add delimiter option for CsvDataset (#66)
Useful option when the CSV file uses another delimiter character other than the comma, for example, as the semicolon or tab character.
2017-03-29 12:58:12 +02:00
Mustafa Karabulut
49234429f0 LogisticRegression classifier & Optimization methods (#63)
* LogisticRegression classifier & Optimization methods

* Minor fixes to Logistic Regression & Optimizers PR

* Addition for getCostValues() method
2017-03-27 23:46:53 +02:00
Kyle Warren
c44f3b2730 Additional training for SVR (#59)
* additional training SVR

* additional training SVR, missed old labels reference

* SVM labels parameter now targets

* SVM member labels now targets

* SVM init targets empty array
2017-03-17 11:44:45 +01:00
Arkadiusz Kondas
39747efdc1 Update dependecies and coding style fixes 2017-03-05 16:45:48 +01:00
Arkadiusz Kondas
c6fbb83573 Add typehints to DecisionTree 2017-03-05 16:25:01 +01:00
Mustafa Karabulut
01bb82a2a7 One-v-Rest Classification technique applied to linear classifiers (#54)
* One-v-Rest Classification technique applied to linear classifiers

* Fix for Apriori

* Fixes for One-v-Rest

* One-v-Rest test cases
2017-03-05 09:43:19 +01:00
Arkadiusz Kondas
63c63dfba2 Add no_unused_imports rule to cs-fixer 2017-03-01 10:16:15 +01:00
Mustafa Karabulut
c028a73985 AdaBoost improvements (#53)
* AdaBoost improvements

* AdaBoost improvements & test case resolved

* Some coding style fixes
2017-02-28 21:45:18 +01:00
Arkadiusz Kondas
e8c6005aec Update changelog and cs fixes 2017-02-23 20:59:30 +01:00
Mustafa Karabulut
4daa0a222a AdaBoost algorithm along with some improvements (#51) 2017-02-21 10:38:18 +01:00
Mustafa Karabulut
cf222bcce4 Linear classifiers: Perceptron, Adaline, DecisionStump (#50)
* Linear classifiers

* Code formatting to PSR-2

* Added basic test cases for linear classifiers
2017-02-16 23:23:55 +01:00
Povilas Susinskas
f0a7984f39 Check if matrix is singular doing inverse (#49)
* Check if matrix is singular doing inverse

* add return bool type
2017-02-15 10:09:16 +01:00
Mustafa Karabulut
a33d5fe9c8 RandomForest::getFeatureImportances() method (#47)
* RandomForest::getFeatureImportances() method

* CsvDataset update for column names
2017-02-13 21:23:18 +01:00
Mustafa Karabulut
0a58a71d77 Euclidean optimization (#42)
* Euclidean optimization

* Euclidean with foreach
2017-02-09 10:30:38 +01:00
Mustafa Karabulut
1d73503958 Ensemble Classifiers : Bagging and RandomForest (#36)
* Fuzzy C-Means implementation

* Update FuzzyCMeans

* Rename FuzzyCMeans to FuzzyCMeans.php

* Update NaiveBayes.php

* Small fix applied to improve training performance

array_unique is replaced with array_count_values+array_keys which is way
faster

* Revert "Small fix applied to improve training performance"

This reverts commit c20253f16ac3e8c37d33ecaee28a87cc767e3b7f.

* Revert "Revert "Small fix applied to improve training performance""

This reverts commit ea10e136c4c11b71609ccdcaf9999067e4be473e.

* Revert "Small fix applied to improve training performance"

This reverts commit c20253f16ac3e8c37d33ecaee28a87cc767e3b7f.

* First DecisionTree implementation

* Revert "First DecisionTree implementation"

This reverts commit 4057a08679c26010c39040a48a3e6dad994a1a99.

* DecisionTree

* FCM Test

* FCM Test

* DecisionTree Test

* Ensemble classifiers: Bagging and RandomForests

* test

* Fixes for conflicted files

* Bagging and RandomForest ensemble algorithms

* Changed unit test

* Changed unit test

* Changed unit test

* Bagging and RandomForest ensemble algorithms

* Baggging and RandomForest ensemble algorithms

* Bagging and RandomForest ensemble algorithms

RandomForest algorithm is improved with changes to original DecisionTree

* Bagging and RandomForest ensemble algorithms

* Slight fix about use of global Exception class

* Fixed the error about wrong use of global Exception class

* RandomForest code formatting
2017-02-07 12:37:56 +01:00
Arkadiusz Kondas
b7c9983524 Do not requre file to exist for model manager 2017-02-03 17:48:15 +01:00
Arkadiusz Kondas
858d13b0fa Update phpunit to 6.0 2017-02-03 12:58:25 +01:00
David Monllaó
8f122fde90 Persistence class to save and restore models (#37)
* Models manager with save/restore capabilities

* Refactoring dataset exceptions

* Persistency layer docs

* New tests for serializable estimators

* ModelManager static methods to instance methods
2017-02-02 09:03:09 +01:00
David Monllaó
c1b1a5d6ac Support for multiple training datasets (#38)
* Multiple training data sets allowed

* Tests with multiple training data sets

* Updating docs according to #38

Documenting all models which predictions will be based on all
training data provided.

Some models already supported multiple training data sets.
2017-02-01 19:06:38 +01:00
Arkadiusz Kondas
c3686358b3 Add rules for new cs-fixer 2017-01-31 20:33:08 +01:00
Mustafa Karabulut
87396ebe58 DecisionTree and Fuzzy C Means classifiers (#35)
* Fuzzy C-Means implementation

* Update FuzzyCMeans

* Rename FuzzyCMeans to FuzzyCMeans.php

* Update NaiveBayes.php

* Small fix applied to improve training performance

array_unique is replaced with array_count_values+array_keys which is way
faster

* Revert "Small fix applied to improve training performance"

This reverts commit c20253f16ac3e8c37d33ecaee28a87cc767e3b7f.

* Revert "Revert "Small fix applied to improve training performance""

This reverts commit ea10e136c4c11b71609ccdcaf9999067e4be473e.

* Revert "Small fix applied to improve training performance"

This reverts commit c20253f16ac3e8c37d33ecaee28a87cc767e3b7f.

* DecisionTree

* FCM Test

* FCM Test

* DecisionTree Test
2017-01-31 20:27:15 +01:00
Mustafa Karabulut
95fc139170 Update Cluster.php (#32) 2017-01-23 09:24:50 +01:00
Arkadiusz Kondas
d19ddb8507 Apply cs fixes for NaiveBayes 2017-01-17 16:26:43 +01:00
Mustafa Karabulut
e603d60841 Update NaiveBayes.php (#30)
* Update NaiveBayes.php

* Update NaiveBayes.php

* Update NaiveBayes.php

Update to fix "predictSample" function to enable it handle samples given as multi-dimensional arrays.

* Update NaiveBayes.php

* Update NaiveBayes.php
2017-01-17 16:21:58 +01:00
Arkadiusz Kondas
4dc82710c8 Replace rand with newer versions random_int 2016-12-12 19:09:45 +01:00
Arkadiusz Kondas
2363bbaa75 Add type hint and exceptions annotation 2016-12-12 19:02:09 +01:00
Arkadiusz Kondas
d32197100e Fix docblock 2016-12-12 18:50:27 +01:00
Arkadiusz Kondas
fd85033339 Use __DIR__ instead of dirname 2016-12-12 18:45:14 +01:00
Arkadiusz Kondas
a4f65bd13f Short syntax for applied operations 2016-12-12 18:34:20 +01:00
Arkadiusz Kondas
df28656d0d Fixes after new php-cs-fixer v2.0 2016-12-12 18:11:57 +01:00
Arkadiusz Kondas
8aad8afc37 Add null coalesce operator in token count vectoriezer 2016-12-08 00:45:42 +01:00
Arkadiusz Kondas
38a26d185f Secure index access and type safe comparision in statistic median 2016-12-06 09:03:02 +01:00
Arkadiusz Kondas
6d11116994 Fix default prameters values 2016-12-06 08:55:52 +01:00
Arkadiusz Kondas
9764890ccb Change floatvar to float casting (up to 6 times faster) 2016-12-06 08:52:33 +01:00
Arkadiusz Kondas
d00b7e5668 Secure uniqid usage 2016-12-06 08:50:18 +01:00
Arkadiusz Kondas
a61704501d Fix type compatibility for Minkowski distance 2016-12-06 08:48:45 +01:00
Arkadiusz Kondas
c4f0d1e3b0 Make csv reader binary safe 2016-12-06 08:46:55 +01:00
Arkadiusz Kondas
cbdc049526 Update php-cs-fixer 2016-11-20 22:53:17 +01:00
Arkadiusz Kondas
bca2196b57 Prevent Division by zero error in classification report 2016-11-20 22:49:26 +01:00
Arkadiusz Kondas
349ea16f01 Rename demo datasets and add Dataset suffix 2016-09-30 14:02:08 +02:00
Arkadiusz Kondas
84af842f04 Fix division by zero in ClassificationReport #21 2016-09-27 20:07:21 +02:00
Arkadiusz Kondas
1ce6bb544b Run php-cs-fixer 2016-09-21 21:51:19 +02:00
Patrick Florek
fa87eca375 Add new class Set for simple Set-theoretical operations
### Features

* Works only with primitive types int, float, string
* Implements set theortic operations union, intersection, complement
* Modifies set by adding, removing elements
* Implements \IteratorAggregate for use in loops

### Implementation details

Based on array functions:
* array_diff,
* array_merge,
* array_intersection,
* array_unique,
* array_values,
* sort.

### Drawbacks

* **Do not work with objects.**
* Power set and Cartesian product returning array of Set
2016-09-10 13:24:43 +02:00
Patrick Florek
90038befa9 Apply comments / coding styles
* Remove user-specific gitignore
* Add return type hints
* Avoid global namespace in docs
* Rename rules -> getRules
* Split up rule generation

Todo:
* Move set theory out to math
* Extract rule generation
2016-09-02 00:26:01 +02:00
Patrick Florek
c8bd8db601 # Association rule learning - Apriori algorithm
* Generating frequent k-length item sets
* Generating rules based on frequent item sets
* Algorithm has exponential complexity, be aware of it
* Apriori algorithm is split into apriori and candidates method
* Second step rule generation is implemented by rules method
* Internal methods are invoked for fine grain unit tests
* Wikipedia's train samples and an alternative are provided for test cases
* Small documentation for public interface is also shipped
2016-08-23 15:44:53 +02:00
Arkadiusz Kondas
638119fc98 code style fixes 2016-08-14 18:27:08 +02:00
Arkadiusz Kondas
f0bd5ae424 Create MLP Regressor draft 2016-08-12 16:29:50 +02:00
Arkadiusz Kondas
2412f15923 Add activationFunction parameter for Perceptron and Layer 2016-08-11 13:21:22 +02:00
Arkadiusz Kondas
c506a84164 refactor Backpropagation methods and simplify things 2016-08-10 23:03:02 +02:00
Arkadiusz Kondas
66d029e94f implement and test Backpropagation training 2016-08-10 22:43:47 +02:00
Arkadiusz Kondas
72afeb7040 implements and test multilayer perceptron methods 2016-08-09 13:27:43 +02:00
Arkadiusz Kondas
ddb3cc367b test abstraction from LayeredNetwork 2016-08-07 23:41:02 +02:00
Arkadiusz Kondas
12ee62bbca create Network and Training contracts 2016-08-05 16:12:39 +02:00
Arkadiusz Kondas
95b29d40b1 add Layer, Input and Bias for neutal network 2016-08-05 10:20:31 +02:00
Arkadiusz Kondas
7062ee29e1 add Neuron and Synapse classes 2016-08-02 20:30:20 +02:00
Arkadiusz Kondas
f186aa9c0b extract functions from loops and remove unused code 2016-08-02 13:23:58 +02:00
Arkadiusz Kondas
637fd613b8 implement activation function for neural network 2016-08-02 13:07:47 +02:00
Pablo Joán Iglesias
bbbf5cfc9d For each body should be wrapped in an if statement (#14)
unit test to go with commit
2016-07-26 08:14:57 +02:00
Arkadiusz Kondas
403824d23b test exception on kmeans 2016-07-24 14:01:17 +02:00
Arkadiusz Kondas
448eaafd78 remove unused exception 2016-07-24 13:52:52 +02:00
Arkadiusz Kondas
074dcf7470 php-cs-fixer 2016-07-19 21:59:23 +02:00
Arkadiusz Kondas
9665457159 implement ClassificationReport class 2016-07-19 21:58:59 +02:00
Arkadiusz Kondas
7abee3061a docs for files dataset and php-cs-fixer 2016-07-16 23:56:52 +02:00
Arkadiusz Kondas
e0b560f31d create FilesDataset class 2016-07-16 23:29:40 +02:00
Arkadiusz Kondas
9f140d5b6f fix problem with token count vectorizer array order 2016-07-14 13:25:11 +02:00
Arkadiusz Kondas
f04cc04da5 create StratifiedRandomSplit for cross validation 2016-07-10 14:13:35 +02:00
Arkadiusz Kondas
adc2d1c81b change hhvm to 3.12 2016-07-07 23:38:11 +02:00
Arkadiusz Kondas
f3288c5946 fix scalar typehint for hhvm 2016-07-07 23:33:06 +02:00
Arkadiusz Kondas
4aa9702943 fix errors on hhvm with float casting 2016-07-07 22:47:36 +02:00
Arkadiusz Kondas
6c7416a9c4 implement ConfusionMatrix metric 2016-07-07 00:29:58 +02:00
Arkadiusz Kondas
cce68997a1 implement StopWords in TokenCountVectorizer 2016-07-06 23:22:29 +02:00
Arkadiusz Kondas
a2aa27adba fix problem in SVM with path on windows 2016-07-04 22:22:22 +02:00
Arkadiusz Kondas
9507d58a80 add support for osx 2016-07-01 22:25:57 +02:00
Arkadiusz Kondas
be7693ff2e remove osx from travis - dont work with php 7.0 2016-06-30 23:27:17 +02:00
Arkadiusz Kondas
601ff884e8 php-cs-fixer 2016-06-17 00:34:15 +02:00
Arkadiusz Kondas
424519cd83 implement fit fot TokenCountVectorizer 2016-06-17 00:33:48 +02:00
Arkadiusz Kondas
3e9e70810d implement fit on Imputer 2016-06-17 00:16:49 +02:00
Arkadiusz Kondas
557f344018 add fit method for Transformer interface 2016-06-17 00:08:10 +02:00
Arkadiusz Kondas
4554011899 rename labels to targets for Dataset 2016-06-16 23:56:15 +02:00
Arkadiusz Kondas
7f4a0b243f transform samples for prediction in pipeline 2016-06-16 16:10:46 +02:00
Arkadiusz Kondas
26f2cbabc4 fix Pipeline transformation 2016-06-16 10:26:29 +02:00
Arkadiusz Kondas
d21a401365 implement Tranformer interface on preprocessing classes 2016-06-16 10:03:57 +02:00
Arkadiusz Kondas
7c5e79d2c6 change transformer behavior to reference 2016-06-16 10:01:40 +02:00
Arkadiusz Kondas
374182a6d4 simple pipeline test 2016-06-16 09:58:12 +02:00
Arkadiusz Kondas
cab79e7e36 change interfaces and add Estimator and Pipeline 2016-06-16 09:00:10 +02:00
Arkadiusz Kondas
cc50d2c9b1 implement TfIdf transformation 2016-06-15 16:04:09 +02:00
Arkadiusz Kondas
8a65026642 rename interface Vectorizer to Transformer 2016-06-15 14:09:49 +02:00
Arkadiusz Kondas
da6d94cc46 create stop words class 2016-06-14 11:54:04 +02:00
Arkadiusz Kondas
1ac4b44ee4 create stop words class 2016-06-14 11:53:58 +02:00
Arkadiusz Kondas
2f51716388 change token count vectorizer to return full token counts 2016-06-14 09:58:11 +02:00
Arkadiusz Kondas
fb04b57853 implement data Normalizer with L1 and L2 norm 2016-05-08 20:35:01 +02:00
Arkadiusz Kondas
65cdfe64b2 implement Median and MostFrequent strategy for imputer 2016-05-08 19:33:39 +02:00
Arkadiusz Kondas
a761d0e8f2 mode (dominant) from numbers 2016-05-08 19:23:54 +02:00
Arkadiusz Kondas
ed1e07e803 median function in statistic 2016-05-08 19:12:39 +02:00
Arkadiusz Kondas
b0ab236ab9 create imputer tool for completing missing values 2016-05-08 14:47:17 +02:00
Arkadiusz Kondas
46197eba7b add word tokenizer 2016-05-07 23:17:52 +02:00