Deep feature synthesis
Deep Feature Synthesis is an algorithm developed by James Max Kanter and Kalyan Veeramachaneni in their paper "Deep Feature Synthesis: Towards Automating Data Science Endeavors" [1]
Definition
Quoting the above paper: "Deep Feature Synthesis is an algorithm that automatically generates features for relational datasets. In essence, the algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature."
Practical Results
Kanter and Veeramachaneni implemented the Deep Feature Synthesis algorithm in their Data Science Machine and proceeded to enter the automated results in several competitions:
Their results competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers' "Data Science Machine" finished ahead of 615. In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.[2]
Characteristics
Little to no human intervention.
Results in hours not weeks.
Relies on SQL schema and normalized table relationships.
Applications
Quickly create feature sets of predictive value.
Critique
The process of feature synthesis from relational data is known as propositionalization, which is known at least from 1991.[3] The employed algorithm in Deep feature synthesis was for the first time described by Knobbe in 2001 [4] and is known as RollUp. RollUp was later on enhanced in PRORED.[5] A commercial version of RollUp is sold under the name Safarii.
See also
References
- ↑ Kanter, Max; Veeramachaneni, Kalyan. "Deep Feature Synthesis: Towards Automating Data Science Endeavors" (PDF).
- ↑ Hardesty, Larry. "System that replaces human intuition with algorithms outperforms human teams".
- ↑ (ed.), European Working Session on Learning, Porto, Portugal, March 6–8, 1991 ; Y. Kodratoff (1991). Machine learning--EWSL-91 : proceedings. Berlin: Springer-Verlag. ISBN 0-387-53816-X.
- ↑ Knobbe, Arno (2001). "Propositionalisation and Aggregates". Principles of Data Mining and Knowledge Discovery: 277–288. doi:10.1007/3-540-44794-6_23.
- ↑ Gjorgjioski, Valentin. "Stochastic propositionalization of relational data using aggregates" (PDF).
Further reading
External links
- FeatureLab the author's spin off for algorithm applications