ML4all: scalable ML system for everyone
ML4all is a system that frees users from the burden of machine learning algorithm selection and lowlevel implementation details. It uses a new abstraction that is capable of solving most ML tasks and provides a costbased optimizer on top of the proposed abstraction for choosing the best gradient descent algorithm in a given setting. Our results show that ML4all is more than two orders of magnitude faster than stateoftheart systems and can process large datasets that were not possible before.
More details can be found in our dedicated SIGMOD publication and in Wayang's core system paper.
Abstraction
ML4all abstracts most ML algorithms with seven operators:

(1)
Transform
receives a data point to transform (e.g., normalize it) and outputs a new data point. 
(2)
Stage
initializes all the required global param eters (e.g., centroids for the kmeans algorithm). 
(3)
Compute
performs userdefined computations on the input data point and returns a new data point. For example, it can compute the nearest cen troid for each input data point. 
(4)
Update
updates the global parameters based on a userdefined formula. For example, it can update the new centroids based on the output computed by the Compute operator. 
(5)
Sample
takes as input the size of the desired sample and the data points to sample from and re turns a reduced set of sampled data points. 
(6)
Converge
specifies a function that outputs a convergence dataset required for determining whether the iterations should continue or stop. 
(7)
Loop
specifies the stopping condition on the convergence dataset.
Similar to MapReduce, where users need to implement a map and reduce function, users of ML4all wishing to develop their own algorithm should implement the above interfaces.
The interfaces can be found in org.apache.wayang.ml4all.abstraction.api
.
Examples for KMeans clustering and stochastic gradient descent can be found in org.apache.wayang.ml4all.algorithms
.
Example runs
 Kmeans:
./bin/wayangsubmit org.apache.wayang.ml4all.examples.RunKMeans java,spark <url_path_to_file>/USCensus1990sample.input 3 68 0 1
 SGD:
./bin/wayangsubmit org.apache.wayang.ml4all.examples.RunSGD spark <url_path_to_file>/adult.zeros.input 100827 123 10 0.001