Apache Wayang has a three-layer data processing abstraction that sits between user applications and data processing platforms, such as Hadoop and Spark. The figure below depicts the Apache Wayang architecture: (i) an application layer that models all application-specific logic; (ii) a core layer that provides the intermediate representation between applications and processing platforms; and (iii) a platform layer that embraces the underlying processing platforms. Overall, the input of an application layer comprises the logical operators provided by users (or generated by a declarative query parser) and the output is a physical plan (WayangPlan). The WayangPlan is then passed to the core layer where cross-platform optimizations take place to produce an execution plan (ExecutionPlan).
Notice that, in contrast to DBMSs, Apache Wayang decouples physical and execution levels. This separation allows applications to express physical plans in terms of algorithmic needs only, without being tied to a particular processing platform. The salient features of Apache Wayang are cross-platform task execution, high-performance, flexibility, and ease-of-use.
The most salient feature of Apache Wayang is its cross platform optimizer. Besides deciding the best processing platform to run any incoming task, Apache Wayang can run a single task on multiple processing platforms. Overall, it applies an extensible set of graph transformations to the Apache Wayang plan to find alternative better execution plans. After it compares all execution plans by using a platform-specific cost model. Cost functions can either be given or learned, and are parameterized with respect to the underlying hardware (e.g., number of computing nodes for distributed operators).
Apache Wayang provides a number of optimized operators and novel query optimization processes that allows to efficiently deal with big (as well as small) datasets. Furthermore, as the data processing abstraction is based on UDFs, Apache Wayang lets applications expose semantic properties about their functions, optimization hints (e.g., numbers of iterations), constraints (e.g., physical collocation of operators), and alternative plans. The optimizer then uses those artifacts where available in a best-effort approach.
Apache Wayang provides a set of operators, which applications use to specify their tasks. The key aspect of Apache Wayang is the providing of a flexible operator mapping structure; allowing developers to add, modify, or delete mappings among Wayang and execution operators. As a result, developers can also add or remove Wayang and execution operators.
Apache Wayang exposes a simple Java API to developers whereby they can implement their tasks. Developers focus on the logics of their tasks rather than on low-level details specific to data processing platforms. The figure of the SGD plans above shows the Wayang plan for a scalable gradient descent implementation: we clearly see that this tedious implementation task is now much easier!
Users do not have to know the intricacies of the underlying platforms: they focus on the logic of their application only. This not only speeds up the development of applications, but also it is no longer a "must have" to be an expert in big data infrastructures. Apache Wayang takes care of how and on which data processing platforms fits best for the given tasks to deploy applications.
Apache Wayang has been open source from its very beginnings and will keep being open source until its very endings. Feel free to download, try, and contribute to the project. Help us to make Apache Wayang even better!