Skip to main content

Using Wayang's docker image

This guide provides a brief example for developers that want to utilize docker in order to develop with Wayang.

Step 1: Creating a docker-compose file

This guide assumes knowledge about docker and docker compose. We provide a pre-built docker image that contains the necessary tooling in order to run or develop Apache Wayang. The tools necessary for this are:

  • Java 11
  • Apache Spark
  • Hadoop
  • Maven

A docker-compose.yml containing the following services will suffice to run TPC-H benchmarks:

name: apache-wayang

services:
app:
container_name: apache-wayang-app
image: apache/incubator-wayang:latest
ports:
- 8888:8888
volumes:
- ./:/var/www/html
- ./.m2/repository/:/root/.m2/repository
tty: true
restart: always

tpch:
container_name: apache-wayang-tpch
image: ghcr.io/scalytics/tpch-docker:main
tty: true
volumes:
- ./data/:/data
restart: always

Placing this file in the root directory of Wayang's source will mount volumes containing the application and its dependencies after installation into the app container.

Step 2: Connecting to the app container

In order to get a interactive bash session that allows running commands inside of the app container, run the following:

docker exec -it apache-wayang-app bash

Step 3: Compiling Wayang and running benchmarks

Within the root directory of Wayang (/var/www/html in our container), run the following command to install all packages in Wayang:

mvn clean install -DskipTests

Packaging the project to build the executable:

mvn clean package -pl :wayang-assembly -Pdistribution

Execute your code:

cd wayang-assembly/target/
tar -xvf apache-wayang-assembly-0.7.1-incubating-dist.tar.gz
cd wayang-0.7.1
./bin/wayang-submit org.apache.wayang.<main_class> <parameters>

Optional: Add clusters for additional platforms to your docker setup

name: apache-wayang

services:
app:
container_name: apache-wayang-app
image: apache/incubator-wayang:latest
ports:
- 8888:8888
volumes:
- ./:/var/www/html
- ./.m2/repository/:/root/.m2/repository
tty: true
restart: always

tpch:
container_name: apache-wayang-tpch
image: ghcr.io/scalytics/tpch-docker:main
tty: true
volumes:
- ./data/:/data
restart: always

spark-master:
image: cluster-apache-spark:3.0.2

spark-worker-a:
image: cluster-apache-spark:3.0.2
depends_on:
- spark-master
environment:
- SPARK_MASTER=spark://spark-master:7077
- SPARK_WORKER_CORES=1
- SPARK_WORKER_MEMORY=1G
- SPARK_DRIVER_MEMORY=1G
- SPARK_EXECUTOR_MEMORY=1G
- SPARK_WORKLOAD=worker
- SPARK_LOCAL_IP=spark-worker-a