Blog | Apache Wayang™

Bringing Spatial Data Processing to Apache Wayang

May 19, 2026 · 6 min read

Maximilian Speer

HPI Student

Anton Persitzky

HPI Student

Felix Treykorn

HPI Student

Apache Wayang already enables a large variety of data processing workflows, ranging from basic data retrieval and filtering to complex ML tasks. However, there are additional areas that could benefit from its platform flexibility. As part of a university project, we implemented spatial operators in Apache Wayang. This enables the execution of workflows using geospatial data, including spatial join and filter operations. Since execution times can differ significantly depending on the chosen platform, spatial workflows can benefit greatly from Wayang’s cross-platform capabilities.

Implementation

Workflows including spatial operators are not the main task in Apache Wayang. Therefore, we chose to add spatial support as a plugin that can be enabled separately.

The two new operators we added are SpatialFilter and SpatialJoin, since many spatial workflows primarily rely on these two operations.

To support multiple data sources and geometry formats, we introduced an internal geometry representation, SpatialGeometry, that enables translation between formats. This allows geometries stored as WKT, WKB, and GeoJSON to be read and, if necessary, converted to the type expected by the consuming operator.

For the execution of spatial jobs, we currently support Java, PostgreSQL, and Spark as backends. The Spark implementation uses Apache Sedona, a well-established library for distributed processing of spatial data. The Java implementation is based on JTS, and the PostgreSQL platform uses PostGIS for spatial operations.

WayangContext wayangContext = new WayangContext(configuration)
                .withPlugin(Java.basicPlugin())
                .withPlugin(Postgres.plugin())
                .withPlugin(Spatial.plugin());

final Collection<Long> outputcount = builder
       .readTable(new PostgresTableSource(tableName, "ST_AsText(geom)"))
       .spatialFilter(
               (input -> WayangGeometry.fromStringInput(input.getString(0))),
               SpatialPredicate.INTERSECTS,
               queryGeometry
       ).withSqlGeometryColumnName("geom")
       .withTargetPlatform(Postgres.platform())
       .count()
       .collect();

System.out.println("Spatial Filter Postgres Result Size: " + outputcount);

Why Spatial Is a Plugin, Not a Platform

In the current Wayang architecture, a platform is responsible for runtime ownership: it defines a Platform implementation, provides an executor factory, defines channel conversions, and integrates cost models. This can be seen clearly in modules such as Java (JavaPlatform + JavaExecutor) and PostgreSQL (PostgresPlatform via JdbcPlatformTemplate + JdbcExecutor).

The spatial extension is intentionally different. It does not introduce a new Platform subclass and does not provide its own executor. Instead, wayang-spatial is implemented as a plugin (Spatial.java) that contributes operators, mappings for those operators, and specifies required platforms.

The logical spatial operators (SpatialFilterOperator, SpatialJoinOperator, GeoJsonFileSource) still live in wayang-basic and are included in the Java-Scala API. The spatial plugin extends the system by adding platform-specific execution operators and transformation rules that map Wayang operators to their platform implementations.

At registration time, WayangContext.withPlugin(...) calls plugin.configure(configuration). This whitelists the required platforms and spatial mappings. During optimization, mappings such as SpatialFilterMapping and SpatialJoinMapping rewrite logical operators into execution operators such as:

JavaSpatialFilterOperator / JavaSpatialJoinOperator
SparkSpatialFilterOperator / SparkSpatialJoinOperator
PostgresSpatialFilterOperator / PostgresSpatialJoinOperator

Execution is then handled by the existing platform runtimes:

Java operators are executed by JavaExecutor
Spark operators are executed by SparkExecutor
PostgreSQL operators are executed through the JDBC runtime (JdbcExecutor)

In practice, this is why applications have to register both a platform plugin (for general operators and channels) and a spatial plugin (for spatial mappings), for example Java.basicPlugin() + Spatial.javaPlugin() or Spark.basicPlugin() + Spatial.sparkPlugin().

This design provides three concrete advantages:

No duplicate runtime stack, since executors, channels, and cost models remain centralized in the platforms
Clean multi-platform support for the same spatial semantics
Lower maintenance overhead, because spatial development efforts can focus on operator logic and mappings rather than recreating infrastructure

Spatial support in Wayang is therefore best understood as a cross-platform capability extension: it contributes spatial semantics and translation, while execution remains fully owned by the underlying platforms.

Benchmarks

Below we show scenarios where it is advantageous to be able to freely choose the execution depending on the use case. The following benchmarks are not intended to be exhaustive tests of our new operators but rather to highlight how using the platform independence of Apache Wayang can speed up spatial jobs. We executed the benchmarks on an HPC cluster with reproducible Spark cluster configurations.

Job 1: Spatial Join of parks (~44k) and lakes (~140k) in Germany with spatial predicate CONTAINS, no index on Postgres tables, Spark cluster with 4 nodes
Job 2: Spatial Join of parks (~44k) and lakes (~140k) in Germany with spatial predicate CONTAINS, spatial index on Postgres tables, Spark cluster with 4 nodes
Job 3: Spatial Join of two synthetic datasets containing boxes (100k and 1M) with spatial predicate INTERSECTS, spatial index on Postgres tables, Spark cluster with 8 nodes

The above figure shows that there are use cases in which each of the currently supported platforms performs best. For small datasets, Java performs best if there is no spatial index available for Postgres; otherwise, Postgres outperforms Java. Due to the Spark overhead, Sedona running on a Spark cluster only gains an advantage in jobs with large datasets, as seen in Job 3.

This is also evident in the following figure. While Java and Postgres perform better than Sedona on a Spark cluster for joins on small datasets, this trend reverses for joins of larger datasets. Especially for the join of 100k and 10M boxes (generated using star.cs.ucr.edu), the Spark cluster outperforms the single node execution of Java and Postgres. The poor performance of Sedona/Spark on a cluster with only one node indicates that this advantage is actually coming from the distributed workload and not just from using Sedona instead of JTS or PostGIS operators.

The performance gained by using larger cluster configurations can also be seen in the following visualization of runtime results of box joins with datasets of various sizes.

Other than dataset size, selectivity and subsequently the join result size can also impact execution time significantly. The following chart shows the execution time of joining synthetic box datasets containing 1M boxes each. For each run, one of the datasets contained boxes with decreasing max edge lengths, resulting in higher selectivity for the intersection join. Execution times for Java decreased significantly with decreasing box edge lengths, while execution times for the Spark platform stayed roughly the same.

The benchmark results show multiple scenarios in which platform choice has a significant impact on runtime. Spatial data processing can therefore benefit greatly from using the new spatial operators in Apache Wayang.

Future Work

Our spatial extension enables basic spatial workloads using filter and join operations on the Java, Spark, and Postgres platforms. However, additional operators like nearest neighbor or within-distance operations could enable even more complex scenarios.

Adding platforms specifically designed for spatial data processing could improve performance even further. An interesting candidate for this could be the relatively recently introduced Apache SedonaDB database engine.

Currently, the execution platform has to be chosen manually. Part of potential future work should therefore be the implementation of heuristics for platform selection.

Apache Wayang Graduation

December 11, 2025 · 2 min read

Zoi Kaoudi

(P)PMC Apache Wayang

The Apache Wayang community is proud to announce a major milestone in our journey: Apache Wayang has officially graduated from the Apache Incubator and is now a Top-Level Project (TLP) at the Apache Software Foundation (ASF)!

This graduation rewards the community-driven effort, research, innovation, and collaboration of many years. It reflects our shared commitment to open-source development and the growing importance of cross-platform data processing in the broader data ecosystem nowadays. Graduating as an Apache Top-Level Project is an important recognition of the project’s maturity and ensures that Apache Wayang is backed by the full support and oversight of the ASF, guaranteeing openness, meritocracy, and a sustainable development model for years to come.

What Is Apache Wayang?

Apache Wayang (formerly known as Rheem) is a cross-platform data processing system designed to give developers and organizations a unified way to execute pipelines over multiple data processing engines—such as Apache Spark, Apache Flink, PostgreSQL, and more.

Instead of forcing users to choose a single execution backend, Wayang:

Abstracts data processing engines through a unified API
Can automatically choose the best execution platform based on workload characteristics
Optimizes performance by distributing tasks across engines if necessary
Provides a flexible plugin architecture for integrating new platforms and optimizers

In other words, Wayang enables interoperability across the diverse data landscape and empowers teams to focus on application logic and not infrastructure decisions.

Thank You to Our Community

Apache Wayang wouldn’t exist without its community of developers, users, mentors, and advocates who contributed code, documentation, feedback, and support. A special thanks goes to our Incubator mentors, who helped guide the project through the Apache processes and best practices. Your contributions made this milestone possible, and we’re just getting started.

Get Involved!

Whether you want to try Wayang for your next project, contribute code, write documentation, or share your ideas, your participation is welcome. Here’s how to get started:

Explore the documentation
Join our community mailing lists
Contribute to discussions and development
Try Wayang on your data workloads

Welcome to Apache Wayang — where data platforms work together!

Apache Wayang Release Odysse

September 1, 2024 · 19 min read

Mirko Kämpf

(P)PMC Apache Wayang

Intro

The ASF provides a robust infrastructure for open communities of software developers. We can share ideas, combine forces, contribute code, docs, review-energy, art work, and from time to time we can nail it down. A release defines an intermediate result of the continuous community work.

How we do such a release in the Apache Wayang team is an essential aspect towards graduation. First of all, there are some references to take into account, such as:

Assuming you are (P)PMC, and assuming that you have the right permissions for such a release, you can follow the path as described in this guide:

https://plc4x.apache.org/developers/release/release.html

I tried to follow exactly this procedure, several times. I failed. Here I share the current status of my release attempts.

I plan a longer tour, and do not want to block the project for a long time. Hence I create this draft, and I hope we can unblock this project as soon as possible.

Status:

I am not able to conduct the mvn release:perform step. Anything before worked, sometimes only after some digging, but it worked.

We assume, that due to my membership in two ASF incubator projects I am not able to upload the artefacts to the Nexus repository (H1).
It can be, that I have not the correct user and password in my settings.xml file (H2).

But I tested a manual login to the nexus server https://repository.apache.org/service/local/staging/deploy/maven2 with success. And beyond that I have no idea how I can verify this detail alone.

Idea / Proposal

(1) It would be great, if someone - who has done a release in any other ASF project or in Apache Wayang - could follow the steps I share, so that we can check where the problem hides itself.

(2) As a follow-up task, I suggest to add a Release Guide to the Apache Wayang project, including release manager onboarding steps, and checklists for the particular project, derived from the referenced sources which are listed above.

But for now it is all about sharing the status (as I did serveral times on multiple chanels, including JIRA, Slack, Mailing lists) and finding a solution for Apache Wayang release 1.0.

Latest Error:

mvn release:perform -X -DskipTests

[INFO] Caused by: org.eclipse.aether.deployment.DeploymentException: Failed to deploy artifacts: Could not transfer artifact org.apache.wayang:wayang:pom:1.0.0-RC2 from/to apache.releases.https (https://repository.apache.org/service/local/staging/deploy/maven2): status code: 401, reason phrase: Unauthorized (401)

Activity Log

mvn release:clean
mvn versions:set -DnewVersion=1.0.0-RC2
mvn versions:commit

mvn release:prepare -Darguments='-DskipTests=True'

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.0.1:prepare (default-cli) on project wayang: You don't have a SNAPSHOT project in the reactor projects list. -> [Help 1]

mvn versions:commit
mvn versions:set -DnewVersion=1.0.0-RC2-SNAPSHOT
mvn versions:commit

mvn release:prepare -Darguments='-DskipTests=True'

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.0.1:prepare (default-cli) on project wayang: Cannot prepare the release because you have local modifications :

git status
git add .
git commit -m "prepare for release 1.0.0-RC2-SNAPSHOT"
git push

mvn release:prepare -Darguments='-DskipTests=True -Dresume=False' -DdryRun=true
mvn release:prepare -Darguments='-DskipTests=True -Dresume=False' -XXX

Caused by: org.eclipse.aether.transfer.NoRepositoryConnectorException: Blocked mirror for repositories: [repository.jboss.org (http://repository.jboss.org/nexus/content/groups/public/, default, releases)]

Dependency on JDK-11 during release

FIXED with local JDK11 Setup

brew install openjdk@11

curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"

which java

which java
/Users/kamir/.sdkman/candidates/java/current/bin/java
➜  GITHUB.active export JAVA_HOME=/Users/kamir/.sdkman/candidates/java/current
➜  GITHUB.active mvn clean -XXX
export JAVA_HOME=
sdk install java 11.0.24-amzn
sdk home java 11.0.24-amzn

/usr/libexec/java_home -v 11

jenv add /Library/Java/JavaVirtualMachines/jdk-11.0.15.1.jdk/Contents/Home
jenv global 11.0
jenv shell 11.0
jenv local 11.0
java -version

Manual update of release-version

During the release procedure, do I have to set the version here in this configuration section manually?

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-release-plugin</artifactId>
    <version>3.0.1</version>

    <configuration>
        <autoVersionSubmodules>true</autoVersionSubmodules>
        <autoResolveSnapshots>all</autoResolveSnapshots>
        <releaseProfiles>apache-release</releaseProfiles>
        <!--<pushChanges>false</pushChanges>-->
        <!--<dryRun>true</dryRun>-->
        <releaseVersion>0.7.1</releaseVersion>
        <updateWorkingCopyVersions>true</updateWorkingCopyVersions>
        <updateDependencies>true</updateDependencies>
        <tag>wayang-0.7.1</tag>
        <scmReleaseCommitComment>@{prefix} prepare release 0.7.1</scmReleaseCommitComment>
        <tagNameFormat>apache-@{project.artifactId}-@{project.version}-incubating</tagNameFormat>
        <tagNameFormat>v${project.version}</tagNameFormat>
    </configuration>
</plugin>

It seems that these properties must be updated manually.

Warning regarding "illegal reflective access operation"

[ERROR] WARNING: An illegal reflective access operation has occurred
[ERROR] WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/Users/mkaempf/.m2/repository/org/codehaus/groovy/groovy-all/2.4.9/groovy-all-2.4.9.jar) to method java.lang.Object.finalize()
[ERROR] WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
[ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[ERROR] WARNING: All illegal access operations will be denied in a future release

This is still an OPEN ISSUE !

RAT Check fails

[INFO] [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.13:check (license-check) on project wayang: Too many files with unapproved license: 1 See RAT report in: /Users/mkaempf/GITHUB.private/incubator-wayang/target/rat.txt -> [Help 1]

*****************************************************

Printing headers for text files without a valid license header...
 
=====================================================
== File: .java-version
=====================================================
11.0

FIXED by adding .java-versions to .gitignore

Tag could not be created in SCM.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.0.1:prepare (default-cli) on project wayang: Unable to tag SCM
[ERROR] Provider message:
[ERROR] The git-tag command failed.
[ERROR] Command output:
[ERROR] fatal: tag 'wayang-0.7.1' already exists

FIXED by manual changes in pom.xml.

<scm>
<connection>scm:git:https://gitbox.apache.org/repos/asf/incubator-wayang.git</connection>
<developerConnection>scm:git:https://gitbox.apache.org/repos/asf/incubator-wayang.git</developerConnection>
<url>https://github.com/apache/incubator-wayang</url>
<tag>1.0.0-RC2-SNAPSHOT</tag>
</scm>

<configuration>
<autoVersionSubmodules>true</autoVersionSubmodules>
<autoResolveSnapshots>all</autoResolveSnapshots>
<releaseProfiles>apache-release</releaseProfiles>
<!--<pushChanges>false</pushChanges>-->
<!--<dryRun>true</dryRun>-->
<releaseVersion>1.0.0-RC2-SNAPSHOT</releaseVersion>
<updateWorkingCopyVersions>true</updateWorkingCopyVersions>
<updateDependencies>true</updateDependencies>
<tag>1.0.0-RC2-SNAPSHOT</tag>
<scmReleaseCommitComment>@{prefix} prepare release 1.0.0-RC2-SNAPSHOT</scmReleaseCommitComment>
<tagNameFormat>apache-@{project.artifactId}-@{project.version}-incubating</tagNameFormat>
<tagNameFormat>v${project.version}</tagNameFormat>
</configuration>

mvn release:prepare -Darguments='-DskipTests=True -Dresume=True'

mvn clean package

mvn release:perform -X -DskipTests

- you need your password for the keystore to sign the build artefacts.

So far so good. But now the sun went down.

[INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) on project wayang: ArtifactDeployerException: Failed to deploy artifacts: Could not transfer artifact org.apache.wayang:wayang:pom:1.0.0-RC2 from/to apache.releases.https (https://repository.apache.org/service/local/staging/deploy/maven2): NullPointerException -> [Help 1]
[INFO] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) on project wayang: ArtifactDeployerException
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:333)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
[INFO]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
[INFO]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
[INFO]     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
[INFO]     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
[INFO]     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
[INFO]     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
[INFO]     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
[INFO]     at java.lang.reflect.Method.invoke (Method.java:566)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[INFO] Caused by: org.apache.maven.plugin.MojoExecutionException: ArtifactDeployerException
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.deployProject (DeployMojo.java:201)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.execute (DeployMojo.java:159)
[INFO]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
[INFO]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
[INFO]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
[INFO]     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
[INFO]     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
[INFO]     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
[INFO]     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
[INFO]     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
[INFO]     at java.lang.reflect.Method.invoke (Method.java:566)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[INFO] Caused by: org.apache.maven.shared.transfer.artifact.deploy.ArtifactDeployerException: Failed to deploy artifacts: Could not transfer artifact org.apache.wayang:wayang:pom:1.0.0-RC2 from/to apache.releases.https (https://repository.apache.org/service/local/staging/deploy/maven2): NullPointerException
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.Maven31ArtifactDeployer.deploy (Maven31ArtifactDeployer.java:126)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.DefaultArtifactDeployer.deploy (DefaultArtifactDeployer.java:79)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:190)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:134)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.deployProject (DeployMojo.java:193)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.execute (DeployMojo.java:159)
[INFO]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
[INFO]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
[INFO]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
[INFO]     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
[INFO]     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
[INFO]     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
[INFO]     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
[INFO]     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
[INFO]     at java.lang.reflect.Method.invoke (Method.java:566)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[INFO] Caused by: org.eclipse.aether.deployment.DeploymentException: Failed to deploy artifacts: Could not transfer artifact org.apache.wayang:wayang:pom:1.0.0-RC2 from/to apache.releases.https (https://repository.apache.org/service/local/staging/deploy/maven2): NullPointerException
[INFO]     at org.eclipse.aether.internal.impl.DefaultDeployer.deploy (DefaultDeployer.java:278)
[INFO]     at org.eclipse.aether.internal.impl.DefaultDeployer.deploy (DefaultDeployer.java:202)
[INFO]     at org.eclipse.aether.internal.impl.DefaultRepositorySystem.deploy (DefaultRepositorySystem.java:393)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.Maven31ArtifactDeployer.deploy (Maven31ArtifactDeployer.java:122)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.DefaultArtifactDeployer.deploy (DefaultArtifactDeployer.java:79)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:190)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:134)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.deployProject (DeployMojo.java:193)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.execute (DeployMojo.java:159)
[INFO]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
[INFO]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
[INFO]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
[INFO]     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
[INFO]     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
[INFO]     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
[INFO]     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
[INFO]     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
[INFO]     at java.lang.reflect.Method.invoke (Method.java:566)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[INFO] Caused by: org.eclipse.aether.transfer.ArtifactTransferException: Could not transfer artifact org.apache.wayang:wayang:pom:1.0.0-RC2 from/to apache.releases.https (https://repository.apache.org/service/local/staging/deploy/maven2): NullPointerException
[INFO]     at org.eclipse.aether.connector.basic.ArtifactTransportListener.transferFailed (ArtifactTransportListener.java:44)
[INFO]     at org.eclipse.aether.connector.basic.BasicRepositoryConnector$TaskRunner.run (BasicRepositoryConnector.java:417)
[INFO]     at org.eclipse.aether.connector.basic.BasicRepositoryConnector.put (BasicRepositoryConnector.java:297)
[INFO]     at org.eclipse.aether.internal.impl.DefaultDeployer.deploy (DefaultDeployer.java:271)
[INFO]     at org.eclipse.aether.internal.impl.DefaultDeployer.deploy (DefaultDeployer.java:202)
[INFO]     at org.eclipse.aether.internal.impl.DefaultRepositorySystem.deploy (DefaultRepositorySystem.java:393)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.Maven31ArtifactDeployer.deploy (Maven31ArtifactDeployer.java:122)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.DefaultArtifactDeployer.deploy (DefaultArtifactDeployer.java:79)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:190)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:134)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.deployProject (DeployMojo.java:193)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.execute (DeployMojo.java:159)
[INFO]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
[INFO]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
[INFO]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
[INFO]     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
[INFO]     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
[INFO]     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
[INFO]     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
[INFO]     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
[INFO]     at java.lang.reflect.Method.invoke (Method.java:566)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[INFO] Caused by: java.lang.NullPointerException
[INFO]     at java.util.concurrent.ConcurrentHashMap.putVal (ConcurrentHashMap.java:1011)
[INFO]     at java.util.concurrent.ConcurrentHashMap.put (ConcurrentHashMap.java:1006)
[INFO]     at org.apache.http.impl.client.BasicCredentialsProvider.setCredentials (BasicCredentialsProvider.java:62)
[INFO]     at org.eclipse.aether.transport.http.DeferredCredentialsProvider.getCredentials (DeferredCredentialsProvider.java:67)
[INFO]     at org.apache.http.client.protocol.RequestAuthCache.doPreemptiveAuth (RequestAuthCache.java:135)
[INFO]     at org.apache.http.client.protocol.RequestAuthCache.process (RequestAuthCache.java:110)
[INFO]     at org.apache.http.protocol.ImmutableHttpProcessor.process (ImmutableHttpProcessor.java:133)
[INFO]     at org.apache.http.impl.execchain.ProtocolExec.execute (ProtocolExec.java:184)
[INFO]     at org.apache.http.impl.execchain.RetryExec.execute (RetryExec.java:89)
[INFO]     at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute (ServiceUnavailableRetryExec.java:85)
[INFO]     at org.apache.http.impl.execchain.RedirectExec.execute (RedirectExec.java:110)
[INFO]     at org.apache.http.impl.client.InternalHttpClient.doExecute (InternalHttpClient.java:185)
[INFO]     at org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:72)
[INFO]     at org.eclipse.aether.transport.http.HttpTransporter.execute (HttpTransporter.java:485)
[INFO]     at org.eclipse.aether.transport.http.HttpTransporter.implPut (HttpTransporter.java:469)
[INFO]     at org.eclipse.aether.spi.connector.transport.AbstractTransporter.put (AbstractTransporter.java:107)
[INFO]     at org.eclipse.aether.connector.basic.BasicRepositoryConnector$PutTaskRunner.runTask (BasicRepositoryConnector.java:564)
[INFO]     at org.eclipse.aether.connector.basic.BasicRepositoryConnector$TaskRunner.run (BasicRepositoryConnector.java:414)
[INFO]     at org.eclipse.aether.connector.basic.BasicRepositoryConnector.put (BasicRepositoryConnector.java:297)
[INFO]     at org.eclipse.aether.internal.impl.DefaultDeployer.deploy (DefaultDeployer.java:271)
[INFO]     at org.eclipse.aether.internal.impl.DefaultDeployer.deploy (DefaultDeployer.java:202)
[INFO]     at org.eclipse.aether.internal.impl.DefaultRepositorySystem.deploy (DefaultRepositorySystem.java:393)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.Maven31ArtifactDeployer.deploy (Maven31ArtifactDeployer.java:122)
[INFO]     at org.apache.maven.shared.transfer.artifact.deploy.internal.DefaultArtifactDeployer.deploy (DefaultArtifactDeployer.java:79)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:190)
[INFO]     at org.apache.maven.shared.transfer.project.deploy.internal.DefaultProjectDeployer.deploy (DefaultProjectDeployer.java:134)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.deployProject (DeployMojo.java:193)
[INFO]     at org.apache.maven.plugins.deploy.DeployMojo.execute (DeployMojo.java:159)
[INFO]     at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
[INFO]     at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
[INFO]     at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
[INFO]     at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
[INFO]     at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
[INFO]     at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
[INFO]     at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
[INFO]     at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
[INFO]     at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
[INFO]     at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
[INFO]     at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
[INFO]     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
[INFO]     at java.lang.reflect.Method.invoke (Method.java:566)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
[INFO]     at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[INFO] [ERROR]
[INFO] [ERROR]
[INFO] [ERROR] For more information about the errors and possible solutions, please read the following articles:
[INFO] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[INFO] [DEBUG] Shutting down adapter factory; available factories [file-lock, rwlock-local, semaphore-local, noop]; available name mappers [discriminating, file-gav, file-hgav, file-static, gav, static]
[INFO] [DEBUG] Shutting down 'file-lock' factory
[INFO] [DEBUG] Shutting down 'rwlock-local' factory
[INFO] [DEBUG] Shutting down 'semaphore-local' factory
[INFO] [DEBUG] Shutting down 'noop' factory
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Wayang (incubating) 1.0.0-RC3-SNAPSHOT:
[INFO]
[INFO] Apache Wayang (incubating) ......................... FAILURE [ 23.626 s]
[INFO] Wayang Commons ..................................... SKIPPED
[INFO] wayang-utils-profile-db ............................ SKIPPED
[INFO] Wayang Core ........................................ SKIPPED
[INFO] Wayang Basic ....................................... SKIPPED
[INFO] Wayang Platform .................................... SKIPPED
[INFO] Wayang Platform Java ............................... SKIPPED
[INFO] Wayang Platform Spark .............................. SKIPPED
[INFO] Wayang Platform JDBC Template ...................... SKIPPED
[INFO] Wayang Platform Postgres ........................... SKIPPED
[INFO] Wayang Platform SQLite3 ............................ SKIPPED
[INFO] Wayang Platform Giraph ............................. SKIPPED
[INFO] Wayang Platform Apache Flink ....................... SKIPPED
[INFO] Wayang Platform Generic Jdbc ....................... SKIPPED
[INFO] Wayang API ......................................... SKIPPED
[INFO] Wayang API Scala-Java .............................. SKIPPED
[INFO] Wayang Integration Test ............................ SKIPPED
[INFO] Wayang API Python .................................. SKIPPED
[INFO] wayang-api-sql ..................................... SKIPPED
[INFO] Wayang Profiler .................................... SKIPPED
[INFO] Wayang Extensions .................................. SKIPPED
[INFO] wayang-iejoin ...................................... SKIPPED
[INFO] Wayang - Common resources .......................... SKIPPED
[INFO] wayang-benchmark ................................... SKIPPED
[INFO] Wayang ML4all ...................................... SKIPPED
[INFO] Wayang Project Assembly ............................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  24.093 s
[INFO] Finished at: 2024-06-25T10:53:32+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.0.1:perform (default-cli) on project wayang: Maven execution failed, exit code: 1 -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-release-plugin:3.0.1:perform (default-cli) on project wayang: Maven execution failed, exit code: 1
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:333)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
Caused by: org.apache.maven.plugin.MojoExecutionException: Maven execution failed, exit code: 1
at org.apache.maven.plugins.release.PerformReleaseMojo.execute (PerformReleaseMojo.java:198)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
Caused by: org.apache.maven.shared.release.ReleaseExecutionException: Maven execution failed, exit code: 1
at org.apache.maven.shared.release.phase.AbstractRunGoalsPhase.execute (AbstractRunGoalsPhase.java:115)
at org.apache.maven.shared.release.phase.RunPerformGoalsPhase.runLogic (RunPerformGoalsPhase.java:127)
at org.apache.maven.shared.release.phase.RunPerformGoalsPhase.execute (RunPerformGoalsPhase.java:59)
at org.apache.maven.shared.release.DefaultReleaseManager.perform (DefaultReleaseManager.java:325)
at org.apache.maven.shared.release.DefaultReleaseManager.perform (DefaultReleaseManager.java:268)
at org.apache.maven.plugins.release.PerformReleaseMojo.execute (PerformReleaseMojo.java:196)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
Caused by: org.apache.maven.shared.release.exec.MavenExecutorException: Maven execution failed, exit code: 1
at org.apache.maven.shared.release.exec.InvokerMavenExecutor.executeGoals (InvokerMavenExecutor.java:129)
at org.apache.maven.shared.release.exec.AbstractMavenExecutor.executeGoals (AbstractMavenExecutor.java:70)
at org.apache.maven.shared.release.phase.AbstractRunGoalsPhase.execute (AbstractRunGoalsPhase.java:105)
at org.apache.maven.shared.release.phase.RunPerformGoalsPhase.runLogic (RunPerformGoalsPhase.java:127)
at org.apache.maven.shared.release.phase.RunPerformGoalsPhase.execute (RunPerformGoalsPhase.java:59)
at org.apache.maven.shared.release.DefaultReleaseManager.perform (DefaultReleaseManager.java:325)
at org.apache.maven.shared.release.DefaultReleaseManager.perform (DefaultReleaseManager.java:268)
at org.apache.maven.plugins.release.PerformReleaseMojo.execute (PerformReleaseMojo.java:196)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[DEBUG] Shutting down adapter factory; available factories [file-lock, rwlock-local, semaphore-local, noop]; available name mappers [discriminating, file-gav, file-hgav, file-static, gav, static]
[DEBUG] Shutting down 'file-lock' factory
[DEBUG] Shutting down 'rwlock-local' factory
[DEBUG] Shutting down 'semaphore-local' factory
[DEBUG] Shutting down 'noop' factory

The null pointer exception indicates problems in settings.xml. This could be fixed. But now, the error changes with and I am still not able to stage the build results.

mvn release:perform -X -DskipTests

[INFO] Caused by: org.eclipse.aether.deployment.DeploymentException: Failed to deploy artifacts: Could not transfer artifact org.apache.wayang:wayang:pom:1.0.0-RC2 from/to apache.releases.https (https://repository.apache.org/service/local/staging/deploy/maven2): status code: 401, reason phrase: Unauthorized (401)

Checklist for next iteration:

The error you're encountering indicates that there is a problem with deploying the artifacts using the maven-deploy-plugin. The root cause of the error is a NullPointerException during the deployment process, specifically related to the org.eclipse.aether.transfer.ArtifactTransferException.

Here's a step-by-step guide to troubleshoot and resolve this issue:

1. Check Maven Settings

Ensure that your Maven settings (settings.xml) are correctly configured for deployment. Verify that the repository settings and credentials are correctly specified.

[DONE]

2. Verify Repository URL

Make sure the repository URL in your pom.xml or settings.xml is correct and reachable. The URL should point to the correct staging repository for deployment.

[OPEN] - I did not touch it, so I guess it is correct in the pom.xml.

3. Maven Version Compatibility

Verify that you are using a compatible version of Maven. Sometimes, upgrading or downgrading Maven can resolve such issues.

Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven home: /usr/local/Cellar/maven/3.9.6/libexec
Java version: 11.0.15.1, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk-11.0.15.1.jdk/Contents/Home
Default locale: en_DE, platform encoding: US-ASCII
OS name: "mac os x", version: "13.2.1", arch: "x86_64", family: "mac"

[DONE]

4. Check for Network Issues

Ensure there are no network issues that might be causing problems in connecting to the repository. Sometimes, network configurations, firewalls, or proxy settings can interfere with the deployment process.

[DONE]

5. Update Maven Plugins

Ensure that you are using the latest versions of the Maven plugins. Sometimes, bugs in older versions can cause unexpected issues.

[OPEN]

6. Configure the `maven-deploy-plugin` in `pom.xml`

Make sure the maven-deploy-plugin is correctly configured in your pom.xml. Here's an example configuration:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-deploy-plugin</artifactId>
    <version>3.0.0-M1</version>
    <configuration>
        <repositoryId>apache.releases.https</repositoryId>
        <url>https://repository.apache.org/service/local/staging/deploy/maven2</url>
    </configuration>
</plugin>

7. Increase Verbose Logging

Enable verbose logging to get more details about the error. You can do this by adding the -X flag when running the Maven command:

mvn clean deploy -X

[DONE]

8. Retry with a Clean Local Repository

Sometimes, a corrupt local repository can cause issues. Try cleaning your local Maven repository and re-running the deployment:

mvn clean install -U
mvn deploy

[OPEN] - Does not match to an authentication issue.

9. Check for Missing Credentials

Ensure that the credentials for the repository are correctly set up in your settings.xml:

<servers>
    <server>
        <id>apache.releases.https</id>
        <username>your-username</username>
        <password>your-password</password>
    </server>
</servers>

[DONE]

10. Review the Full Stack Trace

The full stack trace indicates a NullPointerException:

Caused by: java.lang.NullPointerException
    at java.util.concurrent.ConcurrentHashMap.putVal (ConcurrentHashMap.java:1011)
    at java.util.concurrent.ConcurrentHashMap.put (ConcurrentHashMap.java:1006)
    at org.apache.http.impl.client.BasicCredentialsProvider.setCredentials (BasicCredentialsProvider.java:62)
    at org.eclipse.aether.transport.http.DeferredCredentialsProvider.getCredentials (DeferredCredentialsProvider.java:67)
    at org.apache.http.client.protocol.RequestAuthCache.doPreemptiveAuth (RequestAuthCache.java:135)
    at org.apache.http.client.protocol.RequestAuthCache.process (RequestAuthCache.java:110)
    at org.apache.http.protocol.ImmutableHttpProcessor.process (ImmutableHttpProcessor.java:133)
    at org.apache.http.impl.execchain.ProtocolExec.execute (ProtocolExec.java:184)
    ...

This suggests that there might be an issue with how credentials are being handled. Double-check that the credentials are being correctly passed and processed.

[DONE]

Integrating ML platforms in Wayang

May 7, 2024 · 4 min read

Zoi Kaoudi

(P)PMC Apache Wayang

We are happy to announce that we have extended Wayang to be able to utilize any ML platform and any ML operators. Thanks to the extensible nature of Wayang, the only core changes we had to do were introducing the concept of a Model and implement a new driver for the newly added platform.

Step 1: Introducing a Model

With respect to the model, we followed Wayang’s abstraction philosophy: We created a Model interface to be used as input or output by Wayang operators and then extended it for the platform-specific operators. Different model interfaces can be found here:

https://github.com/apache/incubator-wayang/tree/main/wayang-commons/wayang-basic/src/main/java/org/apache/wayang/basic/model

A platform-specific model needs to be instantiated to be used as the output of a training operator and as input for an inference operator. You can see an example of the SparkMLModel here:

https://github.com/apache/incubator-wayang/tree/main/wayang-platforms/wayang-spark/src/main/java/org/apache/wayang/spark/model/SparkMLModel.java

Step 2: Introducing Training Operators

We added the desired Wayang (platform-agnostic) training operators which are binary to unary operators, taking as input the X and y values and outputting a Model. You can find an example of a LinearRegressionOperator here:

https://github.com/apache/incubator-wayang/blob/main/wayang-commons/wayang-basic/src/main/java/org/apache/wayang/basic/operators/LinearRegressionOperator.java

Platform-specific execution operators, such as SparkLinearRegressionOperator, can be easily added as any other execution operator: extending the corresponding Wayang operator and providing the mappings from the Wayang to the execution operator. See, for example, the SparkLinearRegressionOperator:

https://github.com/apache/incubator-wayang/tree/main/wayang-platforms/wayang-spark/src/main/java/org/apache/wayang/spark/operators/ml/SparkLinearRegressionOperator.java

Step 3: Introducing Prediction Operators

Additionally, we created a PredictOperator, a BinaryToUnary Wayang (platform-agnostic) operator which takes as input the data quanta and a model and outputs the data quanta with the predictions output by the model.

https://github.com/apache/incubator-wayang/tree/main/wayang-commons/wayang-basic/src/main/java/org/apache/wayang/basic/operators/PredictOperator.java

Then, a concrete platform-specific operator extends from the abstract one. See the SparkPredictOperator for an example:

https://github.com/apache/incubator-wayang/tree/main/wayang-platforms/wayang-spark/src/main/java/org/apache/wayang/spark/operators/ml/SparkPredictOperator.java

Deep Learning Models

Unlike traditional machine learning models, the definition of deep learning models is more flexible. Users can combine different blocks (e.g., fully connected blocks, convolutional blocks) to build their desired models. The whole model can be represented as a graph on which the vertices represent blocks and the edges represent connections between blocks. In this case, we built a DLModel class that implements the Model interface, which contains a user-defined, platform-agnostic graph of the model:

https://github.com/apache/incubator-wayang/tree/main/wayang-commons/wayang-basic/src/main/java/org/apache/wayang/basic/model/DLModel.java

For training, we implemented the platform-agnostic DLModelTrainingOperator Wayang operator:

https://github.com/apache/incubator-wayang/tree/main/wayang-commons/wayang-basic/src/main/java/org/apache/wayang/basic/operators/DLTrainingOperator.java

New ML platform -- Tensorflow Integration

We have added Tensorflow as a new platform by creating a new module (wayang-tensorflow) inside the wayang-platforms parent module and implementing a Tensorflow driver. The TensorflowExecutor driver is responsible for creating and destroying Tensorflow resources, such as a model graph and a model parameter context. When a training task scheduled on Tensorflow, it will be mapped to TensorflowDLModelTrainingOperator. In this process, the DLModel will be converted to TensorflowModel, which means that the user-defined model graph will be converted to a Tensorflow model graph. Likewise, for inference, the PredictOperator will be mapped to TensorflowPredictOperator. All the code for the tensorflow platform can be found here:

https://github.com/apache/incubator-wayang/tree/main/wayang-platforms/wayang-tensorflow/src/main/java/org/apache/wayang/tensorflow

Acknowledgement

The source code for the support of ML operators and the Tensorflow integration has been contributed by Mingxi Liu.

Follow Wayang

Apache Wayang is in incubation phase and has a potential roadmap of implementations coming soon (including the federated learning aspect as well as an SQL interface and a novel data debugging functionality). If you want to hear or join the community, consult the link https://wayang.apache.org/community/ , join the mailing lists, contribute with new ideas, write documentation, or fix bugs.

Wayang and the Federated AI

April 17, 2024 · 3 min read

Gláucia Esppenchutz

(P)PMC Apache Wayang

AI systems and applications are widely used nowadays, from assisting grammar spellings to detecting early signs of cancer cells. Building an AI requires a lot of data and training to achieve the desired results, and federated learning is an approach to make AI training more viable. Federated learning (or collaborative learning) is a technique that trains AI models on data distributed across multiple serves or devices. It does so without centralizing data on a single place or storage. It also prevents the possibility of data breaches and protects sensitive personal data. One of the significant challenges in working with AI is the variety of tools found in the market or the open-source community. Each tool provides results in a different form; integrating them can be pretty challenging. Let's talk about Apache Wayang and how it can help to solve this problem.

Apache Wayang in the Federated AI world

Apache Wayang (Wayang, for short), an Apache Software Foundation top-level project, integrates big data platforms and tools by removing the complexity of worrying about low-level details. Interestingly, even if it was not designed for, Wayang could also serve as a scalable platform for federated learning: the Wayang community is starting to work on integrating federated learning capabilities. In a federated learning approach, Wayang would allow different local models to be built and exchange its model results across other data centers to combine them into a single enhanced model.

A real-world example

Let's consider a real-world scenario. Hospitals and health organizations have increased their investments in machine/deep learning initiatives to learn more and predict diagnostics. However, due to legal frameworks, sharing patients' information or diagnostics is impossible, and the solution would be to apply federated learning. To solve this problem, we could use Wayang to help to train the models. See the diagram 1 below:

As a first step, the data scientists would send an ML task to Wayang, which will work as an abstraction layer to connect to different data processing platforms, sparing the time to build integration code for each. Then, the data platforms process and generate the results that will be sent back to Wayang. Wayang aggregates the results into one "global result" and sends it back to the requestor as a next step.

The process repeats until the desired results are achieved. Although it is very much like a Federated learning pipeline, Wayang removes a considerable layer of complexity from the developers by integrating with diverse types of data platforms. It also brings fast development and reduces the need for a deep understanding of data infrastructure or integrations. Developers can focus on the logic and how to execute tasks instead of details about data processors.

Follow Wayang

Apache Wayang is a top-level ASF project with a potential roadmap of implementations coming soon (including the federated learning aspect as well as an SQL interface and a novel data debugging functionality). If you want to hear or join the community, consult the link https://wayang.apache.org/community/ , join the mailing lists, contribute with new ideas, write documentation, or fix bugs.

Thank you!

I (Gláucia) want to thank professor Jorge Quiané for the guidance to write this blog post. Thanks for incentivate me to join the project and for the knowledge shared. I will always remember you.

Pywayang - Apache Wayang's Python API

April 9, 2024 · 4 min read

Juri Petersen

Apache Committer

In the vast landscape of data processing, efficiency and flexibility are important. However, navigating through a multitude of tools and languages often is a major inconvenience. Apache Wayang's upcoming Python API will allow you to seamlessly orchestrate data processing tasks without ever leaving the comfort of Python, irrespective of the underlying framework written in Java.

Expanding Apache Wayang's APIs

Apache Wayang's architecture decouples the process of planning from the resulting execution, allowing users to specify platform agnostic plans through the provided APIs.

Python's popularity and convenience for data processing workloads makes it an obvious candidate for a desired API. Previous APIs, such as the Scala API wayang-api-scala-java benefited from the interoperability of Java and Scala that allows to reuse objects from other languages to provide new interfaces. Accessing JVM objects in Python is possible through several libraries, but in doing so, future APIs in other programming languages would need similar libraries and implementations in order to exist. As a contrast to that, providing an API within Apache Wayang that receives input plans from any source and executes them within allows to create plans and submit them in any programming language. The following figure shows the architecture of pywayang:

The Python API allows users to specify WayangPlans with UDFs in Python. pywayang then serializes the UDFs and constructs the WayangPlan in JSON format, preparing it to be sent to Apache Wayang's JSON API. When receiving a valid JSON plan, the JSON API uses the optimizer to construct an execution plan. However, since UDFs are defined in Python and thus need to be executed in Python as well, an operators function needs to be wrapped into a WrappedPythonFunction:

val mapOperator = new MapPartitionsOperator[Input, Output](
  new MapPartitionsDescriptor[Input, Output](
    new WrappedPythonFunction[Input, Output](
      ByteString.copyFromUtf8(udf)
    ),
    classOf[Input],
    classOf[Output],
  )
)

This wrapped functional descriptor allows to handle execution of UDFs in Python through a socket connection with the pywayang worker. Input data is sourced from the platform chosen by the optimizer and Apache Wayang handles routing the output data to the next operator.

A new API in any programming languages would have to specify two things:

A way to create plans that conform to a JSON format specified in the Wayang JSON API.
A worker that handles encoding and decoding of user defined functions (UDFs), as they need to be executed on iterables in their respective language. After that, the API can be added as a module in Wayang, so that operators will be wrapped and UDFs can be executed in the desired programming language.

Apache Kafka meets Wayang - Part 3

March 10, 2024 · 6 min read

Mirko Kämpf

(P)PMC Apache Wayang

The third part of this article series is an activity log. Motivated by the learnings from last time, I stated implementing a Kafka Source component and a Kafka Sink component for the Apache Spark platform in Apache Wayang. In our previous article we shared the results of the work on the frist Apache Kafka integration using the Java Platform.

Let's see how it goes this time with Apache Spark.

The goal of this implementation

We want to process data from Apache Kafka topics, which are hosted on Confluent cloud. In our example scenario, data is available in multiple different clusters, in different regions and owned by different organizations. Each organization uses the "stream sharing" feature provided by Confluent cloud.

This way, the operator of our central processing job has been granted appropriate permissions. The plaftorm provided the necessary configuration properties, including access coordinates and credentials in the name of the topic owner to us.

The following illustration has already been introduced in part one of the blog series, but for clarity we repeat it here.

images/image-1.png

Today, we focus on Job 4 in the image. We implement a program which uses data federation based on multiple sources. Each source allows us to read the data from that particular topic so that we can process it in a different governance context. In this example it is a public processing context, in which data from multiple private processing contexts are used together.

This use case is already prepared for high processing loads We can utilize the scalability capabilities of Apache Spark or simply use a Java program for initial tests of the solution. Switching between both is done in one line of code in Apache Wayang.

Again, we start with a WayangContext, as shown by examples in the Wayang code repository.

WayangContext wayangContext = new WayangContext().with(Spark.basicPlugin());

We simply switched the backend system towards Apache Spark by using the WayangContext with Spark.basicPlugin(). The JavaPlanBuilder and all other logic of our example job won't be touched.

In order to make this working we will now implement the Mappings and the Operators for the Apache Spark platform module.

Implementation of Input- and Output Operators

We reuse the Kafka Source and Kafka Sink components which have been created for the JavaKafkaSource and JavaKafkaSink. Hence we work with Wayang's Java API.

Level 1 – Wayang execution plan with abstract operators

Since the JavaPlanBuilder already exposes the function for selecting a Kafka topic as source and the DataQuantaBuilder class exposes the writeKafkaTopic function we can move on quickly.

Remember, in this API layer we use the Scala programming language, but we utilize the Java classes, implemented in the layer below.

Level 2 – Wiring between Platform Abstraction and Implementation

As in the case with the Java Platform, in the second layer we build a bridge between the WayangContext and the PlanBuilders, which work together with DataQuanta and the DataQuantaBuilder.

We must provide the mapping between the abstract components and the specific implementations in this layer.

Therefore, the mappings package in project wayang-platforms/wayang-spark has a class Mappings in which our KafkaTopicSinkMapping and KafkaTopicSourceMapping will be registered.

Again, these classes allow the Apache Wayang framework to use the Java implementation of the KafkaTopicSource component (and KafkaTopicSink respectively).

While the Wayang execution plan uses the higher abstractions, here on the “platform level” we have to link the specific implementation for the target platform. In this case this leads to an Apache Spark job, running on a Spark cluster which is set up by the Apache Wayang framework using the logical components of the execution plan, and the Apache Spark configuration provided at runtime.

A mapping links an operator implementation to the abstraction used in an execution plan. We define two new mappings for our purpose, namely KafkaTopicSourceMapping, and KafkaTopicSinkMapping, both could be reused from last round.

For the Spark platform we simply replace the occurences of JavaPlatform with SparkPlatform.

Furthermore, we create an implementation of the SparkKafkaTopicSource and SparkKafkaTopicSink.

Layer 3 – Input/Output Connector Layer

Let's quickly recap, how does Apache Spark interacts with Apache Kafka?

There is already an integration which gives us a DataSet using the Spark SQL framework. For Spark Streaming, there is also a Kafka integration using the SparkSession's readStream() function. Kafka client properties are provided as key value pairs k and v by using the option( k, v ) function. For writing into a topic, we can use the writeStream() function. But from a first look, it seems to be not the best fit.

Another approach is possible. We can use simple RDDs to process data previously consumed from Apache Kafka. This is a more low-level approach compared to using Datasets with Spark Structured Streaming, and it typically involves using the Kafka RDD API provided by Spark.

This approach is less common with newer versions of Spark, as Structured Streaming provides a higher-level abstraction that simplifies stream processing. However, we might need that approach for the integration with Apache Wayang.

For now, we will focus on the lower level approach and plan to consume data from Kafka using a Kafka client, and then we parallelize the records in an RDD.

This allows us to reuse KafkaTopicSource and KafkaTopicSink classes we built last time. Those were made specifically for a simple non parallel Java program, using one Consumer and one Producer.

The selected approach does not yet fully take advantage from Spark's parallelism at load time. For higher loads and especially for streaming processing we would have to investigate another approache, using a SparkStreamingContext, but this is out of scope for now.

Since we can't reuse the JavaKafkaTopicSource and JavaKafkaTopicSink we rather implement SparkKafkaTopicSource and SparkKafkaTopicSink based on given SparkTextFileSource and SparkTextFileSink which both cary all needed RDD specific logic.

Summary

As expected, the integration of Apache Spark with Apache Wayang was no magic, thanks to a fluent API design and a well structured architecture of Apache Wayang. We could easily follow the pattern we have worked out in the previous exercise.

But a bunch of much more interesting work will follow next. More testing, more serialization schemes, and Kafka Schema Registry support should follow, and full parallelization as well.

The code has been submitted to the Apache Wayang repository.

Outlook

The next part of the article series will cover the real world example as described in image 1. We will show how analysts and developers can use the Apache Kafka integration for Apache Wayang to solve cross organizational collaboration issues. Therefore, we will bring all puzzles together, and show the full implementation of the multi organizational data collaboration use case.

Apache Wayang vs. Presto/Trino

March 8, 2024 · 3 min read

Zoi Kaoudi

(P)PMC Apache Wayang

We have been asked several times about the difference between Apache Wayang and Presto/Trino. In this blog post, we will clarify the main differences and how they impact various applications and use cases.

Apache Kafka meets Wayang - Part 2

March 6, 2024 · 6 min read

Mirko Kämpf

(P)PMC Apache Wayang

In the second part of the article series we describe the implementation of the Kafka Source and Kafka Sink component for Apache Wayang. We look into the “Read- and Write-Path” for our data items, called DataQuanta.

Apache Wayang’s Read & Write Path for Kafka topics

To describe the read and write paths for data in the context of the created Apache Wayang code snippet, the primary classes and interfaces we need to understand are as follows:

WayangContext: This class is essential for initializing the Wayang processing environment. It allows you to configure the execution environment and register plugins that define which platforms Wayang can use for data processing tasks, such as Java.basicPlugin() for local Java execution.

JavaPlanBuilder: This class is used to build and define the data processing pipeline (or plan) in Wayang. It provides a fluent API to specify the operations to be performed on the data, from reading the input to processing it and writing the output.

Read Path

The read path describes how data is ingested from a source into the Wayang processing pipeline:

Reading from Kafka Topic: The method readKafkaTopic(topicName) is used to ingest data from a specified Kafka topic. This is the starting point of the data processing pipeline, where topicName represents the name of the Kafka topic from which data is read.

Data Tokenization and Preparation: Once the data is read from Kafka, it undergoes several transformations such as Splitting, Filtering, and Mapping. What follows are the procedures known as Reducing, Grouping, Co-Grouping, and Counting.

Write Path

Writing to Kafka Topic: The final step in the pipeline involves writing the processed data back to a Kafka topic using .writeKafkaTopic(...). This method takes parameters that specify the target Kafka topic, a serialization function to format the data as strings, and additional configuration for load profile estimation, which optimizes the writing process.

This read-write path provides a comprehensive flow of data from ingestion from Kafka, through various processing steps, and finally back to Kafka, showcasing a full cycle of data processing within Apache Wayang's abstracted environment and is implemented in our example program shown in listing 1.

Implementation of Input- and Output Operators

The next section shows how a new pair of operators can be implemented to extend Apache Wayang’s capabilities on the input and output side. We created the Kafka Source and Kafka Sink components so that our cross organizational data collaboration scenario can be implemented using data streaming infrastructure.

Level 1 – Wayang execution plan with abstract operators

The implementation of our Kafka Source and Kafka Sink components for Apache Wayang requires new methods and classes on three layers. First of all in the API package. Here we use the JavaPlanBuilder to expose the function for selecting a Kafka topic as the source to be used by client. The class JavaPlanBuilder in package org.apache.wayang.api in the project wayang-api/wayang-api-scala-java exposes our new functionality to our external client. An instance of the JavaPlanBuilder is used to define the data processing pipeline. We use its readKafkaTopic() which specifies the source Kafka topic to read from, and for the write path we use the writeKafkaTopic() method. Both Methods do only trigger activities in the background.

For the output side, we use the DataQuantaBuilder class, which offers an implementation of the writeKafkaTopic function. This function is designed to send processed data, referred to as DataQuanta, to a specified Kafka topic. Essentially, it marks the final step in a data processing sequence constructed using the Apache Wayang framework.

In the DataQuanta class we implemented the methods writeKafkaTopic and writeKafkaTopicJava which use the KafkaTopicSink class. In this API layer we use the Scala programming language, but we utilize the Java classes, implemented in the layer below.

Level 2 – Wiring between Platform Abstraction and Implementation

The second layer builds the bridge between the WayangContext and PlanBuilders which work together with DataQuanta and the DataQuantaBuilder.

Also, the mapping between the abstract components and the specific implementations are defined in this layer.

Therefore, the mappings package has a class Mappings in which all relevant input and output operators are listed. We use it to register the KafkaSourceMapping and a KafkaSinkMapping for the particular platform, Java in our case. These classes allow the Apache Wayang framework to use the Java implementation of the KafkaTopicSource component (and KafkaTopicSink respectively). While the Wayang execution plan uses the higher abstractions, here on the “platform level” we have to link the specific implementation for the target platform. In our case this leads to a Java program running on a JVM which is set up by the Apache Wayang framework using the logical components of the execution plan.

Those mappings link the real implementation of our operators the ones used in an execution plan. The JavaKafkaTopicSource and the JavaKafkaTopicSink extend the KafkaTopicSource and KafkaTopicSink so that the lower level implementation of those classes become available within Wayang’s Java Platform context.

In this layer, the KafkaConsumer class and the KafkaProducer class are used, but both are configured and instantiated in the next layer underneath. All this is done in the project wayang-plarforms/wayang-java.

Layer 3 – Input/Output Connector Layer

The KafkaTopicSource and KafkaTopicSink classes build the third layer of our implementation. Both are implemented in Java programming language. In this layer, the real Kafka-Client logic is defined. Details about consumer and producers, client configuration, and schema handling have to be handled here.

Summary

Both classes in the third layer implement the Kafka client logic which is needed by the Wayang-execution plan when external data flows should be established. The layer above handles the mapping of the components at startup time. All this wiring is needed to keep Wayang open and flexible so that multiple external systems can be used in a variety of combinations and using multiple target platforms in combinations.

Outlook

The next part of the article series will cover the creation of an Kafka Source and Sink component for the Apache Spark platform, which allows our use case to scale. Finally, in part four we bring all puzzles together, and show the full implementation of the multi organizational data collaboration use case.

Apache Kafka meets Wayang - Part 1

March 5, 2024 · 4 min read

Mirko Kämpf

(P)PMC Apache Wayang

Intro

This article is the first of a four part series about federated data analysis using Apache Wayang. The first article starts with an introduction of a typical data colaboration scenario which will emerge in our digital future.

In part two and three we will share a summary of our Apache Kafka client implementation for Apache Wayang. We started with the Java Platform (part 2) and the Apache Spark implementation follows (W.I.P.) in part three.

The use case behind this work is an imaginary data collaboration scenario. We see this example and the demand for a solution already in many places. For us this is motivation enough to propose a solution. This would also allow us to do more local data processing, and businesses can stop moving data around the world, but rather care about data locality while they expose and share specific information to others by using data federation. This reduces complexity of data management and cost dramatically.

For this purpose, we illustrate a cross organizational data sharing scenario from the finance sector soon. This analysis pattern will also be relevant in the context of data analysis along supply chains, another typical example where data from many stakeholder together is needed but never managed in one place, for good reasons.

Data federation can help us to unlock the hidden value of all those isolated data lakes.

Our goal is the implementation of a cross organization decentralized data processing scenario, in which protected local data should be processed in combination with public data from public sources in a collaborative manner. Instead of copying all data into a central data lake or a central data platform we decided to use federated analytics. Apache Wayang is the tool we work with. In our case, the public data is hosted on publicly available websites or data pods. A client can use the HTTP(S) protocol to read the data which is given in a well defined format. For simplicity we decided to use CSV format. When we look into the data of each participant we have a different perspective.

Our processing procedure should calculate a particular metric on the local data of each participant. An example of such a metric is the average spending of all users on a particular product category per month. This can vary from partner to partner, hence, we want to be able to calculate a peer-group comparison so that each partner can see its own metric compared with a global average calculated from contributions by all partners. Such a process requires global averaging and local averaging. And due to governance constraints, we can’t bring all raw data together in one place.

Instead, we want to use Apache Wayang for this purpose. We simplify the procedure and split it into two phases. Phase one is the process, which allows each participant to calculate the local metrics. This requires only local data. The second phase requires data from all collaborating partners. The monthly sum and counter values per partner and category are needed in one place by all other parties. Hence, the algorithm of the first phase stores the local results locally, and the contributions to the global results in an externally accessible Kafka topic. We assume this is done by each of the partners.

Now we have a scenario, in which an Apache Wayang process must be able to read data from multiple Apache Kafka topics from multiple Apache Kafka clusters but finally writes into a single Kafka topic, which then can be accessed by all the participating clients.

images/image-1.png

The illustration shows the data flows in such a scenario. Jobs with red border are executed by the participants in isolation within their own data processing environments. But they share some of the data, using publicly accessible Kafka topics, marked by A. Job 4 is the Apache Wayang job in our focus: here we intent to read data from 3 different source systems, and write results into a fourth system (marked as B), which can be accesses by all participants again.

With this in mind we want to implement an Apache Wayang application which implements the illustrated Job 4. Since as of today, there is now KafkaSource and KafkaSink available in Apache Wayang, an implementation of both will be our first step. Our assumption is, that in the beginning, there won’t be much data.

Apache Spark is not required to cope with the load, but we expect, that in the future, a single Java application would not be able to handle our workload. Hence, we want to utilize the Apache Wayang abstraction over multiple processing platforms, starting with Java. Later, we want to switch to Apache Spark.

Implementation​

Why Spatial Is a Plugin, Not a Platform​

Benchmarks​

Future Work​

What Is Apache Wayang?​

Thank You to Our Community​

Get Involved!​

Intro​

Status:​

Idea / Proposal​

Latest Error:​

Activity Log​

Dependency on JDK-11 during release​

Manual update of release-version​

Warning regarding "illegal reflective access operation"​

RAT Check fails​

Tag could not be created in SCM.​

Checklist for next iteration:​

1. Check Maven Settings​

2. Verify Repository URL​

3. Maven Version Compatibility​

4. Check for Network Issues​

5. Update Maven Plugins​

6. Configure the maven-deploy-plugin in pom.xml​

7. Increase Verbose Logging​

8. Retry with a Clean Local Repository​

9. Check for Missing Credentials​

10. Review the Full Stack Trace​

Step 1: Introducing a Model​

Step 2: Introducing Training Operators​

Step 3: Introducing Prediction Operators​

Deep Learning Models​

New ML platform -- Tensorflow Integration​

Acknowledgement​

Follow Wayang​

Apache Wayang in the Federated AI world​

A real-world example​

Follow Wayang​

Thank you!​

Expanding Apache Wayang's APIs​

The goal of this implementation​

Implementation of Input- and Output Operators​

Summary​

Outlook​

Apache Wayang’s Read & Write Path for Kafka topics​

Read Path​

Write Path​

Implementation of Input- and Output Operators​

Summary​

Outlook​

Intro​

A cross organizational data sharing scenario​

Implementation

Why Spatial Is a Plugin, Not a Platform

Benchmarks

Future Work

What Is Apache Wayang?

Thank You to Our Community

Get Involved!

Intro

Status:

Idea / Proposal

Latest Error:

Activity Log

Dependency on JDK-11 during release

Manual update of release-version

Warning regarding "illegal reflective access operation"

RAT Check fails

Tag could not be created in SCM.

Checklist for next iteration:

1. Check Maven Settings

2. Verify Repository URL

3. Maven Version Compatibility

4. Check for Network Issues

5. Update Maven Plugins

6. Configure the `maven-deploy-plugin` in `pom.xml`

7. Increase Verbose Logging

8. Retry with a Clean Local Repository

9. Check for Missing Credentials

10. Review the Full Stack Trace

Step 1: Introducing a Model

Step 2: Introducing Training Operators

Step 3: Introducing Prediction Operators

Deep Learning Models

New ML platform -- Tensorflow Integration

Acknowledgement

Follow Wayang

Apache Wayang in the Federated AI world

A real-world example

Follow Wayang

Thank you!

Expanding Apache Wayang's APIs

The goal of this implementation

Implementation of Input- and Output Operators

Summary

Outlook

Apache Wayang’s Read & Write Path for Kafka topics

Read Path

Write Path

Implementation of Input- and Output Operators

Summary

Outlook

Intro

A cross organizational data sharing scenario