Skip to main content

Documentation of Supported Operators in JSON-REST API

This is not a full coverage of supported operators and plan parameters in the API. These are only the ones that we were able to verify ourselves working through testing, and that we allow in the agentic systems.

For a Wayang plan in JSON to be executable, it must at least contain two main properties in its outer object: “context” and “operators”.

Overall Wayang Plan structure

{
"context": { ... },
"operators": [ ... ]
}

Context Property

The “context” property is an object with at least two fields: “platforms” and “configurations”. Platform is a list of strings with platforms available for the plan, like “Java” or “Spark”. At least one platform must be provided. Configurations contain optional settings provided to Wayang. The field is required, but it is possible to provide an empty object to it for the plan to work. In that case, we believe that Wayang uses the default configuration settings. An example of “context” below:

{
"context": {
"platforms": ["java", "spark"],
"configuration": {}
}
}

Operator Property

The “operator” property consists of a list of objects. Each object is an operator specification, and the list contains the available operators to be used in a plan. An operator has a set of required fields that it must contain to work. Some fields are common across all operators, where others are unique for the specific operator.

Table 1 describes the required fields for most operators. Some operators require additional properties inside the “data” object, but these are listed in the operator schema in Table 2

Operator Properties in Apache Wayang Plans (Table 1)

PropertyValueRequired in all operatorsDescription
idIntYesThe unique ID for an operator in a plan.
catStringYesThe operator's category or group, for example input or unary.
inputList[Int]YesUnique IDs of operators that send data to this operator. An operator is a source if the list is empty, for example input operators.
outputList[Int]YesUnique IDs of operators that receive data from this operator. An operator is a sink if the list is empty, for example output operators.
operatorNameStringYesName of the operator.
dataObjectNoAn object containing additional operator-specific properties. The count and distinct operators do not require the data property.

Operator Schema

The following Table 2 is a schema of all tested operators. Every operator requires the properties listed in Table 1. Data properties are operator-specific properties required. The schema is not a full coverage of the JSON-REST API. Additional operator and parameters are supported as well. These are the operators that we allow in the agentic systems.

Supported Apache Wayang Operators and their Required Data Properties (Table 2)

CategoryOperatorDescriptionRequired data propertiesExample
InputjdbcRemoteInputReads data from a database using a JDBC connection.uri, username, password, table, columnNamesjdbc_input
InputtextFileInputReads data from a text file line by line.filenametextfile_input
UnarymapApplies a function to each element and returns the transformed element.udfmap
UnaryflatMapApplies a function that may return zero, one, or multiple elements.udfflatmap
UnaryfilterKeeps only elements that satisfy a condition.udffilter
UnarydistinctRemoves duplicate elements.--distinct
UnarysortSorts elements according to a key.keyUdfsort
UnarysampleRandomly selects a subset of elements.sampleSizesample
UnaryreduceAggregates all elements into a single result.udfreduce
UnaryreduceByAggregates elements with the same key.keyUdf, udfreduceby
UnarygroupByGroups elements by key.keyUdfgroupby
UnarycountCounts the number of elements.--count
BinaryjoinCombines two datasets using an inner join.thisKeyUdf, thatKeyUdfjoin
OutputtextFileOutputWrites output data to a text file.filenametextfile_output

Operator Examples

jdbcRemoteInput

{
"id": 1,
"cat": "input",
"input": [],
"output": [2],
"operatorName": "jdbcRemoteInput",
"data": {
"uri": "jdbc:postgresql://localhost:5432/master_thesis_db",
"username": "master_thesis",
"password": "master",
"table": "person",
"columnNames": ["id", "name", "age", "address"]
}
}

textFileInput

{
"id": 1,
"cat": "input",
"input": [],
"output": [2],
"operatorName": "textFileInput",
"data": {
"filename": "file:///Users/alexander/Downloads/textfile_input.txt"
}
}

map

{
"id": 3,
"cat": "unary",
"input": [2],
"output": [],
"operatorName": "map",
"data": {
"udf": "(w: String) => w.length"
}
}

flatMap

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [3],
"operatorName": "flatMap",
"data": {
"udf": "(s: String) => s.split(\" \").toList"
}
}

filter

{
"id": 3,
"cat": "unary",
"input": [2],
"output": [4],
"operatorName": "filter",
"data": {
"udf": "(w: String) => w.length > 4"
}
}

distinct

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [],
"operatorName": "distinct"
}

sort

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [],
"operatorName": "sort",
"data": {
"keyUdf": "(w: String) => w"
}
}

sample

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [3],
"operatorName": "sample",
"data": {
"sampleSize": 3
}
}

reduce

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [3],
"operatorName": "reduce",
"data": {
"udf": "(a: Int, b: Int) => a + b"
}
}

reduceBy

{
"id": 4,
"cat": "unary",
"input": [3],
"output": [5],
"operatorName": "reduceBy",
"data": {
"keyUdf": "(pair: (String, Int)) => pair._1",
"udf": "(a: (String, Int), b: (String, Int)) => (a._1, a._2 + b._2)"
}
}

groupBy

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [3],
"operatorName": "groupBy",
"data": {
"keyUdf": "(w: String) => w.substring(0,1)"
}
}

count

{
"id": 2,
"cat": "unary",
"input": [1],
"output": [],
"operatorName": "count"
}

join

{
"id": 3,
"cat": "binary",
"input": [1, 2],
"output": [4],
"operatorName": "join",
"data": {
"thisKeyUdf": "(t: (String, Int)) => t._1",
"thatKeyUdf": "(t: (String, String)) => t._1"
}
}

textFileOutput

{
"id": 4,
"cat": "output",
"input": [3],
"output": [],
"operatorName": "textFileOutput",
"data": {
"filename": "file:///Users/alexander/Downloads/testoutput1.txt"
}
}