Package org.apache.wayang.profiler.spark
Class SparkOperatorProfiler
java.lang.Object
org.apache.wayang.profiler.spark.SparkOperatorProfiler
- Direct Known Subclasses:
BinaryOperatorProfiler
,SinkProfiler
,SparkSourceProfiler
,SparkUnaryOperatorProfiler
Allows to instrument an
SparkExecutionOperator
.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
The result of a single profiling run. -
Field Summary
FieldsModifier and TypeFieldDescriptionint
protected final long
protected final FunctionCompiler
protected final org.apache.logging.log4j.Logger
int
int
int
protected SparkExecutionOperator
protected Supplier<SparkExecutionOperator>
protected SparkExecutor
-
Constructor Summary
ConstructorsConstructorDescriptionSparkOperatorProfiler
(Supplier<SparkExecutionOperator> operatorGenerator, Configuration configuration, Supplier<?>... dataQuantumGenerators) -
Method Summary
Modifier and TypeMethodDescriptionvoid
cleanUp()
Override this method to implement any clean-up logic.protected static RddChannel.Instance
createChannelInstance
(org.apache.spark.api.java.JavaRDD<?> rdd, SparkExecutor sparkExecutor) Creates aChannelInstance
that carries the givenrdd
.protected static RddChannel.Instance
createChannelInstance
(SparkExecutor sparkExecutor) Creates an emptyChannelInstance
.protected void
evaluate
(SparkExecutionOperator operator, ChannelInstance[] inputs, ChannelInstance[] outputs) protected abstract SparkOperatorProfiler.Result
Executes the profiling task.protected <T> org.apache.spark.api.java.JavaRDD<T>
partition
(org.apache.spark.api.java.JavaRDD<T> rdd) If a desired number of partitions for the inputJavaRDD
s is requested, enforce this.void
prepare
(long... inputCardinalities) Call this method beforerun()
to prepare the profiling runprotected abstract void
prepareInput
(int inputIndex, long inputCardinality) protected <T> org.apache.spark.api.java.JavaRDD<T>
prepareInputRdd
(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD
.protected <T> org.apache.spark.api.java.JavaRDD<T>
prepareInputRddInDriver
(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD
.protected <T> org.apache.spark.api.java.JavaRDD<T>
prepareInputRddInWorker
(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD
.protected long
provideCpuCycles
(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.protected long
provideDiskBytes
(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.protected long
provideNetworkBytes
(long startTime, long endTime) Estimates the network bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.run()
Executes and profiles the profiling task.
-
Field Details
-
logger
protected final org.apache.logging.log4j.Logger logger -
operatorGenerator
-
dataQuantumGenerators
-
cpuMhz
public int cpuMhz -
numMachines
public int numMachines -
numCoresPerMachine
public int numCoresPerMachine -
numPartitions
public int numPartitions -
executionPaddingTime
protected final long executionPaddingTime -
operator
-
sparkExecutor
-
functionCompiler
-
inputCardinalities
-
-
Constructor Details
-
SparkOperatorProfiler
public SparkOperatorProfiler(Supplier<SparkExecutionOperator> operatorGenerator, Configuration configuration, Supplier<?>... dataQuantumGenerators)
-
-
Method Details
-
prepare
public void prepare(long... inputCardinalities) Call this method beforerun()
to prepare the profiling run- Parameters:
inputCardinalities
- number of input elements for each input of the profiled operator
-
prepareInput
protected abstract void prepareInput(int inputIndex, long inputCardinality) -
prepareInputRdd
protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRdd(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD
. Uses an implementation based on thewayang.profiler.datagen.location
property. -
prepareInputRddInDriver
protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInDriver(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD
. -
prepareInputRddInWorker
protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInWorker(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD
. -
partition
protected <T> org.apache.spark.api.java.JavaRDD<T> partition(org.apache.spark.api.java.JavaRDD<T> rdd) If a desired number of partitions for the inputJavaRDD
s is requested, enforce this. -
run
Executes and profiles the profiling task. Requires that this instance is prepared. -
provideCpuCycles
protected long provideCpuCycles(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files. -
provideNetworkBytes
protected long provideNetworkBytes(long startTime, long endTime) Estimates the network bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files. -
provideDiskBytes
protected long provideDiskBytes(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files. -
executeOperator
Executes the profiling task. Requires that this instance is prepared. -
evaluate
protected void evaluate(SparkExecutionOperator operator, ChannelInstance[] inputs, ChannelInstance[] outputs) -
createChannelInstance
protected static RddChannel.Instance createChannelInstance(org.apache.spark.api.java.JavaRDD<?> rdd, SparkExecutor sparkExecutor) Creates aChannelInstance
that carries the givenrdd
. -
createChannelInstance
Creates an emptyChannelInstance
. -
cleanUp
public void cleanUp()Override this method to implement any clean-up logic.
-