Package org.apache.wayang.profiler.spark
Class SparkOperatorProfiler
java.lang.Object
org.apache.wayang.profiler.spark.SparkOperatorProfiler
- Direct Known Subclasses:
BinaryOperatorProfiler,SinkProfiler,SparkSourceProfiler,SparkUnaryOperatorProfiler
Allows to instrument an
SparkExecutionOperator.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classThe result of a single profiling run. -
Field Summary
FieldsModifier and TypeFieldDescriptionintprotected final longprotected final FunctionCompilerprotected final org.apache.logging.log4j.Loggerintintintprotected SparkExecutionOperatorprotected Supplier<SparkExecutionOperator>protected SparkExecutor -
Constructor Summary
ConstructorsConstructorDescriptionSparkOperatorProfiler(Supplier<SparkExecutionOperator> operatorGenerator, Configuration configuration, Supplier<?>... dataQuantumGenerators) -
Method Summary
Modifier and TypeMethodDescriptionvoidcleanUp()Override this method to implement any clean-up logic.protected static RddChannel.InstancecreateChannelInstance(org.apache.spark.api.java.JavaRDD<?> rdd, SparkExecutor sparkExecutor) Creates aChannelInstancethat carries the givenrdd.protected static RddChannel.InstancecreateChannelInstance(SparkExecutor sparkExecutor) Creates an emptyChannelInstance.protected voidevaluate(SparkExecutionOperator operator, ChannelInstance[] inputs, ChannelInstance[] outputs) protected abstract SparkOperatorProfiler.ResultExecutes the profiling task.protected <T> org.apache.spark.api.java.JavaRDD<T>partition(org.apache.spark.api.java.JavaRDD<T> rdd) If a desired number of partitions for the inputJavaRDDs is requested, enforce this.voidprepare(long... inputCardinalities) Call this method beforerun()to prepare the profiling runprotected abstract voidprepareInput(int inputIndex, long inputCardinality) protected <T> org.apache.spark.api.java.JavaRDD<T>prepareInputRdd(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD.protected <T> org.apache.spark.api.java.JavaRDD<T>prepareInputRddInDriver(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD.protected <T> org.apache.spark.api.java.JavaRDD<T>prepareInputRddInWorker(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD.protected longprovideCpuCycles(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.protected longprovideDiskBytes(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.protected longprovideNetworkBytes(long startTime, long endTime) Estimates the network bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.run()Executes and profiles the profiling task.
-
Field Details
-
logger
protected final org.apache.logging.log4j.Logger logger -
operatorGenerator
-
dataQuantumGenerators
-
cpuMhz
public int cpuMhz -
numMachines
public int numMachines -
numCoresPerMachine
public int numCoresPerMachine -
numPartitions
public int numPartitions -
executionPaddingTime
protected final long executionPaddingTime -
operator
-
sparkExecutor
-
functionCompiler
-
inputCardinalities
-
-
Constructor Details
-
SparkOperatorProfiler
public SparkOperatorProfiler(Supplier<SparkExecutionOperator> operatorGenerator, Configuration configuration, Supplier<?>... dataQuantumGenerators)
-
-
Method Details
-
prepare
public void prepare(long... inputCardinalities) Call this method beforerun()to prepare the profiling run- Parameters:
inputCardinalities- number of input elements for each input of the profiled operator
-
prepareInput
protected abstract void prepareInput(int inputIndex, long inputCardinality) -
prepareInputRdd
protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRdd(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD. Uses an implementation based on thewayang.profiler.datagen.locationproperty. -
prepareInputRddInDriver
protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInDriver(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD. -
prepareInputRddInWorker
protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInWorker(long cardinality, int inputIndex) Helper method to generate data quanta and provide them as a cachedJavaRDD. -
partition
protected <T> org.apache.spark.api.java.JavaRDD<T> partition(org.apache.spark.api.java.JavaRDD<T> rdd) If a desired number of partitions for the inputJavaRDDs is requested, enforce this. -
run
Executes and profiles the profiling task. Requires that this instance is prepared. -
provideCpuCycles
protected long provideCpuCycles(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files. -
provideNetworkBytes
protected long provideNetworkBytes(long startTime, long endTime) Estimates the network bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files. -
provideDiskBytes
protected long provideDiskBytes(long startTime, long endTime) Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files. -
executeOperator
Executes the profiling task. Requires that this instance is prepared. -
evaluate
protected void evaluate(SparkExecutionOperator operator, ChannelInstance[] inputs, ChannelInstance[] outputs) -
createChannelInstance
protected static RddChannel.Instance createChannelInstance(org.apache.spark.api.java.JavaRDD<?> rdd, SparkExecutor sparkExecutor) Creates aChannelInstancethat carries the givenrdd. -
createChannelInstance
Creates an emptyChannelInstance. -
cleanUp
public void cleanUp()Override this method to implement any clean-up logic.
-