Class SparkOperatorProfiler

java.lang.Object
org.apache.wayang.profiler.spark.SparkOperatorProfiler
Direct Known Subclasses:
BinaryOperatorProfiler, SinkProfiler, SparkSourceProfiler, SparkUnaryOperatorProfiler

public abstract class SparkOperatorProfiler extends Object
Allows to instrument an SparkExecutionOperator.
  • Field Details

    • logger

      protected final org.apache.logging.log4j.Logger logger
    • operatorGenerator

      protected Supplier<SparkExecutionOperator> operatorGenerator
    • dataQuantumGenerators

      protected final List<Supplier<?>> dataQuantumGenerators
    • cpuMhz

      public int cpuMhz
    • numMachines

      public int numMachines
    • numCoresPerMachine

      public int numCoresPerMachine
    • numPartitions

      public int numPartitions
    • executionPaddingTime

      protected final long executionPaddingTime
    • operator

      protected SparkExecutionOperator operator
    • sparkExecutor

      protected SparkExecutor sparkExecutor
    • functionCompiler

      protected final FunctionCompiler functionCompiler
    • inputCardinalities

      protected List<Long> inputCardinalities
  • Constructor Details

  • Method Details

    • prepare

      public void prepare(long... inputCardinalities)
      Call this method before run() to prepare the profiling run
      Parameters:
      inputCardinalities - number of input elements for each input of the profiled operator
    • prepareInput

      protected abstract void prepareInput(int inputIndex, long inputCardinality)
    • prepareInputRdd

      protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRdd(long cardinality, int inputIndex)
      Helper method to generate data quanta and provide them as a cached JavaRDD. Uses an implementation based on the wayang.profiler.datagen.location property.
    • prepareInputRddInDriver

      protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInDriver(long cardinality, int inputIndex)
      Helper method to generate data quanta and provide them as a cached JavaRDD.
    • prepareInputRddInWorker

      protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInWorker(long cardinality, int inputIndex)
      Helper method to generate data quanta and provide them as a cached JavaRDD.
    • partition

      protected <T> org.apache.spark.api.java.JavaRDD<T> partition(org.apache.spark.api.java.JavaRDD<T> rdd)
      If a desired number of partitions for the input JavaRDDs is requested, enforce this.
    • run

      Executes and profiles the profiling task. Requires that this instance is prepared.
    • provideCpuCycles

      protected long provideCpuCycles(long startTime, long endTime)
      Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.
    • provideNetworkBytes

      protected long provideNetworkBytes(long startTime, long endTime)
      Estimates the network bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.
    • provideDiskBytes

      protected long provideDiskBytes(long startTime, long endTime)
      Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.
    • executeOperator

      protected abstract SparkOperatorProfiler.Result executeOperator()
      Executes the profiling task. Requires that this instance is prepared.
    • evaluate

      protected void evaluate(SparkExecutionOperator operator, ChannelInstance[] inputs, ChannelInstance[] outputs)
    • createChannelInstance

      protected static RddChannel.Instance createChannelInstance(org.apache.spark.api.java.JavaRDD<?> rdd, SparkExecutor sparkExecutor)
      Creates a ChannelInstance that carries the given rdd.
    • createChannelInstance

      protected static RddChannel.Instance createChannelInstance(SparkExecutor sparkExecutor)
      Creates an empty ChannelInstance.
    • cleanUp

      public void cleanUp()
      Override this method to implement any clean-up logic.