Class SparkOperatorProfiler

    • Field Detail

      • logger

        protected final org.apache.logging.log4j.Logger logger
      • dataQuantumGenerators

        protected final java.util.List<java.util.function.Supplier<?>> dataQuantumGenerators
      • cpuMhz

        public int cpuMhz
      • numMachines

        public int numMachines
      • numCoresPerMachine

        public int numCoresPerMachine
      • numPartitions

        public int numPartitions
      • executionPaddingTime

        protected final long executionPaddingTime
      • inputCardinalities

        protected java.util.List<java.lang.Long> inputCardinalities
    • Constructor Detail

      • SparkOperatorProfiler

        public SparkOperatorProfiler​(java.util.function.Supplier<SparkExecutionOperator> operatorGenerator,
                                     Configuration configuration,
                                     java.util.function.Supplier<?>... dataQuantumGenerators)
    • Method Detail

      • prepare

        public void prepare​(long... inputCardinalities)
        Call this method before run() to prepare the profiling run
        Parameters:
        inputCardinalities - number of input elements for each input of the profiled operator
      • prepareInput

        protected abstract void prepareInput​(int inputIndex,
                                             long inputCardinality)
      • prepareInputRdd

        protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRdd​(long cardinality,
                                                                           int inputIndex)
        Helper method to generate data quanta and provide them as a cached JavaRDD. Uses an implementation based on the wayang.profiler.datagen.location property.
      • prepareInputRddInDriver

        protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInDriver​(long cardinality,
                                                                                   int inputIndex)
        Helper method to generate data quanta and provide them as a cached JavaRDD.
      • prepareInputRddInWorker

        protected <T> org.apache.spark.api.java.JavaRDD<T> prepareInputRddInWorker​(long cardinality,
                                                                                   int inputIndex)
        Helper method to generate data quanta and provide them as a cached JavaRDD.
      • partition

        protected <T> org.apache.spark.api.java.JavaRDD<T> partition​(org.apache.spark.api.java.JavaRDD<T> rdd)
        If a desired number of partitions for the input JavaRDDs is requested, enforce this.
      • provideCpuCycles

        protected long provideCpuCycles​(long startTime,
                                        long endTime)
        Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.
      • provideNetworkBytes

        protected long provideNetworkBytes​(long startTime,
                                           long endTime)
        Estimates the network bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.
      • provideDiskBytes

        protected long provideDiskBytes​(long startTime,
                                        long endTime)
        Estimates the disk bytes occurred in the cluster during the given time span by waiting for Ganglia to provide the respective information in its RRD files.
      • executeOperator

        protected abstract SparkOperatorProfiler.Result executeOperator()
        Executes the profiling task. Requires that this instance is prepared.
      • cleanUp

        public void cleanUp()
        Override this method to implement any clean-up logic.