Class SampleOperator<Type>
java.lang.Object
org.apache.wayang.core.plan.wayangplan.OperatorBase
org.apache.wayang.core.plan.wayangplan.UnaryToUnaryOperator<Type,Type>
org.apache.wayang.basic.operators.SampleOperator<Type>
- All Implemented Interfaces:
Serializable,ActualOperator,ElementaryOperator,Operator
- Direct Known Subclasses:
FlinkSampleOperator,JavaRandomSampleOperator,JavaReservoirSampleOperator,SparkBernoulliSampleOperator,SparkRandomPartitionSampleOperator,SparkShufflePartitionSampleOperator
A random sample operator randomly selects its inputs from the input slot and pushes that element to the output slot.
- See Also:
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
OperatorBase.GsonSerializer -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected LongSize of the dataset to be sampled or 0L if a dataset size is not known.protected final org.apache.logging.log4j.LoggerThis function determines the sample size by the number of iterations.This function optionally determines the seed by the number of iterations.static final longSpecial dataset size that represents "unknown".Fields inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
inputSlots, outputSlots, STANDARD_OPERATOR_ARGSFields inherited from interface org.apache.wayang.core.plan.wayangplan.Operator
FIRST_EPOCH -
Constructor Summary
ConstructorsConstructorDescriptionSampleOperator(Integer sampleSize, DataSetType<Type> type) Creates a new instance with any sampling method.SampleOperator(Integer sampleSize, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given the sample size and the seed.SampleOperator(SampleOperator<Type> that) Copies an instance (exclusive of broadcasts).SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type) Creates a new instance with any sampling method.SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod) Creates a new instance given the sample size and the method.SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given a user-defined sample size.SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, FunctionDescriptor.SerializableLongUnaryOperator seedFunction) Creates a new instance given user-defined sample size and seed methods. -
Method Summary
Modifier and TypeMethodDescriptioncreateCardinalityEstimator(int outputIndex, Configuration configuration) longprotected intgetSampleSize(OptimizationContext.OperatorContext operatorContext) Retrieve the sample size for this instance w.r.t. the current iteration.protected longgetSeed(OptimizationContext.OperatorContext operatorContext) Retrieve the seed for this instance w.r.t. the current iteration.getType()protected booleanFind out whether this instance knows about the size of the incoming dataset.static longGenerate a random seed.voidsetDatasetSize(long datasetSize) voidsetSampleMethod(SampleOperator.Methods sampleMethod) voidMethods inherited from class org.apache.wayang.core.plan.wayangplan.UnaryToUnaryOperator
getInput, getInputType, getOutput, getOutputTypeMethods inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
accept, addBroadcastInput, addTargetPlatform, at, collectMappedInputSlots, collectMappedOutputSlots, copy, createCopy, getAllInputs, getAllOutputs, getCardinalityEstimator, getContainer, getEpoch, getName, getOriginal, getSimpleClassName, getTargetPlatforms, isAuxiliary, isSupportingBroadcastInputs, propagateInputCardinality, propagateOutputCardinality, setAuxiliary, setCardinalityEstimator, setContainer, setEpoch, setName, toStringMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.wayang.core.plan.wayangplan.ActualOperator
acceptMethods inherited from interface org.apache.wayang.core.plan.wayangplan.ElementaryOperator
getCardinalityEstimator, isAuxiliary, setAuxiliary, setCardinalityEstimatorMethods inherited from interface org.apache.wayang.core.plan.wayangplan.Operator
addBroadcastInput, addTargetPlatform, broadcastTo, broadcastTo, collectMappedInputSlots, collectMappedOutputSlots, connectTo, connectTo, getAllInputs, getAllOutputs, getCardinalityPusher, getContainer, getEffectiveOccupant, getEffectiveOccupant, getEpoch, getEstimationContextProperties, getForwards, getInnermostLoop, getInput, getInput, getLoopStack, getName, getNumBroadcastInputs, getNumInputs, getNumOutputs, getNumRegularInputs, getOuterInputSlot, getOutermostInputSlot, getOutermostOutputSlots, getOutput, getOutput, getParent, getTargetPlatforms, isAlternative, isConversion, isElementary, isExecutionOperator, isFeedbackInput, isFeedforwardOutput, isLoopHead, isLoopSubplan, isOwnerOf, isReading, isSink, isSource, isSubplan, isSupportingBroadcastInputs, isUnconnected, propagateInputCardinality, propagateOutputCardinality, propagateOutputCardinality, replaceWith, setContainer, setEpoch, setInput, setName, setOutput
-
Field Details
-
logger
protected final org.apache.logging.log4j.Logger logger -
UNKNOWN_DATASET_SIZE
public static final long UNKNOWN_DATASET_SIZESpecial dataset size that represents "unknown".- See Also:
-
sampleSizeFunction
This function determines the sample size by the number of iterations. -
seedFunction
This function optionally determines the seed by the number of iterations. -
datasetSize
Size of the dataset to be sampled or 0L if a dataset size is not known.
-
-
Constructor Details
-
SampleOperator
Creates a new instance with any sampling method.- Parameters:
sampleSize- size of the sampletype-DataSetTypeof the sampled dataset
-
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type) Creates a new instance with any sampling method.- Parameters:
sampleSizeFunction- user-specified size of the sample in dependence of the current iteration numbertype-DataSetTypeof the sampled dataset
-
SampleOperator
public SampleOperator(Integer sampleSize, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given the sample size and the seed. -
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod) Creates a new instance given the sample size and the method. -
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given a user-defined sample size. -
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, FunctionDescriptor.SerializableLongUnaryOperator seedFunction) Creates a new instance given user-defined sample size and seed methods. -
SampleOperator
Copies an instance (exclusive of broadcasts).- Parameters:
that- that should be copied
-
-
Method Details
-
randomSeed
public static long randomSeed()Generate a random seed. -
getType
-
getDatasetSize
public long getDatasetSize() -
setDatasetSize
public void setDatasetSize(long datasetSize) -
isDataSetSizeKnown
protected boolean isDataSetSizeKnown()Find out whether this instance knows about the size of the incoming dataset.- Returns:
- whether it knows the dataset size
-
getSampleMethod
-
setSampleMethod
-
setSeedFunction
-
getSampleSize
Retrieve the sample size for this instance w.r.t. the current iteration.- Parameters:
operatorContext- provides the current iteration number- Returns:
- the sample size
-
getSeed
Retrieve the seed for this instance w.r.t. the current iteration.- Parameters:
operatorContext- provides the current iteration number- Returns:
- the seed
-
createCardinalityEstimator
public Optional<CardinalityEstimator> createCardinalityEstimator(int outputIndex, Configuration configuration) Description copied from interface:ElementaryOperator- Parameters:
outputIndex- index of theOutputSlotfor that theCardinalityEstimatoris requestedconfiguration- if theCardinalityEstimatordepends on further ones, use this to obtain the latter- Returns:
- an
Optionalthat might provide the requested instance
-