Class SampleOperator<Type>
java.lang.Object
org.apache.wayang.core.plan.wayangplan.OperatorBase
org.apache.wayang.core.plan.wayangplan.UnaryToUnaryOperator<Type,Type>
org.apache.wayang.basic.operators.SampleOperator<Type>
- All Implemented Interfaces:
Serializable
,ActualOperator
,ElementaryOperator
,Operator
- Direct Known Subclasses:
FlinkSampleOperator
,JavaRandomSampleOperator
,JavaReservoirSampleOperator
,SparkBernoulliSampleOperator
,SparkRandomPartitionSampleOperator
,SparkShufflePartitionSampleOperator
A random sample operator randomly selects its inputs from the input slot and pushes that element to the output slot.
- See Also:
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
OperatorBase.GsonSerializer
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Long
Size of the dataset to be sampled or 0L if a dataset size is not known.protected final org.apache.logging.log4j.Logger
This function determines the sample size by the number of iterations.This function optionally determines the seed by the number of iterations.static final long
Special dataset size that represents "unknown".Fields inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
inputSlots, outputSlots, STANDARD_OPERATOR_ARGS
Fields inherited from interface org.apache.wayang.core.plan.wayangplan.Operator
FIRST_EPOCH
-
Constructor Summary
ConstructorsConstructorDescriptionSampleOperator
(Integer sampleSize, DataSetType<Type> type) Creates a new instance with any sampling method.SampleOperator
(Integer sampleSize, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given the sample size and the seed.SampleOperator
(SampleOperator<Type> that) Copies an instance (exclusive of broadcasts).SampleOperator
(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type) Creates a new instance with any sampling method.SampleOperator
(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod) Creates a new instance given the sample size and the method.SampleOperator
(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given a user-defined sample size.SampleOperator
(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, FunctionDescriptor.SerializableLongUnaryOperator seedFunction) Creates a new instance given user-defined sample size and seed methods. -
Method Summary
Modifier and TypeMethodDescriptioncreateCardinalityEstimator
(int outputIndex, Configuration configuration) long
protected int
getSampleSize
(OptimizationContext.OperatorContext operatorContext) Retrieve the sample size for this instance w.r.t. the current iteration.protected long
getSeed
(OptimizationContext.OperatorContext operatorContext) Retrieve the seed for this instance w.r.t. the current iteration.getType()
protected boolean
Find out whether this instance knows about the size of the incoming dataset.static long
Generate a random seed.void
setDatasetSize
(long datasetSize) void
setSampleMethod
(SampleOperator.Methods sampleMethod) void
Methods inherited from class org.apache.wayang.core.plan.wayangplan.UnaryToUnaryOperator
getInput, getInputType, getOutput, getOutputType
Methods inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
accept, addBroadcastInput, addTargetPlatform, at, collectMappedInputSlots, collectMappedOutputSlots, copy, createCopy, getAllInputs, getAllOutputs, getCardinalityEstimator, getContainer, getEpoch, getName, getOriginal, getSimpleClassName, getTargetPlatforms, isAuxiliary, isSupportingBroadcastInputs, propagateInputCardinality, propagateOutputCardinality, setAuxiliary, setCardinalityEstimator, setContainer, setEpoch, setName, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.wayang.core.plan.wayangplan.ActualOperator
accept
Methods inherited from interface org.apache.wayang.core.plan.wayangplan.ElementaryOperator
getCardinalityEstimator, isAuxiliary, setAuxiliary, setCardinalityEstimator
Methods inherited from interface org.apache.wayang.core.plan.wayangplan.Operator
addBroadcastInput, addTargetPlatform, broadcastTo, broadcastTo, collectMappedInputSlots, collectMappedOutputSlots, connectTo, connectTo, getAllInputs, getAllOutputs, getCardinalityPusher, getContainer, getEffectiveOccupant, getEffectiveOccupant, getEpoch, getEstimationContextProperties, getForwards, getInnermostLoop, getInput, getInput, getLoopStack, getName, getNumBroadcastInputs, getNumInputs, getNumOutputs, getNumRegularInputs, getOuterInputSlot, getOutermostInputSlot, getOutermostOutputSlots, getOutput, getOutput, getParent, getTargetPlatforms, isAlternative, isConversion, isElementary, isExecutionOperator, isFeedbackInput, isFeedforwardOutput, isLoopHead, isLoopSubplan, isOwnerOf, isReading, isSink, isSource, isSubplan, isSupportingBroadcastInputs, isUnconnected, propagateInputCardinality, propagateOutputCardinality, propagateOutputCardinality, replaceWith, setContainer, setEpoch, setInput, setName, setOutput
-
Field Details
-
logger
protected final org.apache.logging.log4j.Logger logger -
UNKNOWN_DATASET_SIZE
public static final long UNKNOWN_DATASET_SIZESpecial dataset size that represents "unknown".- See Also:
-
sampleSizeFunction
This function determines the sample size by the number of iterations. -
seedFunction
This function optionally determines the seed by the number of iterations. -
datasetSize
Size of the dataset to be sampled or 0L if a dataset size is not known.
-
-
Constructor Details
-
SampleOperator
Creates a new instance with any sampling method.- Parameters:
sampleSize
- size of the sampletype
-DataSetType
of the sampled dataset
-
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type) Creates a new instance with any sampling method.- Parameters:
sampleSizeFunction
- user-specified size of the sample in dependence of the current iteration numbertype
-DataSetType
of the sampled dataset
-
SampleOperator
public SampleOperator(Integer sampleSize, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given the sample size and the seed. -
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod) Creates a new instance given the sample size and the method. -
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed) Creates a new instance given a user-defined sample size. -
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, FunctionDescriptor.SerializableLongUnaryOperator seedFunction) Creates a new instance given user-defined sample size and seed methods. -
SampleOperator
Copies an instance (exclusive of broadcasts).- Parameters:
that
- that should be copied
-
-
Method Details
-
randomSeed
public static long randomSeed()Generate a random seed. -
getType
-
getDatasetSize
public long getDatasetSize() -
setDatasetSize
public void setDatasetSize(long datasetSize) -
isDataSetSizeKnown
protected boolean isDataSetSizeKnown()Find out whether this instance knows about the size of the incoming dataset.- Returns:
- whether it knows the dataset size
-
getSampleMethod
-
setSampleMethod
-
setSeedFunction
-
getSampleSize
Retrieve the sample size for this instance w.r.t. the current iteration.- Parameters:
operatorContext
- provides the current iteration number- Returns:
- the sample size
-
getSeed
Retrieve the seed for this instance w.r.t. the current iteration.- Parameters:
operatorContext
- provides the current iteration number- Returns:
- the seed
-
createCardinalityEstimator
public Optional<CardinalityEstimator> createCardinalityEstimator(int outputIndex, Configuration configuration) Description copied from interface:ElementaryOperator
- Parameters:
outputIndex
- index of theOutputSlot
for that theCardinalityEstimator
is requestedconfiguration
- if theCardinalityEstimator
depends on further ones, use this to obtain the latter- Returns:
- an
Optional
that might provide the requested instance
-