Class SampleOperator<Type>
- java.lang.Object
-
- org.apache.wayang.core.plan.wayangplan.OperatorBase
-
- org.apache.wayang.core.plan.wayangplan.UnaryToUnaryOperator<Type,Type>
-
- org.apache.wayang.basic.operators.SampleOperator<Type>
-
- All Implemented Interfaces:
java.io.Serializable
,ActualOperator
,ElementaryOperator
,Operator
- Direct Known Subclasses:
FlinkSampleOperator
,JavaRandomSampleOperator
,JavaReservoirSampleOperator
,SparkBernoulliSampleOperator
,SparkRandomPartitionSampleOperator
,SparkShufflePartitionSampleOperator
public class SampleOperator<Type> extends UnaryToUnaryOperator<Type,Type>
A random sample operator randomly selects its inputs from the input slot and pushes that element to the output slot.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SampleOperator.Methods
-
Nested classes/interfaces inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
OperatorBase.GsonSerializer
-
-
Field Summary
Fields Modifier and Type Field Description protected java.lang.Long
datasetSize
Size of the dataset to be sampled or 0L if a dataset size is not known.protected org.apache.logging.log4j.Logger
logger
protected FunctionDescriptor.SerializableIntUnaryOperator
sampleSizeFunction
This function determines the sample size by the number of iterations.protected FunctionDescriptor.SerializableLongUnaryOperator
seedFunction
This function optionally determines the seed by the number of iterations.static long
UNKNOWN_DATASET_SIZE
Special dataset size that represents "unknown".-
Fields inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
inputSlots, outputSlots, STANDARD_OPERATOR_ARGS
-
Fields inherited from interface org.apache.wayang.core.plan.wayangplan.Operator
FIRST_EPOCH
-
-
Constructor Summary
Constructors Constructor Description SampleOperator(java.lang.Integer sampleSize, DataSetType<Type> type)
Creates a new instance with any sampling method.SampleOperator(java.lang.Integer sampleSize, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed)
Creates a new instance given the sample size and the seed.SampleOperator(SampleOperator<Type> that)
Copies an instance (exclusive of broadcasts).SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type)
Creates a new instance with any sampling method.SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod)
Creates a new instance given the sample size and the method.SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed)
Creates a new instance given a user-defined sample size.SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, FunctionDescriptor.SerializableLongUnaryOperator seedFunction)
Creates a new instance given user-defined sample size and seed methods.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Optional<CardinalityEstimator>
createCardinalityEstimator(int outputIndex, Configuration configuration)
long
getDatasetSize()
SampleOperator.Methods
getSampleMethod()
protected int
getSampleSize(OptimizationContext.OperatorContext operatorContext)
Retrieve the sample size for this instance w.r.t. the current iteration.protected long
getSeed(OptimizationContext.OperatorContext operatorContext)
Retrieve the seed for this instance w.r.t. the current iteration.DataSetType<Type>
getType()
protected boolean
isDataSetSizeKnown()
Find out whether this instance knows about the size of the incoming dataset.static long
randomSeed()
Generate a random seed.void
setDatasetSize(long datasetSize)
void
setSampleMethod(SampleOperator.Methods sampleMethod)
void
setSeedFunction(FunctionDescriptor.SerializableLongUnaryOperator seedFunction)
-
Methods inherited from class org.apache.wayang.core.plan.wayangplan.UnaryToUnaryOperator
getInput, getInputType, getOutput, getOutputType
-
Methods inherited from class org.apache.wayang.core.plan.wayangplan.OperatorBase
accept, addBroadcastInput, addTargetPlatform, at, collectMappedInputSlots, collectMappedOutputSlots, copy, createCopy, getAllInputs, getAllOutputs, getCardinalityEstimator, getContainer, getEpoch, getName, getOriginal, getSimpleClassName, getTargetPlatforms, isAuxiliary, isSupportingBroadcastInputs, propagateInputCardinality, propagateOutputCardinality, setAuxiliary, setCardinalityEstimator, setContainer, setEpoch, setName, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.wayang.core.plan.wayangplan.ActualOperator
accept
-
Methods inherited from interface org.apache.wayang.core.plan.wayangplan.ElementaryOperator
getCardinalityEstimator, isAuxiliary, setAuxiliary, setCardinalityEstimator
-
Methods inherited from interface org.apache.wayang.core.plan.wayangplan.Operator
addBroadcastInput, addTargetPlatform, broadcastTo, broadcastTo, collectMappedInputSlots, collectMappedOutputSlots, connectTo, connectTo, getAllInputs, getAllOutputs, getCardinalityPusher, getContainer, getEffectiveOccupant, getEffectiveOccupant, getEpoch, getEstimationContextProperties, getForwards, getInnermostLoop, getInput, getInput, getLoopStack, getName, getNumBroadcastInputs, getNumInputs, getNumOutputs, getNumRegularInputs, getOuterInputSlot, getOutermostInputSlot, getOutermostOutputSlots, getOutput, getOutput, getParent, getTargetPlatforms, isAlternative, isElementary, isExecutionOperator, isFeedbackInput, isFeedforwardOutput, isLoopHead, isLoopSubplan, isOwnerOf, isReading, isSink, isSource, isSubplan, isSupportingBroadcastInputs, isUnconnected, propagateInputCardinality, propagateOutputCardinality, propagateOutputCardinality, setContainer, setEpoch, setInput, setName, setOutput
-
-
-
-
Field Detail
-
logger
protected final org.apache.logging.log4j.Logger logger
-
UNKNOWN_DATASET_SIZE
public static final long UNKNOWN_DATASET_SIZE
Special dataset size that represents "unknown".- See Also:
- Constant Field Values
-
sampleSizeFunction
protected FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction
This function determines the sample size by the number of iterations.
-
seedFunction
protected FunctionDescriptor.SerializableLongUnaryOperator seedFunction
This function optionally determines the seed by the number of iterations.
-
datasetSize
protected java.lang.Long datasetSize
Size of the dataset to be sampled or 0L if a dataset size is not known.
-
-
Constructor Detail
-
SampleOperator
public SampleOperator(java.lang.Integer sampleSize, DataSetType<Type> type)
Creates a new instance with any sampling method.- Parameters:
sampleSize
- size of the sampletype
-DataSetType
of the sampled dataset
-
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type)
Creates a new instance with any sampling method.- Parameters:
sampleSizeFunction
- user-specified size of the sample in dependence of the current iteration numbertype
-DataSetType
of the sampled dataset
-
SampleOperator
public SampleOperator(java.lang.Integer sampleSize, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed)
Creates a new instance given the sample size and the seed.
-
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod)
Creates a new instance given the sample size and the method.
-
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, long seed)
Creates a new instance given a user-defined sample size.
-
SampleOperator
public SampleOperator(FunctionDescriptor.SerializableIntUnaryOperator sampleSizeFunction, DataSetType<Type> type, SampleOperator.Methods sampleMethod, FunctionDescriptor.SerializableLongUnaryOperator seedFunction)
Creates a new instance given user-defined sample size and seed methods.
-
SampleOperator
public SampleOperator(SampleOperator<Type> that)
Copies an instance (exclusive of broadcasts).- Parameters:
that
- that should be copied
-
-
Method Detail
-
randomSeed
public static long randomSeed()
Generate a random seed.
-
getType
public DataSetType<Type> getType()
-
getDatasetSize
public long getDatasetSize()
-
setDatasetSize
public void setDatasetSize(long datasetSize)
-
isDataSetSizeKnown
protected boolean isDataSetSizeKnown()
Find out whether this instance knows about the size of the incoming dataset.- Returns:
- whether it knows the dataset size
-
getSampleMethod
public SampleOperator.Methods getSampleMethod()
-
setSampleMethod
public void setSampleMethod(SampleOperator.Methods sampleMethod)
-
setSeedFunction
public void setSeedFunction(FunctionDescriptor.SerializableLongUnaryOperator seedFunction)
-
getSampleSize
protected int getSampleSize(OptimizationContext.OperatorContext operatorContext)
Retrieve the sample size for this instance w.r.t. the current iteration.- Parameters:
operatorContext
- provides the current iteration number- Returns:
- the sample size
-
getSeed
protected long getSeed(OptimizationContext.OperatorContext operatorContext)
Retrieve the seed for this instance w.r.t. the current iteration.- Parameters:
operatorContext
- provides the current iteration number- Returns:
- the seed
-
createCardinalityEstimator
public java.util.Optional<CardinalityEstimator> createCardinalityEstimator(int outputIndex, Configuration configuration)
Description copied from interface:ElementaryOperator
- Parameters:
outputIndex
- index of theOutputSlot
for that theCardinalityEstimator
is requestedconfiguration
- if theCardinalityEstimator
depends on further ones, use this to obtain the latter- Returns:
- an
Optional
that might provide the requested instance
-
-