Class ParquetSource

All Implemented Interfaces:
Serializable, ActualOperator, ElementaryOperator, Operator
Direct Known Subclasses:
JavaParquetSource, SparkParquetSource

public class ParquetSource extends UnarySource<Record>
This source reads a parquet file and outputs the lines as Record units.
See Also:
  • Constructor Details

    • ParquetSource

      public ParquetSource(String inputUrl, String[] projection, String... fieldNames)
    • ParquetSource

      public ParquetSource(String inputUrl, String[] projection, DataSetType<Record> type)
    • ParquetSource

      public ParquetSource(ParquetSource that)
      Copies an instance (exclusive of broadcasts).
      Parameters:
      that - that should be copied
  • Method Details

    • create

      public static ParquetSource create(String inputUrl, String[] projection)
      Creates a new instance.
      Parameters:
      inputUrl - name of the file to be read
      projection - names of the columns to filter; can be omitted but allows for an early projection
    • getInputUrl

      public String getInputUrl()
    • getProjection

      public String[] getProjection()
    • getMetadata

      public org.apache.parquet.hadoop.metadata.ParquetMetadata getMetadata()
    • getSchema

      public org.apache.parquet.schema.MessageType getSchema()
    • createCardinalityEstimator

      public Optional<CardinalityEstimator> createCardinalityEstimator(int outputIndex, Configuration configuration)
      Description copied from interface: ElementaryOperator
      Provide a CardinalityEstimator for the OutputSlot at outputIndex.
      Parameters:
      outputIndex - index of the OutputSlot for that the CardinalityEstimator is requested
      configuration - if the CardinalityEstimator depends on further ones, use this to obtain the latter
      Returns:
      an Optional that might provide the requested instance