Do you need to perform any joins across the files (e.g.
If not, you could structure your pipeline
ReadFromFileA --> Reshuffle(optional) --> CopyOfPipelineA
ReadFromFileB --> Reshuffle(optional) --> CopyOfPipelineB
ReadFromFileC --> Reshuffle(optional) --> CopyOfPipelineC
and then run it as a batch pipeline.
You can set --streaming=true on the pipeline and then it will run in a
streaming mode but streaming prioritizes low latency and correctness on
Google Cloud Dataflow so it will cost more to run your pipeline then in
batch mode. It may make more sense to store the data uncompressed as it may
be less expensive then paying the additional compute cost for streaming.
*From: *Allie Chen <[EMAIL PROTECTED]>
*Date: *Tue, May 14, 2019 at 7:38 AM
*To: * <[EMAIL PROTECTED]>
Is it possible to use windowing or somehow pretend it is streaming so