Thank Lukasz. Unfortunately, decompressing the files is not an option for
I am trying to speed up Reshuffle step, since it waits for all data. Here
are two ways I have tried:

1.  add timestamps to the PCollection's elements after reading (since it is
bounded source), then apply windowing before Reshuffle, but it still waits
all data.
2.  run the pipeline with --streaming flag, but it leads to an error:
Workflow failed. Causes: Expected custom source to have non-zero number of
splits. Also, I found in

*DataflowRunner does not currently support the following Cloud Dataflow
specific features with Python streaming execution.*


   *Streaming autoscaling*

I doubt whether this approach can solve my issue.
Thanks so much!


Do you need to perform any joins across the files (e.g.