I would advise against using TestDFSIO, instead trying TeraGen and
TeraValidate. IIRC TestDFSIO doesn't actually schedule for task locality,
so it's not very good if you have a cluster bigger than your replication
factor. You might be network bound as you try to read more files.

Best,
Andrew

On Tue, Nov 4, 2014 at 6:19 AM, Eitan Rosenfeld <[EMAIL PROTECTED]> wrote: