I would advise against using TestDFSIO, instead trying TeraGen and
TeraValidate. IIRC TestDFSIO doesn't actually schedule for task locality,
so it's not very good if you have a cluster bigger than your replication
factor. You might be network bound as you try to read more files.


On Tue, Nov 4, 2014 at 6:19 AM, Eitan Rosenfeld <[EMAIL PROTECTED]> wrote: