Subject: syntheticcontroldata clustering example failure due to combiner


Good to hear. The current implementation is actually the first one I
did, so it was easy to revert to that model. It does require the mapper
to retain all of the canopies; however, and this could create an OOME if
the T values are poorly chosen. Doing the centroid calculation in the
combiner removed this difficulty but the Hadoop semantics change makes
it a non-starter. If there was some globally-unique way to create new
cluster identifiers as they are needed, the centroid calculation could
be moved to the reducer. There would still be a need to combine the
clusters created by each of the mappers...

Jeff
Adil Aijaz wrote: