Simple answer
20 regions/server & <2000 regions/cluster is a good rule of thumb if you
can't profile your workload yet.  You really want to ensure that

1) You need to limits the regions/cluster so the master can have a
reasonable startup time & can handle all the region state transitions via
ZK.  Most bigger companies are running 2,000 in production and achieve
reasonable startup times (< 2 minutes for region assignment on cold
start).  If you want to test the scalability of that algorithm beyond what
other companies need, admin beware.
2) The more regions/server you have, the faster that recovery can happen
after RS death because you can currently parallelize recovery on a
region-granularity.  Too many regions/server and #1 starts to be a problem.

Complicated answer
More information is optimize this formula.  Additional considerations:

1) Are you IO-bound or CPU-bound
2) What is your grid topology like
3) What is your network hardware like
4) How many disks (not just size)
5) What is the data locality between RegionServer & DataNode

In the Facebook case, we have 5 racks with 20 nodes each.  Servers in the
rack are connected by 1G Eth to a switch with a 10G uplink.  We are
network bound.  Our saturation point is mostly commonly on the top-of-rack
switch.  With 20 regions/server, we can roughly parallelize our
distributed log splitting within a single rack on RS death (although 2
regions do split off-rack).  This minimizes top-of-rack traffic and
optimized our recovery time.  Even if you are CPU-bound, log splitting
(hence recovery time) is an IO-bound operation.  A lot of our work on
region assignment is about maximizing data locality, even on RS death, so
we avoid top-of-rack saturation.
On 11/1/11 10:54 AM, "Sujee Maniyam" <[EMAIL PROTECTED]> wrote: