On 2018-04-19 20:43, Ariel Weisberg wrote:

Right, that's how it's done. The component you typically don't get to
control is the client-side local port, but you can bind to a local port
if you want.

> Since it's hashing how do you manipulate which queue packets for a TCP connection go to and how is it made worse by having an accept socket per shard?

It's not made worse, it's just not made better.

There are three ways at least to get multiqueue to work with
thread-per-core without software movement of packets, none of them pretty:

1. The client tells the server which shard to connect to. The server
uses "Flow Director" [1] or an equivalent to bypass the hash and bind
the connection to a particular queue. This is problematic since you need
to bypass the tcp stack, and since there are a limited number of entries
in the flow director table.
2. The client asks the server which shard it happened to connect to.
This requires the client to open many connections in order to reach all
shards, and then close any excess connections (did I mention it wasn't
3. The server communicates the hash function to the client, or perhaps
suggests local ports for the client to use in order to reach a shard.
This can be problematic if the server doesn't know the hash function
(can happen in some virtualized environments, or with new NICs, or with
limited knowledge of the hardware topology). See similar approach in [2].


> You also mention 160 ports as bad, but it doesn't sound like a big number resource wise. Is it an operational headache?

Port 9042 + N can easily conflict with another statically allocated port
on the server. I guess you can listen on ephemeral ports, but then if
you firewall them, you need to adjust the firewall rules.

In any case it doesn't solve the problem of directing a connection's
packets to a specific queue.

> RE tokens distributed amongst shards. The way that would work right now is that each port number appears to be a discrete instance of the server. So you could have shards be actual shards that are simply colocated on the same box, run in the same process, and share resources. I know this pushes more of the complexity into the server vs the driver as the server expects all shards to share some client visible like system tables and certain identifiers.

This has its own problems, I'll address them in the other sub-thread (or
using our term, other continuation).

> Ariel
> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
>> Port-per-shard is likely the easiest option but it's too ugly to
>> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
>> IIRC), it will be just horrible to have 160 open ports.
>> It also doesn't fit will with the NICs ability to automatically
>> distribute packets among cores using multiple queues, so the kernel
>> would have to shuffle those packets around. Much better to have those
>> packets delivered directly to the core that will service them.
>> (also, some protocol changes are needed so the driver knows how tokens