> Also, if i have a bunch of new documents to fold-in, it looks like i'd need

> to run a matrix multiplication job between new document vectors and V, both

> matrices represented row-wise. So DistributedRowMatrix should help me,

> shouldn't it? do i need to transpose the first matrix first?

>

If you have a dense matrix V of eigenvectors (ie, it has K (a small number

like 100's) rows of dense vectors, each of which are cardinality M (which

may large)), which is a DistributedRowMatrix, and you have your original

document matrix C, which has N rows, each of which has cardinality M, then

you actually need to take the transpose of *both* matrices, then take

the DistributedRowMatrix.times() on these:

V_transpose = V.transpose();

C_transpose = C.transpose();

C_times_V_transpose = C_transpose.times(V_transpose);

This code will yield the mathematical result of C * V^T, which is probably

what you want.

(it turns out that this set of operations could also be done in a custom

operation

using the row-paths of both V and C as inputs, but you'd still require two

MapReduce shuffles to get the answer, so it's not really a savings to do

this).

