Thanks maximiliano so much for responding, I didn't want this discussion to disappear in the wilderness of dev emails :), here's what I would like to see or contribute to for model serving within spark, first of I want to be clear on what we mean by model serving so I'll add my interpretation of the definition here: model serving is the ability to discover what models exist through the use of a model repository and serve up the contents of a particular model for invocation/consumption, before we dive into the details you specify below is this the definition that people have in mind. Finally as I mentioned earlier when I'm thinking about models I'm initially targeting deep/machine learning models but eventually models requiring lots of compute or I/O frequently present in Operations Research and other worlds.
Given the above I feel like we also need a more robustified (nice word huh) version of Livy , something that discovers and serves up any model for downstream computation in addition to hooking it up to zeppelin or some other downstream viz engine.
Would love to hear thoughts.
From: Maximiliano Felice <[EMAIL PROTECTED]>
Sent: Thursday, July 12, 2018 11:52 AM
To: Saikat Kanjilal; Holden Karau
Subject: Re: Spark model serving
As I know many of you don't read / are not part of the user list. I'll make a summary of what happened at the summit:
We discussed some needs we get in order to start serving our predictions with Spark. We mostly talked about alternatives to this work and what we could expect in these areas.
I'm going to share mine here, hoping it will trigger further discussion. We currently:
* Use Spark as an ETL tool, followed by
* a Python (numpy/pandas based) pipeline to preprocess information and
* use Tensorflow for training our Neural Networks
What we'd love to, and why we don't:
* Start using Spark for our full preprocessing pipeline. Because type safety. And distributed computation. And catalyst. Buy mainly because not-python.
Our main issue:
* We want to use the same code for online serving. We're not willing to duplicate the preprocessing operations. Spark is not serving-friendly.
* If we want it to preprocess online, we need to copy/paste our custom transformations to MLeap.
* It's an issue to communicate with a Tensorflow API to give it the preprocessed data to serve.
* Use Spark to do hyperparameter tunning.
* GPU Integration with Spark, letting us achieve finer tuning.
* Better TensorFlow integration
Now that I'm on the @dev, do you think that any of this issues could be addressed? We talked at the summit about PFA (Portable Format for Analytics) and how we would expect it to cover some issues. Another discussion I remember was about encoding operations (functions/lambdas) in PFA itself. And I don't remember having smoked anything at that point, although we could as well have.
Oh, and @Holden Karau<mailto:[EMAIL PROTECTED]> insisted that she would be much happier with us if we started helping with code reviews. I'm willing to make some time for that.
Sorry again for the delay in replying to this email (and now sorry for the length), looking forward to following up on this topic
El mar., 3 jul. 2018 a las 15:37, Saikat Kanjilal (<[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>) escribió:
Ping, would love to hear back on this.
From: Saikat Kanjilal <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Sent: Tuesday, June 26, 2018 7:27 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Spark model serving
HoldenK and interested folks,
Am just following up on the spark model serving discussions as this is highly relevant to what I’m embarking on at work. Is there a concrete list of next steps or can someone summarize what was discussed at the summit , would love to have a Seattle version of this discussion with some folks.
Look forward to hearing back and driving this.
Sent from my iPhone