Subject: Answers to recent questions on Hive on Spark


Hi,

 

As I mentioned that parameter does not seem to work I am afraid!

 

hive> set hive.spark.client.server.address=50.140.197.217;

Query returned non-zero code: 1, cause: hive configuration hive.spark.client.server.address does not exists.

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Xuefu Zhang [mailto:[EMAIL PROTECTED]]
Sent: 28 November 2015 20:53
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Answers to recent questions on Hive on Spark

 

You should be able to set that property as any other Hive property: just do "set hive.spark.client.server.address=xxx;" before you start a query. Make sure that you can reach this server address from your nodemanager nodes because they are where the remote driver runs. The driver needs to connect back to HS2. Sometimes firewall may blocks the access, causing the error you seen.

Thanks,

Xuefu

 

On Sat, Nov 28, 2015 at 9:33 AM, Mich Talebzadeh <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > wrote:

Hi Xuefu,

 

Thanks for the response. I did the changes as requested (coping the assembly jar file from build to $HIVE_HOME/lib). I will give full response when I get the debug outpout

 

In summary when I ran the sql query from Hive and expected Spark to act as execution engine, it came back with client connection error.

 

Cruically I noticed that it was trying to connect to eth1 (the internet connection) as opposed to eth0 (the local network. This host has two Ethernet cards one for local area network and the other for linternet (directly no proxy)

 

It suggested that I can change the address using the configuration parameter hive.spark.client.server.address

 

Now I don’t seem to be able to set it up in hive-site.xml or as a set parameter in hive prompt itself!

 

Any hint would be appreciated or any work around?

 

Regards,

 

Mich

 

From: Xuefu Zhang [mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> ]
Sent: 28 November 2015 04:35
To: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Subject: Re: Answers to recent questions on Hive on Spark

 

Okay. I think I know what problem you have now. To run Hive on Spark, spark-assembly.jar is needed and it's also recommended that you have a spark installation (identified by spark.home) on the same host where HS2 is running. You only need spark-assembly.jar in HS2's /lib directory. Other than those, Hive on Spark doesn't have any other dependency at service level. On the job level, Hive on Spark jobs of course run on a spark cluster, which could be standalone, yarn-cluster, etc. However, how you get the binaries for your spark cluster and how you start them is completely independent of Hive.

Thus, you only need to build the spark-assembly.jar w/o HIve and put it in Hive's /lib directory. The one in the existing spark build may contain Hive classes and that's why you need to build your own. Your spark installation can still have a jar that's different from what you build for Hive on Spark. Your spark.home can still point to your existing spark installation. In fact, Hive on Spark only needs spark-submit from your Spark installation. Therefore, you should be okay even if your spark installation contains Hive classes.

By following this, I'm sure you will get your Hive on Spark to work. Depending on the Hive version that your spark installation contains, you may have problem with spark applications such as SparkSQL, but it shouldn't be a concern if you decide that you use Hive in Hive.

Let me know if you are still confused.

Thanks,

Xuefu

 

On Fri, Nov 27, 2015 at 4:34 PM, Mich Talebzadeh <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > wrote:

Hi,

 

Thanks for heads up and comments.

 

Sounds like when it comes to using spark as the execution engine for Hive, we are in no man’s land so to speak. I have opened questions in both Hive and Spark user forums. Not much of luck for reasons that you alluded to.

 

Ok just to clarify the prebuild version of spark (as opposed get the source code and build your spec) works fine for me.

 

Components are

 

hadoop version

Hadoop 2.6.0

 

hive --version

Hive 1.2.1

 

Spark

version 1.5.2

 

It does what it says on the tin. For example I can start the master node OK start-master.sh.

 

 

Spark Command: /usr/java/latest/bin/java -cp /usr/lib/spark_1.5.2_bin/sbin/../conf/:/usr/lib/spark_1.5.2_bin/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-rdbms-3.2.9.jar:/home/hduser/hadoop-2.6.0/etc/hadoop/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 127.0.0.1 --port 7077 --webui-port 8080

======================