Subject: Control Sqoop job from Spark job


Hi,

Just to clarify, JDBC connection to RDBMS from Spark is slow?

This one read from an Oracle table with 4 connections in parallel to Oracle
table assuming there is a primary key on the Oracle tale

//
// Get maxID first
//
val minID = HiveContext.read.format("jdbc").options(Map("url" ->
_ORACLEserver,"dbtable" -> "(SELECT cast(MIN(ID) AS INT) AS minID FROM
scratchpad.dummy)",
       "user" -> _username, "password" ->
_password)).load().collect.apply(0).getDecimal(0).toString
val maxID = HiveContext.read.format("jdbc").options(Map("url" ->
_ORACLEserver,"dbtable" -> "(SELECT cast(MAX(ID) AS INT) AS maxID FROM
scratchpad.dummy)",
       "user" -> _username, "password" ->
_password)).load().collect.apply(0).getDecimal(0).toString
val s = HiveContext.read.format("jdbc").options(
       Map("url" -> _ORACLEserver,
       "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED,
RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
       "partitionColumn" -> "ID",
       "lowerBound" -> minID,
       "upperBound" -> maxID,
       "numPartitions" -> "4",
       "user" -> _username,
       "password" -> _password)).load

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Mon, 2 Sep 2019 at 12:12, Chetan Khatri <[EMAIL PROTECTED]>
wrote: