Subject: Spark SaveMode


I dug some of my old stuff using Spark as ETL.

Regarding the question

"Any reason why Spark's SaveMode doesn't have mode that ignore any Primary
Key/Unique constraint violations?"

There is no way Spark can determine if PK constraint is violated until it
receives such message from Oracle through JDBC connection. In general
SaveMode is implanted as follows:

import org.apache.spark.sql.SaveMode

val saveMode = SaveMode.Append

Only SaveMode Overwrite or Append seem to work. The other mode like Ignore
etc do not work.

However, one can exclude records that already exist in Oracle through
reading the Oracle table in like my previous mail and excluding records
that already have PK in Oracle. This can be done through SQL itself by
creating tempView on top of your Oracle DF and Cassandra DF). Again ID is
PK constraint on the Oracle table

// find out IDs that do not exist (i.e. new records). FYI, dfdummy2 is your
Cassandra DF and s is your Oracle DF

dfdummy2.createOrReplaceTempView("dfdummy2")
s.createOrReplaceTempView("s")
//Create an Outer join between two DFs in SQL
var sqltext = """select dfdummy2.ID, CLUSTERED, SCATTERED, RANDOMISED,
RANDOM_STRING, SMALL_VC, PADDING FROM dfdummy2 LEFT OUTER JOIN s ON
dfdummy2.ID = s.ID WHERE s.ID IS NULL ORDER BY dfdummy2.ID"""
sql(sqltext).count()

// write the RS to Oracle table

// Put new data into Oracle table
val connectionProperties = new Properties
connectionProperties.put("user", _username)
connectionProperties.put("password", _password)
connectionProperties.put("jdbUrl", _ORACLEserver)
connectionProperties.put("jdbcDriver", driverName)

//broadcast jdbc connection parameters to cluster nodes
val brConnect = sc.broadcast(connectionProperties)

val saveMode = SaveMode.Append
sql(sqltext).write.mode(saveMode).jdbc(_ORACLEserver,
_dbschema+"."+_dbtable, connectionProperties)

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Sat, 20 Jul 2019 at 08:13, Mich Talebzadeh <[EMAIL PROTECTED]>
wrote: