Subject: writing into oracle database is very slow


I don't believe there is inherently anything slow writing to Oracle.

This piece of code works fine for me ready a csv file from HDFS and writing
it to Oracle table

import java.sql.DriverManager
import java.sql.Connection
import java.sql.DatabaseMetaData
import java.sql.ResultSet
import java.sql.SQLException
import java.util.ArrayList
import org.apache.spark.sql.functions._
import java.sql.{Date, Timestamp}
val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val driverName = "oracle.jdbc.OracleDriver"
var url= "jdbc:oracle:thin:@rhes564:1521:mydb12"
var _username = "scratchpad"
var _password = "oracle"
var _dbschema = "SCRATCHPAD"
var _dbtable = "ANALYTICS"
var e:SQLException = null
var connection:Connection = null
var metadata:DatabaseMetaData = null
println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(),
'dd/MM/yyyy HH:mm:ss.ss') ").collect.foreach(println)
//
// Get a DF first based on Databricks CSV libraries ignore column heading
because of column called "Type"
//
val df = spark.read.option("header",
true).csv("hdfs://rhes75:9000/data/stg/domains/analytics")
//
// Map the columns to names
//
val a = df.select(col("Vendor").as("VENDOR"),
col("product_name").as("PRODUCT_NAME"),col("version").as("VERSION"),
col("release").as("RELEASE"),
     col("edition").as("EDITION"), col("service_pack").as("SERVICE_PACK"),
col("MPLID").as("MPLID"), col("Product Impact").as("PRODUCTIMPACT"),
col("Hardware / Software / OS").as("HARDWARESOFTWAREOS"),
     col("Software Design").as("SOFTWAREDESIGN"), col("Data
Processing").as("DATAPROCESSING"), col("Big Data").as("BIGDATA"),
col("Distributed Ledgers").as("DISTRIBUTEDLEDGERS"),
     col("Analytics").as("ANALYTICS"), col("Distributed
Systems").as("DISTRIBUTEDSYSTEMS"), col("API").as("API"),
col("Compute").as("COMPUTE"), col("Desktop").as("DESKTOP"),
col("UI").as("UI"),
     col("Process Automation").as("PROCESSAUTOMATION"),
col("Security").as("SECURITY"), col("DevOps").as("DEVOPS"),
col("Comments").as("COMMENTS"))
// Define prop
val prop = new java.util.Properties
prop.setProperty("user", _username)
prop.setProperty("password",_password)
//
// populate Oracle table
//
// Check Oracle is accessible
try {
      connection = DriverManager.getConnection(url, _username, _password)
} catch {
  case e: SQLException => e.printStackTrace
  connection.close()
}
metadata = connection.getMetaData()
// Check table exists
var rs:ResultSet = metadata.getTables(null,_dbschema,_dbtable, null)
if (rs.next()) {
   println("Table " + _dbschema+"."+_dbtable + " exists")
} else {
   println("Table " + _dbschema+"."+_dbtable + " does not exist, quitting!")
   connection.close()
   sys.exit(1)
}
a.write.mode("overwrite").jdbc(url, _dbschema+"."+_dbtable, prop)
//
// Test all went OK by looking at some old transactions
//
val s = HiveContext.read.format("jdbc").options(
       Map("url" -> url,
       "dbtable" -> _dbtable,
       "user" -> _username,
       "password" -> _password)).load
s.orderBy('VENDOR,'PRODUCT_NAME).collect.foreach(println)
//
println ("\nFinished at"); spark.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
").collect.foreach(println)
sys.exit()

I have uploaded the Scala code, the CSV example file (source of data) and
the DDL to create Oracle table. Need minor changes to schema name and
Oracle host and Port where the listener running on.

You ought to check where the bottleneck is by looking at Oracle performance
as well. you can use wait_detection.sql to see what is taking the resources
and waits or ask your DBA to do it for you.

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Fri, 19 Apr 2019 at 19:43, Lian Jiang <[EMAIL PROTECTED]> wrote:

import java.sql.Connection
import java.sql.DatabaseMetaData
import java.sql.ResultSet
import java.sql.SQLException
import java.util.ArrayList

import org.apache.spark.sql.functions._
import java.sql.{Date, Timestamp}
val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val driverName = "oracle.jdbc.OracleDriver"
var url= "jdbc:oracle:thin:@rhes564:1521:mydb12"
var _username = "scratchpad"
var _password = "oracle"
var _dbschema = "SCRATCHPAD"
var _dbtable = "ANALYTICS"
var e:SQLException = null
var connection:Connection = null
var metadata:DatabaseMetaData = null

println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') ").collect.foreach(println)
//
// Get a DF first based on Databricks CSV libraries ignore column heading because of column called "Type"
//
val df = spark.read.option("header", true).csv("hdfs://rhes75:9000/data/stg/domains/analytics")
//
// Map the columns to names
//
val a = df.select(col("Vendor").as("VENDOR"), col("product_name").as("PRODUCT_NAME"),col("version").as("VERSION"), col("release").as("RELEASE"),
     col("edition").as("EDITION"), col("service_pack").as("SERVICE_PACK"), col("MPLID").as("MPLID"), col("Product Impact").as("PRODUCTIMPACT"), col("Hardware / Software / OS").as("HARDWARESOFTWAREOS"),
     col("Software Design").as("SOFTWAREDESIGN"), col("Data Processing").as("DATAPROCESSING"), col("Big Data").as("BIGDATA"), col("Distributed Ledgers").as("DISTRIBUTEDLEDGERS"),
     col("Analytics").as("ANALYTICS"), col("Distributed Systems").as("DISTRIBUTEDSYSTEMS"), col("API").as("API"), col("Compute").as("COMPUTE"), col("Desktop").as("DESKTOP"), col("UI").as("UI"),
     col("Process Automation").as("PROCESSAUTOMATION"), col("Security").as("SECURITY"), col("D