Subject: Spark Security


Hello,

This is what happens when I load the data using sparklyr::spark_read_csv()
in R. It creates a "derby.log" file that says something along the lines of:

Sun May 31 14:17:02 EDT 2020:
Booting Derby version The Apache Software Foundation - Apache Derby -
10.12.1.1 - (1704137): instance xxxxxxx
on database directory memory:C:\Users\wseoane\2020-05-31 sparklyr on three
rows\databaseName=metastore_db with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$xxxxxxxxx
Loaded from
file:/C:/Users/wseoane/AppData/Local/spark/spark-2.4.3-bin-hadoop2.7/jars/derby-10.12.1.1.jar
java.vendor=Oracle Corporation
java.runtime.version=1.8.0_241-b07
user.dir=C:\Users\wseoane\2020-05-31 sparklyr on three rows
os.name=Windows 10
os.arch=xxxxx
os.version=10.0
derby.system.home=null
Database Class Loader started - derby.database.classpath=''
I can then click to view details about the Spark connection in my browser
while I have the Spark connection in sparklyr. Here are the results from a
test .tsv file:
Jobs:
[image: Jobs 2020-05-31 142103.png]
SQL:
[image: SQL 2020-05-31 142217.png]
Stages:
[image: Stages 2020-05-31 142217.png]
Storage:
[image: Storage 2020-05-31 142217.png]

So, since sparklyr::spark_read_csv() reads in the data locally and not in
the cloud, security is determined by my company's IT department correct
(i.e. the firewalls that the IT department has in place in the network and
the antivirus software they have installed on my computer and etc.)? If it
were on the cloud, the cloud would need it's own layer of security ("up to
whoever runs the cluster") but that is not relevant here since I am
using sparklyr::spark_read_csv(),
correct?
Thanks,

Wilbert Seoane

On Fri, May 29, 2020 at 3:17 PM Sean Owen <[EMAIL PROTECTED]> wrote: