Subject: Sharing ideas on using Databricks Delta Lake

I upgraded my Spark to 2.4.3 that allows using the storage layer Delta Lake
<> . Actually, I wish
Databricks would have chosen a different name for it :)

Anyhow although most example of storage are on normal file system,
(/tmp/<TABLE>), I managed to put data on hdfs itself. I assume this should
work on any Hadoop Compatible File System (HCFS) like GCP buckets etc?

According to the link above:

Delta Lake <> is an open source storage layer
<> that brings reliability to data lakes.
Delta Lake provides ACID transactions, scalable metadata handling, and
unifies streaming and batch data processing. Delta Lake runs on top of your
existing data lake and is fully compatible with Apache Spark APIs.

So in a nutshell with ACID compliance we have got an Oracle type DW on HDFS
with snapshots. So I am thinking loud besides its compatibility with Spark
(which is great), where I can use this product to give me strategic

Also how much functional programming this will support. I gather once you
created  DataFrame on top of storage, windowing analytics etc can be used

I am sure someone can explain this.


Dr Mich Talebzadeh

