I upgraded my Spark to 2.4.3 that allows using the storage layer Delta Lake
. Actually, I wish
Databricks would have chosen a different name for it :)
Anyhow although most example of storage are on normal file system,
(/tmp/<TABLE>), I managed to put data on hdfs itself. I assume this should
work on any Hadoop Compatible File System (HCFS) like GCP buckets etc?
According to the link above:
Delta Lake <https://delta.io/>
is an open source storage layer
that brings reliability to data lakes.
Delta Lake provides ACID transactions, scalable metadata handling, and
unifies streaming and batch data processing. Delta Lake runs on top of your
existing data lake and is fully compatible with Apache Spark APIs.
So in a nutshell with ACID compliance we have got an Oracle type DW on HDFS
with snapshots. So I am thinking loud besides its compatibility with Spark
(which is great), where I can use this product to give me strategic
Also how much functional programming this will support. I gather once you
created DataFrame on top of storage, windowing analytics etc can be used
I am sure someone can explain this.
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.