Subject: Masking username in Spark with regexp_replace and reverse functions


I am looking at Description column of a bank statement (CSV download) that
has the following format

scala> account_table.printSchema
 |-- TransactionDate: date (nullable = true)
 |-- TransactionType: string (nullable = true)
 |-- Description: string (nullable = true)
 |-- Value: double (nullable = true)
 |-- Balance: double (nullable = true)
 |-- AccountName: string (nullable = true)
 |-- AccountNumber: string (nullable = true)

The column description for BACS payments contains the name of the
individual who paid into the third party account. I need to mask the name
but cannot simply use a literal as below for all contents of descriptions

f1.withColumn("Description", lit("*** Masked
***")).select('"Who paid")

So I try the following combination'Description, ",",
1),2,50)).as("name in clear"),
"^['A-Z]", "XX"),2,6),"[A-F]","X")," ","X"),"[,]","R")).as("Masked")).show
|          in clear|Masked|
|           SOLTA A|XTLOSX|

This seems to work as it not only masks the name but also makes it
consistent for all names (in other words, the same username gets the same

Are there any better alternatives?


Dr Mich Talebzadeh

LinkedIn *
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.