Subject: Spark dataset to explode json string


Hi Richard,

You can use the following to read JSON data into DF. The example is reading
JSON from Kafka topic

          val sc = spark.sparkContext
         import spark.implicits._
         // Use map to create the new RDD using the value portion of the
pair.
         val jsonRDD = pricesRDD.map(x => x._2)
         // Create DataFrame from jsonRDD
         val jsonDF = sqlContext.read.json(jsonRDD)

This is an example of reading a MongoDB document into Spark

dfrddMongoDB.printSchema
/*
root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- operation: struct (nullable = true)
 |    |-- op_type: integer (nullable = true)
 |    |-- op_time: string (nullable = true)
 |-- priceInfo: struct (nullable = true)
 |    |-- key: string (nullable = true)
 |    |-- ticker: string (nullable = true)
 |    |-- timeissued: string (nullable = true)
 |    |-- price: double (nullable = true)
 |    |-- currency: string (nullable = true)
// one example of mongo document from mongo collection
{
    "_id" : ObjectId("5cae4fa25d8b5279db785b43"),
    "priceInfo" : {
        "key" : "2ca8de24-eaf3-40d4-b0ef-c8b56534ceb5",
        "ticker" : "ORCL",
        "timeissued" : "2019-04-10T21:20:57",
        "price" : 41.13,
        "currency" : "GBP"
    },
    "operation" : {
        "op_type" : NumberInt(1),
        "op_time" : "1554927506012"
    }
}
*/
// Flatten the structs
val df = dfrddMongoDB.
               select(
                        'priceInfo.getItem("key").as("key")
                      , 'priceInfo.getItem("ticker").as("ticker")
                      , 'priceInfo.getItem("timeissued").as("timeissued")
                      , 'priceInfo.getItem("price").as("price")
                      , 'priceInfo.getItem("currency").as("currency")
                      , 'operation.getItem("op_type").as("op_type")
                      , 'operation.getItem("op_time").as("op_time")
                     )

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Fri, 19 Jul 2019 at 21:48, Richard <[EMAIL PROTECTED]> wrote: