Hi all,
I recently faced an interesting challenge of having to convert Spark StructType
to MapType and vice-versa in Spark Dataset using Scala.
After spending a good bit of time searching the internet, I could not find a readymade or handy solution to perform this transformation,
so I started digging around and this post describes my findings.
For this article, we will assume that you have a working local Spark installation (see Installing Spache Spark)
and the input data is available on your local disk.
For our input dataset, we will make use of the following employee JSON data.
[
{
"id":"E100",
"employeeName":"Ravi",
"department":{
"departmentId":"D100",
"departmentName":"Software Engineering"
},
"salary":{
"baseSalary":1000,
"currency":"USD"
}
},
{
"id":"E101",
"employeeName":"Alice",
"department":{
"departmentId":"D200",
"departmentName":"Finance"
},
"salary":{
"baseSalary":1000,
"currency":"USD"
}
},
{
"id":"E102",
"employeeName":"Bob",
"department":{
"departmentId":"D300",
"departmentName":"Sales"
},
"salary":{
"baseSalary":1000,
"currency":"USD"
}
}
]
You can read the raw JSON data to a Spark dataset using below code.
valemployees=(spark.read
.option("multiLine","true")// Required for formatted JSON
.json("/tmp/employees.json"))
You can view the inferred schema and print the dataset rows using below code.
Comments
Comments powered by Disqus