Spark Scala: Grouping Values In A Key-Value Pair
First, we'll create our data set. We'll notice in our test data set, we have different uni values in various IDs. Ultimately, we'd like to get these outputted in a key-value pair with the name of the column and the value. For this example, we're using Databricks, but if you have the appropriate libraries installed, you can run this in your environment.
import scala.collection
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
val table1 = Seq(
(1,"A","uni1"),
(1,"A","uni2"),
(2,"B","uni1"),
(2,"B","uni2"),
(2,"B","uni3"),
(3,"C","uni1"),
(4,"D","uni1"),
(5,"E","uni1")
).toDF("ID","Letter","Val")
Next, we'll group these Vals as an array in a new column called UniSets. This gets us an array grouping of these values. From here, we want these values to be stored in a key-value pair with the name of the column.
display(
table1
.groupBy($"ID",$"Letter")
.agg(
collect_set($"Val").as("UniSets")
)
)
Finally, we'll group our array with the column names and result in a key-value pair:
display(
table1
.groupBy($"ID",$"Letter")
.agg(
collect_set(struct($"Val")).as("UniSets")
)
)
Leave Spark Scala: Grouping Values In A Key-Value Pair to:
Read more #data posts
Best Posts From SqlinSix
We have not curated any of sqlinsix's posts yet. But you can encourage our curation team to review posts by visiting them regularly and by referring other readers. Because we give priority to frequently read content.
More Posts From SqlinSix
- Solution When ISJSON May Not Be A Recognized Function Name
- Creating a GUID For Each GUID
- Get A Past Bash Command Quickly
- Connecting A Data Factory To An Existing Runtime In Azure
- Backing Up A Database With MongoDump
- SQL Tutorial: Using UNION ALL or UNION and Why
- T-SQL: How To UNION ALL Tables and Why
- SQL Tutorial: Solving Data Differentials With LEFT JOINs Only
- Solving Data Differentials With LEFT JOINs
- Spark Scala: Grouping Values In A Key-Value Pair
- Transaction Log Becoming Full Due To Replication
- Why We Should Be Skeptical About Data Regarding the Physical World
- Does the Same Data Mean the Same Conclusions?
- SQL Tutorial: How To Use ROW NUMBER() and Why
- Not Able To Drop Table Because of Reference
- T-SQL: How To Use DENSE_RANK()
- The Decentralized Myth - Beyond the Hype
- SQL Tutorial: INNER JOINs vs LEFT JOINs
- T-SQL: How To Use PARTITION BY and Why
- Weekly Tech Lesson: CROSS JOINs Versus CROSS APPLY