apache spark sql - How to split a list to multiple columns in Pyspark? -
i have:
key value [1,2,3] b [2,3,4]
i want:
key value1 value2 value3 1 2 3 b 2 3 4
it seems in scala can write:df.select($"value._1", $"value._2", $"value._3")
, not possible in python.
so there way this?
from depends on type of "list":
if of type arraytype():
df = hc.createdataframe(sc.parallelize([['a', [1,2,3]], ['b', [2,3,4]]]), ["key", "value"]) df.printschema() df.show() root |-- key: string (nullable = true) |-- value: array (nullable = true) | |-- element: long (containsnull = true)
you can access values python using
[]
:df.select("key", df.value[0], df.value[1], df.value[2]).show() +---+--------+--------+--------+ |key|value[0]|value[1]|value[2]| +---+--------+--------+--------+ | a| 1| 2| 3| | b| 2| 3| 4| +---+--------+--------+--------+ +---+-------+ |key| value| +---+-------+ | a|[1,2,3]| | b|[2,3,4]| +---+-------+
if of type structtype(): (maybe built dataframe reading json)
df2 = df.select("key", psf.struct( df.value[0].alias("value1"), df.value[1].alias("value2"), df.value[2].alias("value3") ).alias("value")).show() df2.printschema() df2.show() root |-- key: string (nullable = true) |-- value: struct (nullable = false) | |-- value1: long (nullable = true) | |-- value2: long (nullable = true) | |-- value3: long (nullable = true) +---+-------+ |key| value| +---+-------+ | a|[1,2,3]| | b|[2,3,4]| +---+-------+
you can directly 'split' column directly using
*
:df2.select('key', 'value.*').show() +---+------+------+------+ |key|value1|value2|value3| +---+------+------+------+ | a| 1| 2| 3| | b| 2| 3| 4| +---+------+------+------+
Comments
Post a Comment