How to replace nulls with empty string ("") in Apache spark using scala -
this question has answer here:
i working huge datasets (contains 332 fields) in apache spark scala ( except 1 field, remaining 331 can null) of around 10m records. replace null blank string (""). best way achieve have huge number of fields? want handle nulls while importing data set safe while performing transformations or exporting df. have created case class 332 fields, best way handle these nulls? can use option(field).getorelse(""), guess it's not best way have huge number of fields. thank you!!
we can use udf
safe column this
val df = seq((1,"hello"), (2,"world"), (3,null)).todf("id", "name") val safestring: string => string = s => if (s == null) "" else s val udfsafestring = udf(safestring) val dfsafe = df.select($"id", udfsafestring($"name").alias("name")) dfsafe.show
if have lots of columns, , 1 of columns key column. can this.
val safecols = df.columns.map(colname => if (colname == "id") col(colname) else udfsafestring(col(colname)).alias(colname)) val dfsafe = df.select(safecols:_*) dfsafe.show
Comments
Post a Comment