python - Combine multiple columns in Pandas excluding NaNs -
my sample df has 4 columns nan
values. goal concatenate rows while excluding nan
values.
import pandas pd import numpy np df = pd.dataframe({'keywords_0':["a", np.nan, "c"], 'keywords_1':["d", "e", np.nan], 'keywords_2':[np.nan, np.nan, "b"], 'keywords_3':["f", np.nan, "g"]}) keywords_0 keywords_1 keywords_2 keywords_3 0 d nan f 1 nan e nan nan 2 c nan b g
want accomplish following:
keywords_0 keywords_1 keywords_2 keywords_3 keywords_all 0 d nan f a,d,f 1 nan e nan nan e 2 c nan b g c,b,g
pseudo code:
cols = [df.keywords_0, df.keywords_1, df.keywords_2, df.keywords_3] df["keywords_all"] = df["keywords_all"].apply(lambda cols: ",".join(cols), axis=1)
i know can use ",".join()
exact result, unsure how pass column names function.
you can apply ",".join()
on each row passing axis=1
apply method. first need drop nans though. otherwise typeerror.
df.apply(lambda x: ','.join(x.dropna()), axis=1) out: 0 a,d,f 1 e 2 c,b,g dtype: object
you can assign original dataframe with
df["keywords_all"] = df.apply(lambda x: ','.join(x.dropna()), axis=1)
or if want specify columns did in question:
cols = ['keywords_0', 'keywords_1', 'keywords_2', 'keywords_3'] df["keywords_all"] = df[cols].apply(lambda x: ','.join(x.dropna()), axis=1)
Comments
Post a Comment