python - convert pandas df to multi-dimensional numpy array -
i have sparse data in pandas dataframe 25million+ records. has converted multi dimensional numpy array. have written straightforward way using for
loop, , wondering if there more efficient way.
import numpy np import pandas pd facts_pd = pd.dataframe.from_records(columns=['name','offset','code'], data=[('john', -928, 'dx_434'), ('steve',-757,'dx_5859'), ('jack',-800,'dx_250'), ('john',-919,'dx_401'),('john',-956,'dx_5859')]) name_lu = pd.dataframe(sorted(facts_pd['name'].unique()), columns=['name']) name_lu["nameid"] = name_lu.index offset_lu = pd.dataframe(sorted(facts_pd['offset'].unique(), reverse=true), columns=['offset']) offset_lu["offsetid"] = offset_lu.index code_lu = pd.dataframe(sorted(facts_pd['code'].unique()), columns=['code']) code_lu["codeid"] = code_lu.index facts_pd = pd.merge(pd.merge(pd.merge(facts_pd, name_lu, how="left", on="name") , offset_lu, how="left", on="offset"), code_lu, how="left", on="code") facts_pd.drop(["name","offset","code"], inplace=true, axis=1) facts_np = np.zeros((len(name_lu),len(offset_lu),len(code_lu))) row in facts_pd.iterrows(): i,j,k = row[1] facts_np[i][j][k] = 1
the command looking dataframe.as_matrix()
return numpy array , not matrix despite command says here man pages it.
here stack overflow topic on use of well
Comments
Post a Comment