r - Using dcast.data.table to transform from long to wide only for subset of column values -


i'm working on market basket analysis project , have data contains variable represents id, , contains 1 item in basket. there ~50k unique items across users, have created simple table below illustration.

library(data.table) dt = data.table(id = c("1","1","1","1","2","2","2"),                 product = c("soda","beer","chips","apple","soda","water","juice")) dt # original data looks 

i using dcast function transform each product it's own column binary values, indicating part of order.

dcast.data.table(dt, id ~ product, function(x) 1, fill = 0)  

as mentioned, cannot use method on entire dataset due memory limitations (since create 50k columns each product). so, trying find solution dcast creates "product" columns based on items contained within id ==1 (meaning, columns "juice" , water" excluded). also, working large dataset of 34mm observations, looking efficient solution data.table api can leveraged , trying avoid looping through products. hope question clear. thank you.

this works:

dcast(dt, id + product ~ factor(product, levels=product[id==1]), fun.agg=length)  using 'product' value column. use 'value.var' override    id product soda beer chips apple na 1:  1   apple    0    0     0     1  0 2:  1    beer    0    1     0     0  0 3:  1   chips    0    0     1     0  0 4:  1    soda    1    0     0     0  0 5:  2   juice    0    0     0     0  1 6:  2    soda    1    0     0     0  0 7:  2   water    0    0     0     0  1 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -