r - Using dcast.data.table to transform from long to wide only for subset of column values -
i'm working on market basket analysis project , have data contains variable represents id, , contains 1 item in basket. there ~50k unique items across users, have created simple table below illustration.
library(data.table) dt = data.table(id = c("1","1","1","1","2","2","2"), product = c("soda","beer","chips","apple","soda","water","juice")) dt # original data looks
i using dcast function transform each product it's own column binary values, indicating part of order.
dcast.data.table(dt, id ~ product, function(x) 1, fill = 0)
as mentioned, cannot use method on entire dataset due memory limitations (since create 50k columns each product). so, trying find solution dcast creates "product" columns based on items contained within id ==1 (meaning, columns "juice" , water" excluded). also, working large dataset of 34mm observations, looking efficient solution data.table api can leveraged , trying avoid looping through products. hope question clear. thank you.
this works:
dcast(dt, id + product ~ factor(product, levels=product[id==1]), fun.agg=length) using 'product' value column. use 'value.var' override id product soda beer chips apple na 1: 1 apple 0 0 0 1 0 2: 1 beer 0 1 0 0 0 3: 1 chips 0 0 1 0 0 4: 1 soda 1 0 0 0 0 5: 2 juice 0 0 0 0 1 6: 2 soda 1 0 0 0 0 7: 2 water 0 0 0 0 1
Comments
Post a Comment