r - Find the 'RARE' user based on occurrence in two columns -
this data frame looks : data of song portal(like itunes or raaga)
datf <- read.csv(text = "albumid,date_transaction,listened_time_secs,userid,songid 6263,3/28/2017,59,3747,6263 3691,4/24/2017,53,2417,3691 2222,3/24/2017,34,2417,9856 1924,3/16/2017,19,8514,1924 6691,1/1/2017,50,2186,6691 5195,1/1/2017,64,2186,5195 2179,1/1/2017,37,2186,2179 6652,1/11/2017,33,1145,6652") my aim pick out rare user. 'rare' user 1 visits portal not more once in each calendar month.
for e.g : 2186 not rare. 2417 rare because occurred once in 2 diff months, 3747,1145 , 8514.
i've been trying :
duplicateusers <- duplicated(songsdata[,2:4]) duplicateusers <- songsdata[duplicateusers,] distinctsongs <- songsdata %>% distinct(sessionid, date_transaction, .keep_all = true) rareusers <- anti_join(distinctsongs, duplicateusers, by='sessionid') but doesn't seem work.
using library(dplyr) this:
# make new monthid variable group_by() songdata$month_id <- gsub("\\/.*", "", songdata$date_transaction) rareusers <- group_by(songdata, userid, month_id) %>% filter(n() == 1) rareusers # tibble: 5 x 6 # groups: userid, month_id [5] albumid date_transaction listened_time_secs userid songid month_id <int> <chr> <int> <int> <int> <chr> 1 6263 3/28/2017 59 3747 6263 3 2 3691 4/24/2017 53 2417 3691 4 3 2222 3/24/2017 34 2417 9856 3 4 1924 3/16/2017 19 8514 1924 3 5 6652 1/11/2017 33 1145 6652 1
Comments
Post a Comment