r - Find the 'RARE' user based on occurrence in two columns -


this data frame looks : data of song portal(like itunes or raaga)

datf <- read.csv(text = "albumid,date_transaction,listened_time_secs,userid,songid 6263,3/28/2017,59,3747,6263 3691,4/24/2017,53,2417,3691 2222,3/24/2017,34,2417,9856 1924,3/16/2017,19,8514,1924 6691,1/1/2017,50,2186,6691 5195,1/1/2017,64,2186,5195 2179,1/1/2017,37,2186,2179 6652,1/11/2017,33,1145,6652") 

my aim pick out rare user. 'rare' user 1 visits portal not more once in each calendar month.

for e.g : 2186 not rare. 2417 rare because occurred once in 2 diff months, 3747,1145 , 8514.

i've been trying :

duplicateusers <- duplicated(songsdata[,2:4]) duplicateusers <- songsdata[duplicateusers,]   distinctsongs <- songsdata %>%   distinct(sessionid, date_transaction, .keep_all = true)  rareusers <- anti_join(distinctsongs, duplicateusers, by='sessionid') 

but doesn't seem work.

using library(dplyr) this:

# make new monthid variable group_by() songdata$month_id <- gsub("\\/.*", "", songdata$date_transaction)  rareusers <- group_by(songdata, userid, month_id) %>%     filter(n() == 1)  rareusers # tibble: 5 x 6 # groups:   userid, month_id [5]   albumid date_transaction listened_time_secs userid songid month_id     <int>            <chr>              <int>  <int>  <int>    <chr> 1    6263        3/28/2017                 59   3747   6263        3 2    3691        4/24/2017                 53   2417   3691        4 3    2222        3/24/2017                 34   2417   9856        3 4    1924        3/16/2017                 19   8514   1924        3 5    6652        1/11/2017                 33   1145   6652        1 

Comments

Popular posts from this blog

python - Operations inside variables -

Generic Map Parameter java -

arrays - What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? -