r - Comparing between groups in grouped dataframe -


i trying perform comparison between items in subsequent groups in dataframe - guess pretty easy when know doing...

my data set can represented follows:

set.seed(1) data <- data.frame(  date = c(rep('2015-02-01',15), rep('2015-02-02',16), rep('2015-02-03',15)),  id = as.character(c(1005 + sample.int(10,15,replace=true), 1005 + sample.int(10,16,replace=true), 1005 + sample.int(10,15,replace=true))) ) 

which yields dataframe looks like:

date    id 1/02/2015   1008 1/02/2015   1009 1/02/2015   1011 1/02/2015   1015 1/02/2015   1008 1/02/2015   1014 1/02/2015   1015 1/02/2015   1012 1/02/2015   1012 1/02/2015   1006 1/02/2015   1008 1/02/2015   1007 1/02/2015   1012 1/02/2015   1009 1/02/2015   1013 2/02/2015   1010 2/02/2015   1013 2/02/2015   1015 2/02/2015   1009 2/02/2015   1013 2/02/2015   1015 2/02/2015   1008 2/02/2015   1012 2/02/2015   1007 2/02/2015   1008 2/02/2015   1009 2/02/2015   1006 2/02/2015   1009 2/02/2015   1014 2/02/2015   1009 2/02/2015   1010 3/02/2015   1011 3/02/2015   1010 3/02/2015   1007 3/02/2015   1014 3/02/2015   1012 3/02/2015   1013 3/02/2015   1007 3/02/2015   1013 3/02/2015   1010 

then want group data date (group_by) , filter out duplicates (distinct) before comparing between groups. want determine day day new id's added , id's leave. day 1 , day 2 compared determine id's in day 2 not in day 1 , id's in day 1 not present in day 2, same comparisons between day 2 , day 3 etc.
comparison can done using anti_join (dplyr) don't know how reference individual groups in dataset.

my attempt (or 1 of attempts) looks like:

data %>%   group_by(date) %>%   distinct(id) %>%   do(lost = anti_join(., lag(.), by="id")) 

but of course not work, get:

error in anti_join_impl(x, y, by$x, by$y) : can't join on 'id' x 'id' because of incompatible types (factor / logical) 

is attempting possible or should looking @ writing clunky function it?

just add input stringsasfactors = false dataframe. make code run: although not sure whether outputted result 1 looking for. view whole result, pipe data.frame , see whether looking for. hope helps.

 set.seed(1)  data <- data.frame(     date = c(rep('2015-02-01',15), rep('2015-02-02',16), rep('2015-02-3',15)),     id = as.character(c(1005 + sample.int(10,15,replace=true), 1005 + sample.int(10,16,replace=true), 1005 + sample.int(10,15,replace=true))),stringsasfactors = false)   data %>%   group_by(date) %>%   distinct(id) %>%   do(lost = anti_join(., lag(.), by="id"))%>%data.frame() 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -