R: Monthly bins based on source date ranges -


binning examples r i've found seem assume source data has single date (or date/time). i've discrete start , stop dates user accounts ranging on years 2002-2017. want output counts of number of accounts active, using monthly bins, during overall range 2002-17.

the data in dd/mm/yyyy strings,though change format if needed; rows sorted ascending start date. example

start       stop 04/09/2006  23/01/2014 ... 06/07/2008  11/03/2017 ... 30/09/2010  22/04/2016 

the resulting counts be, example:

mar 2006    0 jan 2007    1 mar 2011    3 jun 2015    2 sep 2016    1 ...etc. 

the aim of generating counts plot total active accounts on time. i'm open getting daily counts , aggregating month if easier. i'm stuck @ start though: bin source date range , not single date.

convert columns "yearmon" class , use mapply generate year/months ym covered. count how many of each year/month occur , merge data frame having year/months between jan 2002 , dec 2017 giving m_na , replace nas 0 giving m.

library(zoo)  df2 <- transform(df, start = as.yearmon(start), stop = as.yearmon(stop))  ym <- unlist(mapply(seq, df2$start, df2$stop, moreargs = list(by = 1/12))) ag <- aggregate(ym^0, list(ym = as.yearmon(ym)), sum)  m_na <- merge(ag, data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = true) m <- transform(m_na, x = replace(x, is.na(x), 0))   plot(x ~ ym, m, type = "h", xlab = "", ylab = "count", xaxt = "n") axis(1, 2002:2017) 

(continued after image)

screenshot

magrittr

this expressed magrittr pipeline this:

library(magrittr) library(zoo)  m <- df %>%    transform(start = as.yearmon(start), stop = as.yearmon(stop)) %$%    unlist(mapply(seq, start, stop, moreargs = list(by = 1/12))) %>%    { aggregate(.^0, list(ym = as.yearmon(.)), sum) } %>%    merge(data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = true) %>%    transform(x = replace(x, is.na(x), 0)) 

note: assume following input date class columns:

lines <- " start       stop 04/09/2006  23/01/2014 06/07/2008  11/03/2017 30/09/2010  22/04/2016" df <- read.table(text = lines, header = true) fmt <- "%d/%m/%y" df <- transform(df, start = as.date(start, fmt), stop = as.date(stop, fmt)) 

Comments