R: Monthly bins based on source date ranges -
binning examples r i've found seem assume source data has single date (or date/time). i've discrete start , stop dates user accounts ranging on years 2002-2017. want output counts of number of accounts active, using monthly bins, during overall range 2002-17.
the data in dd/mm/yyyy strings,though change format if needed; rows sorted ascending start date. example
start stop 04/09/2006 23/01/2014 ... 06/07/2008 11/03/2017 ... 30/09/2010 22/04/2016 the resulting counts be, example:
mar 2006 0 jan 2007 1 mar 2011 3 jun 2015 2 sep 2016 1 ...etc. the aim of generating counts plot total active accounts on time. i'm open getting daily counts , aggregating month if easier. i'm stuck @ start though: bin source date range , not single date.
convert columns "yearmon" class , use mapply generate year/months ym covered. count how many of each year/month occur , merge data frame having year/months between jan 2002 , dec 2017 giving m_na , replace nas 0 giving m.
library(zoo) df2 <- transform(df, start = as.yearmon(start), stop = as.yearmon(stop)) ym <- unlist(mapply(seq, df2$start, df2$stop, moreargs = list(by = 1/12))) ag <- aggregate(ym^0, list(ym = as.yearmon(ym)), sum) m_na <- merge(ag, data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = true) m <- transform(m_na, x = replace(x, is.na(x), 0)) plot(x ~ ym, m, type = "h", xlab = "", ylab = "count", xaxt = "n") axis(1, 2002:2017) (continued after image)
magrittr
this expressed magrittr pipeline this:
library(magrittr) library(zoo) m <- df %>% transform(start = as.yearmon(start), stop = as.yearmon(stop)) %$% unlist(mapply(seq, start, stop, moreargs = list(by = 1/12))) %>% { aggregate(.^0, list(ym = as.yearmon(.)), sum) } %>% merge(data.frame(ym = as.yearmon(seq(2002, 2017+11/12, 1/12))), all.y = true) %>% transform(x = replace(x, is.na(x), 0)) note: assume following input date class columns:
lines <- " start stop 04/09/2006 23/01/2014 06/07/2008 11/03/2017 30/09/2010 22/04/2016" df <- read.table(text = lines, header = true) fmt <- "%d/%m/%y" df <- transform(df, start = as.date(start, fmt), stop = as.date(stop, fmt)) 
Comments
Post a Comment