r - Head and tail by group -
i'm re-posting question again after confusion caused on part, apologies that. believe example correct.
sample data:
df <- data.frame(group=rep(c("a","b","c"),c(8,10,8)), size=c(rep(1000,5),rep(0,3),rep(2000,7),rep(0,3),rep(5000,5),rep(0,3)), out=c(rep(0,5),rnorm(3,5,1),rep(0,7),rnorm(3,5,1),rep(0,5),rnorm(3,5,1)), g1=rbinom(26,1,.5),g2=rbinom(26,1,.5),g3=rbinom(26,1,.5)) group size out g1 g2 g3 1 1000 0.000000 0 0 1 2 1000 0.000000 0 1 0 3 1000 0.000000 0 1 0 4 1000 0.000000 0 1 0 5 1000 0.000000 0 0 1 6 0 3.997360 1 1 0 7 0 4.992823 1 0 1 8 0 5.644386 1 1 1 9 b 2000 0.000000 1 1 0 10 b 2000 0.000000 0 1 1 11 b 2000 0.000000 0 0 0 12 b 2000 0.000000 1 0 1 13 b 2000 0.000000 1 1 0 14 b 2000 0.000000 1 0 1 15 b 2000 0.000000 1 1 1 16 b 0 5.247895 1 0 0 17 b 0 5.248148 0 0 1 18 b 0 5.026844 1 1 1 19 c 5000 0.000000 0 0 0 20 c 5000 0.000000 0 1 0 21 c 5000 0.000000 0 1 1 22 c 5000 0.000000 0 0 0 23 c 5000 0.000000 1 0 1 24 c 0 6.532156 1 1 0 25 c 0 5.457338 0 0 0 26 c 0 4.675683 1 1 1
i obtain this:
group size out g1 g2 g3 1 1000 0.000000 1 1 1 6 0 7.276473 0 0 1 9 b 2000 0.000000 0 0 0 16 b 0 5.630425 1 0 0 19 c 5000 0.000000 0 0 0 24 c 0 5.449923 1 0 1
and final output is:
group size out g1 g2 g3 6 0 7.276473 1 1 1 16 b 0 5.630425 0 0 0 24 c 0 5.449923 0 0 0
basically replacing values of g1-g3 in first row (per group) values in second row per group. i'm looking base r solution.
the solution that:
1) select first row per group (row 1) if out==0 , size>0 , select first row per given group out!=0 , size==0 (row 2).
2) replace dummy's g1-g3 first row , replace second row per group.
3) keep last row per group.
here possible (partial) solution:
sol <- with(df, by(df, group, function(x) rbind(head(x[(x$size>0 & x$out==0), ],1),head(x[x$size==0 & x$out!=0, ],1)))) data.frame(do.call(rbind,sol),check.names=false)
in order make reproducible example, when use rngs or sample
, should set.seed()
.
set.seed(5175) df <- data.frame(group=rep(c("a","b","c"),c(8,10,8)), size = c(rep(1000,5),rep(0,3),rep(2000,7),rep(0,3),rep(5000,5),rep(0,3)), out=c(rep(0,5),rnorm(3,5,1),rep(0,7),rnorm(3,5,1),rep(0,5),rnorm(3,5,1)), g1=rbinom(26,1,.5), g2=rbinom(26,1,.5), g3=rbinom(26,1,.5)) fun <- function(x){ <- min(which(x$size > 0 & x$out == 0)) tmp1 <- x[i, ] <- min(which(x$size == 0 & x$out != 0)) tmp2 <- x[i, ] tmp2[, 4:6] <- tmp1[, 4:6] tmp2 } res <- do.call(rbind, lapply(split(df, df$group), fun)) res
Comments
Post a Comment