r - Filter, Subset, or Select Repeated IDs for Different Time Entries in a Dataframe -
this question has answer here:
i have timeseries data values time indices not others. i need way filter observations occur in both time indices.
here's reproducible example illustrates problem. in final graph want observations of type == a
, occur in both time indices.
set.seed(1005) mydat <- data.frame( id = c('a1', 'a2', 'a3', 'a4', 'a5', 'a1', 'a2', 'a5', 'a12', 'a13'), year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001), result = rnorm(10, mean = 20, sd = 10), type = c('a','a','b','b','a', 'a', 'a', 'a', 'b', 'b')) mydat %>% ggplot(aes(x = year, y = result)) + geom_point(aes(color = type)) + geom_line(aes(group = id))
note: should mention column type
not exist in original dataset. created toy dataset type
column show points want rid of in blue.
solutions should independent of type
column, or alternatively, show how generate type
column without hard-coding it.
you can find the, okay let's called them repeated, repeated ids 2 time entries , mark them type == a
.
using reshape
:
you can reshape data wide format , remove ones na
means don't have data both time entries. below:
mydat_a <- reshape(mydat, idvar = "id", timevar = "year", direction = "wide") mydat_a #those na ones set them type == b # id result.2000 result.2001 # 1 a1 14.39524 37.150650 # 2 a2 17.69823 24.609162 # 3 a3 35.58708 na # 4 a4 20.70508 na # 5 a5 21.29288 7.349388 # 9 a12 na 13.131471 # 10 a13 na 15.543380 #add types again mydat_a$type <- "a" mydat_a[which(is.na(mydat_a), arr.ind=true)[,1],]$type <- "b" #go long format mydat_a <- reshape(mydat_a, direction="long", varying=list(names(mydat_a)[2:3]), v.names="result", idvar="id", timevar="year", times=2000:2001) #remove na mydat_a <- na.omit(mydat_a)
you can final plotting solution below (use mydat_a
instead of mydat
in ggplot
syntax).
or...
mydat$type <- "b" #make of them "b" later change repeated ones "a" mydat[ mydat$id %in% mydat[mydat$year==2000,]$id & mydat$id %in% mydat[mydat$year==2001,]$id,]$type <- "a" mydat$type <- as.factor(mydat$type) mydat # id year result type # 1 a1 2000 17.67485 # 2 a2 2000 15.16812 # 3 a3 2000 27.18261 b # 4 a4 2000 14.18510 b # 5 a5 2000 32.91164 # 6 a1 2001 13.30867 # 7 a2 2001 20.15258 # 8 a5 2001 31.21311 # 9 a12 2001 32.62673 b # 10 a13 2001 6.85111 b
it gives types entered manually here.
then can use @d.b's solution:
ggplot(data = split(mydat, mydat$type)$a, aes(x = year, y = result)) + geom_point(aes(color = type)) + geom_line(aes(group = id))
data:
set.seed(123) mydat <- data.frame(id = c('a1', 'a2', 'a3', 'a4', 'a5', 'a1', 'a2', 'a5', 'a12', 'a13'), year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001), result = rnorm(10, mean = 20, sd = 10))
Comments
Post a Comment