r - Filter, Subset, or Select Repeated IDs for Different Time Entries in a Dataframe -


this question has answer here:

i have timeseries data values time indices not others. i need way filter observations occur in both time indices.

here's reproducible example illustrates problem. in final graph want observations of type == a, occur in both time indices.

set.seed(1005) mydat <- data.frame(   id = c('a1', 'a2', 'a3', 'a4', 'a5', 'a1', 'a2', 'a5', 'a12', 'a13'),   year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001),   result = rnorm(10, mean = 20, sd = 10),   type = c('a','a','b','b','a', 'a', 'a', 'a', 'b', 'b'))  mydat %>%    ggplot(aes(x = year, y = result)) +   geom_point(aes(color = type)) +    geom_line(aes(group = id)) 

enter image description here

note: should mention column type not exist in original dataset. created toy dataset type column show points want rid of in blue.

solutions should independent of type column, or alternatively, show how generate type column without hard-coding it.

you can find the, okay let's called them repeated, repeated ids 2 time entries , mark them type == a.

using reshape:

you can reshape data wide format , remove ones na means don't have data both time entries. below:

mydat_a <- reshape(mydat, idvar = "id", timevar = "year", direction = "wide")  mydat_a #those na ones set them type == b   #     id result.2000 result.2001  # 1   a1    14.39524   37.150650  # 2   a2    17.69823   24.609162  # 3   a3    35.58708          na  # 4   a4    20.70508          na  # 5   a5    21.29288    7.349388  # 9  a12          na   13.131471  # 10 a13          na   15.543380  #add types again mydat_a$type <- "a" mydat_a[which(is.na(mydat_a), arr.ind=true)[,1],]$type <- "b"  #go long format mydat_a <- reshape(mydat_a, direction="long",                     varying=list(names(mydat_a)[2:3]), v.names="result",                     idvar="id", timevar="year", times=2000:2001)   #remove na  mydat_a <- na.omit(mydat_a) 

you can final plotting solution below (use mydat_a instead of mydat in ggplot syntax).

or...

mydat$type <- "b" #make of them "b" later change repeated ones "a" mydat[  mydat$id %in% mydat[mydat$year==2000,]$id       & mydat$id %in% mydat[mydat$year==2001,]$id,]$type <- "a" mydat$type <- as.factor(mydat$type)   mydat   #     id year   result type  # 1   a1 2000 17.67485     # 2   a2 2000 15.16812     # 3   a3 2000 27.18261    b  # 4   a4 2000 14.18510    b  # 5   a5 2000 32.91164     # 6   a1 2001 13.30867     # 7   a2 2001 20.15258     # 8   a5 2001 31.21311     # 9  a12 2001 32.62673    b  # 10 a13 2001  6.85111    b 

it gives types entered manually here.

then can use @d.b's solution:

ggplot(data = split(mydat, mydat$type)$a, aes(x = year, y = result)) +         geom_point(aes(color = type)) + geom_line(aes(group = id)) 

enter image description here

data:

set.seed(123) mydat <- data.frame(id = c('a1', 'a2', 'a3', 'a4', 'a5', 'a1', 'a2', 'a5', 'a12', 'a13'),                     year = c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001),                     result = rnorm(10, mean = 20, sd = 10)) 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -