r - Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor X has new levels -
i did logistic regression:
ew <- glm(everwrk~age_p + r_maritl, data = nh11, family = "binomial")
moreover, want predict everwrk
each level of r_maritl
.
r_maritl
has following levels:
levels(nh11$r_maritl) "0 under 14 years" "1 married - spouse in household" "2 married - spouse not in household" "3 married - spouse in household unknown" "4 widowed" "5 divorced" "6 separated" "7 never married" "8 living partner" "9 unknown marital status"
so did:
predew <- with(nh11, expand.grid(r_maritl = c( "0 under 14 years", "1 married - spouse in household", "2 married - spouse not in household", "3 married - spouse in household unknown", "4 widowed", "5 divorced", "6 separated", "7 never married", "8 living partner", "9 unknown marital status"), age_p = mean(age_p,na.rm = true))) cbind(predew, predict(ew, type = "response", se.fit = true, interval = "confidence", newdata = predew))
the problem following response:
error in model.frame.default(terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 0 under 14 years, married - spouse in household unknown
sample data:
str(nh11$age_p) num [1:33014] 47 18 79 51 43 41 21 20 33 56 ... str(nh11$everwrk) factor w/ 2 levels "2 no","1 yes": na na 2 na na na na na 2 2 ... str(nh11$r_maritl) factor w/ 10 levels "0 under 14 years",..: 6 8 5 7 2 2 8 8 8 2 ...
tl;dr looks have levels in factor not represented in data, dropped factors used in model. in hindsight isn't terribly surprising, since won't able predict responses these levels. said, it's mildly surprising r doesn't nice generate na
values automatically. can solve problem using levels(droplevels(nh11$r_maritl))
in constructing prediction frame, or equivalently ew$xlevels$r_maritl
.
a reproducible example:
maritl_levels <- c( "0 under 14 years", "1 married - spouse in household", "2 married - spouse not in household", "3 married - spouse in household unknown", "4 widowed", "5 divorced", "6 separated", "7 never married", "8 living partner", "9 unknown marital status") set.seed(101) nh11 <- data.frame(everwrk=rbinom(1000,size=1,prob=0.5), age_p=runif(1000,20,50), r_maritl = sample(maritl_levels,size=1000,replace=true))
let's make missing level:
nh11 <- subset(nh11,as.numeric(nh11$r_maritl) != 3)
fit model:
ew <- glm(everwrk~r_maritl+age_p,data=nh11,family=binomial) predew <- with(nh11, expand.grid(r_maritl=levels(r_maritl),age_p=mean(age_p,na.rm=true))) predict(ew,newdata=predew)
success!
error in model.frame.default(terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 2 married - spouse not in household
predew <- with(nh11, expand.grid(r_maritl=ew$xlevels$r_maritl,age_p=mean(age_p,na.rm=true))) predict(ew,newdata=predew)
Comments
Post a Comment