r - Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor X has new levels -


i did logistic regression:

 ew <- glm(everwrk~age_p + r_maritl, data = nh11, family = "binomial") 

moreover, want predict everwrk each level of r_maritl.

r_maritl has following levels:

levels(nh11$r_maritl)  "0 under 14 years"   "1 married - spouse in household"   "2 married - spouse not in household"  "3 married - spouse in household unknown"   "4 widowed"                                 "5 divorced"                               "6 separated"                               "7 never married"                          "8 living partner"    "9 unknown marital status"   

so did:

predew <- with(nh11, expand.grid(r_maritl = c( "0 under 14 years", "1 married -  spouse in household", "2 married - spouse not in household", "3 married -  spouse in household unknown", "4 widowed", "5 divorced", "6 separated", "7  never married", "8 living partner", "9 unknown marital status"), age_p = mean(age_p,na.rm = true)))  cbind(predew, predict(ew, type = "response",                         se.fit = true, interval = "confidence",                         newdata = predew)) 

the problem following response:

error in model.frame.default(terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 0 under 14 years, married - spouse in household unknown

sample data:

str(nh11$age_p) num [1:33014] 47 18 79 51 43 41 21 20 33 56 ...  str(nh11$everwrk) factor w/ 2 levels "2 no","1 yes": na na 2 na na na na na 2 2 ...  str(nh11$r_maritl) factor w/ 10 levels "0 under 14 years",..: 6 8 5 7 2 2 8 8 8 2 ... 

tl;dr looks have levels in factor not represented in data, dropped factors used in model. in hindsight isn't terribly surprising, since won't able predict responses these levels. said, it's mildly surprising r doesn't nice generate na values automatically. can solve problem using levels(droplevels(nh11$r_maritl)) in constructing prediction frame, or equivalently ew$xlevels$r_maritl.

a reproducible example:

maritl_levels <- c( "0 under 14 years", "1 married - spouse in household",    "2 married - spouse not in household", "3 married - spouse in household unknown",    "4 widowed", "5 divorced", "6 separated", "7 never married", "8 living partner",   "9 unknown marital status") set.seed(101) nh11 <- data.frame(everwrk=rbinom(1000,size=1,prob=0.5),                  age_p=runif(1000,20,50),                  r_maritl = sample(maritl_levels,size=1000,replace=true)) 

let's make missing level:

nh11 <- subset(nh11,as.numeric(nh11$r_maritl) != 3) 

fit model:

ew <- glm(everwrk~r_maritl+age_p,data=nh11,family=binomial) predew <- with(nh11,   expand.grid(r_maritl=levels(r_maritl),age_p=mean(age_p,na.rm=true))) predict(ew,newdata=predew) 

success!

error in model.frame.default(terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 2 married - spouse not in household

predew <- with(nh11,            expand.grid(r_maritl=ew$xlevels$r_maritl,age_p=mean(age_p,na.rm=true))) predict(ew,newdata=predew) 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -