python - smf.ols summary and metrics for three-class classification -


i use scikit-learn modelling purposes, , have no experience in r. however, had specific request forward selective logistic regression , trying use statsmodels.formula.api.ols three-class classification. i found , modified function think working can't sure because can't interpret output.

advice appreciated familiar statsmodels, particularly using statsmodels pandas.

i have 2 main issues:

  1. i can't print summary table seems basis of results formatting class. error:

    valueerror: shapes (18,3) , (18,3) not aligned: 3 (dim 1) != 18 (dim 0) 

this related using ols classifier, doesn't work when restricting 2 classes. other methods , attributes, pvalues , rsquared, return similar errors. can't dig structure of summary() , can't find examples in documentation. examples appreciated.

  1. interestingly, params attribute contains meaningful output. however, can't interpret it, it's organized in columns 0, 1, 2. obviously, there 3 resultant equations , there should 3 parameters each feature correspond, can't tell column refers test (ex. normal vs. positive, normal vs. negative, negative vs. positive).

                      0         1         2 intercept  0.268715  0.036415  0.694869 feature1  -0.019223 -0.015703  0.034926 feature3   0.023013  0.061053 -0.084067 

for completeness, model.model.formula contains meaningful output.

here's sample data put excel doc, please use testing:

classname   feature1    feature2    feature3 normal      3           3           6 positive    6           1           7 negative    2           2           4 normal      3           2           5 positive    5           4           3 negative    6           4           7 normal      8           1           6 positive    5           6           6 negative    3           3           8 normal      2           7           5 positive    4           2           3 negative    3           9           3 normal      2           5           9 positive    3           1           5 negative    5           2           6 normal      2           4           7 positive    1           2           6 negative    1           2           8 

and here's code, simplified, save imports:

def forward_selected(df, response):      remaining=set(df.columns)     remaining.remove(response)     print df.head()     selected=[]     current_score, best_new_score=0.0, 0.0      while remaining , current_score == best_new_score:         scores_with_candidates=[]         candidate in remaining:             formula="{} ~ {} + 1".format(response, ' + '.join(selected+[candidate]))             score=smf.ols(formula, df).fit()             scores_with_candidates.append((score, candidate))         scores_with_candidates.sort()         best_new_score, best_candidate = scores_with_candidates.pop()         if current_score < best_new_score:             remaining.remove(best_candidate)             selected.append(best_candidate)             print best_candidate             current_score=best_new_score     formula="{} ~ {} + 1".format(response, ' + '.join(selected))     model=smf.ols(formula, df).fit()      return model  def main(): #infile, feature_names     df_raw = pd.read_excel('sampledata.xlsx') #infile     model=forward_selected(df_raw, 'classname')      print model.params     print model.model.formula     print model.summary()     return  if __name__ == '__main__':     main() 

any advice dig model , retrieve metrics (aic, rsquared, pvalues) appreciated.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -