python - Issue with concatenating and replace in Pandas dataframe -

September 15, 2013

i stymied something, , if fix 1 way, breaks else.

i have set of data lists file status country. want is, each country in country column, print missing files each status in visitstatus column. rows country=france, every visit "complete", list number of missing files.

there 2 dataframes concatenating 1 combined set work , deliver final output. concat'ing df_s , df_ins df_combined.

when grab set of unique values country , visitstatus columns loop over, try write out results per country excel file workbook, quirks in data kick out 'duplicate sheetname' error. in 1 of source dataframes, there status of "do not review" in visitstatus column, in other source dataframe, it's named "do not review", lowercase second 2 words. when they're concatenated, kicks out unique values of "do not review" , "do not review". when xslx writer tries make workbooks second one, checks against existing workbooks disregarding case, finds first one, decides same since ignoring case, , kicks out error saying 'do not review' worksheet exists.

if run replace() , change "do not review" values in visitstatus column "do not review" match , don't give 2 results when call unique(), breaks , gives me keyerror on visitstatus.

so far have read thread after thread , haven't been able solve this. tried running replace() on source dataframe, , throws error saying "status" float , can't handled string.

i'm @ loss. in advance!

#     combo # merge screening , in study datasets df_combined = pd.concat([df_s,df_ins], axis=0, ignore_index=true) df_combined = df_combined.query('visitstatus != "hand off  information"') print(df_combined.columns.values)  print("---------------------------------------------------------------------------------") # display , save out country , missing file status statuses = df_combined['visitstatus'].unique() countries = df_combined['country'].unique() status in statuses:     print("x" + status + "x")     print('\n') print (statuses) country in countries:     status in statuses:         print('\n')         print("---> missing files " + country + " visits status of: " + str(status))         df_cmb = df_combined[(df_combined.country==country) & (df_combined.visitstatus==status)]         print('\n')         numrows=df_cmb.shape[0]         if numrows > 0:             print("----> number of visits in " + str(status) + " subset: " + str(numrows))             print("drf forms missing: " + str(df_cmb['drf-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['drf-form-uploaded'].sum()) + " collected")             print("cssrs forms missing: " + str(df_cmb['cssrs-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['cssrs-form-uploaded'].sum()) + " collected")             print("cdr forms missing: " + str(df_cmb['cdr-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['cdr-form-uploaded'].sum()) + " collected")             print("cdr audio missing: " + str(df_cmb['cdr-audio-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['cdr-audio-uploaded'].sum()) + " collected")             print("mmse forms missing: " + str(df_cmb['mmse-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['mmse-form-uploaded'].sum()) + " collected")             print("mmse audio missing: " + str(df_cmb['mmse-audio-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['mmse-audio-uploaded'].sum()) + " collected")             print("rbans forms missing: " + str(df_cmb['rbans-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['rbans-form-uploaded'].sum()) + " collected")             print("rbans audio missing: " + str(df_cmb['rbans-audio-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['rbans-audio-uploaded'].sum()) + " collected")             print("--------------------------------------")             print('\n')         else:             print("no " + status + " files/visits " + country)         if country =="united states":             country="usa"         # borked in next line - somehow there 2 "do not review" status types in combined file, triggers "already in use" sheetname         df_cmb.to_excel(combo_writer, header=true, index=false, sheet_name=str(country)[:3] + "-by-" + str(status))

oh lord. i'm answering own question.

so tinkered more , nothing else made sense, started wondering if putting in arguments replace() correctly, , had them backwards. assumed "do not review" needed changed "do not review", other way around...i assumed incorrectly source file data needed modified. once flipped them, works.

Search This Blog

Force Net

python - Issue with concatenating and replace in Pandas dataframe -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -