python - Issue with concatenating and replace in Pandas dataframe -
i stymied something, , if fix 1 way, breaks else.
i have set of data lists file status country. want is, each country in country column, print missing files each status in visitstatus column. rows country=france, every visit "complete", list number of missing files.
there 2 dataframes concatenating 1 combined set work , deliver final output. concat'ing df_s , df_ins df_combined.
when grab set of unique values country , visitstatus columns loop over, try write out results per country excel file workbook, quirks in data kick out 'duplicate sheetname' error. in 1 of source dataframes, there status of "do not review" in visitstatus column, in other source dataframe, it's named "do not review", lowercase second 2 words. when they're concatenated, kicks out unique values of "do not review" , "do not review". when xslx writer tries make workbooks second one, checks against existing workbooks disregarding case, finds first one, decides same since ignoring case, , kicks out error saying 'do not review' worksheet exists.
if run replace() , change "do not review" values in visitstatus column "do not review" match , don't give 2 results when call unique(), breaks , gives me keyerror on visitstatus.
so far have read thread after thread , haven't been able solve this. tried running replace() on source dataframe, , throws error saying "status" float , can't handled string.
i'm @ loss. in advance!
# combo # merge screening , in study datasets df_combined = pd.concat([df_s,df_ins], axis=0, ignore_index=true) df_combined = df_combined.query('visitstatus != "hand off information"') print(df_combined.columns.values) print("---------------------------------------------------------------------------------") # display , save out country , missing file status statuses = df_combined['visitstatus'].unique() countries = df_combined['country'].unique() status in statuses: print("x" + status + "x") print('\n') print (statuses) country in countries: status in statuses: print('\n') print("---> missing files " + country + " visits status of: " + str(status)) df_cmb = df_combined[(df_combined.country==country) & (df_combined.visitstatus==status)] print('\n') numrows=df_cmb.shape[0] if numrows > 0: print("----> number of visits in " + str(status) + " subset: " + str(numrows)) print("drf forms missing: " + str(df_cmb['drf-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['drf-form-uploaded'].sum()) + " collected") print("cssrs forms missing: " + str(df_cmb['cssrs-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['cssrs-form-uploaded'].sum()) + " collected") print("cdr forms missing: " + str(df_cmb['cdr-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['cdr-form-uploaded'].sum()) + " collected") print("cdr audio missing: " + str(df_cmb['cdr-audio-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['cdr-audio-uploaded'].sum()) + " collected") print("mmse forms missing: " + str(df_cmb['mmse-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['mmse-form-uploaded'].sum()) + " collected") print("mmse audio missing: " + str(df_cmb['mmse-audio-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['mmse-audio-uploaded'].sum()) + " collected") print("rbans forms missing: " + str(df_cmb['rbans-form-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['rbans-form-uploaded'].sum()) + " collected") print("rbans audio missing: " + str(df_cmb['rbans-audio-uploaded'].sum()) + " vs. " + str(numrows - df_cmb['rbans-audio-uploaded'].sum()) + " collected") print("--------------------------------------") print('\n') else: print("no " + status + " files/visits " + country) if country =="united states": country="usa" # borked in next line - somehow there 2 "do not review" status types in combined file, triggers "already in use" sheetname df_cmb.to_excel(combo_writer, header=true, index=false, sheet_name=str(country)[:3] + "-by-" + str(status))
oh lord. i'm answering own question.
so tinkered more , nothing else made sense, started wondering if putting in arguments replace() correctly, , had them backwards. assumed "do not review" needed changed "do not review", other way around...i assumed incorrectly source file data needed modified. once flipped them, works.
Comments
Post a Comment