pandas - How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks? -

January 15, 2015

it relatively easy manually give list of bins when plotting histogram matplotlib, shown example here.

a simple example of following:

import numpy np import matplotlib.pyplot plt  fig, ax = plt.subplots() ax.hist(np.random.randn(10000), bins=np.arange(-4, 4, 0.1)) ax.hist(0.2 * np.random.randn(10000), bins=np.arange(-4, 4, 0.1)) plt.show()

this can equivalently done pandas.dataframe with:

pd.dataframe({     'firsthistogram': np.random.randn(10000),     'secondhistogram': 0.2 * np.random.randn(10000) }).plot(kind='hist', bins=np.arange(-4, 4, 0.1))

going further, plotly allows directly interface pandas through cufflinks module, allows things following:

pd.dataframe({     'firsthistogram': np.random.randn(10000),     'secondhistogram': 0.2 * np.random.randn(10000) }).iplot(kind='hist', bins=100)

but here catch: iplot method provided cufflinks not seem accept list bins. when number provided in above example, number used bin independently both datasets, results in unequal binning, potentially misleading results (see equal heights in above plot).

while effect can mitigated using histnorm='density' option, 1 may want see counts per bin , not density.

is there way around this?

i have added update this. should able specify bins=(start,end,size)

pd.dataframe({ 'firsthistogram': np.random.randn(10000), 'secondhistogram': 0.2 * np.random.randn(10000)}).iplot(kind='hist',bins=(-4,4,.08))

should return: custom bins

Search This Blog

Force Net

pandas - How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks? -

Comments

Post a Comment

Popular posts from this blog

python - Operations inside variables -

Generic Map Parameter java -

arrays - What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? -