pandas - How to use a specific list of bins for multiple histograms from DataFrame, when using plotly+cufflinks? -
it relatively easy manually give list of bins when plotting histogram matplotlib
, shown example here.
a simple example of following:
import numpy np import matplotlib.pyplot plt fig, ax = plt.subplots() ax.hist(np.random.randn(10000), bins=np.arange(-4, 4, 0.1)) ax.hist(0.2 * np.random.randn(10000), bins=np.arange(-4, 4, 0.1)) plt.show()
this can equivalently done pandas.dataframe
with:
pd.dataframe({ 'firsthistogram': np.random.randn(10000), 'secondhistogram': 0.2 * np.random.randn(10000) }).plot(kind='hist', bins=np.arange(-4, 4, 0.1))
going further, plotly
allows directly interface pandas
through cufflinks
module, allows things following:
pd.dataframe({ 'firsthistogram': np.random.randn(10000), 'secondhistogram': 0.2 * np.random.randn(10000) }).iplot(kind='hist', bins=100)
but here catch: iplot
method provided cufflinks
not seem accept list bins
. when number provided in above example, number used bin independently both datasets, results in unequal binning, potentially misleading results (see equal heights in above plot).
while effect can mitigated using histnorm='density'
option, 1 may want see counts per bin , not density.
is there way around this?
i have added update this. should able specify bins=(start,end,size)
pd.dataframe({ 'firsthistogram': np.random.randn(10000), 'secondhistogram': 0.2 * np.random.randn(10000)}).iplot(kind='hist',bins=(-4,4,.08))
should return: custom bins
Comments
Post a Comment