python - Creating a heatmap by sampling and bucketing from a 3D array -
i have experimental data exists so:
x = array([1, 1.12, 1.109, 2.1, 3, 4.104, 3.1, ...]) y = array([-9, -0.1, -9.2, -8.7, -5, -4, -8.75, ...]) z = array([10, 4, 1, 4, 5, 0, 1, ...])
if it's convenient, can assume data exists 3d array or pandas dataframe
:
df = pd.dataframe({'x': x, 'y': y, 'z': z})
the interpretation being, every position x[i], y[i]
, value of variable z[i]
. these not evenly sampled, there parts "densely sampled" (e.g. between 1 , 1.2 in x
) , others sparse (e.g. between 2 , 3 in x
). because of this, can't chuck these pcolormesh
or contourf
.
what instead resample x
, y
evenly @ fixed interval , aggregate values of z
. needs, z
can summed or averaged meaningful values, not problem. naïve attempt this:
x = np.arange(min(x), max(x), 0.1) y = np.arange(min(y), max(y), 0.1) x_g, y_g = np.meshgrid(x, y) nx, ny = x_g.shape z_g = np.full(x_g.shape, np.nan) ix in range(nx - 1): jx in range(ny - 1): x_min = x_g[ix, jx] x_max = x_g[ix + 1, jx + 1] y_min = y_g[ix, jx] y_max = y_g[ix + 1, jx + 1] vals = df[(df.x >= x_min) & (df.x < x_max) & (df.y >= y_min) & (df.y < y_max)].z.values if vals.any(): z_g[ix, jx] = sum(vals)
this works , output desire, plt.contourf(x_g, y_g, z_g)
slow! have ~20k samples, subsample ~800 samples in x , ~500 in y, meaning loop 400k long.
is there way vectorize/optimize this? better if there function this!
(also tagging matlab because syntax between numpy/matlab similar , have access both software.)
here's vectorized python solution employing numpy broadcasting
, matrix multiplication
np.dot
sum-reduction part -
x_mask = ((x >= x[:-1,none]) & (x < x[1:,none])) y_mask = ((y >= y[:-1,none]) & (y < y[1:,none])) z_g_out = np.dot(y_mask*z[none].astype(np.float32), x_mask.t) # if needed fill invalid places nans z_g_out[y_mask.dot(x_mask.t.astype(np.float32))==0] = np.nan
note avoiding use of meshgrid
there. thus, saving memory there meshes created meshgrid
huge , in process gaining performance improvement.
benchmarking
# original app def org_app(x,y,z): x = np.arange(min(x), max(x), 0.1) y = np.arange(min(y), max(y), 0.1) x_g, y_g = np.meshgrid(x, y) nx, ny = x_g.shape z_g = np.full(np.asarray(x_g.shape)-1, np.nan) ix in range(nx - 1): jx in range(ny - 1): x_min = x_g[ix, jx] x_max = x_g[ix + 1, jx + 1] y_min = y_g[ix, jx] y_max = y_g[ix + 1, jx + 1] vals = z[(x >= x_min) & (x < x_max) & (y >= y_min) & (y < y_max)] if vals.any(): z_g[ix, jx] = sum(vals) return z_g # proposed app def app1(x,y,z): x = np.arange(min(x), max(x), 0.1) y = np.arange(min(y), max(y), 0.1) x_mask = ((x >= x[:-1,none]) & (x < x[1:,none])) y_mask = ((y >= y[:-1,none]) & (y < y[1:,none])) z_g_out = np.dot(y_mask*z[none].astype(np.float32), x_mask.t) # if needed fill invalid places nans z_g_out[y_mask.dot(x_mask.t.astype(np.float32))==0] = np.nan return z_g_out
as seen, fair benchmarking, using array values original approach, fetching values dataframe slow things down.
timings , verification -
in [143]: x = np.array([1, 1.12, 1.109, 2.1, 3, 4.104, 3.1]) ...: y = np.array([-9, -0.1, -9.2, -8.7, -5, -4, -8.75]) ...: z = np.array([10, 4, 1, 4, 5, 0, 1]) ...: # verify outputs in [150]: np.nansum(np.abs(org_app(x,y,z) - app1(x,y,z))) out[150]: 0.0 in [145]: %timeit org_app(x,y,z) 10 loops, best of 3: 19.9 ms per loop in [146]: %timeit app1(x,y,z) 10000 loops, best of 3: 39.1 µs per loop in [147]: 19900/39.1 # speedup figure out[147]: 508.95140664961633
Comments
Post a Comment