opencv - Python: Identifying undulating patterns in 1d distribution -


my question in brief: given 1d distribution in python, how can 1 identify regions of distribution have sine-like, undulating pattern?

i'm working identify images within page scans of historic documents. these images full-width within scans (that is, they're never juxtaposed text). led me believe simplest solution remove regions of page scan contain text lines.

using following snippet, 1 can read image memory , measure aggregate pixel brightness each row across image, top bottom, transforming input image plot below:

import matplotlib.mlab mlab import matplotlib.pyplot plt scipy.ndimage import imread import numpy np import sys  img = imread(sys.argv[1]) row_sums = list([(sum(r)/len(r)) r in img ])  # size of returned array = size of row_sums input array window_size = 150 running_average_y = np.convolve(row_sums, np.ones((window_size,))/window_size, mode='same')  # plot y dimension pixel distribution plt.plot(running_average_y) plt.show() 

input image:

enter image description here

output plot:

enter image description here

given distribution, i'm wanting identify regions of curve have regular undulating pattern 1 sees in first , last thirds of plot (roughly speaking). others have ideas on how task should approached?

at first tried fitting linear model whole 1d distribution, fails sorts of reasons. i'm thinking might make sense try , fit sine-wave segments of curve, seems overkill. others have ideas on how best approach task? suggestions or insights appreciated!

this doesn't answer question maybe solves problem. smoothing row sums hides fact lines of text in images separated white space -- expected movable type print.

you can use white space separator partition image blocks. in cases, block corresponds singe line. large blocks correspond images.

enter image description here

import sys import numpy np import matplotlib.pyplot plt  min_block_size = 100 # pixels  img = plt.imread(sys.argv[1])  # find blank rows row_sums = np.mean(img, axis=1) threshold = np.percentile(row_sums, 75) is_blank = row_sums > threshold  # find blocks between blank rows block_edges = np.diff(is_blank.astype(np.int)) starts, = np.where(block_edges == -1) stops, = np.where(block_edges == 1) blocks = np.c_[starts, stops]  # plot steps fig, axes = plt.subplots(3,1, sharex=true, figsize=(6.85, 6)) axes[0].plot(row_sums) axes[0].axhline(threshold, c='r', ls='--') axes[1].plot(is_blank) (start, stop) in blocks:     if stop - start > min_block_size:         axes[2].axvspan(start, stop, facecolor='red') plt.show() 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -