python tesseract - how to extract exact numbers from the below image using pytesseract image_to_string -
import cv2 import numpy np matplotlib import pyplot plt pil import image noiselevel import estimate_noise import pytesseract img = cv2.imread('cmam.png',0) # gray = cv2.cvtcolor(img, cv2.color_bgr2gray) # a=estimate_noise(img) # global thresholding ret1,th1 = cv2.threshold(img,127,254,cv2.thresh_binary) cv2.imshow("global",th1) cv2.imwrite("final.tif",th1) img=cv2.imread("final.tif",0) x=image.open("final.tif") print pytesseract.image_to_string(x,config='-psm 6') # otsu's thresholding ret2,th2 = cv2.threshold(img,43,254,cv2.thresh_binary+cv2.thresh_otsu) cv2.imshow("otsu",th2) cv2.imwrite("final1.tif",th2) x=image.open("final1.tif") print pytesseract.image_to_string(x,config='-psm 6') # otsu's thresholding after gaussian filtering blur = cv2.gaussianblur(img,(5,5),0) ret3,th3 = cv2.threshold(blur,0,255,cv2.thresh_binary+cv2.thresh_otsu) cv2.imshow("filterotsu",th3) cv2.imwrite("final2.tif",th3) x=image.open("final2.tif") print pytesseract.image_to_string(x,config='-psm 6')
my. "al ©1910 18500 é§551 6253
there multiple steps need followed increase accuracy.
- use preprocessing technique crop desired part of image, eliminate error , improve accuracy.
- configure tesseract detect numbers.
- train engine, since type of font fixed can use images train engine , increase accuracy.
hope solves problem.
Comments
Post a Comment