How to extract text from a Specific Area in a PDF using Python? -


i'm trying extract text pdf using python, , have done using pypdf2 this:

import pypdf2 pdffileobj = open('path', 'rb') pdfreader = pypdf2.pdffilereader(pdffileobj) pageobj = pdfreader.getpage(0) pageobj.extracttext() 

this extracts text page, want extract text rectangular region of 3'x4' @ top-left part of page.

i want :how-to extract text pdf doc within specific rectangular region? in python

can done pypdf2 or other python library?

this rather complex topic, possible. first need familiar pdf format descripton.

start here example.

you can identify location , contents of text boxes , extract string data.

this topic holds examples pypdf, previous version of pypdf2, syntax similar. there examples on how iterate through indirect objects.

a place start source of function pageobj.extracttext() used.

if not restricted python: how extract text pdf?


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -