How to extract text from a Specific Area in a PDF using Python? -

i'm trying extract text pdf using python, , have done using pypdf2 this:

import pypdf2 pdffileobj = open('path', 'rb') pdfreader = pypdf2.pdffilereader(pdffileobj) pageobj = pdfreader.getpage(0) pageobj.extracttext()

this extracts text page, want extract text rectangular region of 3'x4' @ top-left part of page.

can done pypdf2 or other python library?

this rather complex topic, possible. first need familiar pdf format descripton.

start here example.

you can identify location , contents of text boxes , extract string data.

this topic holds examples pypdf, previous version of pypdf2, syntax similar. there examples on how iterate through indirect objects.

a place start source of function pageobj.extracttext() used.

if not restricted python: how extract text pdf?

Force Net