How to extract text from a Specific Area in a PDF using Python? -
i'm trying extract text pdf using python, , have done using pypdf2 this:
import pypdf2 pdffileobj = open('path', 'rb') pdfreader = pypdf2.pdffilereader(pdffileobj) pageobj = pdfreader.getpage(0) pageobj.extracttext()
this extracts text page, want extract text rectangular region of 3'x4' @ top-left part of page.
i want :how-to extract text pdf doc within specific rectangular region? in python
can done pypdf2 or other python library?
this rather complex topic, possible. first need familiar pdf format descripton.
start here example.
you can identify location , contents of text boxes , extract string data.
this topic holds examples pypdf, previous version of pypdf2, syntax similar. there examples on how iterate through indirect objects.
a place start source of function pageobj.extracttext() used.
if not restricted python: how extract text pdf?
Comments
Post a Comment