Parsing Index page in a PDF text book with Python

Published: 25/09/2018
Parsing Index page in a PDF text book with Python
Source: STACKOVERFLOW.COM

I have to extract text from PDF pages as it is with the indentation into a CSV file. Index page from PDF text book: I should split the text into class and subclass type hierarchy along with the page numbers. For example in the image, Application server is the class and Apache Tomcat is the subclass in the page number 275 This is the expected output of the CSV: I have used Tika parser to parse the

Read more
Related news
Comment
FACEBOOK