Thread: PDF Input
View Single Post
Old 04-25-2010, 01:02 AM   #5
asjogren
Addict
asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.asjogren is no ebook tyro.
 
Posts: 266
Karma: 1378
Join Date: Dec 2009
Location: Seattle / San Carlos, Sonora, Mexico
Device: Kindle & WiFi Nook & PocketBook IQ
Partial Success! Thank you SpeakingToHe!

Some observations:
1) There appears to be no context that removing page headers affects ONLY PAGE HEADERS in PDF input. There are false positives within the body of the book where matching text is removed.
2) The case of the text is after conversion to XHTML
3) Even though the source PDF had the page headings centered on the line, this was not the case WHEN the pattern matching was applied.

What I had was alternating odd - even pages of page headings, centered. The odd pages had page numbers with blanks between the digits, for example "2 1" and "3 5 1".

The even number pages had a page heading of the book title in upper case with spaces between the letters and multiple spaces between the words, like "T H E T I T L E OF T H E B O O K". However, I had to match the lower case of the book title - with the extra spaces.
asjogren is offline   Reply With Quote