MobileRead Forums - View Single Post

asjogren · 04-25-2010, 02:02 AM

Partial Success! Thank you SpeakingToHe!

Some observations:
1) There appears to be no context that removing page headers affects ONLY PAGE HEADERS in PDF input. There are false positives within the body of the book where matching text is removed.
2) The case of the text is after conversion to XHTML
3) Even though the source PDF had the page headings centered on the line, this was not the case WHEN the pattern matching was applied.

What I had was alternating odd - even pages of page headings, centered. The odd pages had page numbers with blanks between the digits, for example "2 1" and "3 5 1".

The even number pages had a page heading of the book title in upper case with spaces between the letters and multiple spaces between the words, like "T H E T I T L E OF T H E B O O K". However, I had to match the lower case of the book title - with the extra spaces.

04-25-2010, 02:02 AM	#5
asjogren Addict Posts: 266 Karma: 1378 Join Date: Dec 2009 Location: Seattle / San Carlos, Sonora, Mexico Device: Kindle & WiFi Nook & PocketBook IQ	Partial Success! Thank you SpeakingToHe! Some observations: 1) There appears to be no context that removing page headers affects ONLY PAGE HEADERS in PDF input. There are false positives within the body of the book where matching text is removed. 2) The case of the text is after conversion to XHTML 3) Even though the source PDF had the page headings centered on the line, this was not the case WHEN the pattern matching was applied. What I had was alternating odd - even pages of page headings, centered. The odd pages had page numbers with blanks between the digits, for example "2 1" and "3 5 1". The even number pages had a page heading of the book title in upper case with spaces between the letters and multiple spaces between the words, like "T H E T I T L E OF T H E B O O K". However, I had to match the lower case of the book title - with the extra spaces.