MobileRead Forums - View Single Post - PDF to ePub conversion issue

deadSkip · 06-25-2010, 02:55 PM

I'm hoping someone can give me some pointers on where I'm going wrong here. I'm trying to convert a PDF into ePub, but it seems that no matter what I do the header text is left in. According to both the wizard and the regexbuddy software, both headers are matched, but when I do the conversion they're still there.

Here's an example of the debug code.

Input\Index.html Text:

Code:

will soon manifest themselves.”&nbsp;<br>
“I sense nothing.”&nbsp;<br>
<hr>
<A name=12></a>2&nbsp;<br>
Richard A. Knaak&nbsp;<br>
“Your skills are not honed as mine are, my lord, but that&nbsp;<br>

Regex:

Code:

(?i)(?<=)<hr>\s*<A name=/d+></a>(/d+&nbsp;<br>\sRichard A\. Knaak|Moon of the Spider&nbsp;<br>\s/d+)&nbsp;<br>

Doing this leaves the text when. I do the similar thing with the parsed file and it's still left in.

Parsed\Index.html:

Code:

themselves.” </p><p>
“I sense nothing.” </p><p>
2 </p><p>
Richard A. Knaak </p><p>
“Your skills are not honed as mine are, my lord, but that shall be remedied soon enough, yes?” </p><p>

Regex:

Code:

(?i)(?<=)(Moon of the Spider\s*</p><p>\s\d+\s*</p><p>|\s\d+\s</p><p>\sRichard A\. Knaak\s*</p><p>)

And yes, I've remembered to check the Remove Header boxes

06-25-2010, 02:55 PM	#1
deadSkip Junior Member Posts: 1 Karma: 10 Join Date: Jun 2010 Device: iPhone	PDF to ePub conversion issue - headers getting left in I'm hoping someone can give me some pointers on where I'm going wrong here. I'm trying to convert a PDF into ePub, but it seems that no matter what I do the header text is left in. According to both the wizard and the regexbuddy software, both headers are matched, but when I do the conversion they're still there. Here's an example of the debug code. Input\Index.html Text: Code: will soon manifest themselves.” <br> “I sense nothing.” <br> <hr> <A name=12></a>2 <br> Richard A. Knaak <br> “Your skills are not honed as mine are, my lord, but that <br> Regex: Code: (?i)(?<=)<hr>\s<A name=/d+></a>(/d+ <br>\sRichard A\. Knaak\|Moon of the Spider <br>\s/d+) <br> Doing this leaves the text when. I do the similar thing with the parsed file and it's still left in. Parsed\Index.html: Code: themselves.” </p><p> “I sense nothing.” </p><p> 2 </p><p> Richard A. Knaak </p><p> “Your skills are not honed as mine are, my lord, but that shall be remedied soon enough, yes?” </p><p> Regex: Code: (?i)(?<=)(Moon of the Spider\s</p><p>\s\d+\s</p><p>\|\s\d+\s</p><p>\sRichard A\. Knaak\s</p><p>) And yes, I've remembered to check the Remove Header boxes