Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2011, 07:41 AM   #1
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 77
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
PDF to HTML page break questions

I'm trying to convert PDF to ePub, this is how I did it:
1- saving PDF as HTML 4.0 with CSS 1.0 format
after saving it, I opened in FF and noticed that some paragraphs are broken (there are some blank spaces between char as below):
Quote:
...that it was not alive, but her

fear remained...
then I opened the HTML file with NotePad++ and noticed that there's a <p> tag around the "fear remained...", further analysis I noticed that the "fear remained..." is in a new page, so I checked other new pages, it seems some time Acrobat puts in <p> on the page break, and some times not. Anyone knows how to fix it without having to edit it one by one?

This is what I have in my HTML file:
Quote:
<P><SPAN>.....</SPAN
></P>
<P style="text-align:center; margin-left:0px">
<SPAN style="font-size:12pt; font-weight:normal; color:#000000"
>fear remained. For it was the first artificial object that she had ever seen. </SPAN
></P>
Some of the paragraphs does not break, so I have
Quote:
<P><SPAN>.....</SPAN
>
<SPAN style="font-size:12pt; font-weight:normal; color:#000000"
>fear remained. For it was the first artificial object that she had ever seen. </SPAN
></P>
ps: the environment I'm using is:
Win XP, Acrobat Pro 8.

Thanks for your comment/suggestion
Michael
michaelbr is offline   Reply With Quote
Old 01-26-2011, 01:43 PM   #2
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Have you tried using pdfreflow instead?

The only other thing I could suggest would be a reg ex find and replace. The exact syntax would vary by editor, and you'd have to figure out the rule -- would it be, if a paragraph ends with a lowercase letter, and the next one begins with one, merge the paragraphs?
frabjous is offline   Reply With Quote
Advert
Old 01-26-2011, 07:34 PM   #3
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 77
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by frabjous View Post
Have you tried using pdfreflow instead?

The only other thing I could suggest would be a reg ex find and replace. The exact syntax would vary by editor, and you'd have to figure out the rule -- would it be, if a paragraph ends with a lowercase letter, and the next one begins with one, merge the paragraphs?
Thanks frabjous, no I haven't tried pdfreflow, seems interesting, I'll give it a try and post back. I thought about using regex, but heard about it's very messy when fixing problem in HTML, so I'll try first pdfreflow.
michaelbr is offline   Reply With Quote
Old 01-27-2011, 08:49 PM   #4
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 77
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by frabjous View Post
Have you tried using pdfreflow instead?

The only other thing I could suggest would be a reg ex find and replace. The exact syntax would vary by editor, and you'd have to figure out the rule -- would it be, if a paragraph ends with a lowercase letter, and the next one begins with one, merge the paragraphs?
Thanks frabjous, pdfreflow worked like a charm, I think there are some bugs:
1) There's a </body> at the top of generated html file, I think this should be an opening instead of closing <body> (further down there's another </body>).
2) Sometimes there's missing </p>. For instances, the chapter sometimes has a closing </p> tag, sometime it's missing.
michaelbr is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I create chapters without a page-break between? bfollowell Sigil 22 01-02-2011 12:38 PM
How Do I Create A HTML jetBook Page Break? galavanter Ectaco jetBook 21 10-29-2009 12:05 PM
Why two separate page break xpaths in 0.6.x? ldolse Calibre 3 08-12-2009 01:00 PM
Page break before h2 question Amalthia Calibre 9 04-17-2009 06:33 PM
Page break before <b> flowoeB Calibre 14 04-12-2009 03:05 PM


All times are GMT -4. The time now is 02:14 AM.


MobileRead.com is a privately owned, operated and funded community.