Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-10-2010, 02:14 PM   #1
xxx666yyy777
Junior Member
xxx666yyy777 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2010
Device: iPad
Bad PDF to ePub Conversion

Hi,

I have been trying to convert several pdfs to the epub format (for iPad, all other settings default). As a result, the pdf page breaks are lost, page numbers, where the old pdf page breaks used to be have been inserted into the resulting ePub, however the page numbers are NOT aligned with the ePub document page breaks. So, i now have different page breaks, than the original pdf, and pager numbers in the middle of pages/paragraphs.

Am I missing something?

Thx.
xxx666yyy777 is offline   Reply With Quote
Old 12-10-2010, 02:44 PM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
PDF conversion automatically removes all the original page breaks. To remove the page numbers/footers, etc you need to use the remove header/footer regex option under Structure detection. You need to write a regular expression to do this, because every pdf is bit different with respect to page numbers/headers/footers.
ldolse is offline   Reply With Quote
Advert
Old 12-10-2010, 03:22 PM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by xxx666yyy777 View Post
Hi,
....
Am I missing something?
Yes, but it's not your fault. PDF is based on postScript - a great printer language, and a lousy ebook format.

Quote:
pdf page breaks are lost
Page breaks in a pdf are positioned when there's not enough room on the paper to display the next line of text. They are in the middle of sentences, etc. so you don't normally want them in an ebook. Your breaks weren't "lost" they were removed. Calibre can often put in the breaks you want by looking for certain words, like "Chapter."

Quote:
page numbers, where the old pdf page breaks used to be have been inserted into the resulting ePub,
These weren't "inserted." They were there from the start. PDF format makes no distinction between text in the header, footer or page number and regular text in the body of the book. In fact, PDF has no real concept of a sentence. It just puts characters at specified locations on the paper. Converters, like Calibre have to try to reconstruct the sentence by looking for the period or capital letters.

Quote:
however the page numbers are NOT aligned with the ePub document page breaks.
They wouldn't be. The page numbers in the text of the PDF corresponded to the paper size. You have to do some work to remove them.
Quote:
So, i now have different page breaks, than the original pdf, and pager numbers in the middle of pages/paragraphs.
PDF is a lousy ebook format. Your best option is not to start with a PDF. If you have no choice, then your second best option is to read the PDF and not convert. If you have to convert, then you're stuck trying to fix up the PDF in the conversion by setting the remove header/footer options and adjusting the unwrap factor.

Good luck.
Starson17 is offline   Reply With Quote
Old 12-10-2010, 03:25 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,575
Karma: 145863177
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by xxx666yyy777 View Post
Hi,

I have been trying to convert several pdfs to the epub format (for iPad, all other settings default). As a result, the pdf page breaks are lost, page numbers, where the old pdf page breaks used to be have been inserted into the resulting ePub, however the page numbers are NOT aligned with the ePub document page breaks. So, i now have different page breaks, than the original pdf, and pager numbers in the middle of pages/paragraphs.

Am I missing something?

Thx.
The problem is that PDF is not a good format to convert from. It cannot be converted to anything else without errors.So really, treat PDF like it diesn't exist and life will be a lot easier to live.
JSWolf is offline   Reply With Quote
Old 12-10-2010, 03:30 PM   #5
xxx666yyy777
Junior Member
xxx666yyy777 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Dec 2010
Device: iPad
Thanks for all the great responses. I understand now...!

Thx.
xxx666yyy777 is offline   Reply With Quote
Advert
Old 12-10-2010, 04:02 PM   #6
leday
Groupie
leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.
 
Posts: 171
Karma: 400
Join Date: Jun 2009
Device: Sony PRS-700, Nook Color
Quote:
Originally Posted by ldolse View Post
PDF conversion automatically removes all the original page breaks. To remove the page numbers/footers, etc you need to use the remove header/footer regex option under Structure detection. You need to write a regular expression to do this, because every pdf is bit different with respect to page numbers/headers/footers.
How do you write the expression? It seems complicated to me when I look?
leday is offline   Reply With Quote
Old 12-10-2010, 05:07 PM   #7
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by leday View Post
How do you write the expression? It seems complicated to me when I look?
They seem a little scary at first, but are not really that complicated to write, reading them is a different matter. There's a tutorial available in the manual.
Manichean is offline   Reply With Quote
Old 12-10-2010, 06:15 PM   #8
leday
Groupie
leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.
 
Posts: 171
Karma: 400
Join Date: Jun 2009
Device: Sony PRS-700, Nook Color
Thanks. This really helps. Now, the only problem I have is that after the conversion, it is closing up words every so often, is in the example below. Is there any way to keep it from doing this as I am having to go through TONS of words to separate them.

“Right. That’s it. I officially call an end to today. I’mgoing home and going to bed until it’s over.” I rolled overonto my side and propped myself up on one hand. A pairof shoes appeared next to me, attached to a woman’s legs.I followed the legs up to the rest of the person.

He turned away, his voice as smooth and polished as it had been the first time I’d met him. “You will tell me with whom you are working, or I will break your body, corruptyour soul, and banish you to an eternity of torment.”
Every inch of my body broke out into a terrified coldsweat as I frantically looked around the room, desperatefor some way to escape, or something I could do todistract Ariton long enough to get away
leday is offline   Reply With Quote
Old 12-10-2010, 11:21 PM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Words running into each other is probably unique to that pdf, though it's not a totally uncommon problem. As Starson17 was saying, it's all postscript, the pdf itself isn't aware of 'words' per-se, just a bunch of letters. When those strings are converted to text it's likely there is too little spacing or some other odd bit of formatting which prevents the conversion engine from recognizing the space and retain it.
ldolse is offline   Reply With Quote
Old 12-11-2010, 02:00 AM   #10
leday
Groupie
leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.leday has a complete set of Star Wars action figures.
 
Posts: 171
Karma: 400
Join Date: Jun 2009
Device: Sony PRS-700, Nook Color
OK thanks. On looking further at several of my pdf file conversions it does seem that the problem is indeed unique to the particular pdf files that I have, and is there BEFORE conversion to epub.

Sigh....pdf is rather a pain isn't it....
leday is offline   Reply With Quote
Old 12-11-2010, 07:42 AM   #11
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
PDF is a pain to be sure. These problems are why any epub conversion for instance may need to be re-edited in Sigil or other program that can do search and replace and spelling checks to make up for words run together. I am working on book from the US Army Center for Military History that has bad problems with words run together. It also suffers from captions put in graphically and textually both and the fancy captions are overlaid on the bottom of pictures with a black haze. Fine to print, heck to convert.
mrmikel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to EPUB Conversion LuchoResto General Discussions 1 11-19-2010 04:54 PM
pdf to epub conversion Storyowner Calibre 3 11-03-2010 08:01 AM
PDF to EPUB conversion jfontana Calibre 2 03-17-2010 03:09 AM
pdf to epub conversion mediax Sigil 16 11-19-2009 03:48 PM
Help with conversion from PDF to EPUB Fizz Calibre 5 10-25-2009 11:48 AM


All times are GMT -4. The time now is 05:17 AM.


MobileRead.com is a privately owned, operated and funded community.