Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-14-2012, 09:38 PM   #1
daniel3ub
Junior Member
daniel3ub began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2012
Device: Kindle 4NT
Thumbs down PDF pagebreaks turn to blank lines in mobi

Hi, there. I am new to the forum, but a bit experienced in converting things in Calibre.

I know that PDF is a bad format and yadda yadda, but sometimes the only source one have is in pdf, so...

I've noticed that when converting from PDF to mobi the PDF's page breaks become blank lines in mobi, even if the option to "delete blank lines between paragraphs" is turned on. This way, even if the text is all right and fluid, there is some blank lines that coincide with the PDF's page breaks.

Is this a feature or a bug? Or am I missing something?

OBS: The PDF document I am talking about is a pure text document, without fancy formatting or images.

Thanks a lot!
daniel3ub is offline   Reply With Quote
Old 06-14-2012, 09:51 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,668
Karma: 127838212
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Here is how to do it so it comes out correctly...

1. Use Calibre to convert the PDF to ePub (do not use the delete blank lines option ever).
2. Use Sigil to edit the resulting ePub.
3. A/B compare the PDF to the ePub (every letter, every space, every punctuation, everything)
4. Edit the ePub to fix the errors.
5. Edit the ePub to fix the formatting.
6. Validate the ePub using FlightCrew.
7. Convert to AZW3 using Calibre.

To do this right takes a lot of work.
JSWolf is offline   Reply With Quote
Old 06-14-2012, 10:46 PM   #3
daniel3ub
Junior Member
daniel3ub began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2012
Device: Kindle 4NT
Thanks.

What I do is convert the PDF to RTF, edit it and then convert from the edited RTF to mobi. Since there is no images or such, it works well.

My question is: the "insert blank lines in mobi when a page break in PDF is found" behavior is a bug or a feature?

Should I file a bug report? It seems like a bug to me

Thanks!
daniel3ub is offline   Reply With Quote
Old 06-14-2012, 11:24 PM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
It's not a feature or a bug - it's just a limitation/expected behavior. Did you read the sticky?

https://www.mobileread.com/forums/sho...d.php?t=118605

Most pdfs don't exhibit the specific problem you're mentioning - odds are there is some sort of header/footer in your pdf that's tripping up the normal pdf conversion - use search/replace to delete it.
ldolse is offline   Reply With Quote
Old 06-15-2012, 12:56 PM   #5
daniel3ub
Junior Member
daniel3ub began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2012
Device: Kindle 4NT
Thanks for your answer. I've read the sticky many times in search of a solution. And every PDF I've already tried to convert exhibit this problem, or I wouldn't cry for help.

I think you can reproduce the problem. Here are the steps I followed right now, just to be sure:

1. Downloaded a .pdf book from Project Gutenberg (I got this http://www.gutenberg.org/ebooks/1342 ). There is no header or footer in it. You can get the .txt version and make a PDF from it, too. The results are the same.
2. Convert it with Calibre to MOBI, with all the options from Heuristics checked, and "Removing spacing between paragraphs" from "Look and Feel" checked too.

You can see in the resulting MOBI (using the internal viewer or even using Kindle itself) that there is some blank lines corresponding to every page break in the PDF file. Playing around a bit, I've just found that this blank lines are soft scene breaks inserted by Calibre (if I use the option to "replace soft scene breaks" it become obvious).

However, if a paragraph is broken from one page to another in the PDF, no soft scene break is inserted, but rather a new paragraph begins in the point of the page break.

I certainly can use regex to fix this paragraph breaks, but I think that Calibre could handle these "PDF page break -> MOBI soft scene break" problem.

The problem becomes annoying in a text that already contains some real soft scene breaks, as you can imagine, as the resulting MOBI will have a lot of fake soft scene breaks

Cheers!
daniel3ub is offline   Reply With Quote
Old 06-15-2012, 01:58 PM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,668
Karma: 127838212
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by daniel3ub View Post
Thanks.

What I do is convert the PDF to RTF, edit it and then convert from the edited RTF to mobi. Since there is no images or such, it works well.

My question is: the "insert blank lines in mobi when a page break in PDF is found" behavior is a bug or a feature?

Should I file a bug report? It seems like a bug to me

Thanks!
To be honest you are better to convert the PDF to ePub. Then you can format it how you want it to look and convert it to AZW3. RTF in Word leaves a lot of garbage that you don't need and don't want. With ePub, you can make it look nice and when that's done, it will convert better. Plus, with Sigil, you can easily split at the page breaks or combine as needed. It's not that hard to pick up and it will give you a better file to read when you are done.

As for whether the page beak issue is a bug or not, it's probably not. No two PDF will convert the same. So it's just what is. You'll have to fix it averward. So if you convert the PDF to ePub and then use Sigil to correct the output and fix the formatting, you can then convert it to read on your Kindle better then RTF.
JSWolf is offline   Reply With Quote
Old 06-15-2012, 11:38 PM   #7
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I understand what you're saying then - I misunderstood and thought you meant that broken paragraphs weren't being connected across page breaks - blank lines getting inserted and appearing to be scene breaks is a little bit different.

I've submitted a patch which fixes it, will probably be in the next release.
ldolse is offline   Reply With Quote
Old 06-16-2012, 10:49 AM   #8
daniel3ub
Junior Member
daniel3ub began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2012
Device: Kindle 4NT
Quote:
Originally Posted by ldolse View Post
I understand what you're saying then - I misunderstood and thought you meant that broken paragraphs weren't being connected across page breaks - blank lines getting inserted and appearing to be scene breaks is a little bit different.

I've submitted a patch which fixes it, will probably be in the next release.
Amazing!
daniel3ub is offline   Reply With Quote
Old 06-16-2012, 09:56 PM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,668
Karma: 127838212
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Regardless of this patch, my directions are still valid and still correct and this still requires a lot of work to do it right and it does not involve RTF and/or Word in any way at all.
JSWolf is offline   Reply With Quote
Old 08-23-2012, 04:58 PM   #10
da5id403
Junior Member
da5id403 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2012
Device: da5id403
A Lot Of Work – Understatement

Quote:
Originally Posted by JSWolf View Post
Here is how to do it so it comes out correctly...

1. Use Calibre to convert the PDF to ePub (do not use the delete blank lines option ever).
2. Use Sigil to edit the resulting ePub.
3. A/B compare the PDF to the ePub (every letter, every space, every punctuation, everything)
4. Edit the ePub to fix the errors.
5. Edit the ePub to fix the formatting.
6. Validate the ePub using FlightCrew.
7. Convert to AZW3 using Calibre.

To do this right takes a lot of work.
I came to this page via googling my issue which is the same as the OP. With all due respect, I would never want to do all that just to remove the annoying line spaces. I am not even sure that there are Windows alternatives for each program you have mentioned. I think if the author of Calibre was aware this as a high-priority bug, he would fix it. I have been using Calibre for a couple of years now (certainly not 30,000 posts long) and there are almost daily updates.

Bottom line – software should be able to do this. That's what PC's are for. There is a $.10 alternative.
da5id403 is offline   Reply With Quote
Old 08-23-2012, 05:02 PM   #11
da5id403
Junior Member
da5id403 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2012
Device: da5id403
Yes, Do That

Quote:
Originally Posted by daniel3ub View Post
Thanks.

What I do is convert the PDF to RTF, edit it and then convert from the edited RTF to mobi. Since there is no images or such, it works well.

My question is: the "insert blank lines in mobi when a page break in PDF is found" behavior is a bug or a feature?

Should I file a bug report? It seems like a bug to me

Thanks!
I think it's a bug. If it is not a bug, it is an unwanted feature that has plagued PDF to MOBI conversions since beta versions of this outstanding program. Submit the bug report. If you find out anything, please PM me.
da5id403 is offline   Reply With Quote
Old 08-23-2012, 05:14 PM   #12
da5id403
Junior Member
da5id403 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2012
Device: da5id403
Nothing This Simple Is Inevitable

Quote:
Originally Posted by ldolse View Post
It's not a feature or a bug - it's just a limitation/expected behavior. Did you read the sticky?

https://www.mobileread.com/forums/sho...d.php?t=118605

Most pdfs don't exhibit the specific problem you're mentioning - odds are there is some sort of header/footer in your pdf that's tripping up the normal pdf conversion - use search/replace to delete it.
Yes. Basically, the devs are working on it. To wit:

"How can I help make pdf conversion better?
"Improving pdf conversion is on the to-do list of the Calibre developers, but any help would be greatly appreciated. There is a new pdf engine that is currently in progress, and fixes many of the issues described above, like multi-column pdfs, ligatures, line wrapping, etc. Development is presently stalled, and there is no ETA for this being released. ... "

PS: Most do.

Last edited by da5id403; 08-23-2012 at 05:25 PM. Reason: To edit.
da5id403 is offline   Reply With Quote
Old 08-23-2012, 06:59 PM   #13
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
This particular issue was resolved back in June - refer to post number 7. Please make sure you're using the latest version of Calibre.
ldolse is offline   Reply With Quote
Old 08-24-2012, 06:11 AM   #14
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,460
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@daniel3ub - I've found that the mobicreator tool usually does a better job of converting PDF's than Calibre does - as is suggested in the "PDF Conversion - Read This First" sticky,
BetterRed is offline   Reply With Quote
Old 08-28-2012, 10:31 PM   #15
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,668
Karma: 127838212
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by da5id403 View Post
I came to this page via googling my issue which is the same as the OP. With all due respect, I would never want to do all that just to remove the annoying line spaces. I am not even sure that there are Windows alternatives for each program you have mentioned. I think if the author of Calibre was aware this as a high-priority bug, he would fix it. I have been using Calibre for a couple of years now (certainly not 30,000 posts long) and there are almost daily updates.

Bottom line – software should be able to do this. That's what PC's are for. There is a $.10 alternative.
All the programs I mentioned are Windows programs. Regardless of the line spaces, there are other errors in your PDF conversion and only a proper A/B compare will fix all the errors. You want a proper conversion, you have to do the work.

If the line space is created in CSS, then you can easily remove them. If they are in the XML code, they may not be as easy, but maybe a search/replace (maybe with regex) will work.

No software will do the conversion perfectly. It isn't possible. So my directions are all you have to get it right.
JSWolf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to EPUB adds extra blank lines ?? Rojofo Conversion 3 06-01-2012 06:20 PM
Blank Lines in MOBI-Conversion ulrichbi Conversion 3 01-19-2012 04:50 AM
Blank lines in Gutenberg mobi files SkookumPete Calibre 9 06-12-2011 11:16 AM
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) David Derrico Calibre 5 08-04-2010 12:13 AM
Using one of the Mobi softwares to turn prc to pdf. Ireadfreely Kindle Formats 22 01-09-2009 11:43 PM


All times are GMT -4. The time now is 03:44 AM.


MobileRead.com is a privately owned, operated and funded community.