![]() |
#1 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
|
Yet another PDF line break question
I've got a collection of books in PDF format that are formatted with each paragraph having an initial indent. Calibre thinks that if there's a short line after the first line that it's a new paragraph. I played around with the "Line Un-Wrapping Factor" but all I could make it do was turn every single line in the book into a new paragraph!
Is there a way to tell Calibre that only lines that start with an initial indent are new paragraphs? I looked at the output HTML from the conversion but it's already too late at that point... I would need it to happen earlier. |
![]() |
![]() |
![]() |
#2 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
|
|
![]() |
![]() |
![]() |
#4 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
I'll send you an email later today through MobileRead with an over view of how PDF conversion works and the points at which you would need to modify.
Last edited by user_none; 01-17-2011 at 04:08 PM. Reason: typos |
![]() |
![]() |
![]() |
#5 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,247
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
If you can create a way to reliably reconstruct paragraphs from all PDFs you will probably be hailed as the New Messiah in these forums. I think you should have a go.
![]() |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
|
|
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
|
Well, I've come up with a solution that works for my particular situation but I'm pretty sure it is not going to work for any random pdf. I wrote a perl script that takes pdftohtml's XML output and rewrites it into HTML. The XML is fairly easy to clean up. There are a few simple rules I use to detect paragraph breaks. I load the HTML output into Calibre and then let Calibre do its normal conversion stuff to get the final book onto my Kindle. It's not a simple drag n drop procedure though. And, as I say, I don't think it will work generically.
I guess if people have specific pdf's they want me to take a look at I could do that and see if there's a way to make the conversion procedure somewhat simpler. |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
If you don't mind, I'd like to see that email, too. Feel free to post it here, unless there's some reason not to.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF line spacing | jjansen | Calibre | 3 | 03-08-2010 11:46 AM |
PDF 2 LRF, line break issue | ^_Pepe_^ | Calibre | 1 | 12-03-2009 06:43 AM |
Page break before h2 question | Amalthia | Calibre | 9 | 04-17-2009 06:33 PM |
Book Designer - Page Break Line is not showing? | pitolee | Sony Reader | 6 | 04-19-2007 09:26 PM |
Bookdesigner and line spacing question | Texfire | Sony Reader | 4 | 03-23-2007 06:58 PM |