Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-16-2011, 04:12 PM   #1
heddhunter
Junior Member
heddhunter began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
Yet another PDF line break question

I've got a collection of books in PDF format that are formatted with each paragraph having an initial indent. Calibre thinks that if there's a short line after the first line that it's a new paragraph. I played around with the "Line Un-Wrapping Factor" but all I could make it do was turn every single line in the book into a new paragraph!

Is there a way to tell Calibre that only lines that start with an initial indent are new paragraphs?

I looked at the output HTML from the conversion but it's already too late at that point... I would need it to happen earlier.
heddhunter is offline   Reply With Quote
Old 01-16-2011, 05:32 PM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,427
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by heddhunter View Post
]Is there a way to tell Calibre that only lines that start with an initial indent are new paragraphs?
Nope.
user_none is offline   Reply With Quote
Old 01-17-2011, 03:17 PM   #3
heddhunter
Junior Member
heddhunter began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
Quote:
Originally Posted by user_none View Post
Nope.
I'm a pretty good programmer (if I say so myself). Is this something I could work on or is it some sort of fundamental limitation in a library or something.
heddhunter is offline   Reply With Quote
Old 01-17-2011, 04:08 PM   #4
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,427
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by heddhunter View Post
I'm a pretty good programmer (if I say so myself). Is this something I could work on or is it some sort of fundamental limitation in a library or something.
I'll send you an email later today through MobileRead with an over view of how PDF conversion works and the points at which you would need to modify.

Last edited by user_none; 01-17-2011 at 04:08 PM. Reason: typos
user_none is offline   Reply With Quote
Old 01-17-2011, 04:09 PM   #5
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,571
Karma: 3784089
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350/650/T1, PB360, Kobo Glo/AuraHD/Aura6"
If you can create a way to reliably reconstruct paragraphs from all PDFs you will probably be hailed as the New Messiah in these forums. I think you should have a go.
jackie_w is offline   Reply With Quote
Old 01-17-2011, 06:27 PM   #6
JMikeD
Evangelist
JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.
 
JMikeD's Avatar
 
Posts: 448
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
Quote:
Originally Posted by jackie_w View Post
If you can create a way to reliably reconstruct paragraphs from all PDFs you will probably be hailed as the New Messiah in these forums. I think you should have a go.
Oh, yeah.
JMikeD is offline   Reply With Quote
Old 01-18-2011, 02:18 PM   #7
heddhunter
Junior Member
heddhunter began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jan 2011
Device: Kindle 3
Well, I've come up with a solution that works for my particular situation but I'm pretty sure it is not going to work for any random pdf. I wrote a perl script that takes pdftohtml's XML output and rewrites it into HTML. The XML is fairly easy to clean up. There are a few simple rules I use to detect paragraph breaks. I load the HTML output into Calibre and then let Calibre do its normal conversion stuff to get the final book onto my Kindle. It's not a simple drag n drop procedure though. And, as I say, I don't think it will work generically.

I guess if people have specific pdf's they want me to take a look at I could do that and see if there's a way to make the conversion procedure somewhat simpler.
heddhunter is offline   Reply With Quote
Old 01-18-2011, 02:41 PM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
I'll send you an email later today through MobileRead with an over view of how PDF conversion works and the points at which you would need to modify.
If you don't mind, I'd like to see that email, too. Feel free to post it here, unless there's some reason not to.
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF line spacing jjansen Calibre 3 03-08-2010 11:46 AM
PDF 2 LRF, line break issue ^_Pepe_^ Calibre 1 12-03-2009 06:43 AM
Page break before h2 question Amalthia Calibre 9 04-17-2009 06:33 PM
Book Designer - Page Break Line is not showing? pitolee Sony Reader 6 04-19-2007 09:26 PM
Bookdesigner and line spacing question Texfire Sony Reader 4 03-23-2007 06:58 PM


All times are GMT -4. The time now is 07:25 AM.


MobileRead.com is a privately owned, operated and funded community.