![]() |
#1 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
|
Getting extra Paragraphs
This one is going to be harder that the last one I found my own answer to.
When converting from PDF to RTF: Every time a sentence starts at the beginning of a line within a paragraph, Calibre always starts a new paragraph. Any way to keep this from happening??? |
![]() |
![]() |
![]() |
#2 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
|
I have also noticed that if a line ends in a capital letter, a new paragraph is started, even though there is no punctuation.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
|
also found a case where a line starting with a double quote " also started a new paragraph, again with no other punctionation
|
![]() |
![]() |
![]() |
#4 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
|
And, hopefully finally, I've found a case where a typed-out ellipsis " . . . " at the end of a line starts a new paragraph.
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
There is no concept of a 'paragraph' in pdfs.
PDF unwrap works based off of punctuation, html that pdftohtml generates doesn't provide any other clues as to what is and is not a paragraph. You just need to fix it up yourself after conversion unfortunately. There is a new pdf engine that will probably get released someday which contains info such as indentation and spacing between lines, which could be used to determine paragraph boundaries. No telling when it will be ready though. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
|
Thanks.
I do know that this is a difficult area in PDF translation. But even so, Calibre is light-years ahead of the translators inside of Acrobat Pro. Acrobat Pro makes a total mess of a document's paragraph structure, even when its clean. I was hoping that I wouldn't have to scrub thru the Calibre conversions, but ... oh, well. At least there's a lot less to look for. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ragged right / space between paragraphs | Oldpilot | Sigil | 5 | 11-11-2010 07:59 PM |
Paragraphs between Pages and Calibre | La Nuestra | Calibre | 21 | 10-18-2010 08:03 AM |
Paragraphs from indentations? | Raketemensch | Calibre | 6 | 09-16-2010 10:43 AM |
Remove spacing between paragraphs doesn't. | Djehuty | Calibre | 6 | 04-28-2009 04:53 AM |
Paragraphs and indent | mrmikel | Calibre | 33 | 01-10-2009 05:37 PM |