![]() |
#1 |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Handling Broken Paragraphs
For whatever reason, in some PG files some paragraphs are broken into pieces. Some are one line paragraphs.
This requires a tedious process of code editing to put them back together. Is there a method, which I have not found, of marking contiguous paragraphs and pulling them together into a single paragraph? This is a lot to asks for, but it would sure come in handy. ![]() |
![]() |
![]() |
![]() |
#2 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,884
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
You highlight the Paragraph and select un-wrap text (hint: assign a key-stroke, I use the + on the number pad as a simple stroke) It is still tedious. I expected better from PG. Once they are converted from TXT, this is no longer an option. |
|
![]() |
![]() |
![]() |
#3 |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
You can do this with the replace function (in the Edit menu).
Suppose you have the following text: Code:
This is a part of a broken paragraph. These are really. Two paragraphs. Code:
<p>This is a part of§</p> <p>a broken paragraph.</p> <p>These are really.</p> <p>Two paragraphs.</p> Code:
§</p> <p> This supposes that all the broken paragraphs are the same, i.e. the same amount of newlines and spaces in between. If this is not the case you have to use a regular expression. Click the 'More' button and select Regular Expression, and in the 'Find what' field use: Code:
§</p>\s*<p> Sometimes there may be additional whitespace at the end of the first part or the beginning of the second part. To get rid of these add additional \s* at the proper places. E.g. Code:
\s*§\s*</p>\s*<p>\s* Code:
\s*§\s*</p>\s*(<br\s*/>)?\s*<p>\s* |
![]() |
![]() |
![]() |
#4 |
Not who you think I am...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
|
I assume you use Windows.
There is a great old tool for unmangling text, called InterParse. It's available here on the forums. InterParse Thread. Bit of a funky interface, but once you get it you can do a LOT of different regex and search-replace without having to know any code. cap |
![]() |
![]() |
![]() |
#5 |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Code:
\s*§\s*</p>\s*(<br\s*/>)?\s*<p>\s* Many thanks! |
![]() |
![]() |
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 972
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-T2, Kindle Paperwhite 11th gen
|
This is the feature I miss most in Sigil, and the main reason why I stick to BookDesigner.
In BookDesigner you just select "Broken sentences" in the "Element browser", and all the broken paragraphs show up. You click on each one and go straight to where the problem is. |
![]() |
![]() |
![]() |
#7 |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
If the broken paragraphs can be characterized as not ending with a . ! or ? then you don't even have to add a special marker character but you can replace the § in the regular expressions with: ([^.!? ]) and put in the 'Replace with' field: \1 followed by a space character. Note: there is a space character after the ?
Now all the broken paragraphs will be found automatically. |
![]() |
![]() |
![]() |
#8 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Paragraphs can also end in a :, and one should be especially careful with poetry lines.
|
![]() |
![]() |
![]() |
#9 |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
This has been a very interesting discussion. Any general method of addressing broken paragraphs can be fraught with problems.
I've used BD for hundreds of PG books and never found the Broken Paragraph selection to be particullaly useful. One advantage that BD does have is the HTML Fragment. A block of text can be highlighted and only the code for that fragment is presented for operations and repair. This eliminates the danger of affecting code throught the document. If Valoric is following this thread, perhaps he would comment on the difficulty of implementing such a capability before anyone (me?) submits an issue. It may already be in the pipeline. Right now I'm still partial to the special character due to the Law of Unintended Consequences of RegEx's. There is no cut and try methodology as with RegEx Buddy and other RegEx tools. Right now the programmer's best friend (UNDO) is problematical. Thanks for all of the thought provocing discussion. |
![]() |
![]() |
![]() |
#10 | |
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
Quote:
This idea that it "eliminates the danger of affecting code throught the document"... I don't see it. You could certainly introduce problems in a fragment that could adversely affect the rest of the document. But if you're talking about incorrect regexes sometimes affecting more than they should, that's different. A "Current selection" option for the "Look in" field in the search & replace dialog should alleviate that, and this feature is planned. Using that would make the S&R dialog operate only within the selected text. |
|
![]() |
![]() |
![]() |
#11 | |
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,301
Karma: 16078357
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
I find the below scenario often enough after converting to epub from X format in calibre. I find Sigil invaluable in correcting these errors. It usually takes me a few find and replaces to get the book with broken paragraphs close to normal. My most recent case went like this. All paragraphs were surrounded by <p class="MsoPlainText">paragraph words</p> and spaces between the paragraphs were created by <p class="MsoPlainText"></p> 1. I just replaced all <p class="MsoPlainText"></p> with <br/><br/> 2. Then I replaced(removed) all <p class="MsoPlainText"> with nothing and let Sigil clean up the closing elements. 3. Then I replaced all <br/> with </p><p> (again Sigil cleaned up the cases where there were <p></p> without words in between.) 4. Then I replaced all <p> with <p class="description"> which is the class that held my indent and margins. After 4 steps all paragraphs were reformed and the book although not up to production quality specs was readable with paragraph indents and the spacing between paragraphs that I prefer. Any lists that existed prior to this process required manual editing to make them whole again. It is probably best to identify lists upfront and change them so they are not caught up in this process. Last edited by DoctorOhh; 06-15-2010 at 12:11 AM. |
|
![]() |
![]() |
![]() |
#13 |
Member
![]() Posts: 24
Karma: 10
Join Date: Jun 2010
Location: Sofia, Bulgaria
Device: Kindle 3
|
OK, so I go with " [^.]</p><p class="calibre2"> " Everything goes fine for me, but when I press replace(which is empty) the last symbol of the word is deleted-
For example Code:
the same</p><p class="calibre2"> Code:
the sam help, pls.. |
![]() |
![]() |
![]() |
#14 |
Not who you think I am...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 374
Karma: 30283
Join Date: Jan 2010
Location: Honolulu
Device: PocketBook 360 -- Ivory
|
You're selecting the last character before the close paragraph tag with [^.] so when you replace you have to put it back, otherwise it gets overwritten with your replace string (nothing, in this case).
Code:
Search: ([^.])</p><p class="calibre2"> Replace: $1 cap |
![]() |
![]() |
![]() |
#15 |
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to join broken paragraphs? | purcelljf | Workshop | 8 | 08-19-2010 03:21 PM |
PDF Handling on New Kindle | Sheikspeare | Amazon Kindle | 21 | 08-09-2010 04:34 AM |
Metadata Handling in 0.7.+ | tonyc46 | Calibre | 2 | 06-23-2010 05:35 AM |
Broken Ipod works Fine! except that its broken | Andybaby | Lounge | 1 | 06-04-2009 02:03 AM |
Handling several wordlists. | Gianfranco | Bookeen | 9 | 08-20-2008 09:29 AM |