|
|
#1 |
|
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,873
Karma: 13081948
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Handling Broken Paragraphs
This requires a tedious process of code editing to put them back together. Is there a method, which I have not found, of marking contiguous paragraphs and pulling them together into a single paragraph? This is a lot to asks for, but it would sure come in handy.
__________________
Charlie 'Bene legere saecla vincere'. 'To read well is to master the ages' [Prof. Issac Flagg] |
|
|
|
|
|
#2 | |
|
Staff to 4 Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 10,725
Karma: 2485850
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2,Black Astak PEz, K4NT(now Wifes)
|
Quote:
You highlight the Paragraph and select un-wrap text (hint: assign a key-stroke, I use the + on the number pad as a simple stroke) It is still tedious. I expected better from PG. Once they are converted from TXT, this is no longer an option.
__________________
Using: Ubuntu(32 bit):Oneric,Precise and XPpro SP3, W7HP(64)- - Libre Office w/Writer2EPUB
|
|
|
|
|
|
Enthusiast
|
|
|
|
#3 |
|
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 8718
Join Date: Aug 2009
Location: Cochabamba, BO
Device: Onyx Boox 60, iPod Touch
|
You can do this with the replace function (in the Edit menu).
Suppose you have the following text: Code:
This is a part of a broken paragraph. These are really. Two paragraphs. Code:
<p>This is a part of§</p> <p>a broken paragraph.</p> <p>These are really.</p> <p>Two paragraphs.</p> Code:
§</p> <p> This supposes that all the broken paragraphs are the same, i.e. the same amount of newlines and spaces in between. If this is not the case you have to use a regular expression. Click the 'More' button and select Regular Expression, and in the 'Find what' field use: Code:
§</p>\s*<p> Sometimes there may be additional whitespace at the end of the first part or the beginning of the second part. To get rid of these add additional \s* at the proper places. E.g. Code:
\s*§\s*</p>\s*<p>\s* Code:
\s*§\s*</p>\s*(<br\s*/>)?\s*<p>\s* |
|
|
|
|
|
#4 |
|
Not who you think I am...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 332
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-T1
|
I assume you use Windows.
There is a great old tool for unmangling text, called InterParse. It's available here on the forums. InterParse Thread. Bit of a funky interface, but once you get it you can do a LOT of different regex and search-replace without having to know any code. cap |
|
|
|
|
|
#5 |
|
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,873
Karma: 13081948
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Code:
\s*§\s*</p>\s*(<br\s*/>)?\s*<p>\s* Many thanks!
__________________
Charlie 'Bene legere saecla vincere'. 'To read well is to master the ages' [Prof. Issac Flagg] |
|
|
|
|
|
#6 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 627
Karma: 1901287
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
|
This is the feature I miss most in Sigil, and the main reason why I stick to BookDesigner.
In BookDesigner you just select "Broken sentences" in the "Element browser", and all the broken paragraphs show up. You click on each one and go straight to where the problem is.
__________________
My Quick and Dirty ePub Tutorial (version 3 covering Sigil 0.7.1 now available, in pdf and ePub format) Bridging the gap between BookDesigner and Sigil: Try HTML02HTML |
|
|
|
|
|
#7 |
|
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 8718
Join Date: Aug 2009
Location: Cochabamba, BO
Device: Onyx Boox 60, iPod Touch
|
If the broken paragraphs can be characterized as not ending with a . ! or ? then you don't even have to add a special marker character but you can replace the § in the regular expressions with: ([^.!? ]) and put in the 'Replace with' field: \1 followed by a space character. Note: there is a space character after the ?
Now all the broken paragraphs will be found automatically. |
|
|
|
|
|
#8 |
|
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,147
Karma: 2505637
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon
|
Paragraphs can also end in a :, and one should be especially careful with poetry lines.
|
|
|
|
|
|
#9 |
|
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,873
Karma: 13081948
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
This has been a very interesting discussion. Any general method of addressing broken paragraphs can be fraught with problems.
I've used BD for hundreds of PG books and never found the Broken Paragraph selection to be particullaly useful. One advantage that BD does have is the HTML Fragment. A block of text can be highlighted and only the code for that fragment is presented for operations and repair. This eliminates the danger of affecting code throught the document. If Valoric is following this thread, perhaps he would comment on the difficulty of implementing such a capability before anyone (me?) submits an issue. It may already be in the pipeline. Right now I'm still partial to the special character due to the Law of Unintended Consequences of RegEx's. There is no cut and try methodology as with RegEx Buddy and other RegEx tools. Right now the programmer's best friend (UNDO) is problematical. Thanks for all of the thought provocing discussion.
__________________
Charlie 'Bene legere saecla vincere'. 'To read well is to master the ages' [Prof. Issac Flagg] |
|
|
|
|
|
#10 | |
|
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,975
Karma: 348069
Join Date: Feb 2008
Device: Sony Reader PRS 505
|
Quote:
This idea that it "eliminates the danger of affecting code throught the document"... I don't see it. You could certainly introduce problems in a fragment that could adversely affect the rest of the document. But if you're talking about incorrect regexes sometimes affecting more than they should, that's different. A "Current selection" option for the "Look in" field in the search & replace dialog should alleviate that, and this feature is planned. Using that would make the S&R dialog operate only within the selected text. |
|
|
|
|
|
|
#11 | |
|
eBook FANatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,873
Karma: 13081948
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Quote:
__________________
Charlie 'Bene legere saecla vincere'. 'To read well is to master the ages' [Prof. Issac Flagg] |
|
|
|
|
|
|
#12 | |
|
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,317
Karma: 11288999
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7, Sony PRS-950, Sony PRS-505, PRS-300
|
Quote:
I find the below scenario often enough after converting to epub from X format in calibre. I find Sigil invaluable in correcting these errors. It usually takes me a few find and replaces to get the book with broken paragraphs close to normal. My most recent case went like this. All paragraphs were surrounded by <p class="MsoPlainText">paragraph words</p> and spaces between the paragraphs were created by <p class="MsoPlainText"></p> 1. I just replaced all <p class="MsoPlainText"></p> with <br/><br/> 2. Then I replaced(removed) all <p class="MsoPlainText"> with nothing and let Sigil clean up the closing elements. 3. Then I replaced all <br/> with </p><p> (again Sigil cleaned up the cases where there were <p></p> without words in between.) 4. Then I replaced all <p> with <p class="description"> which is the class that held my indent and margins. After 4 steps all paragraphs were reformed and the book although not up to production quality specs was readable with paragraph indents and the spacing between paragraphs that I prefer. Any lists that existed prior to this process required manual editing to make them whole again. It is probably best to identify lists upfront and change them so they are not caught up in this process.
__________________
-- Good Reading, Walt -- 20GB of free CLOUD STORAGE: Use this link to sign up for a free 15GB Copy.com cloud storage account and we both get an extra 5GB of free space. Last edited by DoctorOhh; 06-15-2010 at 12:11 AM. |
|
|
|
|
|
|
#13 |
|
Member
![]() Posts: 15
Karma: 10
Join Date: Jun 2010
Location: Sofia, Bulgaria
Device: Kindle 3
|
OK, so I go with " [^.]</p><p class="calibre2"> " Everything goes fine for me, but when I press replace(which is empty) the last symbol of the word is deleted-
For example Code:
the same</p><p class="calibre2"> Code:
the sam help, pls.. |
|
|
|
|
|
#14 |
|
Not who you think I am...
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 332
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-T1
|
You're selecting the last character before the close paragraph tag with [^.] so when you replace you have to put it back, otherwise it gets overwritten with your replace string (nothing, in this case).
Code:
Search: ([^.])</p><p class="calibre2"> Replace: $1 cap |
|
|
|
|
|
#15 |
|
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,975
Karma: 348069
Join Date: Feb 2008
Device: Sony Reader PRS 505
|
Sigil uses \1, \2, \3... etc.
|
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to join broken paragraphs? | purcelljf | Workshop | 8 | 08-19-2010 03:21 PM |
| PDF Handling on New Kindle | Sheikspeare | Amazon Kindle | 21 | 08-09-2010 04:34 AM |
| Metadata Handling in 0.7.+ | tonyc46 | Calibre | 2 | 06-23-2010 05:35 AM |
| Broken Ipod works Fine! except that its broken | Andybaby | Lounge | 1 | 06-04-2009 02:03 AM |
| Handling several wordlists. | Gianfranco | Bookeen | 9 | 08-20-2008 09:29 AM |