Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 06-11-2010, 09:56 AM   #1
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 14,875
Karma: 13391746
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Handling Broken Paragraphs

For whatever reason, in some PG files some paragraphs are broken into pieces. Some are one line paragraphs.

This requires a tedious process of code editing to put them back together.

Is there a method, which I have not found, of marking contiguous paragraphs and pulling them together into a single paragraph?

This is a lot to asks for, but it would sure come in handy.
crutledge is online now   Reply With Quote
Old 06-11-2010, 10:37 AM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,240
Karma: 5495470
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by crutledge View Post
For whatever reason, in some PG files some paragraphs are broken into pieces. Some are one line paragraphs.

This requires a tedious process of code editing to put them back together.

Is there a method, which I have not found, of marking contiguous paragraphs and pulling them together into a single paragraph?

This is a lot to asks for, but it would sure come in handy.
If they are TXT files, I use Notepad++ "unwrap text" to help.
You highlight the Paragraph and select un-wrap text (hint: assign a key-stroke, I use the + on the number pad as a simple stroke)

It is still tedious.
I expected better from PG.
Once they are converted from TXT, this is no longer an option.
theducks is offline   Reply With Quote
Old 06-11-2010, 11:10 AM   #3
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 514
Karma: 24612
Join Date: Aug 2009
Location: Cochabamba, BO
Device: Onyx Boox 60, iPod Touch
You can do this with the replace function (in the Edit menu).
Suppose you have the following text:
Code:
This is a part of

a broken paragraph.

These are really.

Two paragraphs.
You can mark the paragraphs to be merged by putting some text at the merge points that doesn't occur anywhere in the document. This could be something like XXXXX or as I have chosen: a single strange character like §. Now go to the code view. It looks like:
Code:
  <p>This is a part of§</p>

  <p>a broken paragraph.</p>

  <p>These are really.</p>

  <p>Two paragraphs.</p>
Now copy the part from the § up to and including the <p> on the next line. Put the cursor before the first text to be merged. Then choose the Replace function from the edit menu, and paste the copied text in the 'Find what' field. It will look like
Code:
§</p>    <p>
but in reality one or more of the spaces will be a newline character. In the 'Replace with' field enter a single space character. Then hit Replace all.

This supposes that all the broken paragraphs are the same, i.e. the same amount of newlines and spaces in between. If this is not the case you have to use a regular expression. Click the 'More' button and select Regular Expression, and in the 'Find what' field use:
Code:
§</p>\s*<p>
In a regular expression \s* means zero or more whitespace.
Sometimes there may be additional whitespace at the end of the first part or the beginning of the second part. To get rid of these add additional \s* at the proper places. E.g.
Code:
\s*§\s*</p>\s*<p>\s*
There may also be extra linebreaks between the parts, which manifests themselves in the code view as <br /> or something similar. To get rid of these you add (<br\s*/>)? at the proper place. This means zero or one line breaks. And the / in the line break may be preceded by zero or more whitespace. So the total 'Find what' field then could become a bit more complicated:
Code:
\s*§\s*</p>\s*(<br\s*/>)?\s*<p>\s*
pietvo is offline   Reply With Quote
Old 06-11-2010, 03:52 PM   #4
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 343
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
I assume you use Windows.

There is a great old tool for unmangling text, called InterParse. It's available here on the forums.

InterParse Thread.

Bit of a funky interface, but once you get it you can do a LOT of different regex and search-replace without having to know any code.

cap
capidamonte is offline   Reply With Quote
Old 06-11-2010, 04:28 PM   #5
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 14,875
Karma: 13391746
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Code:
\s*§\s*</p>\s*(<br\s*/>)?\s*<p>\s*
Pietvo, you're a genius. I hadn't thought of using a special marker character. Now I know the solution when I run into it again.

Many thanks!
crutledge is online now   Reply With Quote
Old 06-11-2010, 08:06 PM   #6
Pablo
Guru
Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.
 
Pablo's Avatar
 
Posts: 722
Karma: 2541163
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
This is the feature I miss most in Sigil, and the main reason why I stick to BookDesigner.
In BookDesigner you just select "Broken sentences" in the "Element browser", and all the broken paragraphs show up. You click on each one and go straight to where the problem is.
Pablo is offline   Reply With Quote
Old 06-11-2010, 10:54 PM   #7
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 514
Karma: 24612
Join Date: Aug 2009
Location: Cochabamba, BO
Device: Onyx Boox 60, iPod Touch
If the broken paragraphs can be characterized as not ending with a . ! or ? then you don't even have to add a special marker character but you can replace the § in the regular expressions with: ([^.!? ]) and put in the 'Replace with' field: \1 followed by a space character. Note: there is a space character after the ?

Now all the broken paragraphs will be found automatically.
pietvo is offline   Reply With Quote
Old 06-12-2010, 05:25 AM   #8
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 5,984
Karma: 4346919
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Paragraphs can also end in a :, and one should be especially careful with poetry lines.
Jellby is offline   Reply With Quote
Old 06-12-2010, 07:29 AM   #9
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 14,875
Karma: 13391746
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
This has been a very interesting discussion. Any general method of addressing broken paragraphs can be fraught with problems.

I've used BD for hundreds of PG books and never found the Broken Paragraph selection to be particullaly useful. One advantage that BD does have is the HTML Fragment. A block of text can be highlighted and only the code for that fragment is presented for operations and repair. This eliminates the danger of affecting code throught the document.

If Valoric is following this thread, perhaps he would comment on the difficulty of implementing such a capability before anyone (me?) submits an issue. It may already be in the pipeline.

Right now I'm still partial to the special character due to the Law of Unintended Consequences of RegEx's. There is no cut and try methodology as with RegEx Buddy and other RegEx tools. Right now the programmer's best friend (UNDO) is problematical.

Thanks for all of the thought provocing discussion.
crutledge is online now   Reply With Quote
Old 06-12-2010, 08:57 AM   #10
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by crutledge View Post
One advantage that BD does have is the HTML Fragment. A block of text can be highlighted and only the code for that fragment is presented for operations and repair. This eliminates the danger of affecting code throught the document.

If Valoric is following this thread, perhaps he would comment on the difficulty of implementing such a capability before anyone (me?) submits an issue. It may already be in the pipeline.
Frankly I don't see this "edit HTML fragment" feature as particularly useful in Sigil. It's useful in BD where there is no Code View and you can't sync the position of the cursor in the rendered view to the HTML code view window.

This idea that it "eliminates the danger of affecting code throught the document"... I don't see it. You could certainly introduce problems in a fragment that could adversely affect the rest of the document.

But if you're talking about incorrect regexes sometimes affecting more than they should, that's different. A "Current selection" option for the "Look in" field in the search & replace dialog should alleviate that, and this feature is planned. Using that would make the S&R dialog operate only within the selected text.
Valloric is offline   Reply With Quote
Old 06-12-2010, 09:16 AM   #11
crutledge
eBook FANatic
crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.crutledge ought to be getting tired of karma fortunes by now.
 
crutledge's Avatar
 
Posts: 14,875
Karma: 13391746
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
Quote:
Originally Posted by Valloric View Post
But if you're talking about incorrect regexes sometimes affecting more than they should, that's different. A "Current selection" option for the "Look in" field in the search & replace dialog should alleviate that, and this feature is planned. Using that would make the S&R dialog operate only within the selected text.
Same effect. Outstanding solution.
crutledge is online now   Reply With Quote
Old 06-15-2010, 12:08 AM   #12
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,773
Karma: 12516053
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by crutledge View Post
For whatever reason, in some PG files some paragraphs are broken into pieces. Some are one line paragraphs.
I know most of you are far cleverer then I am so this might help those folks who are my peers in limited ability.

I find the below scenario often enough after converting to epub from X format in calibre. I find Sigil invaluable in correcting these errors. It usually takes me a few find and replaces to get the book with broken paragraphs close to normal.

My most recent case went like this. All paragraphs were surrounded by <p class="MsoPlainText">paragraph words</p> and spaces between the paragraphs were created by <p class="MsoPlainText"></p>

1. I just replaced all <p class="MsoPlainText"></p> with <br/><br/>

2. Then I replaced(removed) all <p class="MsoPlainText"> with nothing and let Sigil clean up the closing elements.

3. Then I replaced all <br/> with </p><p> (again Sigil cleaned up the cases where there were <p></p> without words in between.)

4. Then I replaced all <p> with <p class="description"> which is the class that held my indent and margins.

After 4 steps all paragraphs were reformed and the book although not up to production quality specs was readable with paragraph indents and the spacing between paragraphs that I prefer.

Any lists that existed prior to this process required manual editing to make them whole again. It is probably best to identify lists upfront and change them so they are not caught up in this process.

Last edited by DoctorOhh; 06-15-2010 at 12:11 AM.
DoctorOhh is offline   Reply With Quote
Old 06-21-2010, 06:46 PM   #13
deckoff
Member
deckoff began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Jun 2010
Location: Sofia, Bulgaria
Device: Kindle 3
OK, so I go with " [^.]</p><p class="calibre2"> " Everything goes fine for me, but when I press replace(which is empty) the last symbol of the word is deleted-
For example
Code:
the same</p><p class="calibre2">
=
Code:
the sam
What to put in replace field to keep my last symbol.
help, pls..
deckoff is offline   Reply With Quote
Old 06-21-2010, 07:17 PM   #14
capidamonte
Not who you think I am...
capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!capidamonte , Klaatu Barada Niktu!
 
capidamonte's Avatar
 
Posts: 343
Karma: 5337
Join Date: Jan 2010
Location: Honolulu
Device: Sony PRS-350
You're selecting the last character before the close paragraph tag with [^.] so when you replace you have to put it back, otherwise it gets overwritten with your replace string (nothing, in this case).

Code:
Search: ([^.])</p><p class="calibre2">
Replace: $1
$1 can also be \1 on some regex systems. The enclosing paragraphs mark the match for reuse in replacements. Every successive set of paragraph tags (up to 9, I think) can be returned to your replacement string via $1, $2, etc.

cap
capidamonte is offline   Reply With Quote
Old 06-21-2010, 07:41 PM   #15
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,978
Karma: 350515
Join Date: Feb 2008
Device: Sony Reader PRS 505
Quote:
Originally Posted by capidamonte View Post
$1 can also be \1 on some regex systems. The enclosing paragraphs mark the match for reuse in replacements. Every successive set of paragraph tags (up to 9, I think) can be returned to your replacement string via $1, $2, etc.
Sigil uses \1, \2, \3... etc.
Valloric is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to join broken paragraphs? purcelljf Workshop 8 08-19-2010 03:21 PM
PDF Handling on New Kindle Sheikspeare Amazon Kindle 21 08-09-2010 04:34 AM
Metadata Handling in 0.7.+ tonyc46 Calibre 2 06-23-2010 05:35 AM
Broken Ipod works Fine! except that its broken Andybaby Lounge 1 06-04-2009 02:03 AM
Handling several wordlists. Gianfranco Bookeen 9 08-20-2008 09:29 AM


All times are GMT -4. The time now is 05:18 PM.


MobileRead.com is a privately owned, operated and funded community.