Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 10-29-2012, 05:57 AM   #1
SanatyrZeo
Junior Member
SanatyrZeo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2012
Device: pc, kindle
Regex find and replace

Hi, I'm trying to clean up a pdf converted to epub in Calibre that is filled with linebreaks. Using Sigil's regex search/replace is there any expression I could use in Replace that will keep the character from the original string. I can fix most of the line breaks instantly with replace all but lose the character found by the search as well. See below I'm looking for a linebreak followed by lowercase letters (and whatever format code is inbetween).

Search:
</p>

<p class="calibre1">[(a-w)]

Replace:
" "

Result:
The cat drank from the
milk bowl

The cat drank from the ilk bowl
SanatyrZeo is offline   Reply With Quote
Old 10-29-2012, 06:11 AM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Search:
</p>

<p class="calibre1">([a-w])

Replace:
\1

There's a space before the \1

Do you purposely mean to exclude words that start with a lowercase x, y, or z? Why not [a-z]?
DiapDealer is offline   Reply With Quote
Advert
Old 10-29-2012, 06:25 AM   #3
SanatyrZeo
Junior Member
SanatyrZeo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2012
Device: pc, kindle
oh i meant a-z, it was a typo...
I tried what you said and it replaced the character with \1 like '\1ilk'.
SanatyrZeo is offline   Reply With Quote
Old 10-29-2012, 06:40 AM   #4
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
If you get \1 in your text after a replace it means you haven't correctly specified the () parentheses in your Find text. For instance in your very first text you posted, you had the brackets the wrong way around where it should have been ([a-z]) instead of what you typed of [(a-z)]

If you are using Sigil 0.6 then I instead recommend you right-click on the "Find" box, and under "Example Searches" choose "Join Paragraphs".

It isn't quite the same as the case you are looking to catch, but "most" of the time it will achieve the same thing (or improve upon it). The difference is that the expression in this example search is looking for sentences that have unfinished endings, rather than as you are doing of finding sentences that represent unfinished beginnings. There are still some edge cases it will not catch, such as conversation text which has a finished sentence (but not completed quotes) but it is better than most. And unlike your approach it will catch a situation like this:

<p>The reason</p> <p>Bob did this was...

Of course since the original PDF may have OCR errors (like stray commas), or there may be genuine reasons for the text having a new paragraph (like poetry) you should never do a blanket Replace All with such an expression, but it is better than starting from scratch .

Last edited by kiwidude; 10-29-2012 at 06:57 AM. Reason: Miissing slash
kiwidude is offline   Reply With Quote
Old 10-29-2012, 06:53 AM   #5
SanatyrZeo
Junior Member
SanatyrZeo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2012
Device: pc, kindle
Wow thanks, i did have the parantheses and brackets mixed up.

Yeah I have searched both ways with lower case on either end of line break, and I also run searches for conversations (,") commas, hyphens and and then I run a search for upper case letters that ill probably have to hand review because it could be a dropped period rather than a linebreak. Normally I try do it all by hand but this is a 900 page document. Ill probably ruin poetry if I replace all, but I can manually fix it i suppose.
But thanks this will make it much quicker now.

For the ..., if I confine myself to (a-z) , and ," I wont mess up that. I usually run a manual search for ... and capital letters and see if it is the end of sentence or mid-sentence pause.

Last edited by SanatyrZeo; 10-29-2012 at 06:57 AM. Reason: additions
SanatyrZeo is offline   Reply With Quote
Advert
Old 10-29-2012, 07:03 AM   #6
SanatyrZeo
Junior Member
SanatyrZeo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2012
Device: pc, kindle
Omg 3000 changes at once, thought Sigil was going to hang for a second!

Also I think searching for lowercase letters AFTER a line break will help avoid joining lines that are meant to end with no punctuation like quote attributions or chapter names.

Last edited by SanatyrZeo; 10-29-2012 at 07:06 AM. Reason: ps
SanatyrZeo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sigil Wildcards/Regex Find/Replace Adman35 Sigil 7 08-16-2014 01:02 PM
Regex Find and Replace - Spaces essayhead Sigil 2 08-10-2012 07:41 PM
regex replace??? schuster Conversion 14 01-29-2011 09:02 AM
RegEx find and replace iblesq Sigil 1 01-10-2011 09:26 PM
REGEX find and replace help please potestus Sigil 13 09-18-2010 04:14 PM


All times are GMT -4. The time now is 05:35 PM.


MobileRead.com is a privately owned, operated and funded community.