Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-15-2013, 02:54 PM   #1
MizSuz
Connoisseur
MizSuz began at the beginning.
 
Posts: 63
Karma: 10
Join Date: Jul 2011
Device: Sony Touch, Nook Simple Touch, Kobo Aura, Android w/CoolReader
Replacing Periods with Commas

Greetings!

I've got an epub that started life as a pdf, I think. It has, with some frequency, occurrences in which a period appears where a comma should be. Aside from being able to tell contextually there is the added bonus that the first letter of the next "sentence" is in lower case.

I have identified that I can search for these instances using

Code:
[.] [a-z]
and it works wonderfully by selecting the period, the space, and the first letter of the next sentence if it is lower case.

However, if I try to use

Code:
[,] [a-z]
to replace the period with a comma I end up with the entire string where the period, space, and letter used to be.

What am I doing wrong, specifically? Obviously my regex sucks, I'm just not sure how to fix it.
MizSuz is offline   Reply With Quote
Old 12-15-2013, 03:38 PM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
You were off to a FANTASTIC start (especially if you are new to regex).

Also, take a look through the Regex topic in the Sigil forum, there is A TON of helpful Regex in there:

https://www.mobileread.com/forums/sho...d.php?t=167971

A period in Regex stands for "any character", so in order to catch an ACTUAL period, you have to escape it with a slash '\'.

Search:

Code:
\. ([a-z])
Replace:

Code:
, \1
What you want to do is "capture" the lowercase letters after the period, and you do this by adding parenthesis around what you want to capture (Red).

So, this regex in English says:

"Search for a period, then a space, and capture the "lowercase a through z" and stick it in \1".

"Replace with a comma, space, and then whatever lowercase a-z was captured in \1".

Some more complex Regexes might have you capturing a lot more things, and then you would be able to use \2, \3, \4, ...

Quote:
Originally Posted by MizSuz View Post
I've got an epub that started life as a pdf, I think. It has, with some frequency, occurrences in which a period appears where a comma should be.
Indeed... usually this is just due to a crappy scan, a fully automated OCR (like Archive.org), or just a really crappy converter.

Is this a public domain work? After you are done cleaning it up, you should post it on MobileRead!

Last edited by Tex2002ans; 12-15-2013 at 03:42 PM.
Tex2002ans is offline   Reply With Quote
Old 12-15-2013, 03:56 PM   #3
MizSuz
Connoisseur
MizSuz began at the beginning.
 
Posts: 63
Karma: 10
Join Date: Jul 2011
Device: Sony Touch, Nook Simple Touch, Kobo Aura, Android w/CoolReader
Holy smokes! Not only have you helped me with the correct and properly working regex but you've given me one of the clearest and easiest to understand explanations for regex functions I've ever read! I actually learned several things in your explanation above and beyond the bit of code I was searching for.

THANK YOU!

I'm afraid the book is not public domain or I would share. It's not in terrible condition but there are places where it is obvious it was a pdf or a scan in a former life.

Wow. I really appreciate your help.
MizSuz is offline   Reply With Quote
Old 12-15-2013, 06:38 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
And don't be fooled. This is 100% applicable to Sigil
https://www.mobileread.com/forums/sho...d.php?t=118570

(This is the one that beat REGEX into my thick skull)
theducks is offline   Reply With Quote
Old 12-15-2013, 09:46 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by MizSuz View Post
Holy smokes! Not only have you helped me with the correct and properly working regex but you've given me one of the clearest and easiest to understand explanations for regex functions I've ever read! I actually learned several things in your explanation above and beyond the bit of code I was searching for.
You are welcome. Any time you need regex help, just ask. There seems to be a bunch of Regex gurus in this Sigil section.

If you need more in-depth explanations of any in the Regex topic as well, feel free to ask (I can see if I can figure it out and color code + give "in English" explanations).

I seem to be one of the few who color codes the Regex explanations... I find that to be much more understandable (although it takes a little longer to layout/plan/post). Especially when dealing with capture points, it is nice to see the colors matching in input/output.

Always remember though to save a copy of the EPUB before you run any regex, and do not ever use REPLACE ALL unless you have been using for the Regex for a long time and know EXACTLY what it will be doing. Regex is extremely powerful, and it is very easy to mess up (even the best of us sometimes make typos, so I always do a few replace/undo, replace/undo, just to make sure that it is doing what I want).

Also helpful is the Sigil Saved Search feature:

http://web.sigil.googlecode.com/git/..._searches.html

Which allows you to save a list of Searches/Regex, and allows you to easily load/run them.

Quote:
Originally Posted by MizSuz View Post
I'm afraid the book is not public domain or I would share. It's not in terrible condition but there are places where it is obvious it was a pdf or a scan in a former life.
Understandable... I hate when you purchase a book and get a crappy quality book conversion. The thing that stinks though about having to edit a book that you want to read (especially in the case of a fiction book), is that you have the chance to spoil the book beforehand!

Quote:
Originally Posted by theducks View Post
And don't be fooled. This is 100% applicable to Sigil
https://www.mobileread.com/forums/sho...d.php?t=118570

(This is the one that beat REGEX into my thick skull)
This is where I learned most of the Regex (plus seeing a few examples in the sticky Regex topic):

http://www.regular-expressions.info/tutorial.html

Nowadays, I mostly just come up with the Regex off the top of my head according to patterns of errors that I recognize while looking through a book.

Last edited by Tex2002ans; 12-15-2013 at 09:51 PM.
Tex2002ans is offline   Reply With Quote
Old 12-16-2013, 12:28 AM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
As far as the global replace does, you may wish to consider moving over to calibre's new ebook-edit feature. As Sigil is no longer being maintained, it is rather difficult to get new features, whereas Kovid is very active in maintaining calibre.

The important thing to consider is this: from the beginning calibre ebook-edit has had a very cool feature, it is called global undo-redo. Every time a global action is done, a checkpoint is created allowing you to roll back those changes. And you can manually create checkpoints too.

It is not fully done yet, but the editor has had all the basic functionality added already, and soon we will have saved searches, clips, a formatting toolbar, and all kinds of Sigil goodies! As soon as Kovid gets a chance to code them. He started work less than 2 months ago, so what we have already is huge, considering.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing periods in Authors dKodiak Library Management 2 09-14-2013 01:28 PM
Replacing code without replacing text? ElMiko Sigil 6 11-30-2011 08:14 PM
Add a space after periods prhammer Conversion 3 05-13-2011 06:51 AM
Seriously thoughtful What about commas? GraceKrispy Lounge 115 10-18-2010 10:19 PM
Commas in LRF metadata kevin_boone Calibre 22 02-12-2009 01:39 PM


All times are GMT -4. The time now is 07:37 PM.


MobileRead.com is a privately owned, operated and funded community.