Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-27-2012, 08:35 PM   #151
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,408
Karma: 43171350
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
The HTML code looks like:

Code:
<p class="calibre"><span>bad policy to answer a</span></p>

<p class="calibre"><span>direct question. He kept shaking his head like a china figure.
Ugh. Those empty spans surrounding literally everything are always a pain in the ass. You'll almost surely need to get rid of them first. The problem is ... there can be nested spans (italics/bolds/etc) within them. And that makes it quite painful to regex them away (without funkifying your "real" formatting spans).

If I have the original text to proof against, I sometimes find it easier (and less frustrating) just to blast ALL the spans away. Every single one. And then redo any italic and/or other special formatting using the physical copy as a guide. It's drastic, yes, but sometimes it's less drastic than fixing the havoc that a regex run on nested spans can wreak.

In one fell swoop, all span tags (opening and closing) ... gone (when you replace it with nothing of course):
Code:
</?span[^>]*?>
It all depends on the complexity of the book's formatting, of course. I may not always opt for the "nuclear" span removal approach, but I've done it quite a few times.

Use with an appropriate level of trepidation, of course...

Last edited by DiapDealer; 09-27-2012 at 09:44 PM.
DiapDealer is offline   Reply With Quote
Old 09-28-2012, 01:15 AM   #152
JMikeD
Evangelist
JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.
 
JMikeD's Avatar
 
Posts: 452
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
Quote:
Originally Posted by DiapDealer View Post

Use with an appropriate level of trepidation, of course...
It's probably just as easy to export the entire thing to RTF, clean everything up in OpenOffice and use the ePub Export extension in OO. That gives pretty clean results.
JMikeD is offline   Reply With Quote
Old 09-28-2012, 05:01 AM   #153
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 6,253
Karma: 4801165
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I'd first identify the spans that do something (search for "<span ", replace them with something more meaningful (<i>, <strong>...), or with some other temporary mark), then delete the remaining bogus spans.
Jellby is offline   Reply With Quote
Old 09-29-2012, 04:01 PM   #154
WS64
WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.WS64 ought to be getting tired of karma fortunes by now.
 
WS64's Avatar
 
Posts: 585
Karma: 506380
Join Date: Aug 2010
Location: Germany
Device: Kobo Aura / Bookeen Frontlight / Kobo Mini / Kindle 3 / Nook Color
I would remove ALL <span> (without anything behind) and let Tidy remove the corresponding closing spans.

Then search for
</p>

<p class="calibre">([a-z])
and replace it with
_\1
(_ = blank)

Also search for
([a-zA-Z,])</p>

<p class="calibre">
and replace it with
\1_
(_ = blank)
WS64 is offline   Reply With Quote
Old 10-23-2012, 08:14 AM   #155
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,086
Karma: 1444487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Just a very simple expression for finding instances of period, followed by a space, by a lower case letter, caused by poor OCR.

\. ([a-z])

Not a candidate for auto search and replace because it matches abbreviations, too.
mrmikel is offline   Reply With Quote
Old 10-24-2012, 02:10 AM   #156
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 3,086
Karma: 5658305
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by WS64 View Post
I would remove ALL <span> (without anything behind) and let Tidy remove the corresponding closing spans.
I would strongly recommend not to do that. If there are nested spans, Tidy doesn't always remove the correct closing span. That can make a real mess out of your book.

It is never a good idea to trust Tidy to make the right choice....
Toxaris is offline   Reply With Quote
Old 10-25-2012, 01:58 AM   #157
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,456
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Now that Sigil has a nice Search Editor, I can add some more regex.
I would like to set up this one:

It's about superscript text:

The text me mes lle lles er e o placed within a sup tag and followed by a normal space should instead be followed by a &nbsp;

Say <sup>me</sup>(normal space) should be replaced by
<sup>me</sup>&nbsp;

(me, lle are superscript short for M(adame), M(ademoiselle)...

I hope I have been clear enough...
roger64 is online now   Reply With Quote
Old 10-25-2012, 06:29 AM   #158
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 645
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Search (put a space at end after </sup>)
Code:
<sup>(me|mes|lle|lles|er|e|o)</sup>
Replace
Code:
<sup>\1</sup>&nbsp;
Perkin is offline   Reply With Quote
Old 10-25-2012, 06:57 AM   #159
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,456
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
@Perkin

Thanks a lot for your help. I did not know how to deal with the "false" words like me, mes...



Already in use.

Last edited by roger64; 10-25-2012 at 10:30 AM.
roger64 is online now   Reply With Quote
Old 12-23-2012, 09:47 AM   #160
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,456
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
To replace hyphen with non-breaking hyphen

Hi

I try to set up a regex for French language.

We have some acronyms linked with hyphens 8208 like
J.-C, P.-D.G. (and the list can grow) They are always unhappily hyphenated and it would be much better if they were not. That's why I would like to replace their hyphens with non-breaking hyphens 8209
I do not know how to set up this regex. Ideally, I would like to be able to just add easily one new word.

I think there must be better than this.
I wrote only 8208 instead of the full &#...:

Search: (J.|P.)8208(C.|D.G)
Replace: \18209\2
roger64 is online now   Reply With Quote
Old 12-23-2012, 10:43 AM   #161
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 6,253
Karma: 4801165
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Are there instances of hyphen after a period that you do not want to replace? If there aren't you can just replace all ".-" with ".¬" (where I use ¬ for the non-breaking hyphen), with appropriate escaping of the period if needed.
Jellby is offline   Reply With Quote
Old 12-23-2012, 10:46 AM   #162
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 2,027
Karma: 4836606
Join Date: Dec 2010
Device: Kindle PW2
I'm sure that the Regex gurus will come up with a much more efficient Regex, but I'd simply search for a capital letter with a period followed by &#8208; and another capital letter followed by a period:

Find: ([[:upper:]]\.)&#8208;([[:upper:]]\.)
Replace: \1&#8209;\2

This should work in Sigl and any other Editor with PCRE support.
Doitsu is offline   Reply With Quote
Old 12-23-2012, 11:46 PM   #163
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,456
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Hi

Like for many things, I gather experience book after book. After preparing an history book, I realized that to use a hyphen for J.-C. (70 occurrences of it in one book) was NOT a nice idea.

I have no idea how many words of this kind I may find and I am really not sure that all occurrences of .- should deserve the same treatment. That's why, I thought first to add them one by one.

But, in fact, I realize there does not seem to be a very big risk to try your solutions. So I will try them. Thanks for them.

And enjoy a Merry Chrismas.
roger64 is online now   Reply With Quote
Old 12-24-2012, 04:36 AM   #164
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 6,253
Karma: 4801165
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Well, try searching for ".-" first and see which occurrences you find. With any luck you'll see they all want to be non-breaking, or you may see a pattern (like Doitsu's suggestion) and find some typos
Jellby is offline   Reply With Quote
Old 01-09-2013, 10:08 AM   #165
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 162
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
found myself parsing messy html today, removing empty <p> tags, or <p> tags containing &nbsp;, or <p><i></i></p>, <p><b> </b><p> etc. so that i could space the paragraphs consistently in css, and, inspired by this thread, thought i'd share the snippet in case anyone has a use for it.

i realize it could probably be more concise, and i wouldn't just blindly replace all, but it seems to do the job. it removes <p> tags that may also contain <b>, <i>, <span>, have no content, or 1 or more spaces, or a <br>,<br/>,<br />.

Code:
<p[^>]*>((<\w+[^>/]*>)+)?(<br((\s)?/)?>|&nbsp;|\s*)((</\w+[^>]*>)+)?</p>

Last edited by mzmm; 01-09-2013 at 10:15 AM.
mzmm is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 07:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 04:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 09:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 04:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 05:23 AM


All times are GMT -4. The time now is 08:12 AM.


MobileRead.com is a privately owned, operated and funded community.