Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 11-21-2014, 05:46 AM   #1
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
regex question

The new regex-function works great, the section numbering was just something what I needed. However, I stumbled onto some expressions which were not handled properly:
Code:
<h1 class="center"><a href="chapter001.xhtml#start">Test</a></h1>
<h1 class="center" id="" ><span class="something">Test</span></h1>
<h1 class="center" id="" ><b><span class="something">Test</span></b></h1>
The default find construct is not enough:
Code:
(<h1[^<>]*>)([^<>]+</h1>)
I tried to enhance it but either got no match at all, or, very greedy, it selected almost everything

Can somebody teach me how to select "just enough"?
DrChiper is offline   Reply With Quote
Old 11-21-2014, 06:19 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,339
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
(<h1[^<>]*>)(.+?</h1>)
kovidgoyal is offline   Reply With Quote
Advert
Old 11-21-2014, 06:53 AM   #3
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
I was sooo close ...

Anyway, thanks Kovid

Humble suggestion: Update the manual with this version as I'm afraid the examples I provided will likely to be encountered by many more users.
DrChiper is offline   Reply With Quote
Old 11-21-2014, 07:14 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,339
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well, this expression will not match if the heading tag is split over multiple lines, for that you need:

(?s)(<h2[^<>]*>)(.+?</h2>)
kovidgoyal is offline   Reply With Quote
Old 11-21-2014, 07:45 AM   #5
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
Indeed! Although not very common, constructs like this do happen:

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a><br/>

  Some sub header</h1>
Nice improvement, Kovid!
DrChiper is offline   Reply With Quote
Advert
Old 11-21-2014, 08:32 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,046
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by DrChiper View Post
Indeed! Although not very common, constructs like this do happen:

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a><br/>

  Some sub header</h1>
Nice improvement, Kovid!
I think it is common
I use a similar one a Lot when I want the Chapter Title to appear in the TOC

Code:
<h2 class="CN">- 1 - <br /><span class="CT">A New Year</span></h2>
theducks is offline   Reply With Quote
Old 11-21-2014, 09:54 AM   #7
Paulie_D
Connoisseur
Paulie_D began at the beginning.
 
Paulie_D's Avatar
 
Posts: 67
Karma: 10
Join Date: Apr 2011
Device: Kindle 3, Samsung Tab 4
Quote:
Originally Posted by DrChiper View Post
Indeed! Although not very common, constructs like this do happen:

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a><br/>

  Some sub header</h1>
As a 'coder'of sorts, (primarily web) I would tell anyone that using the break tag for spacing is poor practise.

I'm not saying it doesn't have it's place but using it for spacing is not recommended. That's what margins/padding are for.

This

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a><br/>

  Some sub header</h1>
Should be replaced with this

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a></h1>

  <h2 class="center">Some sub header</h2>
It's certainly more semantic.

Whether this has an specific relevance to e-publishing is perhaps questionable but 'poor' code is something to be noted even if we choose to use it for own own reasons.

Understand I'm not running anyone down for it...just expressing one man's opinion.
Paulie_D is offline   Reply With Quote
Old 11-21-2014, 10:23 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,046
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Paulie_D View Post
As a 'coder'of sorts, (primarily web) I would tell anyone that using the break tag for spacing is poor practise.

I'm not saying it doesn't have it's place but using it for spacing is not recommended. That's what margins/padding are for.

This

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a><br/>

  Some sub header</h1>
Should be replaced with this

Code:
  <h1 class="center"><a href="Section003.xhtml#start">Test</a></h1>

  <h2 class="center">Some sub header</h2>
It's certainly more semantic.

Whether this has an specific relevance to e-publishing is perhaps questionable but 'poor' code is something to be noted even if we choose to use it for own own reasons.

Understand I'm not running anyone down for it...just expressing one man's opinion.
It may be 'proper' coding, bet it fails the Auto-TOC job

Test
Some Sub header
(both are essentially the same location)

is not what is desired in a TOC, especially on a smaller screen
theducks is offline   Reply With Quote
Old 11-21-2014, 10:53 AM   #9
Paulie_D
Connoisseur
Paulie_D began at the beginning.
 
Paulie_D's Avatar
 
Posts: 67
Karma: 10
Join Date: Apr 2011
Device: Kindle 3, Samsung Tab 4
Quote:
Originally Posted by theducks View Post
It may be 'proper' coding, bet it fails the Auto-TOC job

Test
Some Sub header
(both are essentially the same location)

is not what is desired in a TOC, especially on a smaller screen
Oh...I agree...if you know you're breaking the "rules" for a specific reason then it's not a biggie but I do think it's worth noting.

That said, couldn't a function be written to combine two distinct elements into a single TOC item?

Dunno, just asking..I'm still a novice at regex.

Regardless, the TOC 'link' would still be the same point (the H1) so you'd still arrive at the same place wouldn't you?
Paulie_D is offline   Reply With Quote
Old 11-21-2014, 11:05 AM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,561
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
In this case, the br tag is not being used to achieve any kind of spacing or padding. It's being used as it was intended--to simply make sure the text that follows starts on a new line.
DiapDealer is offline   Reply With Quote
Old 11-21-2014, 11:57 AM   #11
DrChiper
Bookish
DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.DrChiper ought to be getting tired of karma fortunes by now.
 
DrChiper's Avatar
 
Posts: 1,017
Karma: 2003162
Join Date: Jun 2011
Device: PC, t1, t2, t3, Clara BW, Clara HD, Libra 2, Libra Color, Nxtpaper 11
@Paulie_D: It is what I see in epubs, not what I have created myself. Sometimes you have to deal with what you have, and that is almost never what you really want. But I mostly can live with it given it just appears acceptable enough on my trusted e-reader

(And if not, then there is always calibre ... )
DrChiper is offline   Reply With Quote
Old 11-21-2014, 01:57 PM   #12
Paulie_D
Connoisseur
Paulie_D began at the beginning.
 
Paulie_D's Avatar
 
Posts: 67
Karma: 10
Join Date: Apr 2011
Device: Kindle 3, Samsung Tab 4
Please...I wasn't suggesting anyone here was doing anything wrong here just trying to make a general point.

Guess I did it badly.
Paulie_D is offline   Reply With Quote
Old 11-21-2014, 02:14 PM   #13
signum
Zealot
signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.
 
Posts: 119
Karma: 64428
Join Date: Aug 2011
Device: none
Regarding multi-line titles and auto TOC:

My preference is to have smaller-sized subtitles, to clearly point them out as subordinate to the main title. So, a typical set of headings would look like:

Code:
<h2>Main heading</h2>
<h3>Subtitle</h3>
where h1, h2, h3 and h4 are styled with at least a "text-align: center". They can all share the same css block. The magic comes when setting this up for an auto TOC generator. The key is to use the "title=" specification. Using the above example, the headings then look like:

Code:
<h2 title="Main heading: Subtitle">Main heading</h2>
<h3 class="sigil_not_in_toc">Subtitle</h3>
The auto TOC generator will use what's in the "title=" in preference to what is in between the ">stuff<" angle bars. BTW, it's been my experience that extra html such as "< br / >" in the "title=" gets stripped out by the auto TOC generator. H2 and h3 can be anything you like, as long as you get the "title=" right. The stuff between the angle bars gets displayed as desired and the "title=" only shows up in the TOC.
signum is offline   Reply With Quote
Old 11-21-2014, 09:17 PM   #14
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,561
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Paulie_D View Post
Please...I wasn't suggesting anyone here was doing anything wrong here just trying to make a general point.

Guess I did it badly.
Not really. I didn't mean to sound harsh, I just thought your assumption that the br in the given example(s) was about creating space was inaccurate.

Yes, using br tags to create spacing between paragraphs or different sections is considered bad form. But surely, using the br tag to force a line break is why the br tag exists in the first place, no?
DiapDealer is offline   Reply With Quote
Old 11-22-2014, 04:27 AM   #15
Paulie_D
Connoisseur
Paulie_D began at the beginning.
 
Paulie_D's Avatar
 
Posts: 67
Karma: 10
Join Date: Apr 2011
Device: Kindle 3, Samsung Tab 4
Quote:
Originally Posted by DiapDealer View Post
But surely, using the br tag to force a line break is why the br tag exists in the first place, no?
Yes...provided the intention is that the whole element is intended to be read as a single item.

So in your example, it's correct because the number and title are intended to be a single item but for visual purposes it needs to look like two lines.

In the other example a major heading and sub-heading would not be intended to be a single item and so "should" be two items.
Paulie_D is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RegEx question (again) phossler Sigil 12 01-20-2013 02:37 PM
A regex question PatNY Sigil 30 06-03-2012 02:03 PM
Yet another regex question Jabby Sigil 8 01-30-2012 08:41 PM
Regex question and maybe some help crutledge Sigil 9 03-10-2011 04:37 PM
Regex Question Archon Conversion 11 02-05-2011 10:13 AM


All times are GMT -4. The time now is 01:29 PM.


MobileRead.com is a privately owned, operated and funded community.