Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 12-22-2014, 02:53 PM   #1
dmonasse
Member
dmonasse began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Apr 2014
Location: Paris
Device: ipad 2, Ubuntu
A regex function to number a mathematical ebook

The search and replace tool with regex function is really fantastic. My little society is building mathematical ebooks from latex sources. One of my problems for converting such books is that latex auto-numbers chapters, sections, subsections and theorem-like assertions (theorems, propositions, lemmas, definitions, corollaries and so on). I would like to do such a numbering in my ebook.

A solution is the following:

1) Converting from latex, I put chapters, sections, subsections and assertions in a <div> tag with a html5 data-type attribute. For example, a latex section
Code:
\section{History of the Fermat-Wiles theorem}
is converted into
Code:
<div class="section" data-type="section">History of the Fermat-Wiles theorem</div>
and
Code:
\begin{theorem}Abracadabra\end{theorem}
is converted into
Code:
<div class="theorem" data-type="theorem">Abracadabra</div>
Nota: I can't use the class attribute to denote the type of the div because the conversion process from HTML to ePub by Calibre modifies these attributes and class="theorem" may be changed into class="pcalibre25". That's the reason for the data-type attribute.

2) After conversion from latex to html (not so easy!!!) and from html to epub (easy with Calibre), I number the whole book with the Calibre editor using the search and replace tool with regex function.
The search pattern I use is:
Code:
<div.*?data-type="(chapter|section|subsection|theorem|proposition|lemma|definition|corollary)"[^>]*>
and the regex function may be:
Code:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    if number==1: #initialization of the counts
        data['chapter']=0
        data['section']=0
        data['subsection']=0
        data['assertion']=0
    the_type=match.group(1)
    if the_type=='chapter': # begins a chapter, reinitialize the counts
        data['section']=0
        data['subsection']=0
        data['assertion']=0
        data['chapter']+=1
        return match.group()+"<span class='chapter_num'>Chapter "+str(data['chapter'])+".</span> "
    elif the_type=='section': # begins a section, reinitialize the subsection count
        data['subsection']=0
        data['section']+=1
        return match.group()+"<span class='section_num'>Section "+str(data['section'])+".</span>" 
    elif the_type=='subsection':
        data['subsection']+=1
        return match.group()+"<span class='subsection_num'>Subsection "+str(data['section'])+"."+str(data['subsection'])+".</span>"
    else: # this is an assertion
        data['assertion']+=1
        return match.group()+"<span class='assertion_num'>Assertion "+str(data['chapter'])+"."+str(data['assertion'])+".</span>"
    return ''

replace.file_order = 'spine'
Adapt the code according to your needs or wishes, this is only an example; it would be nicer to replace "Assertion" by "Theorem", "Proposition", "Lemma", "Corollary", "Definition" (very easy to do starting from the "the_type" variable). I obtain such a numbering:
Code:
Chapter 1
     Section 1
         Subsection 1.1
             Assertion 1.1
             Assertion 1.2
         Subsection 1.2
            Assertion 1.3
     Section 2
         Subsection 2.1
             Assertion 1.4
             Assertion 1.5
         Subsection 2.2
            Assertion 1.6
Chapter 2
     Section 1
         Subsection 1.1
             Assertion 2.1
             Assertion 2.2
         Subsection 1.2
            Assertion 2.3
     Section 2
         Subsection 2.1
             Assertion 2.4
             Assertion 2.5
Hope this may help. Any improvement will be welcome (even in my bad English syntax).

Last edited by dmonasse; 12-22-2014 at 03:11 PM.
dmonasse is offline   Reply With Quote
Old 12-22-2014, 10:10 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool I was indeed inspired by LaTeX's auto-numbering when designing the function mode feature (I used to use LaTeX extensively when I was a physicist).
kovidgoyal is offline   Reply With Quote
Advert
Old 12-23-2014, 02:16 AM   #3
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none


Great usage example. Maybe you could port and post it in the saved searches sticky thread. It could be, no, it will be, a great addition.

Nevertheless:
Quote:
Originally Posted by dmonasse View Post
Nota: I can't use the class attribute to denote the type of the div because the conversion process from HTML to ePub by Calibre modifies these attributes and class="theorem" may be changed into class="pcalibre25". That's the reason for the data-type attribute.
Are you sure? I haven't tested with an HTML to epub conversion, but in an epub to epub conversion, "calibreXX" classes only appear when the original element has no given class in the source file. I've just made a quick test and I'm seeing preserved <p class="salto1">, <blockquote class="asangre"> or my own <span class="nw">.
arspr is offline   Reply With Quote
Old 12-23-2014, 02:54 AM   #4
dmonasse
Member
dmonasse began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Apr 2014
Location: Paris
Device: ipad 2, Ubuntu
Quote:
Originally Posted by arspr View Post


Great usage example. Maybe you could port and post it in the saved searches sticky thread. It could be, no, it will be, a great addition.

Nevertheless:


Are you sure? I haven't tested with an HTML to epub conversion, but in an epub to epub conversion, "calibreXX" classes only appear when the original element has no given class in the source file. I've just made a quick test and I'm seeing preserved <p class="salto1">, <blockquote class="asangre"> or my own <span class="nw">.
I don't understand the rules Calibre uses to flatten the CSS. I prefer to be careful and I know that the conversion process doesn't change the (html5 standard) attributes "data-xxxx".

I made a copy of this post, as suggested, in the saved searches sticky thread

Thanks for your encouragements and many thanks to Kovid for Calibre.
dmonasse is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex Function - Split unknown word Paulie_D Editor 19 12-07-2014 05:12 AM
Advanced search within ebook using application or regex Earthlark Calibre 3 02-04-2014 03:33 AM
Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil Contre-jour Sigil 9 02-01-2013 10:47 AM
Do the number of pages in an ebook differ from the number of pages in a physical book Phoebemy General Discussions 12 07-19-2012 09:25 AM
Texet EZB890 network eBook function thcrw739 Alternative Devices 10 03-29-2010 02:03 PM


All times are GMT -4. The time now is 05:09 PM.


MobileRead.com is a privately owned, operated and funded community.