Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 10-19-2021, 04:02 PM   #46
DyckBook
Morlock
DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.
 
DyckBook's Avatar
 
Posts: 33
Karma: 2734796
Join Date: Oct 2021
Device: Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
That's because builtin function are not simple standalone functions you can learn from, they use other calibre code, however if you want to see them look in function_replace.py in the calibre source code.
Thanks Kovid, I'll work on that.
DyckBook is offline   Reply With Quote
Old 04-02-2022, 02:10 PM   #47
greenskye
Member
greenskye began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2010
Device: none
I'm looking for a method to convert numbers that use the european comma separated format (ex. 1.000,95) to the US version (ex 1,000.95)

Is this achievable with regex or via a search function?
greenskye is offline   Reply With Quote
Advert
Old 04-02-2022, 05:28 PM   #48
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 131
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Assuming all numbers are in european format (no one in US format):
Code:
find:
\d[,.\d]{2,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 
    return match.group(0).replace('.', '§').replace(',', '.').replace('§',',')
Warning: if you have a mixture of numbers in both formats (US and european), they will be switched. In that case, you'll have to refine the selection to catch only european ones.

Note: all number with 3 or more positions will be catched (e.g. 1,2, or 1.2). If you want to be more selective, change "{2,} for what you want minus 1, e.g. {4,} if you want to catch starting from 5 positions (1.000 or 12,45)

Note: Integers as 100 or 234000 will be catched, but they won't be transformed.

Warning : numbers followed by 3 dots will be wrongly transformed : "They were 20..." will give "They were 20,,,"
It's wise to change them to ellipsis (…) prior to apply the conversion:
(\d)\.{3} ==> \1\u2026

Last edited by lomkiri; 04-03-2022 at 01:57 PM.
lomkiri is offline   Reply With Quote
Old 04-04-2022, 02:20 PM   #49
greenskye
Member
greenskye began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2010
Device: none
Spoiler:
Quote:
Originally Posted by lomkiri View Post
Assuming all numbers are in european format (no one in US format):
Code:
find:
\d[,.\d]{2,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 
    return match.group(0).replace('.', '§').replace(',', '.').replace('§',',')
Warning: if you have a mixture of numbers in both formats (US and european), they will be switched. In that case, you'll have to refine the selection to catch only european ones.

Note: all number with 3 or more positions will be catched (e.g. 1,2, or 1.2). If you want to be more selective, change "{2,} for what you want minus 1, e.g. {4,} if you want to catch starting from 5 positions (1.000 or 12,45)

Note: Integers as 100 or 234000 will be catched, but they won't be transformed.

Warning : numbers followed by 3 dots will be wrongly transformed : "They were 20..." will give "They were 20,,,"
It's wise to change them to ellipsis (…) prior to apply the conversion:
(\d)\.{3} ==> \1\u2026


Thanks so much, it worked great!

I ended up using
Code:
find 1: (\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])

find 2: \$(\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])
The updated regex fixed the problem with trailing "." matches. I did a replace all with "find 1", and then ran it again with "find 2" to revert any US currencies accidentally caught. Couldn't figure out how to exclude them in first place (kept matching part of the currency number)
greenskye is offline   Reply With Quote
Old 04-04-2022, 05:55 PM   #50
lomkiri
Zealot
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 131
Karma: 1000102
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by greenskye View Post
The updated regex fixed the problem with trailing "." matches
Mmmh, yes, of course, I forgot this case :-/. Good you thought about it :-)

Quote:
I did a replace all with "find 1", and then ran it again with "find 2" to revert any US currencies accidentally caught. Couldn't figure out how to exclude them in first place (kept matching part of the currency number)
This one seems to be ok to exclude groups beginning with $ (and not selecting inside tags) :
Code:
(\$(?:\d{1,3}[.,])+)(*SKIP)(*F)|(<[^<>]*)(*SKIP)(*F)|(?:\d{1,3}[.,])+\d{1,}
Another way would have been to catch the currency in the regex, then it's easy to make the selection inside the function:
Code:
\$?(\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    m = match.group(0)
    if m[0] == '$':
        return m
    else:        
        return m.replace('.', '§').replace(',', '.').replace('§',',')

Last edited by lomkiri; 04-04-2022 at 07:10 PM. Reason: adding a regex excluding currency
lomkiri is offline   Reply With Quote
Advert
Old 09-12-2022, 01:34 AM   #51
mobilis
drowned in old books
mobilis is on a distinguished road
 
mobilis's Avatar
 
Posts: 39
Karma: 62
Join Date: May 2012
Location: United States
Device: Kindle Paperwhite
I am making an ebook from saved and pdfunite'd pdf pages, and there are scads of things like this:

<p class="calibre1">13/72</p>

<p class="calibre1">14/93</p>

I want to remove.

How can I?
mobilis is offline   Reply With Quote
Old 09-12-2022, 09:33 AM   #52
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,763
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by mobilis View Post
I am making an ebook from saved and pdfunite'd pdf pages, and there are scads of things like this:

<p class="calibre1">13/72</p>

<p class="calibre1">14/93</p>

I want to remove.

How can I?
REGEX is your buddy ( there are a few REGEX tutorials here are MR. That is how I learned. BTW Calibre use PCRE flavor of REGEX)

Code:
<p class="calibre1">\d+\/\d+</p>
\d+ says 1 or more digits together match
\/ is just an escaped / (might not be needed, but dos not hurt)
'escaped' items remove their special meaning and treat them as they LOOK
I left the rest to only be an 'exact match' to be a trigger.

eg <p class="calibre1">The cup was 3/4 full.</p> would not match.
theducks is offline   Reply With Quote
Old 09-14-2022, 02:56 AM   #53
mobilis
drowned in old books
mobilis is on a distinguished road
 
mobilis's Avatar
 
Posts: 39
Karma: 62
Join Date: May 2012
Location: United States
Device: Kindle Paperwhite
Quote:
Originally Posted by theducks View Post
REGEX is your buddy ( there are a few REGEX tutorials here are MR. That is how I learned. BTW Calibre use PCRE flavor of REGEX)

Code:
<p class="calibre1">\d+\/\d+</p>
\d+ says 1 or more digits together match
\/ is just an escaped / (might not be needed, but dos not hurt)
'escaped' items remove their special meaning and treat them as they LOOK
I left the rest to only be an 'exact match' to be a trigger.

eg <p class="calibre1">The cup was 3/4 full.</p> would not match.
THANK YOU!!
mobilis is offline   Reply With Quote
Old 02-19-2023, 06:57 PM   #54
alekseiminko
Member
alekseiminko began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jan 2020
Device: laptop
where is it possible to make an autocorrect in caliber when reading with a voice that she did not read the article (abbreviated), but the article?
alekseiminko is offline   Reply With Quote
Old 02-20-2023, 02:58 PM   #55
alekseiminko
Member
alekseiminko began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jan 2020
Device: laptop
classical substitutions (simple replacement of one line with another), or the use of regular expressions (RegExp) and the emphasis when reading by voice and the expansion of abbreviations when reading by voice, for example vs-versus
alekseiminko is offline   Reply With Quote
Old 02-20-2023, 03:11 PM   #56
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,858
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by alekseiminko View Post
where is it possible to make an autocorrect in caliber when reading with a voice that she did not read the article (abbreviated), but the article?
You already asked this in another thread.
JSWolf is offline   Reply With Quote
Old 02-20-2023, 04:37 PM   #57
alekseiminko
Member
alekseiminko began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jan 2020
Device: laptop
please drop the link
alekseiminko is offline   Reply With Quote
Old 02-20-2023, 07:18 PM   #58
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,229
Karma: 145277352
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by alekseiminko View Post
classical substitutions (simple replacement of one line with another), or the use of regular expressions (RegExp) and the emphasis when reading by voice and the expansion of abbreviations when reading by voice, for example vs-versus
You would have to edit the book to make those changes such as versus for vs. As for changing emphasis, good luck with that. There are good reasons that most authors prefer using people to create audiobooks since even the best of the current automated readers are not all that great.
DNSB is offline   Reply With Quote
Old 02-20-2023, 08:03 PM   #59
alekseiminko
Member
alekseiminko began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jan 2020
Device: laptop
in the Librera program on Android, there is such a function in voice reading, text-to-speech substitution is used to change the way the engine pronounces certain words, to skip certain characters when reading or to set the correct stress marks.
alekseiminko is offline   Reply With Quote
Old 02-20-2023, 09:56 PM   #60
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,826
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No there is no such function.
kovidgoyal is offline   Reply With Quote
Reply

Tags
conversion, errors, function, ocr, spelling

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
About saved searches and regex Carpatos Editor 22 09-30-2020 10:56 PM
Regex-Functions - getting user input CalibUser Editor 8 09-09-2020 04:26 AM
Difference in Manual Search and Saved Search phossler Editor 4 10-04-2015 12:17 PM
Help - Learning to use Regex Functions weberr Editor 1 06-13-2015 01:59 AM
Limit on length of saved regex? ElMiko Sigil 0 06-30-2013 03:32 PM


All times are GMT -4. The time now is 08:19 PM.


MobileRead.com is a privately owned, operated and funded community.