Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 10-19-2021, 04:02 PM   #46
DyckBook
Morlock
DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.DyckBook ought to be getting tired of karma fortunes by now.
 
DyckBook's Avatar
 
Posts: 28
Karma: 2587194
Join Date: Oct 2021
Device: Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
That's because builtin function are not simple standalone functions you can learn from, they use other calibre code, however if you want to see them look in function_replace.py in the calibre source code.
Thanks Kovid, I'll work on that.
DyckBook is offline   Reply With Quote
Old 04-02-2022, 02:10 PM   #47
greenskye
Junior Member
greenskye began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: none
I'm looking for a method to convert numbers that use the european comma separated format (ex. 1.000,95) to the US version (ex 1,000.95)

Is this achievable with regex or via a search function?
greenskye is offline   Reply With Quote
Advert
Old 04-02-2022, 05:28 PM   #48
lomkiri
Connoisseur
lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.
 
Posts: 73
Karma: 107742
Join Date: Jul 2021
Device: N/A
Assuming all numbers are in european format (no one in US format):
Code:
find:
\d[,.\d]{2,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 
    return match.group(0).replace('.', '§').replace(',', '.').replace('§',',')
Warning: if you have a mixture of numbers in both formats (US and european), they will be switched. In that case, you'll have to refine the selection to catch only european ones.

Note: all number with 3 or more positions will be catched (e.g. 1,2, or 1.2). If you want to be more selective, change "{2,} for what you want minus 1, e.g. {4,} if you want to catch starting from 5 positions (1.000 or 12,45)

Note: Integers as 100 or 234000 will be catched, but they won't be transformed.

Warning : numbers followed by 3 dots will be wrongly transformed : "They were 20..." will give "They were 20,,,"
It's wise to change them to ellipsis (…) prior to apply the conversion:
(\d)\.{3} ==> \1\u2026

Last edited by lomkiri; 04-03-2022 at 01:57 PM.
lomkiri is offline   Reply With Quote
Old 04-04-2022, 02:20 PM   #49
greenskye
Junior Member
greenskye began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2010
Device: none
Spoiler:
Quote:
Originally Posted by lomkiri View Post
Assuming all numbers are in european format (no one in US format):
Code:
find:
\d[,.\d]{2,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs): 
    return match.group(0).replace('.', '§').replace(',', '.').replace('§',',')
Warning: if you have a mixture of numbers in both formats (US and european), they will be switched. In that case, you'll have to refine the selection to catch only european ones.

Note: all number with 3 or more positions will be catched (e.g. 1,2, or 1.2). If you want to be more selective, change "{2,} for what you want minus 1, e.g. {4,} if you want to catch starting from 5 positions (1.000 or 12,45)

Note: Integers as 100 or 234000 will be catched, but they won't be transformed.

Warning : numbers followed by 3 dots will be wrongly transformed : "They were 20..." will give "They were 20,,,"
It's wise to change them to ellipsis (…) prior to apply the conversion:
(\d)\.{3} ==> \1\u2026


Thanks so much, it worked great!

I ended up using
Code:
find 1: (\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])

find 2: \$(\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])
The updated regex fixed the problem with trailing "." matches. I did a replace all with "find 1", and then ran it again with "find 2" to revert any US currencies accidentally caught. Couldn't figure out how to exclude them in first place (kept matching part of the currency number)
greenskye is offline   Reply With Quote
Old 04-04-2022, 05:55 PM   #50
lomkiri
Connoisseur
lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.lomkiri is my name, but call me Ishmael.
 
Posts: 73
Karma: 107742
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by greenskye View Post
The updated regex fixed the problem with trailing "." matches
Mmmh, yes, of course, I forgot this case :-/. Good you thought about it :-)

Quote:
I did a replace all with "find 1", and then ran it again with "find 2" to revert any US currencies accidentally caught. Couldn't figure out how to exclude them in first place (kept matching part of the currency number)
This one seems to be ok to exclude groups beginning with $ (and not selecting inside tags) :
Code:
(\$(?:\d{1,3}[.,])+)(*SKIP)(*F)|(<[^<>]*)(*SKIP)(*F)|(?:\d{1,3}[.,])+\d{1,}
Another way would have been to catch the currency in the regex, then it's easy to make the selection inside the function:
Code:
\$?(\d{1,3}[.,])+\d{1,}(?![^<>{}]*[>}])
function:
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
    m = match.group(0)
    if m[0] == '$':
        return m
    else:        
        return m.replace('.', '§').replace(',', '.').replace('§',',')

Last edited by lomkiri; 04-04-2022 at 07:10 PM. Reason: adding a regex excluding currency
lomkiri is offline   Reply With Quote
Advert
Reply

Tags
conversion, errors, function, ocr, spelling

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
About saved searches and regex Carpatos Editor 22 09-30-2020 10:56 PM
Regex-Functions - getting user input CalibUser Editor 8 09-09-2020 04:26 AM
Difference in Manual Search and Saved Search phossler Editor 4 10-04-2015 12:17 PM
Help - Learning to use Regex Functions weberr Editor 1 06-13-2015 01:59 AM
Limit on length of saved regex? ElMiko Sigil 0 06-30-2013 03:32 PM


All times are GMT -4. The time now is 05:24 PM.


MobileRead.com is a privately owned, operated and funded community.