Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 08-06-2019, 12:52 PM   #601
lumpynose
Guru
lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.
 
Posts: 955
Karma: 4444444
Join Date: Jul 2012
Device: Palm Pilot M105
Quote:
Originally Posted by DiapDealer View Post
The short answer is no.
The long answer is also no--it just takes longer to read.

Needing regex to stay between certain tags, or to only include stuff between tags means you gone beyond what plain regex can do for you. You've moved into the realm of parsing and algorithms. The Function Mode of the Calibre Editor's Search and Replace feature comes to mind.
What about having Sigil exclude the stuff outside the body tags? Another check box, for example, in the search options. So that search and replace is given only the stuff within the body tags.
lumpynose is online now   Reply With Quote
Old 08-06-2019, 01:05 PM   #602
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 20,726
Karma: 112092388
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
We're probably not going to clutter up the F&R interface any more than it is (let alone add to the already confusing mess behind the gui!).

Plus, users can already use the "Marked Selected Text" feature (Search->Mark Selected Text) to search only the highlighted portions of individual files.

I'm not really in favor of extending the Search and Replace features beyond what's already available. If people have highly specialized search & replace needs, they can create a plugin (or suggest that one be created).

Last edited by DiapDealer; 08-06-2019 at 01:10 PM.
DiapDealer is online now   Reply With Quote
Old 08-06-2019, 05:31 PM   #603
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,247
Karma: 6110931
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
The above regex works fine.

Maybe it's a little greedy because it also transforms words written in capitals which are included in the head like "DOCTYPE".
Don't use Replace All. You'll have to decide on a case-by-case basis, because there's still words that are in ALL CAPS that are valid, like: DNA, FBI, FDA, etc.

Quote:
Originally Posted by lumpynose View Post
What about having Sigil exclude the stuff outside the body tags? Another check box, for example, in the search options. So that search and replace is given only the stuff within the body tags.
epubcheck and other tools will squawk at you because of this code, and usually it's a sign of some serious underlying issue (bad conversion, bad S&R, horribly coded site, etc.).

You might want to do something like:

Search: </p>\s*([^<]+?)\s+
Replace: </p><p class="notag">\1</p>

This'll help point out those problem areas, then you can do a big pass cleaning up all the "notag" classes and adjusting those issues.
Tex2002ans is offline   Reply With Quote
Old 08-07-2019, 01:47 AM   #604
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,366
Karma: 2440979
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by DiapDealer View Post
The short answer is no.
The long answer is also no--it just takes longer to read.

Needing regex to stay between certain tags, or to only include stuff between tags means you gone beyond what plain regex can do for you. You've moved into the realm of parsing and algorithms. The Function Mode of the Calibre Editor's Search and Replace feature comes to mind.
Thanks for both answers.

I was afraid I could miss quite a simple fix (it happened before). As it happens, there is none and so I'll keep it as it is. It's a specialized regex, intended mostly for bibliography purposes.

Last edited by roger64; 08-07-2019 at 01:49 AM.
roger64 is offline   Reply With Quote
Old 08-07-2019, 05:16 AM   #605
Vroni
Beast
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 91
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Quote:
Originally Posted by Tex2002ans View Post
I would be interested in what you're trying to do that requires more than 10 groups?
Oops, i just read this. I had a font in the book having auto kerning anabled. This is working fine in windows and macs and linux machines, so for example "ll" ist automatically displays as the ligature for "ll", or ff. Ti and so on.

In e-Readers this doesn't work. So the ligature is not shown, the letters/ligatures are just missing. So i had a look for an alternative font and found one, but unfortunetly there was no letter space typeface available for the alternative. As i was really sick about this problem is just surrounded each letter with a span having a right padding of 1 or 2 pixels. As there were "only" 200 words present in the book whis was a fine workaround for me.

So finally the job was done by 15 regexe, each of them handling one-letter words, two letter words and so on (the longest word had 15 characterts.

It looked completely weired in code view, but the result was acceptable and is working on all readers i tested.
Vroni is offline   Reply With Quote
Old 08-13-2019, 07:03 AM   #606
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,366
Karma: 2440979
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
OCR

I am trying Tesseract. Overall results so far are excellent. Some few mistakes appear.

Sometimes, faulty words contain a digit. Like in French, mo1 for moi. Also, usually these words do not have a -.

Confusions of this kind may appear (this is just an example):

5 → S 1 → i 0 → O
2 → Z 4→ A 8 → B

I'd like to use a regex which would detect complete words containing one or more digits (and maybe some special characters that I could add in the regex like €) so that I could check them quickly.

Last edited by roger64; 08-13-2019 at 07:08 AM.
roger64 is offline   Reply With Quote
Old 08-13-2019, 01:17 PM   #607
Vroni
Beast
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 91
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Code:
\b\p{L}.*[\d+]\p{L}.*\b
Vroni is offline   Reply With Quote
Old 08-13-2019, 01:29 PM   #608
lumpynose
Guru
lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.
 
Posts: 955
Karma: 4444444
Join Date: Jul 2012
Device: Palm Pilot M105
Quote:
Originally Posted by Vroni View Post
Oops, i just read this. I had a font in the book having auto kerning anabled. This is working fine in windows and macs and linux machines, so for example "ll" ist automatically displays as the ligature for "ll", or ff. Ti and so on.

In e-Readers this doesn't work. So the ligature is not shown, the letters/ligatures are just missing. So i had a look for an alternative font and found one, but unfortunetly there was no letter space typeface available for the alternative.
This really annoys me. Way back when I was using TeX, one of the great things that it had is a font that had all of the variants. It even had serif and sans serif variants for the family.

Last edited by lumpynose; 08-13-2019 at 04:57 PM.
lumpynose is online now   Reply With Quote
Old 08-13-2019, 03:27 PM   #609
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,366
Karma: 2440979
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Vroni View Post
Code:
\b\p{L}.*[\d+]\p{L}.*\b
Thanks for your help. I tried it but it seems that it "finds" a whole paragraph instead of a single word (mo1).

Code:
<p>il était avec mo1.</p>
Did I miss something?

Last edited by roger64; 08-13-2019 at 03:29 PM.
roger64 is offline   Reply With Quote
Old 08-13-2019, 03:53 PM   #610
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,247
Karma: 6110931
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
Sometimes, faulty words contain a digit. Like in French, mo1 for moi. Also, usually these words do not have a -.
Instead of Regex, you can also use Sigil's or Calibre's spellcheck:

(See my thread Suggestion: Spellcheck Enhancement (Numbers).)

Calibre's spellcheck shows "numbered words" by default.

To enable in Sigil's spellcheck, go into Edit > Preferences > Spellcheck Dictionaries and in the upper right is a checkbox Check Numbers.

Once you enable that, if you search for:

2

you can easily get a sortable list of all words with numbers.

In that thread, I detailed all the cases where it's very helpful ("20th century", "A4 Paper", OCR errors, [...]).
Tex2002ans is offline   Reply With Quote
Old 08-13-2019, 04:28 PM   #611
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,366
Karma: 2440979
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@Tex2002ans

Thanks for the tip. It answers the question.
roger64 is offline   Reply With Quote
Old 08-19-2019, 09:32 AM   #612
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,366
Karma: 2440979
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
reverse order

rename files in reverse order

I have a book that needs to be recognized. It consists of 258 pages, numbered from 001.tif to 258.tif.

I use gimagereader-qt (a front-end for Tesseract) to recognize them. Unhappily, the files imported and displayed on gimagereader are set in reverse order (from 258 to 001). (see screenshot)Yes it's a bug and I have no solution for it. The processing thus begins with 258. Moving the files manually in the display is too tedious.

Except if I batch rename the files in reverse order. Thus 001 would become 258, 002 > 257 and so on.

gprename allows to use a regular expression...
Attached Thumbnails
Click image for larger version

Name:	reverse-order.png
Views:	21
Size:	71.6 KB
ID:	173034  
roger64 is offline   Reply With Quote
Old 08-19-2019, 09:39 AM   #613
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,620
Karma: 14578553
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by roger64 View Post
I use gimagereader-qt (a front-end for Tesseract) to recognize them.
Have you tried clicking the up arrow right of the Name column label?
Doitsu is offline   Reply With Quote
Old 08-19-2019, 09:43 AM   #614
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 3,567
Karma: 2200024
Join Date: Nov 2009
Device: many
Since these are ocr there will be no links. Simply batch rename them any way you want. There should be no links to worry about.
KevinH is online now   Reply With Quote
Old 08-19-2019, 12:52 PM   #615
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,366
Karma: 2440979
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks for your replies

All the well behaved programs present the files in growing order (from 001 to 258). Only gimagereader presents them stubbornly in reverse order...

gprename thus presents them normally in growing order and accepts regexes (see screenshot)

@KevinH

If I import the files in the Calibre editor (did not find a place where they could be accepted in Sigil), even organized in reverse order, they get sorted automatically by growing numbers, 001 on top, 258 down.

@Doitsu

I did not find any magic button to reverse the order of the files. I contacted the developer who does not reproduce the bug and tells me "If it does not work with the file dialog, you can still use the command line, i.e.

$ gimagereader-gtk $(ls -1 *.tif | sort -n)" which I failed to achieve...
Attached Thumbnails
Click image for larger version

Name:	regex.png
Views:	13
Size:	74.3 KB
ID:	173038  
roger64 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 02:33 PM.


MobileRead.com is a privately owned, operated and funded community.