08-06-2019, 12:52 PM | #601 | |
Wizard
Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
|
Quote:
|
|
08-06-2019, 01:05 PM | #602 |
Grand Sorcerer
Posts: 27,563
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
We're probably not going to clutter up the F&R interface any more than it is (let alone add to the already confusing mess behind the gui!).
Plus, users can already use the "Marked Selected Text" feature (Search->Mark Selected Text) to search only the highlighted portions of individual files. I'm not really in favor of extending the Search and Replace features beyond what's already available. If people have highly specialized search & replace needs, they can create a plugin (or suggest that one be created). Last edited by DiapDealer; 08-06-2019 at 01:10 PM. |
Advert | |
|
08-06-2019, 05:31 PM | #603 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
You might want to do something like: Search: </p>\s*([^<]+?)\s+ Replace: </p><p class="notag">\1</p> This'll help point out those problem areas, then you can do a big pass cleaning up all the "notag" classes and adjusting those issues. |
||
08-07-2019, 01:47 AM | #604 | |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
I was afraid I could miss quite a simple fix (it happened before). As it happens, there is none and so I'll keep it as it is. It's a specialized regex, intended mostly for bibliography purposes. Last edited by roger64; 08-07-2019 at 01:49 AM. |
|
08-07-2019, 05:16 AM | #605 | |
Banned
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
|
Quote:
In e-Readers this doesn't work. So the ligature is not shown, the letters/ligatures are just missing. So i had a look for an alternative font and found one, but unfortunetly there was no letter space typeface available for the alternative. As i was really sick about this problem is just surrounded each letter with a span having a right padding of 1 or 2 pixels. As there were "only" 200 words present in the book whis was a fine workaround for me. So finally the job was done by 15 regexe, each of them handling one-letter words, two letter words and so on (the longest word had 15 characterts. It looked completely weired in code view, but the result was acceptable and is working on all readers i tested. |
|
Advert | |
|
08-13-2019, 07:03 AM | #606 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
OCR
I am trying Tesseract. Overall results so far are excellent. Some few mistakes appear. Sometimes, faulty words contain a digit. Like in French, mo1 for moi. Also, usually these words do not have a -. Confusions of this kind may appear (this is just an example): 5 → S 1 → i 0 → O 2 → Z 4→ A 8 → B I'd like to use a regex which would detect complete words containing one or more digits (and maybe some special characters that I could add in the regex like €) so that I could check them quickly. Last edited by roger64; 08-13-2019 at 07:08 AM. |
08-13-2019, 01:17 PM | #607 |
Banned
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
|
Code:
\b\p{L}.*[\d+]\p{L}.*\b |
08-13-2019, 01:29 PM | #608 | |
Wizard
Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
|
Quote:
Last edited by lumpynose; 08-13-2019 at 04:57 PM. |
|
08-13-2019, 03:27 PM | #609 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your help. I tried it but it seems that it "finds" a whole paragraph instead of a single word (mo1).
Code:
<p>il était avec mo1.</p> Last edited by roger64; 08-13-2019 at 03:29 PM. |
08-13-2019, 03:53 PM | #610 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
(See my thread Suggestion: Spellcheck Enhancement (Numbers).) Calibre's spellcheck shows "numbered words" by default. To enable in Sigil's spellcheck, go into Edit > Preferences > Spellcheck Dictionaries and in the upper right is a checkbox Check Numbers. Once you enable that, if you search for: 2 you can easily get a sortable list of all words with numbers. In that thread, I detailed all the cases where it's very helpful ("20th century", "A4 Paper", OCR errors, [...]). |
|
08-13-2019, 04:28 PM | #611 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
@Tex2002ans
Thanks for the tip. It answers the question. |
08-19-2019, 09:32 AM | #612 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
reverse order
rename files in reverse order
I have a book that needs to be recognized. It consists of 258 pages, numbered from 001.tif to 258.tif. I use gimagereader-qt (a front-end for Tesseract) to recognize them. Unhappily, the files imported and displayed on gimagereader are set in reverse order (from 258 to 001). (see screenshot)Yes it's a bug and I have no solution for it. The processing thus begins with 258. Moving the files manually in the display is too tedious. Except if I batch rename the files in reverse order. Thus 001 would become 258, 002 > 257 and so on. gprename allows to use a regular expression... |
08-19-2019, 09:39 AM | #613 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
|
08-19-2019, 09:43 AM | #614 |
Sigil Developer
Posts: 7,670
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Since these are ocr there will be no links. Simply batch rename them any way you want. There should be no links to worry about.
|
08-19-2019, 12:52 PM | #615 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your replies
All the well behaved programs present the files in growing order (from 001 to 258). Only gimagereader presents them stubbornly in reverse order... gprename thus presents them normally in growing order and accepts regexes (see screenshot) @KevinH If I import the files in the Calibre editor (did not find a place where they could be accepted in Sigil), even organized in reverse order, they get sorted automatically by growing numbers, 001 on top, 258 down. @Doitsu I did not find any magic button to reverse the order of the files. I contacted the developer who does not reproduce the bug and tells me "If it does not work with the file dialog, you can still use the command line, i.e. $ gimagereader-gtk $(ls -1 *.tif | sort -n)" which I failed to achieve... |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Examples of Subgroups | emonti8384 | Lounge | 32 | 02-26-2011 06:00 PM |
Accessories Pen examples | Gunnerp245 | enTourage Archive | 15 | 02-21-2011 03:23 PM |
Stylesheet examples? | Skitzman69 | Sigil | 15 | 09-24-2010 08:24 PM |
Examples | kafkaesque1978 | iRiver Story | 1 | 07-26-2010 03:49 PM |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |