Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 10-22-2015, 09:30 AM   #136
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by jackie_w View Post
LOWERS = list('abcdefghijklmnopqrstuvwxyz')
UPPERS = uppercase equivalent of LOWERS
DIGITS = list('0123456789')

I think that should work OK for European, Greek, Cyrillic alphabet languages but probably not for CJK and other Eastern alphabets.
I don't see how that would work for Greek or Cyrillic, given that there's no Greek or Cyrillic in LOWERS. Unless you mean extending your definition to include other alphabets.

Quote:
You mentioned 'unicode properties'. I'm open to suggestions for a better simple algorithm to include a wider variety of languages.
If you use python you could start here. That basically tells you the same as your LOWERS, UPPERS and DIGITS. I haven't really used that stuff, but it looks pretty straightforward. Some additional though might be needed to scramble non-ascii characters to other non-ascii characters in their same "group", I think it's easier to just scramble anything into ascii.

EDIT: Scrambling to non-ascii characters will probably cause problems with fonts: a font may a character for "é", but not for "þ" (even though they are in the same group). And any scrambling will cause problems with subset fonts.

Last edited by Jellby; 10-22-2015 at 09:35 AM.
Jellby is offline   Reply With Quote
Old 10-22-2015, 09:44 AM   #137
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,615
Karma: 306652114
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by jackie_w View Post
You mentioned 'unicode properties'. I'm open to suggestions for a better simple algorithm to include a wider variety of languages.


if you're using unicode characters and strings (as you should) in python, take a look at unicodedata.category(), which returns the abbreviated general category values for Unicode characters enumerated here.
pdurrant is offline   Reply With Quote
Advert
Old 10-22-2015, 10:17 AM   #138
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by Jellby View Post
I don't see how that would work for Greek or Cyrillic, given that there's no Greek or Cyrillic in LOWERS. Unless you mean extending your definition to include other alphabets.
What I meant was that each Greek char in a book where lower!=upper would be scrambled. It won't be scrambled to a Greek char but it will be scrambled to undecipherable ascii. I don't know much about Greek. Are you saying that there are many Greek chars which don't have different upper/lower variations. If so, that would indeed be a problem as far as uploading to MR is concerned.

Quote:
Originally Posted by Jellby View Post
If you use python you could start here. That basically tells you the same as your LOWERS, UPPERS and DIGITS. I haven't really used that stuff, but it looks pretty straightforward. Some additional though might be needed to scramble non-ascii characters to other non-ascii characters in their same "group", I think it's easier to just scramble anything into ascii.
Which is where we are now. I'm inclined to leave it like this unless more issues become apparent.

Quote:
Originally Posted by Jellby View Post
EDIT: Scrambling to non-ascii characters will probably cause problems with fonts: a font may a character for "é", but not for "þ" (even though they are in the same group). And any scrambling will cause problems with subset fonts.
I did observe this situation during testing. I'm not sure it's necessarily a showstopper for debugging common problems, though.

When I release the v0.2 version, perhaps some multilingual people, who are following this thread, will beta test examples of non-English books and report back on perceived issues?
jackie_w is offline   Reply With Quote
Old 10-22-2015, 11:18 AM   #139
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by pdurrant View Post
if you're using unicode characters and strings (as you should) in python, take a look at unicodedata.category(), which returns the abbreviated general category values for Unicode characters enumerated here.
Well, if I get to grips with that lot I'll surely know a lot more about unicode than I do right now

My primary goals, at the moment, are to create something which is:
  • robust and easy to use for a novice user - probably a calibre plugin with minimal user-changeable options. The fixed settings to be those sanctioned by MR. Needs to work equally well on Win, MacOSX and Linux.
  • additionally available as a single Python script (like v0.1) but with more user-configurable options of what to scramble and what to leave as-is. Some of these to be more strict than MR requirements have turned out to be.
    There was some thought that a more general utility may be useful for purposes other than MR.
  • (to keep me sane) the guts of the code to be the same for both

I think refining the scrambling algorithm would be the next logical step once the above is nearer to satisfactory - which may take a little while yet.

If anyone thinks differently, feel free to comment.

In the long run, this Scrambling utility will only be of any practical use if the MR expert problem-solvers, who already handle most of the troubleshooting, encourage the user-with-problem-book to use it for the convenience of both.

It's easy to imagine that a newly launched utility which promises to Scramble your books - the easy way may not be at the top of every user's wishlist. Don't touch with a 10-foot bargepole might be the more likely reaction.
jackie_w is offline   Reply With Quote
Old 10-22-2015, 04:33 PM   #140
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,565
Karma: 129670952
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
What's going to happen to high-bit characters?
JSWolf is online now   Reply With Quote
Advert
Old 10-22-2015, 07:26 PM   #141
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by JSWolf View Post
What's going to happen to high-bit characters?
For the time being, exactly what is happening in v0.1. Chars like ÅÉçčè, which are valid unicode, will be converted to random ascii chars. Mdashes and curly quotes will be unchanged

In the near-ish future see my above post, namely this bit:
Quote:
Originally Posted by jackie_w View Post
I think refining the scrambling algorithm would be the next logical step once the above is nearer to satisfactory - which may take a little while yet.
At that point your views on the subject will be most welcome as long as you back them up with specific examples of how the existing algorithm is failing.

One thing to bear in mind is that the calibre tools convert everything to unicode from the outset. For example, any named entities in the source book (e.g. &nbsp), will not be present in the scrambled output. Whether this will confuse, compound or be totally irrelevant to the issue of high bits chars I'm really not sure.

In the meantime maybe you could do some tests with v0.1 to gather some helpful examples of precisely where things go wrong with the current setup. It's always easier to work with facts than vague what-ifs.
jackie_w is offline   Reply With Quote
Old 10-22-2015, 08:49 PM   #142
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Do they really? Thinking about ebook-edit for example...

I believe that is part of Beautify actually.
Plus, it *can* replace them as you type -- which is an option that can be turned off in the settings.
eschwartz is offline   Reply With Quote
Old 10-22-2015, 09:49 PM   #143
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Quote:
Originally Posted by eschwartz View Post
Do they really? Thinking about ebook-edit for example...

I believe that is part of Beautify actually.
Plus, it *can* replace them as you type -- which is an option that can be turned off in the settings.
I can't argue with what you said about the editor.

My comment arose from what I observed when testing with one of my AZW3 books plus its KindleUnpack epub counterpart. Neither have had any editing other than whatever KUnpk does.

In both cases the books had fewer level 2 errors in the scrambled version than in the source. When I investigated the disappearing errors all related to 'named entities present'. All had been replaced with their unicode char equivalent.

All I can say is that I haven't actively knowingly coded for them to be auto-replaced - but equally, I can't rule out having used default settings rather than an unknown (to me) optional setting which would have forced them to NOT be auto-replaced.

If you forsee this auto-replacing as a problem I'd need to ask Kovid's advice.
jackie_w is offline   Reply With Quote
Old 11-01-2015, 04:10 PM   #144
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
beta version 0.0.2 added

I've updated post #2. Comments welcome.


v0.0.2 - beta release Nov 1, 2015
  • calibre plugin beta version added. In the plugin the scrambling rules are fixed at those agreed by the MR Moderating Team in this thread.
  • Standalone version is still a drag-and-drop Windows .bat file but, once open, it is GUI throughout.
  • Preview option added (both versions). You can see, side-by-side, Original vs. Scrambled text content.
    N.B. This is a simple utility not a full-featured Viewer such as calibre's.
  • Standalone version now allows scrambling Rules config.
    Some options are stricter than MR require. They are included because the work was almost done when MR relaxed their initial stance a little.
  • Standalone version also includes a couple of extra, 'FYI reports':
    - Metadata, Before vs. After
    - Calibre error checks, Before vs. After
    These were added to make life easier when testing. So I've tidied them up a bit in case they're useful to anyone else.
  • How alphanumeric chars are scrambled is unchanged.
jackie_w is offline   Reply With Quote
Old 11-03-2015, 06:21 PM   #145
RbnJrg
Wizard
RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.RbnJrg ought to be getting tired of karma fortunes by now.
 
Posts: 1,567
Karma: 7043711
Join Date: Mar 2013
Location: Rosario - Santa Fe - Argentina
Device: Kindle 4 NT
Quote:
Originally Posted by jackie_w View Post

Remaining work:
  • ...
  • especially for SVG images - I don't own enough samples to thoroughly test this
  • ...
Hi Jackie;

In order to replace svg images (in the book to be scrambled), in the original book, where there is a svg image, you'll find yourself with the following alternatives:

1. Svg images inside svg wrappers:
In this case, you have to find the following statement:

Code:
xlink:href="../Images/Name_of_the_original_image.svg"
Once you found that, you should only replace "Name_of_the_original_image.svg" with "azw3.png" or "epub.png" or "kobo.png" or "xkobo.png".

2. Svg images with <img> tag:
In this case is like any other image; you have to find:

Code:
src="../Images/Name_of_the_original_image.svg"
and to make the respective replace.

3. It can be others cases where you can find svg images. I will send you an epub with lots of svg images (with images and text) so you can find the pattern (there is a pattern ) to do the replace.

Regards
RbnJrg is offline   Reply With Quote
Old 11-18-2015, 08:16 PM   #146
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
Due to the lack of beta-test volunteers I have no plans to pursue this project any further.
jackie_w is offline   Reply With Quote
Old 11-18-2015, 08:19 PM   #147
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,255
Karma: 74007256
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
PeterT is offline   Reply With Quote
Old 11-18-2015, 10:56 PM   #148
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,668
Karma: 26966376
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by jackie_w View Post
Due to the lack of beta-test volunteers I have no plans to pursue this project any further.
@jackie_w - I wasn't even aware you wanted any testers. I'm happy to lend a hand if it would help, but I doubt I would ever need to use it myself.

BR
BetterRed is offline   Reply With Quote
Old 11-18-2015, 11:21 PM   #149
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by BetterRed View Post
@jackie_w - I wasn't even aware you wanted any testers. I'm happy to lend a hand if it would help, but I doubt I would ever need to use it myself.

BR
and
eschwartz is offline   Reply With Quote
Old 11-19-2015, 10:08 AM   #150
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
I hid the request in plain sight in post #2, perhaps I should have been more explicit, but I don't like to demand other people's free time. Or maybe it fell victim to tl;dnr syndrome.
Quote:
Remaining work:
  • Willing beta-testers required
  • ...
Whatever the reason, let's try again.

I don't really think anyone who has posted in this thread is a potential user of the tool, at least not directly, but I would suggest that everyone here can help by guiding potential users towards it when the situation demands. Which, as I only just noticed this morning, PeterT did here. Thanks, Peter, that's exactly what I think is needed.

My intention was to launch the calibre plugin in the main Plugin subforum, but I don't expect it to be a #1 Bestseller. Before I do there are 4 main areas for beta-testing:
  1. Do any of your books crash the plugin? None of mine did but it's over-hopeful to think it's impossible.
  2. Can your calibre original be scrambled by accident? I don't see how, but you never know...
  3. Re: hyperlinks/footnotes. Has the code I added to leave numeric footnote labels as-is, inadvertently revealed more than it should? I don't own many non-fiction books, but my Terry Pratchetts passed unit testing OK.
  4. Needs testing on Linux, Mac. Also WinXP, Win10 if possible. I only have Windows 7 available.

Last edited by jackie_w; 11-19-2015 at 10:22 AM. Reason: item 4 added
jackie_w is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sony ereader troubleshoot chrisms Sony Reader 3 10-02-2013 02:08 PM
Out of copyright ebooks/writers you'd most like to see digitized? pstjmack Reading Recommendations 18 09-14-2012 08:46 PM
whispersync not working: can anyone help me troubleshoot? rheostaticsfan Amazon Kindle 4 10-27-2011 08:09 AM
troubleshoot battery life kkinser Amazon Kindle 2 04-18-2011 09:05 PM
The copyright issues of fan fiction eBooks Kirok Lounge 33 12-08-2008 06:54 PM


All times are GMT -4. The time now is 07:24 AM.


MobileRead.com is a privately owned, operated and funded community.