Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle

Notices

Reply
 
Thread Tools Search this Thread
Old 11-23-2014, 09:58 AM   #16
trekky0623
Member
trekky0623 began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Apr 2013
Device: Kindle Paperwhite
Is there any news on how to get soft hyphens working? I've tried pasting in the soft hyphens into the aliases using the raw output as a guide, and it's not recognizing any aliases that have soft hyphens.
trekky0623 is offline   Reply With Quote
Old 11-23-2014, 03:31 PM   #17
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Quote:
Originally Posted by trekky0623 View Post
Is there any news on how to get soft hyphens working? I've tried pasting in the soft hyphens into the aliases using the raw output as a guide, and it's not recognizing any aliases that have soft hyphens.
No luck on it so far. Not sure why it wouldn't have worked with adding the aliases, maybe an encoding issue. I used Notepad++ for doing mine and it seemed to work alright.
The main issue I'm running into is that words don't always have a single soft hyphen in them, sometimes they have 2 or 3. Unless I can come up with a fancy regular expression that can match the word and still work properly with the HTML-parsing library I'm using, I'm not sure what else I can do.
The only other thing I can think of is to brute-force it by searching every possible combination of position and amount of soft hyphens in every term, but that seems a bit excessive.
If anyone has a simpler solution I'm all ears.
Ephemerality is offline   Reply With Quote
Old 11-23-2014, 03:54 PM   #18
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Quote:
Originally Posted by Ephemerality View Post
No luck on it so far. Not sure why it wouldn't have worked with adding the aliases, maybe an encoding issue. I used Notepad++ for doing mine and it seemed to work alright.
The main issue I'm running into is that words don't always have a single soft hyphen in them, sometimes they have 2 or 3. Unless I can come up with a fancy regular expression that can match the word and still work properly with the HTML-parsing library I'm using, I'm not sure what else I can do.
What about creation of temporary copy of each file with soft hyphens stripped?

Last edited by EbokJunkie; 11-23-2014 at 07:34 PM.
EbokJunkie is offline   Reply With Quote
Old 11-23-2014, 06:13 PM   #19
trekky0623
Member
trekky0623 began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Apr 2013
Device: Kindle Paperwhite
Quote:
Originally Posted by Ephemerality View Post
No luck on it so far. Not sure why it wouldn't have worked with adding the aliases, maybe an encoding issue. I used Notepad++ for doing mine and it seemed to work alright.
The main issue I'm running into is that words don't always have a single soft hyphen in them, sometimes they have 2 or 3. Unless I can come up with a fancy regular expression that can match the word and still work properly with the HTML-parsing library I'm using, I'm not sure what else I can do.
The only other thing I can think of is to brute-force it by searching every possible combination of position and amount of soft hyphens in every term, but that seems a bit excessive.
If anyone has a simpler solution I'm all ears.
I'll try saving it in something other than Notepad, then.


EDIT: Awww yis, that worked perfectly. I opened it with Sublime Text and saved as UTF-8 instead of Western that Notepad uses. Thank you SO MUCH!

Last edited by trekky0623; 11-23-2014 at 06:19 PM.
trekky0623 is offline   Reply With Quote
Old 11-24-2014, 10:18 AM   #20
trekky0623
Member
trekky0623 began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Apr 2013
Device: Kindle Paperwhite
Quote:
Originally Posted by EbokJunkie View Post
What about creation of temporary copy of each file with soft hyphens stripped?
Stripping out soft hyphens would mess up the locations of the terms it finds.

But does C# support regex search? Why not search for aliases like this:

alias: Nessarose

search:N\x{00AD}*e\x{00AD}*s\x{00AD}*s\x{00AD}*a\x {00AD}*r\x{00AD}*o\x{00AD}*s\x{00AD}*e

Which will match soft hyphens 0 or more times between each letter, guaranteeing to find every instance of that term regardless of soft hyphens included in it.


If you wanted to be absolutely sure, you could do some more fancy regex:

search:
Code:
N(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*e(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*s(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*s(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*a(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*r(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*o(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*s(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*e
The key part being the insertion of:

Code:
(\x{00AD}|&​shy;|&#​173;|&#​xad;|&#​0173;|&#​x00AD;)*
between every letter that finds either the literal unicode soft hyphen symbol or the strings &​shy;, &#​173;, &#​xad;, &#​0173;, or &#​x00AD;.

If C# supports inline mode changes like Perl, you could even make that string case-insensitive while preserving the case sensitivity of the alias:

Code:
((?i)\x{00AD}|&​shy;|&​#173;|&​#xad;|&​#0173;|&​#x00AD;(?-i))*

Last edited by trekky0623; 11-24-2014 at 10:53 AM.
trekky0623 is offline   Reply With Quote
Old 11-24-2014, 05:54 PM   #21
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Quote:
Originally Posted by trekky0623 View Post
Stripping out soft hyphens would mess up the locations of the terms it finds.

But does C# support regex search? Why not search for aliases like this:

alias: Nessarose

search:N\x{00AD}*e\x{00AD}*s\x{00AD}*s\x{00AD}*a\x {00AD}*r\x{00AD}*o\x{00AD}*s\x{00AD}*e
Brilliant. I use regex in a few places already in a very similar fashion to this, not sure why I didn't think of trying it that way.
Thanks for your suggestion!

Last edited by Ephemerality; 11-25-2014 at 12:18 AM.
Ephemerality is offline   Reply With Quote
Old 11-24-2014, 06:12 PM   #22
Offie
Enthusiast
Offie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with othersOffie plays well with others
 
Offie's Avatar
 
Posts: 26
Karma: 2716
Join Date: Oct 2014
Device: Kindle 4, Kindle Voyage
I can't seem to get the generated xray to work on the Kindle Voyage (latset update). I tried on the Paperwhite, it worked; but when i put the same files on the Voyage, the xray file disappears as soon as I open the books. Anyone else having trouble with that?
Offie is offline   Reply With Quote
Old 11-24-2014, 06:24 PM   #23
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Quote:
Originally Posted by Offie View Post
I can't seem to get the generated xray to work on the Kindle Voyage (latset update). I tried on the Paperwhite, it worked; but when i put the same files on the Voyage, the xray file disappears as soon as I open the books. Anyone else having trouble with that?
I don't have a Voyage to mess around with, but from what I hear they are expanding the x-ray format so it's possible the current format is not supported anymore. That's just a guess, though.
If you have any books from Amazon on it that have X-Ray working, you can send me one of the X-Ray .asc files via PM and I can have a look at it to see if it is any different.
Ephemerality is offline   Reply With Quote
Old 11-24-2014, 06:44 PM   #24
trekky0623
Member
trekky0623 began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Apr 2013
Device: Kindle Paperwhite
Quote:
Originally Posted by Ephemerality View Post
Brilliant. I use regex in a few places already in a very similar fashion to this, not sure why I didn't think of trying it that way.
I've uploaded a demo version for you to try at https://www.revensoftware.com/files/...1.35-win32.rar.
Let me know if it works for you (and still continues working for other books) and I will upload it as the main version if it does.
Thanks for your suggestion!
It seems to find the terms all right, but all of the positions are way off, and it can't be fixed with the offset because the error seems to increase the farther into the book I go. Further, it seems to be having a problem with names that have html tags in them, like so:

Code:
Something went wrong while searching for start of highlight.
Was looking for (or one of the aliases of): Galinda Upland (aka Glinda)
Searching in:
“Please, it is <i class="calibre10" aid="F8915">Ga</i>linda. The proper old Gil*likinese pro*nun*ci*ation, if you don’t mind.”
But it seems to be getting better in terms of the soft hyphens.

One idea I just had is that, in Notepad++, which is the only thing I can use to create proper chapter markers, it counts certain characters like the soft hyphen as 2 characters. I have no idea why, but that did cause me some trouble in making chapter files at first.

Last edited by trekky0623; 11-24-2014 at 06:47 PM.
trekky0623 is offline   Reply With Quote
Old 11-24-2014, 06:51 PM   #25
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Quote:
Originally Posted by trekky0623 View Post
It seems to find the terms all right, but all of the positions are way off, and it can't be fixed with the offset because the error seems to increase the farther into the book I go.
Whoops, by adding the soft hyphen regex I broke the HTML tag stripping. I'll fix that shortly. Not sure why the locations are off, I'll have to investigate...

Edit: The Voyage does have a new X-Ray format, in an SQLite database. Currently looking through it, not sure how far I'll get without the actual device to see what it looks like when it's displayed.

Last edited by Ephemerality; 11-24-2014 at 07:45 PM.
Ephemerality is offline   Reply With Quote
Old 11-25-2014, 09:54 PM   #26
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Quote:
Originally Posted by Ephemerality View Post
Whoops, by adding the soft hyphen regex I broke the HTML tag stripping. I'll fix that shortly. Not sure why the locations are off, I'll have to investigate...

Edit: The Voyage does have a new X-Ray format, in an SQLite database. Currently looking through it, not sure how far I'll get without the actual device to see what it looks like when it's displayed.
So I'm not entirely sure what's going with the locations being off. The copy of White Fang that I've been testing with only has the issue when I read it in as UTF-8. When I open it in Notepad++ and go to a certain spot, then highlight from that spot back to the beginning, it shows (for example) 4771 characters selected. When I then try to go to offset 4771 (where the Kindle is going to try to read), it is way off of where I think it should be. If I convert the file to ANSI encoding, it works fine.
So there's some issue due to the encoding that I don't entirely understand...

Finished a test version of a converter to convert from the old X-Ray format to the new one that the Voyage is using. If anyone else has a Voyage and wants to try it, send me a PM. It's a command-line tool, so users should be comfortable with that.

Last edited by Ephemerality; 11-25-2014 at 09:59 PM.
Ephemerality is offline   Reply With Quote
Old 11-26-2014, 11:48 AM   #27
trekky0623
Member
trekky0623 began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Apr 2013
Device: Kindle Paperwhite
I just tried out v. 1.36. I only got one error this time:

Code:
Something went wrong while searching for start of highlight.
Was looking for (or one of the aliases of): Sarima
Searching in: <span class="chbeg" aid="LTSU6">S</span>ar*ima,” said her young*est sis*ter, “wake up. Nap*time’s over. We have a house*guest at sup*per, and I need to know if we have to kill a hen. There are so few left, and what we give the trav*eler we miss all winter in eggs <nobr class="calibre13">. . .</nobr> What do you think?”
It looks like the </span> tag is still causing a problem, but this is the only error I got, so every other HTML tag seems to have been fine.

Locations are still way, way off, especially later in the book. I'm still not sure how to fix that.
trekky0623 is offline   Reply With Quote
Old 11-26-2014, 11:56 AM   #28
trekky0623
Member
trekky0623 began at the beginning.
 
Posts: 20
Karma: 10
Join Date: Apr 2013
Device: Kindle Paperwhite
If this helps, these are the locations for the old version where I manually added soft hyphens and the locations for the new version:

Old version:

Code:
"locs":[[658333,561,0,21],[658937,253,0,8],[659431,282,39,8],[660584,256,179,21],[660883,324,25,8],[661535,424,0,8],[662332,162,98,8],[662821,631,104,21],[663869,78,0,8],[664541,254,0,8],[668729,81,30,8],[668853,243,100,8],[669139,416,46,8],[669845,526,51,8],[670414,469,0,8],[672053,215,0,8],[672495,466,108,8],[673594,198,24,8],[675364,335,37,8],[676688,290,0,21],[677115,680,193,8],[678790,619,25,8],[686094,588,63,8],[691502,680,511,8],[696483,128,55,8],[704655,362,79,8],[705258,478,98,8],[705778,161,25,8],[706030,221,102,8],[707288,181,0,8]]
New version:

Code:
"locs":[[622780,521,0,7],[623344,236,0,7],[623810,271,36,7],[624895,240,170,7],[625178,304,25,7],[625790,399,0,7],[626550,153,91,7],[627017,597,100,7],[627898,197,124,7],[628660,245,0,7],[632638,73,24,7],[632754,239,98,7],[633036,397,40,7],[633714,485,47,7],[634242,434,0,7],[635787,203,0,7],[636212,449,104,7],[637260,178,24,7],[638936,321,36,7],[640211,279,0,7],[640622,645,177,7],[642190,590,20,7],[649163,565,60,7],[654339,630,477,7],[659090,117,48,7],[666844,347,78,7],[667431,453,94,7],[667926,143,19,7],[668157,209,95,7],[669343,176,0,7]]
The character's name is Oatsie, with a soft hyphen like Oat-sie, and given that in the new version the length appears to be 7 every time, and in the old version it's 8, I'm convinced it's due to soft hyphens being counted as 2 bytes rather than 1.
trekky0623 is offline   Reply With Quote
Old 11-26-2014, 12:10 PM   #29
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Quote:
Originally Posted by trekky0623 View Post
I just tried out v. 1.36. I only got one error this time:

Code:
Something went wrong while searching for start of highlight.
Was looking for (or one of the aliases of): Sarima
Searching in: <span class="chbeg" aid="LTSU6">S</span>ar*ima,” said her young*est sis*ter, “wake up. Nap*time’s over. We have a house*guest at sup*per, and I need to know if we have to kill a hen. There are so few left, and what we give the trav*eler we miss all winter in eggs <nobr class="calibre13">. . .</nobr> What do you think?”
It looks like the </span> tag is still causing a problem, but this is the only error I got, so every other HTML tag seems to have been fine.
Oops, my bad again. I made an error in the logic in that it assumes there is only either HTML tags in a string, or soft hyphens, not both. I'll have to make a larger regular expression that will match for all of them.

Quote:
Originally Posted by trekky0623 View Post
If this helps, these are the locations for the old version where I manually added soft hyphens and the locations for the new version:
The character's name is Oatsie, with a soft hyphen like Oat-sie, and given that in the new version the length appears to be 7 every time, and in the old version it's 8, I'm convinced it's due to soft hyphens being counted as 2 bytes rather than 1.
I think you're right. I had issues with that before, but I was hoping it would work itself out with the encoding change. I'll see if I can figure something out and PM you a new version to try.
Ephemerality is offline   Reply With Quote
Old 11-30-2014, 06:06 PM   #30
Ephemerality
Addict
Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.Ephemerality ought to be getting tired of karma fortunes by now.
 
Posts: 328
Karma: 800105
Join Date: Feb 2013
Device: PW1
Version 1.41 has been uploaded. If anyone has a PW2 with firmware 5.6, it would be nice to see the the XRAY.asc files are in the old format or the new format.
The Kindle Voyage is definitely using the new format, so it would be nice to see if it works at all for anyone with those devices.
Ephemerality is offline   Reply With Quote
Reply

Tags
x-ray


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Windows 8 with Kindle Application nomadreader Amazon Kindle 4 03-16-2013 02:47 PM
Which Windows pc / Android application will keep epub annotations consistent? internalaudit Reading and Management 0 03-01-2013 09:55 AM
how do you create your application installer for Windows. KevinH Calibre 4 01-07-2011 09:04 PM
portable application for windows ebook reading rheostaticsfan Reading and Management 8 06-27-2008 08:26 PM


All times are GMT -4. The time now is 07:37 AM.


MobileRead.com is a privately owned, operated and funded community.