Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-01-2015, 02:45 PM   #1
Bilingual
Member
Bilingual began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2013
Device: none
Cool Is it possible to create hidden, searchable highlightable text?

My request might be impossible, but since I generally suck at HTML and CSS, I'd better ask before giving up.

I'm preparing an EPUB with a lot of key elements with variant spellings.

Let's say, I wirite a text containing the word "colour", and that word should be fully text searchable within the (long) text. Now, among my target audience there might be some americans prone to search for the "colour" word using their own spelling "color". I would very much like to acommodate that, so I have been experimenting with various formatting options. Best results obtained so far seems to be with this code:

Code:
colour<span style="font-size:0">color</span>
Searching for "color" thus bring me to the correct place in my text, BUT if at all possible, I would additionally really like "colour" to highlight in the manner as when searching by the main spelling variant.

I wonder if this could be obtained by a mixture of overlapping and transparent text elements, but my skills are sorely inadequate for the task, and intense googling around hasn't brought up any hints.

What say you? Totally impossible?
Bilingual is offline   Reply With Quote
Old 05-01-2015, 02:51 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
The appropriate way to deal with this is for UK/US versions of the book.

There is no way stick both words there, and your workaround is probably the best you can do -- and will cause curated search results to look mildly bizarre.
eschwartz is offline   Reply With Quote
Advert
Old 05-01-2015, 03:46 PM   #3
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Searching might be possible, highlighting, if possible at all, depends on the reading software being used.

I would:
1. Dislike a reader finding and highlighting "color" when I search for "colour".
2. Welcome some search "fuzziness" options, like accent-insensitive, curly/straight quotes, or, why not, variant spellings... but this should be done by the software performing the search, not in the book!
Jellby is offline   Reply With Quote
Old 05-01-2015, 04:12 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Fuzzy searching that handles variant spelling would be a pretty cool feature in the ereader engine. I thought of that, but AFAIK nobody does this (yet).

I do know the Kindle (fw5.3.7) has fuzzy searching that handles inflections. Maybe there is hope. Or maybe later firmware that I haven't seen already does that.
eschwartz is offline   Reply With Quote
Old 05-02-2015, 06:33 AM   #5
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
I would simply spell the word as color, having never met a Brit who wasn't aware of American spelling and usage, whereas even in the era of Harry Potter few Americans will tolerate British usage. (The Rowling books are translated for the US market.) I even sense that American usage in punctuation is infiltrating the British book world (double quotes, punctuation inside the close-quote).
Notjohn is offline   Reply With Quote
Advert
Old 05-02-2015, 10:43 AM   #6
Bilingual
Member
Bilingual began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2013
Device: none
Quote:
Originally Posted by Notjohn View Post
I would simply spell the word as color, having never met a Brit who wasn't aware of American spelling and usage, whereas even in the era of Harry Potter few Americans will tolerate British usage. (The Rowling books are translated for the US market.) I even sense that American usage in punctuation is infiltrating the British book world (double quotes, punctuation inside the close-quote).
Well, it's a bit more complicated than that. The color/colour example was just ment to be that: an example. What I really need is to have entirely different words pointed to the same place within a huge text.

Another project, I'm trying to get ahead with is a latin text. Her, I'd like to have some stress-markers in the text, but have them basically invisible for the purpose of text searching.

Example: the latin word sapie'ntior is marked here with an "apostrophe" after the vowel, that should be stressed, e.g. sapiENtior. Nevertheless, I'd like the word to be looked up searching for just sapientior.

I have been pondering over how to achieve this, and my best guess so far has been a span of a sort styled as "stressed" which may trigger a font substitution, where the vowels would have said stress-apostrophes as a part of the vowel itself:

Code:
sapi<span style="stressed">e</span>ntior
should then be searcheable as sapientior, but be shown in the text as sapie'ntior.

Anyone know how to put alternative fonts within an EPUB? Would probably have to replace the reader's default font as well, so the alternative font would not look too far apart.
Bilingual is offline   Reply With Quote
Old 05-02-2015, 10:56 AM   #7
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Embedding fonts is a standard technique, you can find plenty of information around. But note that the <span> may break both searching and hyphenation, depending on the engine.

You could abuse fonts in a similar way, but map, for instance, "é" to your "e + apostrophe", assuming you don't have other "é"s in your Latin words, and then use <span class="whatever">sapiéntor</span> (or simply use accents to mark the stress, as is already done in a number of languages). Then you'd need the reader to support diacritic-insensitive searches, maybe some do already.
Jellby is offline   Reply With Quote
Old 05-02-2015, 11:35 PM   #8
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I have to wonder how worth it this is anyway, surely that is (again) the job of the ereader renderer.

The good news is that the Kindle, at least, supports exactly this fuzzy (diacritic-agnostic) searching.
Hopefully other ereaders do as well.

Just include the word with the appropriate diacritics and stop worrying.
eschwartz is offline   Reply With Quote
Old 05-04-2015, 08:30 AM   #9
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Bilingual View Post
Another project, I'm trying to get ahead with is a latin text. Her, I'd like to have some stress-markers in the text, but have them basically invisible for the purpose of text searching.
IMHO, the easiest solution would be to mark the stressed syllable as bold. You could also automatically display it in upper case letters using text-transform: uppercase. However, very few readers support the text-transform property.

For example:

Spoiler:
Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
  
<style type="text/css">
.stressed {font-weight: bold; text-transform: uppercase;}
</style>
</head>

<body>
  <h3>Latin stress examples</h3>

  <p>for<span class="stressed">tu</span>na</p>

  <p>philo<span class="stressed">so</span>phia</p>

  <p>pe<span class="stressed">cu</span>nia</p>
</body>
</html>


If you're planning to write a couple of Latin textbooks it might make sense to create a custom font in which all regular Latin lowercase characters have an acute accent.
Since the Latin Extended-A and Latin Extended-B Unicode ranges already contain precomposed glyphs with acutes, all you'd have to do is overwrite the regular Latin glyphs with these characters.

This can be easily done with a freeware font editor, e.g. Type light.
Doitsu is offline   Reply With Quote
Old 05-05-2015, 01:57 AM   #10
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by eschwartz View Post
I have to wonder how worth it this is anyway, surely that is (again) the job of the ereader renderer.

The good news is that the Kindle, at least, supports exactly this fuzzy (diacritic-agnostic) searching.
Hopefully other ereaders do as well.
I'm fairly certain that anything based on the WebKit rendering engine does, and that covers just about every recent reader except for the ones that are based on ADE/RMSDK. So if ADE works, then you're golden.
dgatwood is offline   Reply With Quote
Old 05-05-2015, 02:25 AM   #11
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
Quote:
Originally Posted by Doitsu View Post
If you're planning to write a couple of Latin textbooks it might make sense to create a custom font in which all regular Latin lowercase characters have an acute accent.
Won't work. You have to have letters without accents in the same word, and if you try to wrap a single syllable or letter with a font change, you're likely to run headfirst into a nasty ADE bug. ADE (at least as of about a year ago) egregiously and flagrantly violates the HTML spec by allowing lines to wrap at tag boundaries even when there's no whitespace.

You might be able to work around that bug by adding a zero-width nonbreaking space at the tag boundaries, but then you probably won't be able to find the word in any reader, because there will be whitespace within the word....

If you want to go a little bit nuts, you might be able to get away with something like this:

Code:
<span style="display: inline-block; width: 0; color: rgba(0,0,0,0.0); visibility: hidden;">hominibus</span>homínibus
but that's quite a bit beyond the EPUB 2 required subset, so even that isn't guaranteed to work.

Basically, I'm pretty sure there's no way to hack this that won't cause even bigger problems, and even if there is a way, it really isn't a good idea. Just use the proper UTF-8 characters for the accented versions of the characters. It should "just work".

Last edited by dgatwood; 05-05-2015 at 02:27 AM.
dgatwood is offline   Reply With Quote
Old 05-05-2015, 05:14 AM   #12
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by dgatwood View Post
Won't work. You have to have letters without accents in the same word, and if you try to wrap a single syllable or letter with a font change, you're likely to run headfirst into a nasty ADE bug.
The bug that you've linked to doesn't seem to affect the solution that I've suggested.

I've slapped together a proof-of-concept ePub with a Charis SIL based custom font that contains only the letters á, é, í, ó, ú and ý; it works fine with iBooks, ADE 1.7.2 - 4.0.3, ADE-based iOS apps and CoolReader.
Attached Thumbnails
Click image for larger version

Name:	bluefire.PNG
Views:	178
Size:	31.5 KB
ID:	137927  
Attached Files
File Type: epub latin_font_test.epub (50.4 KB, 110 views)

Last edited by Doitsu; 05-05-2015 at 05:17 AM.
Doitsu is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it possible to create interlinear text epubs? Doitsu ePub 8 05-24-2012 11:12 AM
PDF Conversion doesn't see hidden text Bearbait Conversion 3 02-18-2011 02:56 PM
How to create text frame superanima ePub 1 12-12-2010 04:19 PM
PDFs and Hidden Text Layers aidren enTourage Archive 4 04-14-2010 01:23 PM
BookDesigner - How to create a hidden ToC? komugi Reading and Management 3 07-21-2009 01:12 PM


All times are GMT -4. The time now is 02:12 PM.


MobileRead.com is a privately owned, operated and funded community.