View Full Version : Hebrew aleph and overlining problems


frabjous
06-05-2009, 03:34 AM
I'm currently working on a project of trying to create a quality version of Bertrand Russell's Introduction to Mathematical Philosophy, a public domain title published in 1919, a classic in the philosophy of mathematics.

Beginning with a scan, I then created an HTML version, and have been trying to convert into a number of different formats. I really want to create one that will work on a Kindle.

See my personal project page (http://people.umass.edu/klement/russell-imp.html) here and the HTML version (http://people.umass.edu/klement/imp/imp-c.html) here.

But I seem to have reached some stumbling blocks with trying to convert it to .mobi format. There are two problems. Making matters worse is that I don't own a Kindle or other device capable of dealing with .mobi files personally (--though I can use the viewer in calibre--), so I rely on feedback from others.

The first involves the Hebrew letter aleph, which is used in post-Cantorian set theory for various infinite cardinal numbers. To create these in the HTML version, I use the HTML 4.0 Unicode code ℵ or &# 8501;, which works fine in the HTML version:

http://people.umass.edu/phil592w-klement/misc/html-aleph.gif

Converting the file to .mobi (using calibre) does seem to preserve this in the file. (If I then convert the .mobi to an .epub and stick it in my Sony, the alephs show up.) But it doesn't actually seem to work on a Kindle, at least not the first generation Kindle an acquaintance uses. He took this screenshot:

http://people.umass.edu/phil592w-klement/misc/kindle-aleph.gif

I guess this stems from the Kindle not providing full Unicode support.

Another problem, in a way more serious has to do with some overlining that gets used in the text. Here's an example passage from the HTML version:

http://people.umass.edu/phil592w-klement/misc/html-overbar.gif

This is done using the code <span style="text-decoration: overline;">blah blah</span> in the HTML.

Obviously, this is crucial to the meaning of the passage. But it's lost on the Kindle:

http://people.umass.edu/phil592w-klement/misc/kindle-overbar.gif

So my question is: what are my options here? I doubt I can expect firmware updates from Amazon to help, and I want the file to be useable without hacks or the link to people with 1st generation Kindles. (I'm not doing this for my own sake. I don't even have one.)

Would I get better results with another converter or tool?

Or do I have to do something more radical like replace the codes with small inline images or something similar? I can imagine this working so-so with the aleph, but with the overbar, getting a good quality look, especially if the font size is changeable, seems unlikely.

Any suggestions?

pdurrant
06-05-2009, 04:47 AM
This is done using the code <span style="text-decoration: overline;">blah blah</span> in the HTML.

Obviously, this is crucial to the meaning of the passage. But it's lost on the Kindle:


Kindle format is Mobipocket format. See http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=TagRef_OEB.htm for a list of HTML that Mobipocket readers (including Kindle) support.

I think you'll find that the Mobipocket format just can't do overlines on text. You might have to target ePUB format instead, which should be able to do it, I think.

HarryT
06-05-2009, 05:24 AM
If you are using a Windows PC, you should install the Windows version of the MobiPocket Reader, which you can download from http://www.mobipocket.com.

Jellby
06-05-2009, 07:10 AM
The missing aleph is probably the font's fault. If you'd use a font with that glyph, it'd be displayed. Could you post a test file so I can try in in the Cybook, where I can easily use almost any font?

As for the overline, as pdurrant says, I don't think it's possible with mobipocket, you'd have to find a different notation there, maybe something like neg(p|q). It sould be possible in ePUB, though.

HarryT
06-05-2009, 07:14 AM
The other option would be to use graphic images, but that could be a lot of work. In addition, not all versions of the Mobi reader support "in line" graphics.

I don't often say this, but for a book like this, a PDF version nicely formatted for the page size of the device you're reading on is very probably the best option.

Jellby
06-05-2009, 07:31 AM
I don't often say this, but for a book like this, a PDF version nicely formatted for the page size of the device you're reading on is very probably the best option.

Indeed, I could help in creating the LaTeX source from the HTML, if needed.

frabjous
06-05-2009, 11:04 AM
You might have to target ePUB format instead, which should be able to do it, I think.

Yeah, it works in the ePub version, which I've also made.

If you are using a Windows PC, you should install the Windows version of the MobiPocket Reader, which you can download from http://www.mobipocket.com.

I use Linux, but I do dual-boot on my work computer. Is there any reason to expect different results with their software, however?

The missing aleph is probably the font's fault. If you'd use a font with that glyph, it'd be displayed. Could you post a test file so I can try in in the Cybook, where I can easily use almost any font?

You can find anything you'd need at my project page: Click here. (http://people.umass.edu/klement/russell-imp.html)

As for the overline, as pdurrant says, I don't think it's possible with mobipocket, you'd have to find a different notation there, maybe something like neg(p|q).

Russell uses ~(p|q) elsewhere, but I really wanted to preserve the exact notation if possible. I'm targetting this mainly at academics, many of them specialists in the history of logic. I am even in contact with the staff at the Bertrand Russell archives, who have helped me check some things in Russell's original handwritten manuscript. Perhaps if I add a note explaining it, however.

I don't often say this, but for a book like this, a PDF version nicely formatted for the page size of the device you're reading on is very probably the best option.
Indeed, I could help in creating the LaTeX source from the HTML, if needed.

I've already done that (yes, using LaTeX). You'll find it, along with four other PDF versions, at the same site above. It looks great, but I wanted something that could be read on the Kindle.

(And yes, after getting some good working versions, properly proofread, I'll upload here too.)

Thanks for the advice!

HarryT
06-05-2009, 11:09 AM
I use Linux, but I do dual-boot on my work computer. Is there any reason to expect different results with their software, however?


My comment about the Windows Mobi Reader was in response to your saying that you didn't possess a "real" Mobi device. I was just saying that the Windows Mobi Reader is a "real" version of the Mobi Reader, and will give perhaps give a more accurate view of what the end result will look like on other Mobi devices than will Calibre.

frabjous
06-05-2009, 12:15 PM
Ok, thanks. I'll check that out.

Jellby
06-05-2009, 12:51 PM
Russell uses ~(p|q) elsewhere, but I really wanted to preserve the exact notation if possible.

Fair enough. But sometimes you are limited by the format or medium, as seems to be case here. I'd settle for some alternate notation in this particular case (that is, only for the mobipocket version) and add some note about it.

By the way, I'd deffinitely use a different font for the PDF version. Something like what the kpfonts or fourier LaTeX packages provide. Also, I believe microtype would make a noticeable improvement.

frabjous
06-06-2009, 01:02 AM
Yeah I did notice that Computer/Latin Modern looks pretty light on my Sony reader, though I like the look of them otherwise. I completely forgot about the microtype package too -- thanks for the reminder!

EDIT: Switched the ebook-sized PDF over to kpfonts with microtype. Looks great. Thanks for the advice.

(It also made it possible to easily use old style numerals everywhere, which Russell himself does in the pbook, but I have mixed feelings about that, so I might scrap it.)

I might switch fonts on the larger PDFs too... but right now I need sleep.

frabjous
08-24-2009, 04:22 PM
Two months later, and I'm coming close to finally finishing this project -- it's been a lot of work, especially since I'm distributing the book in 10 different formats (6 differentely sized PDFs, including one sized for e-Ink screens and one for iPhones/iPod touches, plus two different HTML versions, an ePub version and a mobi version.)

Since I never "solved" the issues I started this thread about, for the .mobi version I've stooped to using a tilde ~ for negation rather than overlining, and small inline images for the alephs.

(I've got an Appendix that lists changes from the print editions, some of which are actual honest-to-god fixes nearly a century after initial publication--since I had help from someone who actually read the original manuscript!)

Anyway, I'm here to ask for a favor. Is there anyone out there with a first or second generation Kindle who wouldn't mind testing the .mobi version for me? I want to make sure it looks OK on the Kindle before uploading it here (and regarding it as a "final" version).

(If you have another device, you'd be better off using one of the other formats; but as mentioned above, I wanted one suitable for the Kindle in particular.)

If yes, I'm particularly interested in what the alephs look like. You can find them mainly in chapters 8 and 9 (e.g., "original" page 84 or 92; the original page numbers are marked inline; compare the HTML version linked to below if need be).

I've seen what it looks like in MobiPocket Reader and in Calibre's viewer, but I was really hoping to hear from someone actually using a Kindle.

Mobi version: click here (http://people.umass.edu/klement/imp/imp.mobi)
HTML source version: click here (http://people.umass.edu/klement/imp/imp.html)
Project page (with links to all 10 version): click here (http://people.umass.edu/klement/russell-imp.html)

Thanks in advance!

Jellby
08-25-2009, 07:32 AM
In the ePUB version you have this:

<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"/>

which is declaring both encodings utf-8 and ISO-8859-1, I don't think that's right.

frabjous
08-25-2009, 10:43 AM
Calibre must be doing that. My HTML source is ISO-8859-1, and says so. I'm using calibre to convert it to ePub. Calibre converts it to UTF-8, and must add a tag for it. It should remove the other one. Not sure why it doesn't. Anyway, I haven't had any trouble using it.

Jellby
08-25-2009, 12:00 PM
Calibre must be doing that. My HTML source is ISO-8859-1, and says so. I'm using calibre to convert it to ePub. Calibre converts it to UTF-8, and must add a tag for it. It should remove the other one. Not sure why it doesn't. Anyway, I haven't had any trouble using it.

I noticed when I opened the epub in the web browser, the non-ASCII characters were broken. I don't know what will happen in ePUB readers, though.

frabjous
08-25-2009, 01:24 PM
Which web browser, out of curiousity?

The Greek letters, etc., look fine on my Sony, and in ADE, and on Stanza on a friend's iPhone, but I have had trouble with them not appearing right in calibre's viewer. I assumed it was a fault with calibre's viewer. I asked Kovid about it here (http://www.mobileread.com/forums/showthread.php?t=30639&page=2) -- I wonder if this is the cause. I'll do some investigating. Thanks.

EDIT: No, this doesn't seem to be the cause of the trouble I was having with Greek letters. I tried removing the ISO-... tag, and still got the problem with Greek letters in calibre.

If I just open one of the html parts in a browser, the non-ASCII is broken too, at least until I change the encoding to UTF-8 on my Browser. But apart from the one mentioned above (which seems only to apply to Greek, not emdashes and smart quotes, etc.), I haven't had any trouble with actual ePub viewers. Still it's probably safer to get rid of the confusing extra tag in there.

Jellby
08-25-2009, 01:35 PM
Which web browser, out of curiousity?

Opera for Linux 9.23

frabjous
08-25-2009, 01:53 PM
All right, I've uploaded a new ePub that shouldn't have this problem.

It was sort of my fault. I had two encoding tags in my HTML source:

One reading:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />

And another, redundantly:

<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"/>

Calibre was removing one of them but leaving the other in, while adding its UTF tag.

This redundancy was a side effect of my using a WYSIWYG HTML editor early in the process. I've since decided that WYSIWYG editors should never be trusted!

Unfortunately, it hasn't fixed my display problems with calibre's viewer. I think it'll fix things in Opera, though.

Thanks for your help.

Jellby
08-26-2009, 06:47 AM
This redundancy was a side effect of my using a WYSIWYG HTML editor early in the process. I've since decided that WYSIWYG editors should never be trusted!

Better late than never ;)

Unfortunately, it hasn't fixed my display problems with calibre's viewer. I think it'll fix things in Opera, though.

It looks fine now, using my ePUB-in-browser (http://www.mobileread.com/forums/showthread.php?t=51267) reader :D

amaryeh
12-16-2010, 03:42 AM
I don't know if this is at all relevant over a year later, but I downloaded your book to a kindle 3 and it shows the Alef; looks bitmapped (pixelated rather than raster smooth) and stays the same size when changing font size.

frabjous
12-16-2010, 11:26 AM
Yeah, I ended up using a small bitmap of an aleph in the .mobi version.

As you've noticed, this is far from ideal. I'd still love to hear if anyone knows of an alternative.

The ePub and PDF versions use real text alephs of course.