Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 02-26-2008, 09:07 AM   #331
Ortep
Fanatic
Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.
 
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
I just tested it and it makes no difference at all. Junk In Junk Out

I created a new html file with Compozer. It generates the header with the UTF-8 already in place. I inserted (cut and past with Notepad++ ) the text from the original file and it stays the same. I did the same with Word and with 'WebPage' from Trellian
Ortep is offline   Reply With Quote
Old 02-26-2008, 10:39 AM   #332
Gudy
Wizard
Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.
 
Gudy's Avatar
 
Posts: 1,154
Karma: 3252017
Join Date: Jan 2008
Location: Germany
Device: Pocketbook Touch Lux (623)
:-( I'm quickly running out of ideas.

Could you post the mobi file? I'll be at home in less than two hours, and I should be able to have a closer look at what happens then.
Gudy is offline   Reply With Quote
Advert
Old 02-26-2008, 10:48 AM   #333
Ortep
Fanatic
Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.
 
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
Quote:
Originally Posted by Gudy View Post
:-( I'm quickly running out of ideas.

Could you post the mobi file? I'll be at home in less than two hours, and I should be able to have a closer look at what happens then.
Im not sure if I can post it. It is one I bought from Steve Jordan. It contans the following text

Quote:
Lambs Hide, Tigers Seek e-Book edition is copyright © Steve Jordan. All rights reserved. This e-Book edition is intended for private use only. Lambs Hide, Tigers Seek e-Book edition does not apply Digital Rights Management (DRM). It is the desire of the author to promote the use of e-Books, and the reading of his own e-Books, with a minimum of DRM issues for the reader to deal with. He is therefore assuming that the majority of readers are relatively honest and benevolent, and would rather read a good book than take advantage of someone. Please do not reproduce this book for the purposes of mass distribution without the express permission of Steven Jordan. After all, he’s just a guy trying to make a few bucks. What, you don’t think people can afford a couple lousy bucks for a full-length novel? What are you, an anarchist or something?
I have no problems reading then mobi file on my Cybook. I was only using it as a test case for mobi2html
Ortep is offline   Reply With Quote
Old 02-26-2008, 11:28 AM   #334
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by Gudy View Post
Directly behind the <head> is usually best, just look at the source html for this site for an example (Ctrl-U in Mozilla, I don't know what Firefox uses).
Firefox uses Ctrl-U as well. No surprise - Firefox and Mozilla are based on the same underlying code.
______
Dennis
DMcCunney is offline   Reply With Quote
Old 02-26-2008, 12:12 PM   #335
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,376
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
MOBI files specify their encoding in the header. Not sure if mobi2html uses that information. Try mobi2oeb
kovidgoyal is online now   Reply With Quote
Advert
Old 02-26-2008, 01:15 PM   #336
Ortep
Fanatic
Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.
 
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
I guess the 'problem' lies in the mobifile. I tried mob2oeb and got the following output:

Quote:
Traceback (most recent call last):
File "reader.py", line 330, in <module>
File "reader.py", line 319, in main
File "reader.py", line 168, in extract_content
File "reader.py", line 217, in extract_text
File "libprs500\ebooks\mobi\palmdoc.pyo", line 45, in decompress_doc UnicodeDecodeError: 'utf8' codec can't decode byte 0x9d in position 0: unexpected code byte
So it is also complaining about UTF8
Ortep is offline   Reply With Quote
Old 02-26-2008, 02:19 PM   #337
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by kovidgoyal View Post
MOBI files specify their encoding in the header. Not sure if mobi2html uses that information. Try mobi2oeb
Currently I do not do anything with the data with the respect to character coding. I just save it. i should probably add the meta header (it is now on my todo list).

I am reluctant to have support for 1252 so I will probably just assume that the input html file is UTF-8.
tompe is offline   Reply With Quote
Old 02-26-2008, 02:35 PM   #338
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by Gudy View Post
That pretty much is the correct punctuation, albeit in UTF-8 encoding. Is the content encoding set correctly in the html file? There should be a line like the following somewhere near the beginning of the file:

Code:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If there isn't, it should be considered a bug in mobi2html. If there is, either get a better browser or a tool to convert UTF-8 into something more palatable, like e.g. numeric html entities.
What is the correct header if the coding is 1252? Is1252 actually used in Mobiocket books and does it work on the Cybook?
tompe is offline   Reply With Quote
Old 02-26-2008, 02:39 PM   #339
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,376
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The codepage is encoded in bytes 24-28 of the header. It is 1252 for windows-1252 and 65001 for UTF-8

See https://libprs500.kovidgoyal.net/bro.../reader.py#L97
kovidgoyal is online now   Reply With Quote
Old 02-26-2008, 02:42 PM   #340
Gudy
Wizard
Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.Gudy ought to be getting tired of karma fortunes by now.
 
Gudy's Avatar
 
Posts: 1,154
Karma: 3252017
Join Date: Jan 2008
Location: Germany
Device: Pocketbook Touch Lux (623)
Quote:
Originally Posted by tompe View Post
What is the correct header if the coding is 1252?
Code:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
If Google is to be believed. Looks about right to me.
Gudy is offline   Reply With Quote
Old 02-26-2008, 02:51 PM   #341
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by Gudy View Post
Code:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
If Google is to be believed. Looks about right to me.
I have now fiixed mobi2html so in the next release it will include the correct meta header depending of codepage. But I wonder if the codepage can be trusted. I checked some books in the download section here and it had 1252 as codepage. Is this correct? Does every reader have to have two encodings of each font? Or is it translated internally?
tompe is offline   Reply With Quote
Old 02-26-2008, 03:00 PM   #342
Ortep
Fanatic
Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.
 
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
Perhaps you can make it an option. If no parameter is given, use the internal codepage. If somebody is not happy with that, give them the choice to force a codepage
Ortep is offline   Reply With Quote
Old 02-26-2008, 03:38 PM   #343
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by Ortep View Post
Perhaps you can make it an option. If no parameter is given, use the internal codepage. If somebody is not happy with that, give them the choice to force a codepage
Yes, something like that.

There is a bug in MobiPerl regarding your problem. The links will not work since the UTF-8 characters is not handled correctly. And they are translated to the wrong HTML entities. If you use --rawhtml to get what is in the MobiPocket file and add the meta tag fir UTF-8 to this then it will probably work better in a browser. Non-breakable space did work but I got some characters that did not work.

I have to read up on how to handle UTF-8 in Perl so I cannot do a fast fix...
tompe is offline   Reply With Quote
Old 02-26-2008, 04:14 PM   #344
Ortep
Fanatic
Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.Ortep has a complete set of Star Wars action figures.
 
Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
Quote:
Originally Posted by tompe View Post
If you use --rawhtml to get what is in the MobiPocket file and add the meta tag fir UTF-8 to this then it will probably work better in a browser.
I tried it and it works great in a browser. When I tried to re-create the mobifile from the output I got the following error message from MobiCreator:

Quote:
Error(prcgen): Source file is not valid UTF8.
in file: D:\Users\Ortep\Documents\My Publications\test\test.html
No output was generated. Tomorrow I will look a the file and see if I can solve it.


Ok, My mistake, I made a typo inserting the string. It works

Last edited by Ortep; 02-26-2008 at 04:24 PM. Reason: Typo
Ortep is offline   Reply With Quote
Old 02-26-2008, 05:51 PM   #345
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by kovidgoyal View Post
MOBI files specify their encoding in the header. Not sure if mobi2html uses that information. Try mobi2oeb
How have you handled utf-8 characters? Do you know if the filepos is on the byte stream or is it on the character stream where 2 or 3 byte sequences can be counted as one character? I have an example file but either method give a strange position that filepos is pointing to...

Do anybody know were I can find correctly coded MobiPocket files which use utf-8 and have a table of content and uses utf-8 character sequences like "0xe2 ox80 0x99" (') or "0xc2 0xa0" (nbsp).

I wonder if mobigen will give me such a fille. I will test...
tompe is offline   Reply With Quote
Reply

Tags
mobi2mobi, mobils


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mobi2Mobi Mobi2Mobi v0.13 - GUI for Mobiperl tools Jad Kindle Formats 476 03-15-2015 05:51 PM
Tools for Editing Kindle .mobi Files? GJN Kindle Formats 33 12-26-2013 02:05 PM
Handy Perl Script to convert HTML0 files to smartquotes maggotb0y Sony Reader 0 04-12-2007 11:49 AM
PRS-500 Perl tools to generate Reader content TadW Sony Reader Dev Corner 0 01-08-2007 05:55 AM
gmail copy (gmcp) - Perl script to copy files to/from Gmail Colin Dunstan Lounge 0 09-04-2004 01:24 PM


All times are GMT -4. The time now is 01:37 AM.


MobileRead.com is a privately owned, operated and funded community.