![]() |
#331 |
Fanatic
![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
|
I just tested it and it makes no difference at all. Junk In Junk Out
I created a new html file with Compozer. It generates the header with the UTF-8 already in place. I inserted (cut and past with Notepad++ ) the text from the original file and it stays the same. I did the same with Word and with 'WebPage' from Trellian |
![]() |
![]() |
![]() |
#332 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,154
Karma: 3252017
Join Date: Jan 2008
Location: Germany
Device: Pocketbook Touch Lux (623)
|
:-( I'm quickly running out of ideas.
Could you post the mobi file? I'll be at home in less than two hours, and I should be able to have a closer look at what happens then. |
![]() |
![]() |
Advert | |
|
![]() |
#333 | ||
Fanatic
![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
|
Quote:
![]() Quote:
|
||
![]() |
![]() |
![]() |
#334 | |
New York Editor
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
|
Quote:
______ Dennis |
|
![]() |
![]() |
![]() |
#335 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,376
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
MOBI files specify their encoding in the header. Not sure if mobi2html uses that information. Try mobi2oeb
|
![]() |
![]() |
Advert | |
|
![]() |
#336 | |
Fanatic
![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
|
I guess the 'problem' lies in the mobifile. I tried mob2oeb and got the following output:
Quote:
|
|
![]() |
![]() |
![]() |
#337 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
I am reluctant to have support for 1252 so I will probably just assume that the input html file is UTF-8. |
|
![]() |
![]() |
![]() |
#338 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
|
|
![]() |
![]() |
![]() |
#339 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,376
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The codepage is encoded in bytes 24-28 of the header. It is 1252 for windows-1252 and 65001 for UTF-8
See https://libprs500.kovidgoyal.net/bro.../reader.py#L97 |
![]() |
![]() |
![]() |
#340 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,154
Karma: 3252017
Join Date: Jan 2008
Location: Germany
Device: Pocketbook Touch Lux (623)
|
|
![]() |
![]() |
![]() |
#341 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
I have now fiixed mobi2html so in the next release it will include the correct meta header depending of codepage. But I wonder if the codepage can be trusted. I checked some books in the download section here and it had 1252 as codepage. Is this correct? Does every reader have to have two encodings of each font? Or is it translated internally?
|
![]() |
![]() |
![]() |
#342 |
Fanatic
![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
|
Perhaps you can make it an option. If no parameter is given, use the internal codepage. If somebody is not happy with that, give them the choice to force a codepage
|
![]() |
![]() |
![]() |
#343 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
There is a bug in MobiPerl regarding your problem. The links will not work since the UTF-8 characters is not handled correctly. And they are translated to the wrong HTML entities. If you use --rawhtml to get what is in the MobiPocket file and add the meta tag fir UTF-8 to this then it will probably work better in a browser. Non-breakable space did work but I got some characters that did not work. I have to read up on how to handle UTF-8 in Perl so I cannot do a fast fix... |
|
![]() |
![]() |
![]() |
#344 | ||
Fanatic
![]() ![]() ![]() ![]() ![]() Posts: 527
Karma: 470
Join Date: Sep 2007
Location: The Netherlands
Device: Kindle Oasis
|
Quote:
Quote:
Ok, My mistake, I made a typo inserting the string. It works Last edited by Ortep; 02-26-2008 at 04:24 PM. Reason: Typo |
||
![]() |
![]() |
![]() |
#345 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
Do anybody know were I can find correctly coded MobiPocket files which use utf-8 and have a table of content and uses utf-8 character sequences like "0xe2 ox80 0x99" (') or "0xc2 0xa0" (nbsp). I wonder if mobigen will give me such a fille. I will test... |
|
![]() |
![]() |
![]() |
Tags |
mobi2mobi, mobils |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mobi2Mobi Mobi2Mobi v0.13 - GUI for Mobiperl tools | Jad | Kindle Formats | 476 | 03-15-2015 05:51 PM |
Tools for Editing Kindle .mobi Files? | GJN | Kindle Formats | 33 | 12-26-2013 02:05 PM |
Handy Perl Script to convert HTML0 files to smartquotes | maggotb0y | Sony Reader | 0 | 04-12-2007 11:49 AM |
PRS-500 Perl tools to generate Reader content | TadW | Sony Reader Dev Corner | 0 | 01-08-2007 05:55 AM |
gmail copy (gmcp) - Perl script to copy files to/from Gmail | Colin Dunstan | Lounge | 0 | 09-04-2004 01:24 PM |