![]() |
#16 |
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
|
I have read that, but didn't find any help. The input file, HTML, is in ANSI. I work on it with a rather old and simple text-editor which doesn't handle anything else. That is why I'm using the &#x....; forms and things like € and ƒ.
I'm going to post two HTML files which contain all the characters which I am currently using, although I may need more in the future. I have been unable to find a combination of properties which will yield e-books (epub, mobi, or azw3) that correspond to the text that is displayed in my (FireFox) browser. I have found that when I import them as HTML (not ZIPed) they mostly display correctly in the Calibre viewer. I have used black rectangles for undefined characters and control characters. I'm not sure where to go from here. Perhaps just load the HTML onto the Kindle? |
![]() |
![]() |
![]() |
#17 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,122
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
@ BR and @eschwartz
I think you are both correct. Depends on when the config folder (contents) was created Since I copy my config to new machines, I keep my old settings, so I don't know what a newbie might have. |
![]() |
![]() |
![]() |
#18 |
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
|
I'm running 1.48 on a 32 bit XP Pro machine. AZW3 was not check (nor was ZIP). I changed the input encoding in the HTML2ZIP to ansi, with the result that performing an Add Book to and HTML file now does NOT zip it! On examining one of the non-ZIPed HTML file I find that no changes have been made to in input. On converting to AZW3 with utf-8 as the encoding I find that special characters are munged as before. Editing the AZW3 I can replace the changed characters and have the displayed text become correct. But after-conversion editing is not really a reasonable solution. Next?
|
![]() |
![]() |
![]() |
#19 | ||
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,800
Karma: 30237628
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
My memory (admittedly, not as long as yours) is that AZW3 has always been checked. But that said, I doubt it has any bearing on OPs problem. Quote:
In this sticky How to ask a question about conversion problems, there's a link to the calibre bugs tracking system - if you report your problem there, you can attach your text file and mark the post private. BR |
||
![]() |
![]() |
![]() |
#20 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,434
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
ansi is almost certainly the incorrect encoding. You need to figure out what the encoding used in the html file is. Probably latin1 or cp1252.
Last edited by kovidgoyal; 06-15-2015 at 10:40 PM. |
![]() |
![]() |
![]() |
#21 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
BR -- makes sense that 1.48 defaults to checked, AZW3 was well established by then. I think I first set it somewhere in the 0.8.x-0.9.x series.
![]() @C Alberga -- as Kovid said. Just because you can only use ANSI characters doesn't mean that is what the file encoding is. |
![]() |
![]() |
![]() |
#22 | ||
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,800
Karma: 30237628
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Quote:
BR |
||
![]() |
![]() |
![]() |
#23 |
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
|
I've tried the top three values, with no luck. I'm going to create set of files and try every combination there and see what happens. It will take a bit of time (busy day today) but I'll report back.
I'll examine the HTML files with a hex editor and see if I can pin-point the encoding. |
![]() |
![]() |
![]() |
#24 |
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
|
The HTML file in question is Latin1 plus Latin1 Supplement http://www.w3schools.com/charsets/re...supplement.asp, which is cp1252 less the characters in the 120 to 159 range. The displayed text of this HTML file, however, contains characters in that range, as well as two-byte utf-8 characters.
I had assumed that the "input character encoding" referred to the characters in the HTML file, not the characters displayed by the viewer or browser when interpreting that file. I thus tried both cp1252 and Latin1, neither of which produced correct output (in epub, mobi, or azw3 format). However setting the input to utf-8 does work. It would seem that a definition of this (to me) idiosyncratic use of "input character" be provided. |
![]() |
![]() |
![]() |
#25 |
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
|
Curiously even this doesn't always work. I went back to one of my small test cases (see attachments), and none of the characters from 160 up are correctly displayed in the azw3 file, but are in the HTML. Converted with utf-8 selected, and transliterate to ASCII not checked.
|
![]() |
![]() |
![]() |
#26 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,434
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
For HTML you want to leave input character encoding blank and put the correct encoding in the settings for the HTML2ZIP plugin. Or leave both blank and declare the html encoding correctly inthe html file itself using a <meta charset> tag.
If you wish to specify an encoding in both places, you specify the HTML file encoding in the HTML2ZIP plugin settings and utf-8 as the input character encoding. The HTML2ZIP plugin reads in the HTML file using the encoding specified in its settings. It always results in an HTML file inside the zip file in utf-8. So, one last time: 1) HTML2ZIP setting: cp1252 2) Input character encoding: blank |
![]() |
![]() |
![]() |
#27 |
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
|
Thank you, that does it. I know I have had a hard time getting this right, and I guess I don't mind the "one last time:", but out of curiosity, could you point me to the places in the documentation that would have made this clear to me? Perhaps when I have some other problem I can learn how to navigate the "help", documentation, and forums without so many false starts.
|
![]() |
![]() |
![]() |
#28 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,434
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's in the FAQ: http://manual.calibre-ebook.com/faq....r-smart-quotes
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help needed - converting ePub to Kindle mobi format | VRBurnett | Conversion | 3 | 02-26-2012 04:06 PM |
Overlapping text when converting html to mobi/epub | TopCat | Conversion | 4 | 11-28-2011 06:13 AM |
Converting Mobi or HTML file to Epub | Patuba | Sigil | 1 | 07-23-2011 04:14 PM |
Converting Mobi or HTML file to Epub | Patuba | ePub | 7 | 07-19-2011 12:11 PM |
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) | David Derrico | Calibre | 5 | 08-04-2010 12:13 AM |