Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-15-2015, 06:03 PM   #16
C Alberga
Enthusiast
C Alberga began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
I have read that, but didn't find any help. The input file, HTML, is in ANSI. I work on it with a rather old and simple text-editor which doesn't handle anything else. That is why I'm using the &#x....; forms and things like € and ƒ.

I'm going to post two HTML files which contain all the characters which I am currently using, although I may need more in the future.

I have been unable to find a combination of properties which will yield e-books (epub, mobi, or azw3) that correspond to the text that is displayed in my (FireFox) browser. I have found that when I import them as HTML (not ZIPed) they mostly display correctly in the Calibre viewer. I have used black rectangles for undefined characters and control characters.

I'm not sure where to go from here. Perhaps just load the HTML onto the Kindle?
C Alberga is offline   Reply With Quote
Old 06-15-2015, 06:42 PM   #17
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,122
Karma: 60406498
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
@ BR and @eschwartz

I think you are both correct.
Depends on when the config folder (contents) was created

Since I copy my config to new machines, I keep my old settings, so I don't know what a newbie might have.
theducks is offline   Reply With Quote
Old 06-15-2015, 07:28 PM   #18
C Alberga
Enthusiast
C Alberga began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
I'm running 1.48 on a 32 bit XP Pro machine. AZW3 was not check (nor was ZIP). I changed the input encoding in the HTML2ZIP to ansi, with the result that performing an Add Book to and HTML file now does NOT zip it! On examining one of the non-ZIPed HTML file I find that no changes have been made to in input. On converting to AZW3 with utf-8 as the encoding I find that special characters are munged as before. Editing the AZW3 I can replace the changed characters and have the displayed text become correct. But after-conversion editing is not really a reasonable solution. Next?
C Alberga is offline   Reply With Quote
Old 06-15-2015, 10:35 PM   #19
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,800
Karma: 30237628
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by theducks View Post
@ BR and @eschwartz

I think you are both correct.
Depends on when the config folder (contents) was created

Since I copy my config to new machines, I keep my old settings, so I don't know what a newbie might have.
@theducks - I was referring to the Default settings - this is what I have in a fresh, unadulterated, 32bit 1.48 install (and if I reset) and what I see in 2.30 if I do a reset - as you can see the defaults have not changed over that period at least.

Click image for larger version

Name:	Capture.JPG
Views:	191
Size:	124.2 KB
ID:	139329 Click image for larger version

Name:	Capture2.JPG
Views:	194
Size:	144.0 KB
ID:	139330

My memory (admittedly, not as long as yours) is that AZW3 has always been checked. But that said, I doubt it has any bearing on OPs problem.

Quote:
Originally Posted by C Alberga View Post
But after-conversion editing is not really a reasonable solution. Next?
@C Alberga - Have to respectfully disagree, given the nature of the conversion process, many of us expect to have to do some editing after conversion, if we don't then that's a bonus. FWIW the only 'perfect' conversions I get are from DOCX input of simple text - e.g. novels, biographies etc.

In this sticky How to ask a question about conversion problems, there's a link to the calibre bugs tracking system - if you report your problem there, you can attach your text file and mark the post private.

BR
BetterRed is offline   Reply With Quote
Old 06-15-2015, 10:36 PM   #20
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,434
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
ansi is almost certainly the incorrect encoding. You need to figure out what the encoding used in the html file is. Probably latin1 or cp1252.

Last edited by kovidgoyal; 06-15-2015 at 10:40 PM.
kovidgoyal is offline   Reply With Quote
Old 06-16-2015, 12:39 AM   #21
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
BR -- makes sense that 1.48 defaults to checked, AZW3 was well established by then. I think I first set it somewhere in the 0.8.x-0.9.x series.

@C Alberga -- as Kovid said. Just because you can only use ANSI characters doesn't mean that is what the file encoding is.
eschwartz is offline   Reply With Quote
Old 06-16-2015, 02:48 AM   #22
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,800
Karma: 30237628
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by eschwartz View Post
@BR -- makes sense that 1.48 defaults to checked, AZW3 was well established by then. I think I first set it somewhere in the 0.8.x-0.9.x series.
@eschwartz - mea culpa, I resurrected my old Mint notebook, it has 0.8.51 installed, AZW3 NOT checked.

Quote:
Originally Posted by eschwartz View Post
@C Alberga -- as Kovid said. Just because you can only use ANSI characters doesn't mean that is what the file encoding is.
@C Alberga - have you tweaked this setting

Click image for larger version

Name:	Screenshot - 2015-06-16 , 16_26_47.jpg
Views:	188
Size:	108.0 KB
ID:	139332

BR
BetterRed is offline   Reply With Quote
Old 06-16-2015, 10:20 AM   #23
C Alberga
Enthusiast
C Alberga began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
I've tried the top three values, with no luck. I'm going to create set of files and try every combination there and see what happens. It will take a bit of time (busy day today) but I'll report back.

I'll examine the HTML files with a hex editor and see if I can pin-point the encoding.
C Alberga is offline   Reply With Quote
Old 06-17-2015, 12:38 PM   #24
C Alberga
Enthusiast
C Alberga began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
The HTML file in question is Latin1 plus Latin1 Supplement http://www.w3schools.com/charsets/re...supplement.asp, which is cp1252 less the characters in the 120 to 159 range. The displayed text of this HTML file, however, contains characters in that range, as well as two-byte utf-8 characters.

I had assumed that the "input character encoding" referred to the characters in the HTML file, not the characters displayed by the viewer or browser when interpreting that file. I thus tried both cp1252 and Latin1, neither of which produced correct output (in epub, mobi, or azw3 format). However setting the input to utf-8 does work. It would seem that a definition of this (to me) idiosyncratic use of "input character" be provided.
C Alberga is offline   Reply With Quote
Old 06-17-2015, 12:59 PM   #25
C Alberga
Enthusiast
C Alberga began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
Curiously even this doesn't always work. I went back to one of my small test cases (see attachments), and none of the characters from 160 up are correctly displayed in the azw3 file, but are in the HTML. Converted with utf-8 selected, and transliterate to ASCII not checked.
Attached Files
File Type: azw3 All 256 characters _ a few extr - Cyril N. Alberga.azw3 (11.2 KB, 68 views)
C Alberga is offline   Reply With Quote
Old 06-17-2015, 07:19 PM   #26
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,434
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
For HTML you want to leave input character encoding blank and put the correct encoding in the settings for the HTML2ZIP plugin. Or leave both blank and declare the html encoding correctly inthe html file itself using a <meta charset> tag.

If you wish to specify an encoding in both places, you specify the HTML file encoding in the HTML2ZIP plugin settings and utf-8 as the input character encoding.

The HTML2ZIP plugin reads in the HTML file using the encoding specified in its settings. It always results in an HTML file inside the zip file in utf-8.

So, one last time:

1) HTML2ZIP setting: cp1252
2) Input character encoding: blank
kovidgoyal is offline   Reply With Quote
Old 06-18-2015, 02:57 PM   #27
C Alberga
Enthusiast
C Alberga began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Jul 2012
Device: none
Thank you, that does it. I know I have had a hard time getting this right, and I guess I don't mind the "one last time:", but out of curiosity, could you point me to the places in the documentation that would have made this clear to me? Perhaps when I have some other problem I can learn how to navigate the "help", documentation, and forums without so many false starts.
C Alberga is offline   Reply With Quote
Old 06-18-2015, 09:54 PM   #28
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,434
Karma: 27757438
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's in the FAQ: http://manual.calibre-ebook.com/faq....r-smart-quotes
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help needed - converting ePub to Kindle mobi format VRBurnett Conversion 3 02-26-2012 04:06 PM
Overlapping text when converting html to mobi/epub TopCat Conversion 4 11-28-2011 06:13 AM
Converting Mobi or HTML file to Epub Patuba Sigil 1 07-23-2011 04:14 PM
Converting Mobi or HTML file to Epub Patuba ePub 7 07-19-2011 12:11 PM
Calibre Indent Issue When Removing Blank Lines (Converting From HTML to MOBI or EPUB) David Derrico Calibre 5 08-04-2010 12:13 AM


All times are GMT -4. The time now is 05:36 AM.


MobileRead.com is a privately owned, operated and funded community.