|05-23-2013, 08:24 AM||#1|
Join Date: May 2011
Location: Khon Kaen, Thailand
Device: HTC Flier
junk charachters importing html to Calibre
Thu 23 May 2013, 6:05 pm
I scan a hard copy book to MS Word. Then save as "filtered html." Then import to Calibre. When I convert the zip file to ePub, quotation marks, apostrophes (possibly other characters) display as "wingdings" or some such strange font, specifically a black diamond with a question mark.
Any suggestions for a fix?
Rex in Thailand
[Promotional links deleted - MODERATOR]
Last edited by Dr. Drib; Today at 09:03 AM.
|05-23-2013, 12:18 PM||#3|
Join Date: Sep 2009
Device: Sony PRS-350/650/T1, PB360, KoboGlo, KoboAuraHD
When saving from MSWord to Webpage filtered it should help if you make sure the output html always uses UTF-8 encoding. In my rather ancient (version 10) copy of MSWord that particular setting is found under Tools - Options - General - Web Options - Encoding - Save this document as - Unicode (UTF-8). But I'm afraid I don't know where Word has moved this option to in more recent versions.
|05-23-2013, 07:45 PM||#4|
Join Date: Mar 2012
Location: NSW Australia
@rexall - if you still have the problem and you have Word 2007/10/13 then try saving the file as a DOCX and using the DOCX_Input plug-in.
I've been using it for a couple of weeks. I can produce much cleaner output using DOCX_Input to convert DOCX to EPUB, than I was ever able to get by converting RTF or Filtered HTML to EPUB .
Consequently editing the EPUB in Sigil is now a viable proposition, primarily because I don't get any MS specific HTML in the EPUB.
Plus the docx files are up to 90% smaller than rtf's, and the conversions are about 30-40% faster. For what more could one ask, I ask .
The documents I convert are very vanilla, if a PDF has a complex layout, images, tables, infographics etc then I put up with the PDF. Which is less of a hardship since Moz put a FAQ PDF reader in Firefox.
You might want to adjust the plugin source code by hand to set the default option values to suit your needs as I did - see post #38 in the plug-in thread.
Last edited by BetterRed; 05-23-2013 at 07:57 PM. Reason: Where to find Web Options in Word 2007
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Importing more files from HTML folder.||sol_barbez||Sigil||2||02-25-2011 11:02 AM|
|HTML importing problem||PaladinBL||Sigil||13||03-16-2010 06:03 PM|
|Importing HTML Files||Shadowlane||Calibre||1||12-19-2009 04:04 PM|
|Looking for Advice with an HTML Importing Problem||deanstow||Calibre||2||10-03-2009 06:14 PM|
|Calibre 0.6.15-16 importing html files as zips||BKeeper||Calibre||2||10-02-2009 04:29 AM|