11-25-2010, 04:22 PM | #1 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
|
Spanish characters dissapear after converting to EPUB
Hello.
First of all, thanks to Kovid for this great piece of software. Good job!. Second of all, yes I have read the FAQ and searched in the forums for post that could help with my problem without any luck. With permision, let me start. I have an Ipod Touch 1G with Stanza app and I use Calibre 0.7.29 to convert mostly RTF, PDF and DOC files to Epub for comfortable reading while on travel. When I convert the book files, no matter what "from" and "to" formats I choose, spanish characters like "ñ" are incorrectly converter to "n" and stressed vowels (á,é,í,ó,) dissapear along with the following character --> example: "río" becomes "r" and it even can join two seperate words together like " allí junto" becoming "alljunto" I have done the following to try to solve this (note that I use Calibre in Spanish language, maybe my translation is not right or precise): A) Starting from RFT files, "Input codification" field in "Conversion common options" (not in Look&Feel as is stated in the FAQ) has been set to the following values (one at a time) when the output is EPUB format: Maybe am I forgotten something, I am becoming very frustrated with this, as its very uncomfortable to read some words where most of the characters have disappeared (example "cíñete" becomes "cete"). 1) windows-1252 (or windows1252) B) Using UTF8 does not improve things, it makes them worst, as stressed bowels and "ñ" become strange characters like "ñ".2) CP-1252 (or cp1252) 4) ISO-8859-1 (or ISO88591) All of the above codifications allow me to read the HTML converted files from RTF (with Microsoft Word) in google Chrome. C) I have repeated A & B steps for PDF output. in this case, an strange interrogation icon can be seen where stressed bowels should be, but "ñ" is rendered correct. D) I have repeated A & B steps, but starting with HTML files (which Calibre puts into zip) to EPUB output, and as previous verification, I have unzipped the HTML file and opened with Google Chrome, and they look nice with right characters. Please, forgive me if the solution is at any post around the corner, I have just been looking for this issue for a whole week before giving up and posting. Thanks for your help in advance. Last edited by zephyrot; 11-25-2010 at 04:29 PM. Reason: translation improvement |
11-25-2010, 06:04 PM | #2 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Non-ascii doesn't work well (at all?) with rtf. It sounds like you've tried a lot of different things, not sure if you've followed these steps.
pdf problems would be completely unrelated - it's probably something wrong with the pdf. Open a bug at bugs.calibre-ebook.com with one of the problem files. |
Advert | |
|
11-26-2010, 11:25 AM | #3 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
|
Thanks for your suggestion, Idolse.
By the way, I forgot to mention before that RTF file used "Verdana" text and I made a second batch of tests like first post with "Times New Roman", which is supposed to contain stressed vowels and "ñ". I have converted the RTF file to filtered HTML, I have modified the HTML file charset to "UTF-8" from the previous "windows-1252". After that, I have executed again the conversion process two times, one setting a blank on Calibre Input Codification field, and another one setting manually setting "UTF-8", but with both cases, all i get is strange characters like "viéndolo" instead of "viéndolo" and "sueños" instead of "sueños". If nobody gives me more suggestions to test (and I will be more than happy to test a bit more before throwing the computer through the window), I will open a bug report. I do not want to open a new bug report yet just in case someone knows a silly quick solution posted in another place that makes me feel like an idiot for not searching harder. Last edited by zephyrot; 11-26-2010 at 11:27 AM. Reason: corrected translation |
11-26-2010, 11:46 AM | #4 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
1)What do you think the character encoding is for your source? 2)Check that you are correct by finding an accented character in your source with a hex editor, and comparing to the character encoding table. 3)Tell Calibre what the input character encoding is for your source. 4)Turn on debugging and find the same character you checked above. Is it correctly encoded at each step? Check the encoding table and use a hex editor. |
|
11-27-2010, 02:14 AM | #5 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
|
|
Advert | |
|
11-27-2010, 10:56 AM | #6 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
@zephyrot,
As a heavy user of making MSWord work with Calibre, I have to say that 'encoding' is one of the few things which has never given me a problem. So I wonder if we might go back to square one. I have attached a tiny epub (created from MSWord to Webpage-filtered) with some Spanish chars in it. The first image below is what it looks like in the Calibre viewer. The second image is a screencap from my Sony PRS505. As you can see the Spanish chars display correctly. I'm afraid I don't have any experience with iPod or Stanza but perhaps you could do the following simple test as a first step:
If 1. is OK and 2. isn't then the problem is at the iPod/Stanza end not the Calibre conversion. If both are OK then we need to take a closer look at your source file and the conversion settings. |
11-28-2010, 03:49 AM | #7 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
|
Hello:
To Starson: Thanks for your suggest, I will check the encoding with an hex editor and let you know. I have enabled debugging with "calibre-debug -g -d C:/debug" but no output on c:/debug. Maybe I'm doing something wrong on this. To dwanthny: I had changed the encoding on Html to ZIP plugin settings, without any luck. Still the same even after updating to 0.7.30 or 0.7.31 and repeating the tests. To jackie_w: I have tested your epub and it is read perfectly, I attach here a file (trimmed) that i'm trying to convert. Please, convert it and send it back to me. If you can convert it and I can not, with the same calibre version, there will be no doubt of a bug in my settings. By the way, my RTF files are exported with Office from .doc original file type. Thanks a lot to all of you. |
11-28-2010, 03:37 PM | #8 |
Junior Member
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
|
I have sent previously a message with answers to you and a sample RFT file, but it looks like moderators have to approve it.
In the mean time, I have found a workaround for my problem. I use Stanza Desktop to convert RTF or PDF (you lose the embedded photos) to epub format, and then use Calibre to add cover, synopsis, and regenerate the epub. With this, the output is perfect and ebooks can be downloaded to Stanza on Ipod without problems. Even more, my wife has gifted me an Ipad, and since Calibre can upload the books to iTunes, the ibook stand looks awesome (although too much big margins). Thanks, anyway I will check your suggestions and let you know about so I can stop having the need to use Stanza Desktop. This is just a quick dirty trick just in case somebody has the same problem than me. Just to note: Current Stanza Desktop has the nasty behavior of inserting the title and author metadata randomly in paragraph beginnings along all the book (even in middle of the text) when exporting to Epub. Fortunately, using "decompress" feature on Calibre and "batch replace in all opened documents" with notepad++ allows to solve that quickly Last edited by zephyrot; 11-29-2010 at 07:38 AM. Reason: additional comment |
Tags |
bug spanish characters |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
after converting to epub, it shows as strange characters? | mhmohamadi | Calibre | 1 | 05-23-2010 03:03 PM |
PRS-600 any way to type spanish accented characters? | arielinflux | Sony Reader | 1 | 03-17-2010 04:22 AM |
special characters in epub? | biltron | Introduce Yourself | 5 | 12-20-2009 03:50 PM |
Converting PDF to MOBI, weird characters in story | cloudyvisions | Calibre | 12 | 05-23-2009 11:45 AM |
Error converting accented characters into LRF with calibre | Seabound | Calibre | 5 | 10-19-2008 12:15 AM |