Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-25-2010, 04:22 PM   #1
zephyrot
Junior Member
zephyrot began at the beginning.
 
zephyrot's Avatar
 
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
Unhappy Spanish characters dissapear after converting to EPUB

Hello.

First of all, thanks to Kovid for this great piece of software. Good job!.
Second of all, yes I have read the FAQ and searched in the forums for post that could help with my problem without any luck.

With permision, let me start.

I have an Ipod Touch 1G with Stanza app and I use Calibre 0.7.29 to convert
mostly RTF, PDF and DOC files to Epub for comfortable reading while on travel.

When I convert the book files, no matter what "from" and "to" formats I choose, spanish characters like "ñ" are incorrectly converter to "n" and stressed vowels (á,é,í,ó,) dissapear along with the following character --> example: "río" becomes "r" and it even can join two seperate words together like " allí junto" becoming "alljunto"

I have done the following to try to solve this (note that I use Calibre in Spanish language, maybe my translation is not right or precise):
A) Starting from RFT files, "Input codification" field in "Conversion common options" (not in Look&Feel as is stated in the FAQ) has been set to the following values (one at a time) when the output is EPUB format:
1) windows-1252 (or windows1252)
2) CP-1252 (or cp1252)
4) ISO-8859-1 (or ISO88591)

All of the above codifications allow me to read the HTML converted files
from RTF (with Microsoft Word) in google Chrome.
B) Using UTF8 does not improve things, it makes them worst, as stressed bowels and "ñ" become strange characters like "ñ".

C) I have repeated A & B steps for PDF output. in this case, an strange interrogation icon can be seen where stressed bowels should be, but "ñ" is rendered correct.

D) I have repeated A & B steps, but starting with HTML files (which Calibre puts into zip) to EPUB output, and as previous verification, I have unzipped the HTML file and opened with Google Chrome, and they look nice with right characters.
Maybe am I forgotten something, I am becoming very frustrated with this, as its very uncomfortable to read some words where most of the characters have disappeared (example "cíñete" becomes "cete").

Please, forgive me if the solution is at any post around the corner, I have just been looking for this issue for a whole week before giving up and posting.

Thanks for your help in advance.

Last edited by zephyrot; 11-25-2010 at 04:29 PM. Reason: translation improvement
zephyrot is offline   Reply With Quote
Old 11-25-2010, 06:04 PM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Non-ascii doesn't work well (at all?) with rtf. It sounds like you've tried a lot of different things, not sure if you've followed these steps.
  1. Open in Word
  2. Save as 'Filtered HTML', use UTF-8 as your encoding.
  3. Add the HTML to Calibre and convert from that

pdf problems would be completely unrelated - it's probably something wrong with the pdf. Open a bug at bugs.calibre-ebook.com with one of the problem files.
ldolse is offline   Reply With Quote
Advert
Old 11-26-2010, 11:25 AM   #3
zephyrot
Junior Member
zephyrot began at the beginning.
 
zephyrot's Avatar
 
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
Unhappy

Thanks for your suggestion, Idolse.

By the way, I forgot to mention before that RTF file used "Verdana" text
and I made a second batch of tests like first post with "Times New Roman", which is supposed to contain stressed vowels and "ñ".

I have converted the RTF file to filtered HTML, I have modified the HTML file charset to "UTF-8" from the previous "windows-1252".

After that, I have executed again the conversion process two times,
one setting a blank on Calibre Input Codification field, and another one setting manually setting "UTF-8", but with both cases, all i get is strange characters like "viéndolo" instead of "viéndolo" and "sueños" instead of "sueños".

If nobody gives me more suggestions to test (and I will be more than happy to test a bit more before throwing the computer through the window), I will open a bug report.

I do not want to open a new bug report yet just in case someone knows a silly quick solution posted in another place that makes me feel like an idiot for not searching harder.

Last edited by zephyrot; 11-26-2010 at 11:27 AM. Reason: corrected translation
zephyrot is offline   Reply With Quote
Old 11-26-2010, 11:46 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by zephyrot View Post
If nobody gives me more suggestions to test (and I will be more than happy to test a bit more before throwing the computer through the window), I will open a bug report.
This looks to me like a simple character encoding problem. I would just trace it through step by step to see where the problem is.
1)What do you think the character encoding is for your source?
2)Check that you are correct by finding an accented character in your source with a hex editor, and comparing to the character encoding table.
3)Tell Calibre what the input character encoding is for your source.
4)Turn on debugging and find the same character you checked above. Is it correctly encoded at each step? Check the encoding table and use a hex editor.
Starson17 is offline   Reply With Quote
Old 11-27-2010, 02:14 AM   #5
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by Starson17 View Post
This looks to me like a simple character encoding problem. I would just trace it through step by step to see where the problem is.
1)What do you think the character encoding is for your source?
2)Check that you are correct by finding an accented character in your source with a hex editor, and comparing to the character encoding table.
3)Tell Calibre what the input character encoding is for your source.
4)Turn on debugging and find the same character you checked above. Is it correctly encoded at each step? Check the encoding table and use a hex editor.
@zephyrot also note there is a separate place to designate encoding for html files prior to adding the html to calibre. Check under Preferences - Plugins - File type plugins - HTML to ZIP - customize to change encoding prior to adding the html to calibre.
DoctorOhh is offline   Reply With Quote
Advert
Old 11-27-2010, 10:56 AM   #6
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,196
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
@zephyrot,

As a heavy user of making MSWord work with Calibre, I have to say that 'encoding' is one of the few things which has never given me a problem. So I wonder if we might go back to square one.

I have attached a tiny epub (created from MSWord to Webpage-filtered) with some Spanish chars in it.

The first image below is what it looks like in the Calibre viewer. The second image is a screencap from my Sony PRS505. As you can see the Spanish chars display correctly.

I'm afraid I don't have any experience with iPod or Stanza but perhaps you could do the following simple test as a first step:
  1. View the attached epub in your Calibre viewer and confirm that it looks OK on your PC.
  2. Transfer it to your reader in your normal way. Is it still OK?

If 1. is OK and 2. isn't then the problem is at the iPod/Stanza end not the Calibre conversion. If both are OK then we need to take a closer look at your source file and the conversion settings.
Attached Thumbnails
Click image for larger version

Name:	calview.jpg
Views:	406
Size:	24.9 KB
ID:	61906   Click image for larger version

Name:	sony.jpg
Views:	420
Size:	15.8 KB
ID:	61907  
Attached Files
File Type: epub Spanish.epub (88.5 KB, 355 views)
jackie_w is offline   Reply With Quote
Old 11-28-2010, 03:49 AM   #7
zephyrot
Junior Member
zephyrot began at the beginning.
 
zephyrot's Avatar
 
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
Hello:

To Starson: Thanks for your suggest, I will check the encoding with an hex
editor and let you know. I have enabled debugging with "calibre-debug -g
-d C:/debug" but no output on c:/debug. Maybe I'm doing something wrong
on this.

To dwanthny: I had changed the encoding on Html to ZIP plugin settings,
without any luck. Still the same even after updating to 0.7.30 or 0.7.31
and repeating the tests.

To jackie_w: I have tested your epub and it is read perfectly, I attach here
a file (trimmed) that i'm trying to convert. Please, convert it and send it
back to me. If you can convert it and I can not, with the same calibre
version, there will be no doubt of a bug in my settings.
By the way, my RTF files are exported with Office from .doc original
file type.

Thanks a lot to all of you.
Attached Files
File Type: rtf El libro de las sombras contadas.rtf (28.1 KB, 426 views)
zephyrot is offline   Reply With Quote
Old 11-28-2010, 03:37 PM   #8
zephyrot
Junior Member
zephyrot began at the beginning.
 
zephyrot's Avatar
 
Posts: 7
Karma: 10
Join Date: Nov 2010
Location: Where no man has gone before
Device: BQ Cervantes Touch Light
I have sent previously a message with answers to you and a sample RFT file, but it looks like moderators have to approve it.

In the mean time, I have found a workaround for my problem.

I use Stanza Desktop to convert RTF or PDF (you lose the embedded photos)
to epub format, and then use Calibre to add cover, synopsis, and regenerate the epub.

With this, the output is perfect and ebooks can be downloaded to Stanza on Ipod
without problems.

Even more, my wife has gifted me an Ipad, and since Calibre can upload
the books to iTunes, the ibook stand looks awesome (although too much
big margins).

Thanks, anyway I will check your suggestions and let you know about
so I can stop having the need to use Stanza Desktop.

This is just a quick dirty trick just in case somebody has the same problem than me.

Just to note: Current Stanza Desktop has the nasty behavior of inserting the title
and author metadata randomly in paragraph beginnings along all the book (even in middle of the text) when exporting to Epub. Fortunately, using "decompress" feature on Calibre and "batch replace in all opened documents" with notepad++ allows to solve that quickly

Last edited by zephyrot; 11-29-2010 at 07:38 AM. Reason: additional comment
zephyrot is offline   Reply With Quote
Reply

Tags
bug spanish characters

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
after converting to epub, it shows as strange characters? mhmohamadi Calibre 1 05-23-2010 03:03 PM
PRS-600 any way to type spanish accented characters? arielinflux Sony Reader 1 03-17-2010 04:22 AM
special characters in epub? biltron Introduce Yourself 5 12-20-2009 03:50 PM
Converting PDF to MOBI, weird characters in story cloudyvisions Calibre 12 05-23-2009 11:45 AM
Error converting accented characters into LRF with calibre Seabound Calibre 5 10-19-2008 12:15 AM


All times are GMT -4. The time now is 06:17 AM.


MobileRead.com is a privately owned, operated and funded community.