Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-14-2009, 01:15 AM   #1
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Exclamation weird ascii character

Upon conversion, first from the command line and then from the GUI, of a txt file to an epub, I see a weird character shaped like a diamond, black, with a white question mark in the middle. This character seems to be replacing mostly the apostrophes in the sentences. This is supposed to be an ascii txt file, so I don't understand why it can't display the apostrophe. before conversion, the file looks fine.
Thanks,
Paul
p3aul is offline   Reply With Quote
Old 10-14-2009, 05:54 AM   #2
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Probably not ascii

Sometimes texts that are advertised as ASCII are not really. If you look at the text the apostrophes, quotes, etc have & in front of them then a series of numbers such as &8216#.

As I understand it, calibre tries to guess what the text is and when it guesses wrong you get these odd symbols.

You can go through using a search and replace to eliminate them, which is a lot of work or you specify the code page or encoding and see if that makes them go away.

A lot of times cp1252 will work.

Some day this will all be settled, but by then most of us will not be making books for ourselves as many of us no longer write BASIC programs to accomplish what we want to do with a computer.
mrmikel is offline   Reply With Quote
Advert
Old 10-14-2009, 11:41 AM   #3
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Yeah, I'm going to play with this some. The original file was in RTF and I just copied and pasted it into notepad. After that I ran it through Textify, and it looked good. When you copy and paste like that, does it retain the encoding? If so, how could I re-encode it?
Thanks
p3aul is offline   Reply With Quote
Old 10-14-2009, 01:05 PM   #4
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
weird ascii

In calibre under look and feel, input character encoding you can try different encodings to see if they match what you have. cp1252, utf-8 iso 8859-1 are some possibilities. There is another thread on encodings I happened to see another day, you might search for it. I did not find an official list of what calibre accepts and in just exactly what format.

Or you can open the document in a notepad like program and see what they are using for these odd characters. You can do a search for the & symbol and replace all the characters that have this encoding with keyboard characters. Then you will be closer to ascii. If you find a character while you are in your notepad program, copy it and paste into the search area of your browser and hit enter. Google at least will gladly print it for you. Then you will know what you need to search and replace with, being sure to wrap around so you get all of them.
mrmikel is offline   Reply With Quote
Old 10-14-2009, 01:14 PM   #5
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Thanks, I saved your suggestions to a tip file I keep on calibre.

I did solve it by loading the file into word, saving it as html and then saving that as a text file. I then converted the resulting text file to epub.

Long way around but it worked!
p3aul is offline   Reply With Quote
Advert
Old 10-14-2009, 01:28 PM   #6
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
You should have been able to convert the HTML file direct to EPUB.
itimpi is offline   Reply With Quote
Old 10-14-2009, 04:08 PM   #7
p3aul
Captain Courageous
p3aul doesn't litterp3aul doesn't litter
 
p3aul's Avatar
 
Posts: 239
Karma: 102
Join Date: Apr 2009
Device: calibre, PRS 505
Quote:
You should have been able to convert the HTML file direct to EPUB.
I tried that too but I still got the strange characters. It seems the characters it didn't like were the ommlat? (^) and the (..) above a letter. For some strange reason it didn't like the apostrophe either. According to Firefox the encoding was Western Windows 1252. I wonder if changing the encoding in firefox "View menu" would change the encoding when save the page, either as html or txt. Is there a program that you can specify a txt file and change its encoding?
Thanks,
Paul

UPDATE:

The solution was right under my nose! I just loaded the Textified version into word, saved it as txt(unicode UT-8) and then converted that. Result: perfectly formatted epub file!

Thanks for all your contributions.

Last edited by p3aul; 10-14-2009 at 05:10 PM.
p3aul is offline   Reply With Quote
Old 10-14-2009, 05:10 PM   #8
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Specify the encoding of the input file when you are converting.

--input-encoding cp1252
user_none is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Non-ascii symbols on Story James_Wilde iRiver Story 13 09-12-2010 05:15 AM
PDF -> AZW conversion, weird character spacing beacher Amazon Kindle 7 08-17-2010 09:54 PM
Ascii file ProDigit Lounge 1 12-25-2008 10:08 PM
"ascii' codec can't encode character" bug ? zelda_pinwheel Calibre 2 12-21-2008 08:12 PM
WM Live Video in ASCII! TadW Lounge 1 06-22-2006 07:14 PM


All times are GMT -4. The time now is 03:19 AM.


MobileRead.com is a privately owned, operated and funded community.