Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2011, 09:10 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,997
Karma: 1285294
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
how can I determine source ( txt ) encoding ?

other than guess work - how do I figure how to stop this

“Wales? For six weeks! You’re going to miss me.”

becoming this

Wales? For six weeks! You’re going to miss me.

when I convert to epub or mobi.

I have tried normal settings, then the cp1252 - no joy

with normal settings I get lots of ? in boxes on Kindle replaceing " ! etc.

is there a logical approach ?
cybmole is offline   Reply With Quote
Old 01-26-2011, 09:18 AM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,473
Karma: 1053245
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by cybmole View Post
is there a logical approach ?
  1. Try to let TXT input try to auto detect.
  2. Try cp1252.
  3. Try utf-8.
  4. Open it in a text editor and save as utf-8. Sometimes you will need to copy and paste into a new document and then save.
  5. Try opening it in Firefox and see if it can auto detect.
  6. Open it in Firefox and keep changing the encoding (it's a setting under view) until it looks right.
  7. Try using the asciiize option to transform “ to ".
  8. Give up.

Almost all encodings for TXT files do not have any marking as to what they are. Figuring it out of often a guess.
user_none is offline   Reply With Quote
Old 01-26-2011, 09:36 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,997
Karma: 1285294
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
ok - well eventually, cp1250 was the answer.
utf8 made a mess with lots of black diamonds.

playing with settings after opening txt in in notepad++ gave me values of 92. 93. 94 on the problem characters when converting to utf8 with that program but I could not see how to use that info.

Opening the txt in word looked clean, but after saving it as filtered Html & then reimporting & reconverting, the heuristic engine did not clean up line feeds like it did for txt.

- so I reverted to trying txt to epub & trying all options in the encoding drop down list until I got a result.

if that drop down had not been pre-populated with possible solutions, I'd have been lost!

PS txt looked ok when opened in firefox but I did not see how to get firefox to tell me the encoding - under view - I saw autodetect=off & characte set = western iso 8859

( when I say looked OK in firefox, it looked like the posted example with an opening slanted quote. after converting to cp1250, it has proper curly quotes in the epub & looked nicer.

I guess I should go google cp1250 1251 1252 & learn lots of stuff I never really wanted to have to know!....

& google says
"CP1250 is Eastern European (not ISO-8859-2) CP1251 is Cyrillic (not ISO-8859-5) CP1252 is Western European (not ISO-8859-1)... "

so the non-geeky explanation is that my text was in eastern euopean encoding ?

Last edited by cybmole; 01-26-2011 at 09:38 AM.
cybmole is offline   Reply With Quote
Old 01-26-2011, 09:46 AM   #4
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,473
Karma: 1053245
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by cybmole View Post
Opening the txt in word looked clean...
That's when copy and pasting into notepad++ then saving that as utf-8 would have worked.


Quote:
Originally Posted by cybmole View Post
but after saving it as filtered Html & then reimporting & reconverting, the heuristic engine did not clean up line feeds like it did for txt.
?

Quote:
Originally Posted by cybmole View Post
so the non-geeky explanation is that my text was in eastern euopean encoding ?
Yep. Older versions of Windows (might still be true for 7) set the default encoding based on your language selection. So if someone in Greece had saved that file then the encoding would have been automatically set to the one associated with their language choice. Even if the document isn't in Greek.
user_none is offline   Reply With Quote
Old 01-26-2011, 10:04 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,997
Karma: 1285294
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
makes sense. I'll know now to try cp1250 in future, if I see those slanted opening quotes.

I'll postpone learning about utf8 until I 'm forced to :-)

my 2nd point was this:

convert from txt to epub with heuristics on cleaned up all of the flow issues
but convert zip to epub did not ( the zip being the result of open in word, save as filtered html etc...) probably because of what word did tot he file in the interim.

with source in notepad++ If I chose encode in UTF8, I see stuff that does not paste well to here. but you reckon calibre would have understood it ?
e.g. the test line would begin X93Wales ( with X93 bit in inverse video)..... youX92re.... miss me.x94

Last edited by cybmole; 01-26-2011 at 10:06 AM.
cybmole is offline   Reply With Quote
Old 01-26-2011, 11:03 AM   #6
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,909
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Something to keep in mind, at some point you'll be missing the same characters doing a html to epub. In this case the character encoding has to be entered in the html to zip file plugin prior to importing the html file (see attached).
Attached Thumbnails
Click image for larger version

Name:	html_to_zip_plugin.jpg
Views:	106
Size:	32.3 KB
ID:	65523  
DoctorOhh is offline   Reply With Quote
Old 01-26-2011, 11:09 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,997
Karma: 1285294
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by dwanthny View Post
Something to keep in mind, at some point you'll be missing the same characters doing a html to epub. In this case the character encoding has to be entered in the html to zip file plugin prior to importing the html file (see attached).
Hmm. let me check Ive got this please....

you are saying that if the source is, a htm or html file that happens to have non standard encoding,
then adding it to cailibre followed by a ZIP to Epub conversion, with the look & field screen encoding box set, will fail.
- because ithe encoding conversion needed to be done as the file was being imported to calibre ?
cybmole is offline   Reply With Quote
Old 01-26-2011, 11:11 AM   #8
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,909
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by cybmole View Post
you are saying that if the source is, a htm or html file that happens to have non standard encoding,
then adding it to cailibre followed by a ZIP to Epub conversion, with the look & field screen encoding box set, will fail.
- because ithe encoding conversion needed to be done as the file was being imported to calibre ?
Correct, as explained here in the FAQ.
Quote:
This is because the HTML2ZIP plugin automatically converts the HTML files to a standard encoding (utf-8).

Last edited by DoctorOhh; 01-26-2011 at 11:16 AM.
DoctorOhh is offline   Reply With Quote
Old 01-26-2011, 11:39 AM   #9
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,473
Karma: 1053245
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Specifying the encoding for html to zip is only required if the HTML file itself does not specify the encoding. E.G. something like this is missing:

Code:
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
user_none is offline   Reply With Quote
Old 01-26-2011, 12:13 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cybmole View Post
Hmm. let me check Ive got this please....

you are saying that if the source is, a htm or html file that happens to have non standard encoding,
then adding it to cailibre followed by a ZIP to Epub conversion, with the look & field screen encoding box set, will fail.
No. My understanding is that there is a problem only when: 1) the encoding isn't specified inside the file (usually it is), and 2) that encoding is not UTF8 and 3) the file actually includes characters that are encoded differently from UTF8. In that case, those characters will display incorrectly, unless the correct character encoding is specified during import.
Starson17 is offline   Reply With Quote
Old 01-26-2011, 12:18 PM   #11
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,909
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by Starson17 View Post
No. My understanding is that there is a problem only when: 1) the encoding isn't specified inside the file (usually it is), and 2) that encoding is not UTF8 and 3) the file actually includes characters that are encoded differently from UTF8. In that case, those characters will display incorrectly, unless the correct character encoding is specified during import.
Based on the original premise and subsequent qualifications the answer is Yes. The premise was that one might be missing characters as he experienced in the original post. If he was then the html 2 zip plugin might be the culprit.

It was a preemptive heads up so he was aware of this possibility.
DoctorOhh is offline   Reply With Quote
Old 01-26-2011, 12:56 PM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by dwanthny View Post
Based on the original premise and subsequent qualifications the answer is Yes. The premise was that one might be missing characters as he experienced in the original post. If he was then the html 2 zip plugin might be the culprit.

It was a preemptive heads up so he was aware of this possibility.
I'm confused. I took the question to be "If the source has non standard encoding... will (normal) conversion fail?" to which I answered "No," except when that non-standard encoding isn't specified internally (which it usually is, and even if it isn't, conversion may not "fail" if the file doesn't use any non-standard encoded characters). Perhaps you took the question to be something different? I don't really care what question was asked or if the answer is yes or no, so long as I understand the process.

You are more of an expert on this issue than I am, so was something I posted wrong?
Starson17 is offline   Reply With Quote
Old 01-26-2011, 08:25 PM   #13
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,909
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by dwanthny View Post
Something to keep in mind, at some point you'll be missing the same characters doing a html to epub. In this case the character encoding has to be entered in the html to zip file plugin prior to importing the html file (see attached).
Quote:
Originally Posted by Starson17 View Post
You are more of an expert on this issue than I am, so was something I posted wrong?
I am not an expert in this area and what you said sounds correct, but he was responding to the specific idea I broached above. Where I let him know that there was a chance he could lose the same apostrophes and quotes (like is original post) without the input character encoding in the look and feel section being the solution.
DoctorOhh is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there an easy way to determine the publisher? GA Russell General Discussions 7 11-19-2010 01:19 PM
Encoding prusaks Recipes 0 09-27-2010 07:25 AM
How to determine if a Cybook Opus is the new version without opening the box morpheus99 Bookeen 3 07-27-2010 11:30 AM
how to tell the character encoding??? rheostaticsfan Calibre 23 06-21-2010 04:26 PM
Why Software Will Determine the Future of E-Reading (PC Magazine) Nate the great News 25 01-23-2010 09:17 AM


All times are GMT -4. The time now is 01:12 PM.


MobileRead.com is a privately owned, operated and funded community.