![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
how can I determine source ( txt ) encoding ?
other than guess work - how do I figure how to stop this
“Wales? For six weeks! You’re going to miss me.” becoming this Wales? For six weeks! You’re going to miss me. when I convert to epub or mobi. I have tried normal settings, then the cp1252 - no joy with normal settings I get lots of ? in boxes on Kindle replaceing " ! etc. is there a logical approach ? |
![]() |
![]() |
![]() |
#2 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Almost all encodings for TXT files do not have any marking as to what they are. Figuring it out of often a guess. |
![]() |
![]() |
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
ok - well eventually, cp1250 was the answer.
utf8 made a mess with lots of black diamonds. playing with settings after opening txt in in notepad++ gave me values of 92. 93. 94 on the problem characters when converting to utf8 with that program but I could not see how to use that info. Opening the txt in word looked clean, but after saving it as filtered Html & then reimporting & reconverting, the heuristic engine did not clean up line feeds like it did for txt. - so I reverted to trying txt to epub & trying all options in the encoding drop down list until I got a result. if that drop down had not been pre-populated with possible solutions, I'd have been lost! PS txt looked ok when opened in firefox but I did not see how to get firefox to tell me the encoding - under view - I saw autodetect=off & characte set = western iso 8859 ( when I say looked OK in firefox, it looked like the posted example with an opening slanted quote. after converting to cp1250, it has proper curly quotes in the epub & looked nicer. I guess I should go google cp1250 1251 1252 & learn lots of stuff I never really wanted to have to know!.... & google says "CP1250 is Eastern European (not ISO-8859-2) CP1251 is Cyrillic (not ISO-8859-5) CP1252 is Western European (not ISO-8859-1)... " so the non-geeky explanation is that my text was in eastern euopean encoding ? Last edited by cybmole; 01-26-2011 at 08:38 AM. |
![]() |
![]() |
![]() |
#4 | |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
That's when copy and pasting into notepad++ then saving that as utf-8 would have worked.
Quote:
Yep. Older versions of Windows (might still be true for 7) set the default encoding based on your language selection. So if someone in Greece had saved that file then the encoding would have been automatically set to the one associated with their language choice. Even if the document isn't in Greek. |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
makes sense. I'll know now to try cp1250 in future, if I see those slanted opening quotes.
I'll postpone learning about utf8 until I 'm forced to :-) my 2nd point was this: convert from txt to epub with heuristics on cleaned up all of the flow issues but convert zip to epub did not ( the zip being the result of open in word, save as filtered html etc...) probably because of what word did tot he file in the interim. with source in notepad++ If I chose encode in UTF8, I see stuff that does not paste well to here. but you reckon calibre would have understood it ? e.g. the test line would begin X93Wales ( with X93 bit in inverse video)..... youX92re.... miss me.x94 Last edited by cybmole; 01-26-2011 at 09:06 AM. |
![]() |
![]() |
![]() |
#6 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Something to keep in mind, at some point you'll be missing the same characters doing a html to epub. In this case the character encoding has to be entered in the html to zip file plugin prior to importing the html file (see attached).
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
you are saying that if the source is, a htm or html file that happens to have non standard encoding, then adding it to cailibre followed by a ZIP to Epub conversion, with the look & field screen encoding box set, will fail. - because ithe encoding conversion needed to be done as the file was being imported to calibre ? |
|
![]() |
![]() |
![]() |
#8 | ||
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
Quote:
Last edited by DoctorOhh; 01-26-2011 at 10:16 AM. |
||
![]() |
![]() |
![]() |
#9 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Specifying the encoding for html to zip is only required if the HTML file itself does not specify the encoding. E.G. something like this is missing:
Code:
<meta http-equiv="content-type" content="text/html; charset=utf-8"/> |
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
![]() |
#11 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
It was a preemptive heads up so he was aware of this possibility. |
|
![]() |
![]() |
![]() |
#12 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
You are more of an expert on this issue than I am, so was something I posted wrong? |
|
![]() |
![]() |
![]() |
#13 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is there an easy way to determine the publisher? | GA Russell | General Discussions | 7 | 11-19-2010 12:19 PM |
Encoding | prusaks | Recipes | 0 | 09-27-2010 06:25 AM |
How to determine if a Cybook Opus is the new version without opening the box | morpheus99 | Bookeen | 3 | 07-27-2010 10:30 AM |
how to tell the character encoding??? | rheostaticsfan | Calibre | 23 | 06-21-2010 03:26 PM |
Why Software Will Determine the Future of E-Reading (PC Magazine) | Nate the great | News | 25 | 01-23-2010 08:17 AM |