02-27-2023, 04:39 AM | #31 |
Enthusiast
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
|
|
02-27-2023, 04:53 AM | #32 |
Enthusiast
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
|
How to extract a book in one HTML file? If you rename it to zip, then there are a lot of separate files.
|
Advert | |
|
02-27-2023, 06:32 AM | #33 | |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Also... we're pretty far from this being a Sigil issue at this point. Time to wrap it up. Last edited by DiapDealer; 02-27-2023 at 06:46 AM. |
|
02-27-2023, 09:16 AM | #34 | |
Enthusiast
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
|
Quote:
|
|
02-27-2023, 10:45 AM | #35 |
Resident Curmudgeon
Posts: 74,027
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Why are we still getting convoluted ways to convert to UTF-8 when I've already given a very simple solution?
|
Advert | |
|
02-27-2023, 10:49 AM | #36 | |
Addict
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
Quote:
1. Explode/unzip your ebook into a temporary directory. 2. For each file in the temporary directory which has the wrong encoding, convert it to UTF-8. NOTE: if you are not careful about which encodings you use, iconv can introduce more encoding errors. Example follows. Note that the curly quotes I use are UTF-8 encoded characters; after running the echo command, “example” is UTF-8 encoded. Code:
# This example demonstrates an encoding error! Be careful not to do this by mistake. $ echo '“Hello, world!”' >example $ iconv -f cp1252 -t utf-8 example “Hello, world!â€iconv: illegal input sequence at position 18 Code:
for file in $(find . -type f) ; do if [[ $(chardetect "$file" --minimal) == "Windows-1252" ]] ; then iconv -f cp1252 -t utf-8 "$f" -o "$f.utf8" mv "$f.utf8" "$f" fi done Character encodings can be a pain in the butt. I hope this helps. |
|
02-27-2023, 10:53 AM | #37 |
Addict
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
I haven't tested the Modify Epub plugin for this purpose myself, and I don't have any convenient ebooks to test it with, but I do see the option you mention. The hover text says that it “removes any existing charset tags on html pages and encodes in UTF-8”. OP's charset tags already say that the content is UTF-8, so the question in my mind which I would want to test out before relying on this feature is, how does Modify Epub detect the origin encoding? If it takes the original charset tag as gospel then it will get this wrong. If it does get it right somehow, then your solution will be much more convenient than the loop I gave above.
|
02-27-2023, 11:37 AM | #38 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I think calibre's python includes the chardet module (or its own subclassed version thereof). I would suspect that the Modify Epub plugin uses that to determine (through multiple confidence-based checks) the original encoding.
|
02-27-2023, 12:18 PM | #39 | |
Bibliophagist
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
I do have some curiosity about how InDesign was used to generate this dog's breakfast but no quick and easy solutions come to mind. |
|
02-27-2023, 02:19 PM | #40 | |
Addict
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
Quote:
Code:
read -r -d '' bash_script <<'EOF' if [[ $(chardetect "{}" --minimal) == "Windows-1252" ]] ; then iconv -f cp1252 -t utf-8 "{}" -o "{}.utf8" mv "{}.utf8" "{}" fi EOF find . -type f -execdir bash -c "$bash_script" ';' Last edited by isarl; 02-27-2023 at 02:25 PM. Reason: fix typo in find options |
|
02-27-2023, 11:54 PM | #41 | |
Enthusiast
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
|
Quote:
|
|
02-27-2023, 11:56 PM | #42 |
Evangelist
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
|
|
02-27-2023, 11:58 PM | #43 |
Evangelist
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
|
|
02-28-2023, 12:28 AM | #44 |
Addict
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
GnuWin32 seems to provide iconv for Windows. As for the logic to test whether a file is CP1252 before trying to convert it, I leave that to you. I have already done 95% of the work for you.
|
02-28-2023, 04:53 AM | #45 | |
Enthusiast
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
|
Quote:
Is there an easy way that even a child can handle? |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Symbols | sky_kama | Library Management | 13 | 01-18-2013 05:10 AM |
Damnable Symbols | jgawne | Sigil | 33 | 03-07-2012 09:16 AM |
Any symbols not to use? | roguefan99 | Kobo Reader | 1 | 07-24-2010 10:21 AM |
How to convert a Word document into a Kindle document? | PS Kindle | Kindle Developer's Corner | 2 | 12-08-2009 08:40 PM |