![]() |
#31 |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2023
Device: none
|
|
![]() |
![]() |
![]() |
#32 |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2023
Device: none
|
How to extract a book in one HTML file? If you rename it to zip, then there are a lot of separate files.
|
![]() |
![]() |
Advert | |
|
![]() |
#33 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,602
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Also... we're pretty far from this being a Sigil issue at this point. Time to wrap it up. Last edited by DiapDealer; 02-27-2023 at 06:46 AM. |
|
![]() |
![]() |
![]() |
#34 | |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2023
Device: none
|
Quote:
|
|
![]() |
![]() |
![]() |
#35 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,796
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Why are we still getting convoluted ways to convert to UTF-8 when I've already given a very simple solution?
|
![]() |
![]() |
Advert | |
|
![]() |
#36 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
Quote:
1. Explode/unzip your ebook into a temporary directory. 2. For each file in the temporary directory which has the wrong encoding, convert it to UTF-8. NOTE: if you are not careful about which encodings you use, iconv can introduce more encoding errors. Example follows. Note that the curly quotes I use are UTF-8 encoded characters; after running the echo command, “example” is UTF-8 encoded. Code:
# This example demonstrates an encoding error! Be careful not to do this by mistake. $ echo '“Hello, world!”' >example $ iconv -f cp1252 -t utf-8 example “Hello, world!â€iconv: illegal input sequence at position 18 Code:
for file in $(find . -type f) ; do if [[ $(chardetect "$file" --minimal) == "Windows-1252" ]] ; then iconv -f cp1252 -t utf-8 "$f" -o "$f.utf8" mv "$f.utf8" "$f" fi done Character encodings can be a pain in the butt. I hope this helps. |
|
![]() |
![]() |
![]() |
#37 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
I haven't tested the Modify Epub plugin for this purpose myself, and I don't have any convenient ebooks to test it with, but I do see the option you mention. The hover text says that it “removes any existing charset tags on html pages and encodes in UTF-8”. OP's charset tags already say that the content is UTF-8, so the question in my mind which I would want to test out before relying on this feature is, how does Modify Epub detect the origin encoding? If it takes the original charset tag as gospel then it will get this wrong. If it does get it right somehow, then your solution will be much more convenient than the loop I gave above.
|
![]() |
![]() |
![]() |
#38 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,602
Karma: 204624552
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I think calibre's python includes the chardet module (or its own subclassed version thereof). I would suspect that the Modify Epub plugin uses that to determine (through multiple confidence-based checks) the original encoding.
|
![]() |
![]() |
![]() |
#39 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,355
Karma: 169098492
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
I do have some curiosity about how InDesign was used to generate this dog's breakfast but no quick and easy solutions come to mind. |
|
![]() |
![]() |
![]() |
#40 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
Quote:
Code:
read -r -d '' bash_script <<'EOF' if [[ $(chardetect "{}" --minimal) == "Windows-1252" ]] ; then iconv -f cp1252 -t utf-8 "{}" -o "{}.utf8" mv "{}.utf8" "{}" fi EOF find . -type f -execdir bash -c "$bash_script" ';' Last edited by isarl; 02-27-2023 at 02:25 PM. Reason: fix typo in find options |
|
![]() |
![]() |
![]() |
#41 | |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2023
Device: none
|
Quote:
|
|
![]() |
![]() |
![]() |
#42 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 518
Karma: 2268308
Join Date: Nov 2015
Device: none
|
|
![]() |
![]() |
![]() |
#43 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 518
Karma: 2268308
Join Date: Nov 2015
Device: none
|
|
![]() |
![]() |
![]() |
#44 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
|
GnuWin32 seems to provide iconv for Windows. As for the logic to test whether a file is CP1252 before trying to convert it, I leave that to you. I have already done 95% of the work for you.
|
![]() |
![]() |
![]() |
#45 | |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Feb 2023
Device: none
|
Quote:
Is there an easy way that even a child can handle? |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Symbols | sky_kama | Library Management | 13 | 01-18-2013 05:10 AM |
Damnable Symbols | jgawne | Sigil | 33 | 03-07-2012 09:16 AM |
Any symbols not to use? | roguefan99 | Kobo Reader | 1 | 07-24-2010 10:21 AM |
How to convert a Word document into a Kindle document? | PS Kindle | Kindle Developer's Corner | 2 | 12-08-2009 08:40 PM |