Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 02-27-2023, 04:39 AM   #31
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by DiapDealer View Post
Please do not post copyrighted ebooks to MobileRead. There are scrambling plugins that can be used if the structure of entire copyrighted epubs needs to be shared.
This book is distributed free of charge by the publisher. From there I downloaded it.
KIE18 is offline   Reply With Quote
Old 02-27-2023, 04:53 AM   #32
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by Sarmat89 View Post
Just extract the book as a single HTML file, then convert it to CP1252, then open it as CP1251 and save as UTF-8. There are many editors that can to that, including Notepad++ and VSCode.
How to extract a book in one HTML file? If you rename it to zip, then there are a lot of separate files.
KIE18 is offline   Reply With Quote
Advert
Old 02-27-2023, 06:32 AM   #33
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KIE18 View Post
This book is distributed free of charge by the publisher. From there I downloaded it.
"Free of charge" does not equal "no copyright." Unless you are the rights holder (like the publisher you got it from), you are not free to redistribute copyrighted material.

Also... we're pretty far from this being a Sigil issue at this point. Time to wrap it up.

Last edited by DiapDealer; 02-27-2023 at 06:46 AM.
DiapDealer is offline   Reply With Quote
Old 02-27-2023, 09:16 AM   #34
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by DiapDealer View Post
"Free of charge" does not equal "no copyright." Unless you are the rights holder (like the publisher you got it from), you are not free to redistribute copyrighted material.

Also... we're pretty far from this being a Sigil issue at this point. Time to wrap it up.
I know this is not a sigil issue. I was hoping that someone here would help me solve the problem with the book.
KIE18 is offline   Reply With Quote
Old 02-27-2023, 10:45 AM   #35
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,027
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Why are we still getting convoluted ways to convert to UTF-8 when I've already given a very simple solution?
JSWolf is offline   Reply With Quote
Advert
Old 02-27-2023, 10:49 AM   #36
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Quote:
Originally Posted by KIE18 View Post
I know this is not a sigil issue. I was hoping that someone here would help me solve the problem with the book.
If you are on Linux then the iconv utility is likely already installed for you and can perform character encoding conversions. An outline of the process would probably be something like:

1. Explode/unzip your ebook into a temporary directory.

2. For each file in the temporary directory which has the wrong encoding, convert it to UTF-8. NOTE: if you are not careful about which encodings you use, iconv can introduce more encoding errors. Example follows. Note that the curly quotes I use are UTF-8 encoded characters; after running the echo command, “example” is UTF-8 encoded.

Code:
# This example demonstrates an encoding error! Be careful not to do this by mistake.
$ echo '“Hello, world!”' >example
$ iconv -f cp1252 -t utf-8 example
“Hello, world!â€iconv: illegal input sequence at position 18
Here is some bash-like shell code that might work, if you have chardetect available (on my system, this executable is provided by the python-chardet package):

Code:
for file in $(find . -type f) ; do
    if [[ $(chardetect "$file" --minimal) == "Windows-1252" ]] ; then
        iconv -f cp1252 -t utf-8 "$f" -o "$f.utf8"
        mv "$f.utf8" "$f"
    fi
done
3. Zip your book back up.

Character encodings can be a pain in the butt. I hope this helps.
isarl is offline   Reply With Quote
Old 02-27-2023, 10:53 AM   #37
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Quote:
Originally Posted by JSWolf View Post
Why are we still getting convoluted ways to convert to UTF-8 when I've already given a very simple solution?
I haven't tested the Modify Epub plugin for this purpose myself, and I don't have any convenient ebooks to test it with, but I do see the option you mention. The hover text says that it “removes any existing charset tags on html pages and encodes in UTF-8”. OP's charset tags already say that the content is UTF-8, so the question in my mind which I would want to test out before relying on this feature is, how does Modify Epub detect the origin encoding? If it takes the original charset tag as gospel then it will get this wrong. If it does get it right somehow, then your solution will be much more convenient than the loop I gave above.
isarl is offline   Reply With Quote
Old 02-27-2023, 11:37 AM   #38
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I think calibre's python includes the chardet module (or its own subclassed version thereof). I would suspect that the Modify Epub plugin uses that to determine (through multiple confidence-based checks) the original encoding.
DiapDealer is offline   Reply With Quote
Old 02-27-2023, 12:18 PM   #39
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,498
Karma: 145557716
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by JSWolf View Post
Why are we still getting convoluted ways to convert to UTF-8 when I've already given a very simple solution?
Jon, did you actually try this? The solution you gave works if the epub is encoded in utf-8 but is not declared to be in utf-8. It will not convert code page 1251 to utf-8 which is what I think the OP wants to do. Especially not since the original epub has encoding="utf-8" declared.

I do have some curiosity about how InDesign was used to generate this dog's breakfast but no quick and easy solutions come to mind.
DNSB is offline   Reply With Quote
Old 02-27-2023, 02:19 PM   #40
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Quote:
Originally Posted by isarl View Post
Code:
for file in $(find . -type f) ; do
    if [[ $(chardetect "$file" --minimal) == "Windows-1252" ]] ; then
        iconv -f cp1252 -t utf-8 "$f" -o "$f.utf8"
        mv "$f.utf8" "$f"
    fi
done
The more I look at this, the more I worry that it might break if your filenames have a space character in them or something. It might be better to use find's -execdir option instead. Maybe something like:

Code:
read -r -d '' bash_script <<'EOF'
if [[ $(chardetect "{}" --minimal) == "Windows-1252" ]] ; then
    iconv -f cp1252 -t utf-8 "{}" -o "{}.utf8"
    mv "{}.utf8" "{}"
fi
EOF
find . -type f -execdir bash -c "$bash_script" ';'

Last edited by isarl; 02-27-2023 at 02:25 PM. Reason: fix typo in find options
isarl is offline   Reply With Quote
Old 02-27-2023, 11:54 PM   #41
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by isarl View Post
If you are on Linux then the iconv utility is likely already installed for you and can perform character encoding conversions. An outline of the process would probably be something like:

1. Explode/unzip your ebook into a temporary directory.

2. For each file in the temporary directory which has the wrong encoding, convert it to UTF-8. NOTE: if you are not careful about which encodings you use, iconv can introduce more encoding errors. Example follows. Note that the curly quotes I use are UTF-8 encoded characters; after running the echo command, “example” is UTF-8 encoded.

Code:
# This example demonstrates an encoding error! Be careful not to do this by mistake.
$ echo '“Hello, world!”' >example
$ iconv -f cp1252 -t utf-8 example
“Hello, world!â€iconv: illegal input sequence at position 18
Here is some bash-like shell code that might work, if you have chardetect available (on my system, this executable is provided by the python-chardet package):

Code:
for file in $(find . -type f) ; do
    if [[ $(chardetect "$file" --minimal) == "Windows-1252" ]] ; then
        iconv -f cp1252 -t utf-8 "$f" -o "$f.utf8"
        mv "$f.utf8" "$f"
    fi
done
3. Zip your book back up.

Character encodings can be a pain in the butt. I hope this helps.
I am using windows 10.
KIE18 is offline   Reply With Quote
Old 02-27-2023, 11:56 PM   #42
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by KIE18 View Post
How to extract a book in one HTML file?
You can merge them first.
Sarmat89 is offline   Reply With Quote
Old 02-27-2023, 11:58 PM   #43
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by DNSB View Post
I do have some curiosity about how InDesign was used to generate this dog's breakfast
It must be an old Type-1 converted font with wrong glyph names.
Sarmat89 is offline   Reply With Quote
Old 02-28-2023, 12:28 AM   #44
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Quote:
Originally Posted by KIE18 View Post
I am using windows 10.
GnuWin32 seems to provide iconv for Windows. As for the logic to test whether a file is CP1252 before trying to convert it, I leave that to you. I have already done 95% of the work for you.
isarl is offline   Reply With Quote
Old 02-28-2023, 04:53 AM   #45
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by isarl View Post
GnuWin32 seems to provide iconv for Windows. As for the logic to test whether a file is CP1252 before trying to convert it, I leave that to you. I have already done 95% of the work for you.
I am not a programmer and do not work in IT. Working with code is space for me. I don't understand your recommendations.
Is there an easy way that even a child can handle?
KIE18 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Symbols sky_kama Library Management 13 01-18-2013 05:10 AM
Damnable Symbols jgawne Sigil 33 03-07-2012 09:16 AM
Any symbols not to use? roguefan99 Kobo Reader 1 07-24-2010 10:21 AM
How to convert a Word document into a Kindle document? PS Kindle Kindle Developer's Corner 2 12-08-2009 08:40 PM


All times are GMT -4. The time now is 02:49 AM.


MobileRead.com is a privately owned, operated and funded community.