Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 02-25-2023, 11:16 AM   #16
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Quote:
Originally Posted by KIE18 View Post
Finally figured out how to upload a screenshot.
According to this screenshot, the characters display fine when rendered as HTML, and are only strange in the source files. Is that accurate? If so then I am very curious about the contents of the stylesheets and in particular the rules governing the class "CharOverride-1". Or perhaps the relevant rule is related to the class in the surrounding div, the classname written in Russian which I will not attempt to type out for myself.
isarl is offline   Reply With Quote
Old 02-25-2023, 12:15 PM   #17
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
Good point. Perhaps the CodeView font set in Sigil Preferences simply does not support those characters?
KevinH is offline   Reply With Quote
Advert
Old 02-25-2023, 01:04 PM   #18
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
That's very possible. Monospace fonts with full utf-8 coverage can be hard to come by.
DiapDealer is online now   Reply With Quote
Old 02-25-2023, 01:05 PM   #19
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Quote:
Originally Posted by KevinH View Post
Good point. Perhaps the CodeView font set in Sigil Preferences simply does not support those characters?
It seems to me that Sigil handles them fine, and that the document is doing wacky things with stylesheets to transcode characters. Sigil renders it fine as HTML and shows weird characters in source view. So to me it seems that the source file contains UTF-8 encoded characters originally encoded as CP1251 (or some other encoding), which is why we largely see character-for-character matches (compare punctuation, word lengths, etc., for example). When the style rules are applied these incorrectly-encoded characters get re-encoded to the correct encoding and therefore display correctly.

…that's what I think is going on, but without source files to check for myself, I'm just guessing from that singular Sigil screenshot.
isarl is offline   Reply With Quote
Old 02-25-2023, 02:30 PM   #20
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
Good point. Perhaps the CodeView font set in Sigil Preferences simply does not support those characters?
IMHO, the book most likely contains an embedded CP1251-compatible font. I've created a sample epub that exactly reproduces the OP's problems. (The text is from a Russian Public Domain novel and and the embedded CP1251 font is also in the Public Domain.)

The OP simply needs to convert the HTML files from CP1251 to UTF-8.
Attached Files
File Type: epub rus_cp1251.epub (39.2 KB, 65 views)
File Type: epub rus_utf8.epub (10.7 KB, 57 views)
Doitsu is offline   Reply With Quote
Advert
Old 02-25-2023, 11:29 PM   #21
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by Doitsu View Post
In that case the automatic codepage detection of the converter failed because the original files contained the following declaration:
Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
As User_Z has already suggested, you could use Notepad++ to fix the encoding:
  • Open the file with it (Notepad++ should detect the encoding as Cyrillic > Macintosh or Cyrillic > Win-1251. If it doesn't, select Win-1251.)
  • Press CTRL+A to select all text, then press CTRL+C to copy the text to the clipboard.
  • Close the original file.
  • Select File > New and press CTRL+V to paste the clipboard contents into the new file.
  • Save the new file under the same name as the original file.
This'll definitely work, however, since the epub contains multiple files, you might want to search the Russian Internet for batch converters with support for Cyrillic encodings that allow you to manually select in the input and output encodings.
It didn't work. Maybe the file is somehow protected?
Attached Thumbnails
Click image for larger version

Name:	2023-02-26 после.png
Views:	70
Size:	148.2 KB
ID:	199947   Click image for larger version

Name:	2023-02-26_07-19-59 после.png
Views:	62
Size:	142.4 KB
ID:	199948  
KIE18 is offline   Reply With Quote
Old 02-26-2023, 02:19 AM   #22
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
My recommendation was based on the text in screenshot in post 8. And it'll definitely work for such files, if you follow my instructions. Your screenshot shows this for Глава 1 (= Chapter 1):

Spoiler:


Code:
<p [...]>Ãëàâà 1</p>
The same string (with a Roman numeral one) can also be found in my sample CP1251 epub.

After conversion from CP1251 to UTF-8 it'll become:

Code:
<p [...]>Глава 1</p>
Since you can't follow simple instructions, this'll be my last reply in this thread.
Doitsu is offline   Reply With Quote
Old 02-26-2023, 04:23 AM   #23
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
I truly think Doitsu is correct. His example shows all of the characteristics of your screenshots and having an embedded Win-1251 font explains why Preview displays correctly while CodeView does not. His solution really should work for you.

Your epub is NOT protected. Just very poorly made without the proper utf-8 text encoding and utf-8 based fonts.
KevinH is offline   Reply With Quote
Old 02-26-2023, 06:36 AM   #24
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,037
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Doitsu View Post
Your book is encoded as WIN-1251, however, epubs need to be encoded as utf-8 or utf-16. You'll need to unzip the epub file and convert all html files to utf-8. The following article might help: Как перекодировать 1251 в UTF-8?
There is an easier solution. Use Calibre with the Modify ePub plugin. There is an option for utf-8. Once that's done, the ePub can be edited in Sigil.
JSWolf is offline   Reply With Quote
Old 02-26-2023, 06:49 AM   #25
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by Doitsu View Post
My recommendation was based on the text in screenshot in post 8. And it'll definitely work for such files, if you follow my instructions. Your screenshot shows this for Глава 1 (= Chapter 1):

Spoiler:


Code:
<p [...]>Ãëàâà 1</p>
The same string (with a Roman numeral one) can also be found in my sample CP1251 epub.

After conversion from CP1251 to UTF-8 it'll become:

Code:
<p [...]>Глава 1</p>
Since you can't follow simple instructions, this'll be my last reply in this thread.
I didn't understand which file to use. The original epub. Or unpacked and converted in the UTCast Express program. Or the one that I converted in the program, and then changed the zip to epub. These are simple instructions for you. For me it's not that easy. I am new to this. I will try to understand your instructions and try to do it.
KIE18 is offline   Reply With Quote
Old 02-26-2023, 06:53 AM   #26
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Here is the original file of the book.

Last edited by DiapDealer; 02-26-2023 at 10:46 AM.
KIE18 is offline   Reply With Quote
Old 02-26-2023, 07:59 AM   #27
KIE18
Enthusiast
KIE18 began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Feb 2023
Device: none
Quote:
Originally Posted by Doitsu View Post
The same string (with a Roman numeral one) can also be found in my sample CP1251 epub.
I tried on the example of your rus_cp1251.epub file. Opened it in notepad++. Then I chose Win-1251 encoding. I copied everything (CTRL + A, CTRL + C). Closed the document. Created a new one. I pasted what I copied. And this is what happened (screenshot). And it is saved in txt. But I saved with the same epub extension. But the file didn't open at all.
Did I do everything right? If not, please explain more clearly. Maybe there is a video on YouTube that clearly solves this problem.
Attached Thumbnails
Click image for larger version

Name:	2023-02-26_15-55-29.png
Views:	70
Size:	229.3 KB
ID:	199963  
KIE18 is offline   Reply With Quote
Old 02-26-2023, 10:48 AM   #28
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KIE18 View Post
Here is the original file of the book.
Moderator Notice
Please do not post copyrighted ebooks to MobileRead. There are scrambling plugins that can be used if the structure of entire copyrighted epubs needs to be shared.

Last edited by DiapDealer; 02-26-2023 at 10:50 AM.
DiapDealer is online now   Reply With Quote
Old 02-26-2023, 11:32 AM   #29
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
Just extract the book as a single HTML file, then convert it to CP1252, then open it as CP1251 and save as UTF-8. There are many editors that can to that, including Notepad++ and VSCode.
Sarmat89 is offline   Reply With Quote
Old 02-26-2023, 01:24 PM   #30
User_Z
Connoisseur
User_Z began at the beginning.
 
Posts: 95
Karma: 10
Join Date: Sep 2019
Location: Ukraine
Device: Computer, iPad
Online converters convert text fragments, but give different source encodings.

If you later paste this fragment into a Sigil, the text is read in both the Code View window and the Preview window. Only in the preview text is displayed without styles.

I think this is because the conversion results in different character codes.
For example:
before conversion, the character "ñ" has the code 241, which corresponds to the letter "c" in СР1251;
after conversion, the character "c" has the code 1089, which corresponds to the letter "c" in UTF-8.

That's just the conversion of СР1251 to UTF-8 gives a deplorable result.
Attached Thumbnails
Click image for larger version

Name:	decoder.png
Views:	58
Size:	35.4 KB
ID:	199973   Click image for larger version

Name:	uni-decoder.png
Views:	62
Size:	359.4 KB
ID:	199974   Click image for larger version

Name:	sig.png
Views:	56
Size:	83.5 KB
ID:	199975   Click image for larger version

Name:	far.png
Views:	58
Size:	16.9 KB
ID:	199976   Click image for larger version

Name:	deco2.png
Views:	56
Size:	35.5 KB
ID:	199977  
User_Z is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Symbols sky_kama Library Management 13 01-18-2013 05:10 AM
Damnable Symbols jgawne Sigil 33 03-07-2012 09:16 AM
Any symbols not to use? roguefan99 Kobo Reader 1 07-24-2010 10:21 AM
How to convert a Word document into a Kindle document? PS Kindle Kindle Developer's Corner 2 12-08-2009 08:40 PM


All times are GMT -4. The time now is 02:32 PM.


MobileRead.com is a privately owned, operated and funded community.