02-03-2024, 07:16 AM | #1 |
Member
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
|
Remove rhombus question marks from newly created epub
I created an ebook (epub) from a Project Gutenberg book that was in html format after importing it into calibre but the result has rhombus question marks. Otherwise it came out pretty good. How do I remove these rhombus marks from the newly created epub? If I have to redo it that's okay but I just want an epub free of these marks.
https://i.imgur.com/EafI07i.jpg |
02-03-2024, 11:06 AM | #2 |
Bibliolater
Posts: 4,896
Karma: 2600000
Join Date: Dec 2021
Location: England
Device: none
|
I think they may be unknown glyphs in the font you are using to read the epub. You could try embedding a font within your epub that contains the glyph. Chareink6 might do the job amongst others, and it’s available on MR.
https://www.mobileread.com/forums/sh...ight=Chareink6 Or maybe your .opf file in the epub is not encoding to utf8? "Modify epub" in Calibre will let you do this painlessly. The first line in mine is: <?xml version="1.0" encoding="utf-8"?> Last edited by Martinoptic; 02-03-2024 at 11:39 AM. Reason: More information |
Advert | |
|
02-03-2024, 11:36 AM | #3 |
Well trained by Cats
Posts: 29,877
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I concur. Those are missing Glyph symbols.
Either your current font does not have those or the book used the wrong character set during conversion (from text) somewhere along the line. |
02-03-2024, 12:32 PM | #4 |
Member
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
|
Update: I did discover a fix. Instead of just saving the html directly from Firefox's Save Page As, I used an FF extension I've been using for years called SingleFile (which saves the complete page). Once I imported that html/zip file into Calibre then converted it to epub, it created a perfect epub ebook of the entire novel with no strange rhombas marks. Now if only I could figure out how to create OCR html files from those PDF ebooks I got off internet archive to convert into epubs...
|
02-03-2024, 12:47 PM | #5 |
Evangelist
Posts: 483
Karma: 2267928
Join Date: Nov 2015
Device: none
|
Probably the file is encoded in Windows-1252, but the declaration says it is UTF-8.
|
Advert | |
|
02-03-2024, 01:05 PM | #6 | |
Wizard
Posts: 1,127
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
I've been able to create epubs relatively quickly using that workflow. Maybe read the whole thread for context. |
|
02-03-2024, 03:27 PM | #7 | |
Resident Curmudgeon
Posts: 74,329
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
02-03-2024, 04:22 PM | #8 | |
Bibliophagist
Posts: 35,917
Karma: 145678910
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
As for the image, here you go, Jon: Last edited by DNSB; 02-03-2024 at 04:28 PM. |
|
02-03-2024, 04:33 PM | #9 |
the rook, bossing Never.
Posts: 11,299
Karma: 85874895
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
I've been downloading Gutenberg since before I used Calibre, maybe about 25 years. No issues. It must be wrong character set or something. But I never use the HTML files. Started with plain text
|
02-03-2024, 04:50 PM | #10 | |
Resident Curmudgeon
Posts: 74,329
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
What is the character that's not displaying? Is it supposed to be an em-dash or en-dash? If it is such a charcter, it's quite easy to load it into the calibre editor and perform a search/replace to fix this. Last edited by JSWolf; 02-03-2024 at 04:53 PM. |
|
02-03-2024, 06:34 PM | #11 |
Member
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
|
I appreciate additional posters here continuing to try to help but did you guys miss my update above where I found a fix? Maybe it's not posting right and can't be seen for some reason. It was the way I was saving the ebook off the Gutenberg site that was the problem. Once I used my SingleFile save web page extension in my FF browser to save and import it into calibre, it worked fine. I now have a perfectly formatted epub ebook with working Table of Contents.
As for the OCR PDF to HTML thing I mentioned above, I think I had unrealistic expectations in what that would entail. I thought you could just use OCR software with no need of a scanner to convert a PDF ebook into HTML to create in Calibre an epub ebook, but I see now that it's much much more involved and intricate than that. Guess I'll just have to stick with those massive 400MB PDF ebooks I got off IA and perhaps find a way to reduce the size. Last edited by Joseph The Grave; 02-03-2024 at 06:39 PM. |
02-03-2024, 07:16 PM | #12 |
Bibliophagist
Posts: 35,917
Karma: 145678910
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
The books from IA tend to be scanned images of a page with a very poorly OCRred text page to allow for searching. Unless you are heavily into masochism, I would avoid trying to do anything with them.
And again, since there is an epub version of the book on Gutenberg, why not simply download that instead of saving the web page version and converting to epub? This link should point to Before the Dawn (epub) which I find much easier than messing with conversions. |
02-03-2024, 07:24 PM | #13 | |
Member
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
|
Quote:
I have problems finding stuff on PG with their search engine. If I type in "John Taine" his name won't even come up though I know he has books on there. I also have trouble finding the epub versions there but maybe I didn't look hard enough. It's otherwise been a valuable learning experience doing it the hard way with creating my own epubs so I don't regret it too much. Thanks for the help, everyone. Peace out. |
|
02-03-2024, 07:39 PM | #14 | |
Bibliophagist
Posts: 35,917
Karma: 145678910
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
|
|
02-05-2024, 03:16 PM | #15 |
Connoisseur
Posts: 92
Karma: 1133068
Join Date: Sep 2007
Device: ipaq
|
Canada went to life+70 back in 2022.
https://parl.ca/DocumentViewer/en/44...9/royal-assent --jmurphy |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Any quick way to remove all the marks on PDF documents | JeffC | Onyx Boox | 1 | 01-15-2022 12:10 AM |
"generate ToC from all headings" doesn't save newly created entries | davidhcje | Calibre | 0 | 05-30-2021 01:19 AM |
epub file to mobi, but some caracters appear as question marks | abatsukh | Conversion | 13 | 01-28-2021 11:20 PM |
How can I remove these strange ? marks in my books? | Beerman | Calibre | 3 | 06-26-2011 09:10 PM |
How to completely remove enraging highlight marks? | simonp | Amazon Kindle | 3 | 01-29-2011 03:26 PM |