Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 02-03-2024, 07:16 AM   #1
Joseph The Grave
Member
Joseph The Grave began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
Remove rhombus question marks from newly created epub

I created an ebook (epub) from a Project Gutenberg book that was in html format after importing it into calibre but the result has rhombus question marks. Otherwise it came out pretty good. How do I remove these rhombus marks from the newly created epub? If I have to redo it that's okay but I just want an epub free of these marks.

https://i.imgur.com/EafI07i.jpg
Joseph The Grave is offline   Reply With Quote
Old 02-03-2024, 11:06 AM   #2
Martinoptic
Bibliolater
Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.Martinoptic ought to be getting tired of karma fortunes by now.
 
Martinoptic's Avatar
 
Posts: 4,896
Karma: 2600000
Join Date: Dec 2021
Location: England
Device: none
I think they may be unknown glyphs in the font you are using to read the epub. You could try embedding a font within your epub that contains the glyph. Chareink6 might do the job amongst others, and it’s available on MR.

https://www.mobileread.com/forums/sh...ight=Chareink6

Or maybe your .opf file in the epub is not encoding to utf8?

"Modify epub" in Calibre will let you do this painlessly.

The first line in mine is: <?xml version="1.0" encoding="utf-8"?>

Last edited by Martinoptic; 02-03-2024 at 11:39 AM. Reason: More information
Martinoptic is offline   Reply With Quote
Advert
Old 02-03-2024, 11:36 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,877
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I concur. Those are missing Glyph symbols.

Either your current font does not have those
or
the book used the wrong character set during conversion (from text) somewhere along the line.
theducks is offline   Reply With Quote
Old 02-03-2024, 12:32 PM   #4
Joseph The Grave
Member
Joseph The Grave began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
Update: I did discover a fix. Instead of just saving the html directly from Firefox's Save Page As, I used an FF extension I've been using for years called SingleFile (which saves the complete page). Once I imported that html/zip file into Calibre then converted it to epub, it created a perfect epub ebook of the entire novel with no strange rhombas marks. Now if only I could figure out how to create OCR html files from those PDF ebooks I got off internet archive to convert into epubs...
Joseph The Grave is offline   Reply With Quote
Old 02-03-2024, 12:47 PM   #5
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 483
Karma: 2267928
Join Date: Nov 2015
Device: none
Probably the file is encoded in Windows-1252, but the declaration says it is UTF-8.
Sarmat89 is offline   Reply With Quote
Advert
Old 02-03-2024, 01:05 PM   #6
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,127
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by Joseph The Grave View Post
Now if only I could figure out how to create OCR html files from those PDF ebooks I got off internet archive to convert into epubs...
Maybe try this... https://www.mobileread.com/forums/sh...3&postcount=23

I've been able to create epubs relatively quickly using that workflow.
Maybe read the whole thread for context.
Karellen is offline   Reply With Quote
Old 02-03-2024, 03:27 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,329
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Joseph The Grave View Post
I created an ebook (epub) from a Project Gutenberg book that was in html format after importing it into calibre but the result has rhombus question marks. Otherwise it came out pretty good. How do I remove these rhombus marks from the newly created epub? If I have to redo it that's okay but I just want an epub free of these marks.

https://i.imgur.com/EafI07i.jpg
Please attach your image so it can be seen. imgur is over capacity,
JSWolf is offline   Reply With Quote
Old 02-03-2024, 04:22 PM   #8
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,917
Karma: 145678910
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by JSWolf View Post
Please attach your image so it can be seen. imgur is over capacity,
@Joseph The Grave: What does the epub version of the book look like when you download it?

As for the image, here you go, Jon:
Attached Thumbnails
Click image for larger version

Name:	EafI07i.jpeg
Views:	46
Size:	695.4 KB
ID:	206185  

Last edited by DNSB; 02-03-2024 at 04:28 PM.
DNSB is online now   Reply With Quote
Old 02-03-2024, 04:33 PM   #9
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,299
Karma: 85874895
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
I've been downloading Gutenberg since before I used Calibre, maybe about 25 years. No issues. It must be wrong character set or something. But I never use the HTML files. Started with plain text
Quoth is offline   Reply With Quote
Old 02-03-2024, 04:50 PM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,329
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DNSB View Post
@Joseph The Grave: What does the epub version of the book look like when you download it?

As for the image, here you go, Jon:
Thank you for the image.

What is the character that's not displaying? Is it supposed to be an em-dash or en-dash? If it is such a charcter, it's quite easy to load it into the calibre editor and perform a search/replace to fix this.

Last edited by JSWolf; 02-03-2024 at 04:53 PM.
JSWolf is offline   Reply With Quote
Old 02-03-2024, 06:34 PM   #11
Joseph The Grave
Member
Joseph The Grave began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
I appreciate additional posters here continuing to try to help but did you guys miss my update above where I found a fix? Maybe it's not posting right and can't be seen for some reason. It was the way I was saving the ebook off the Gutenberg site that was the problem. Once I used my SingleFile save web page extension in my FF browser to save and import it into calibre, it worked fine. I now have a perfectly formatted epub ebook with working Table of Contents.

As for the OCR PDF to HTML thing I mentioned above, I think I had unrealistic expectations in what that would entail. I thought you could just use OCR software with no need of a scanner to convert a PDF ebook into HTML to create in Calibre an epub ebook, but I see now that it's much much more involved and intricate than that. Guess I'll just have to stick with those massive 400MB PDF ebooks I got off IA and perhaps find a way to reduce the size.

Last edited by Joseph The Grave; 02-03-2024 at 06:39 PM.
Joseph The Grave is offline   Reply With Quote
Old 02-03-2024, 07:16 PM   #12
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,917
Karma: 145678910
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
The books from IA tend to be scanned images of a page with a very poorly OCRred text page to allow for searching. Unless you are heavily into masochism, I would avoid trying to do anything with them.

And again, since there is an epub version of the book on Gutenberg, why not simply download that instead of saving the web page version and converting to epub? This link should point to Before the Dawn (epub) which I find much easier than messing with conversions.
DNSB is online now   Reply With Quote
Old 02-03-2024, 07:24 PM   #13
Joseph The Grave
Member
Joseph The Grave began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Jul 2022
Device: PC
Quote:
Originally Posted by DNSB View Post
The books from IA tend to be scanned images of a page with a very poorly OCRred text page to allow for searching. Unless you are heavily into masochism, I would avoid trying to do anything with them.

And again, since there is an epub version of the book on Gutenberg, why not simply download that instead of saving the web page version and converting to epub? This link should point to Before the Dawn (epub) which I find much easier than messing with conversions.
Yeah, I'm giving up on the OCR PDF to HTML thing. Too much tedium and pain. I would like to find a way to substantially reduce the size of those PDFs though without too much loss of quality but when I attempt it I notice the new file size is barely smaller than the original so it hardly seems worth it.

I have problems finding stuff on PG with their search engine. If I type in "John Taine" his name won't even come up though I know he has books on there. I also have trouble finding the epub versions there but maybe I didn't look hard enough. It's otherwise been a valuable learning experience doing it the hard way with creating my own epubs so I don't regret it too much. Thanks for the help, everyone. Peace out.
Joseph The Grave is offline   Reply With Quote
Old 02-03-2024, 07:39 PM   #14
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,917
Karma: 145678910
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Joseph The Grave View Post
I have problems finding stuff on PG with their search engine. If I type in "John Taine" his name won't even come up though I know he has books on there. I also have trouble finding the epub versions there but maybe I didn't look hard enough. It's otherwise been a valuable learning experience doing it the hard way with creating my own epubs so I don't regret it too much. Thanks for the help, everyone. Peace out.
And I thought Gutenberg Canada's search engine was crud. OTOH, Canada is a life+50 country and since John Taine passed away in 1960, his works would still be under copyright in life+70 countries. Likely why his works show up on Gutenberg Canada and not on Gutenberg USA or Gutenberg Australia.
DNSB is online now   Reply With Quote
Old 02-05-2024, 03:16 PM   #15
jmurphy
Connoisseur
jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.jmurphy ought to be getting tired of karma fortunes by now.
 
Posts: 92
Karma: 1133068
Join Date: Sep 2007
Device: ipaq
Quote:
Originally Posted by DNSB View Post
OTOH, Canada is a life+50 country
Canada went to life+70 back in 2022.

https://parl.ca/DocumentViewer/en/44...9/royal-assent

--jmurphy
jmurphy is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Any quick way to remove all the marks on PDF documents JeffC Onyx Boox 1 01-15-2022 12:10 AM
"generate ToC from all headings" doesn't save newly created entries davidhcje Calibre 0 05-30-2021 01:19 AM
epub file to mobi, but some caracters appear as question marks abatsukh Conversion 13 01-28-2021 11:20 PM
How can I remove these strange ? marks in my books? Beerman Calibre 3 06-26-2011 09:10 PM
How to completely remove enraging highlight marks? simonp Amazon Kindle 3 01-29-2011 03:26 PM


All times are GMT -4. The time now is 03:29 AM.


MobileRead.com is a privately owned, operated and funded community.