|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Zealot
![]() Posts: 122
Karma: 10
Join Date: Oct 2017
Device: iPhone
|
How do you get rid of all images in an ePub file downloaded from Archive.org?
When I download ePub version of a book on Archive.org, I’m seeing not pure text but text mixed with images of the book pages. Is there a way to get just pure text version? Or is there a way to delete all images in an ePub file on Sigil?
|
![]() |
![]() |
![]() |
#2 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,329
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Normally Sigil and/or Calibre questions would be asked in their respective forum.
However, to delete images simply highlight the image(s) on the left side of the screen (Bookbrowser in Sigil) and hit the delete key. You will probably also want to delete the code which references the image from your html file(s). That can be done with a regex: search: <img.*?/> replace: nothing/blank |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,814
Karma: 103895653
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Quote:
You can also do what is suggested in Calibre Editor as well as Sigil. |
|
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,505
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I don't think I'd bother, myself. If you delete all those images of text, you'll probably be missing some content. My recommendation would be to delete the epub in question and find an alternative version.
|
![]() |
![]() |
![]() |
#5 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,408
Karma: 145491800
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I agree that it's best to buy the eBook if a retail version exists and if not, go with the pBook version or forget it and read something else.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Still reading
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 13,814
Karma: 103895653
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper
|
Or do your own OCR if it's really really important PD content not available as cheap ebook.
|
![]() |
![]() |
![]() |
#7 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
This allows you to see all the images in the EPUB + little preview thumbnails (so you could tell if it's useless or an actual important image). You could then Right-Click each image and "Delete From Book". |
|
![]() |
![]() |
![]() |
#8 | |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,329
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
You can also multi-select using ctrl+click or shift+click, then the del key, to delete all of them at once. |
|
![]() |
![]() |
![]() |
#9 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,321
Karma: 168808723
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
I've only gotten books from archive.org a couple of times. In both cases, what was displayed was the scanned image with the text layer hidden. I suspected that this was an artifact from making the scan to PDF searchable since the text files were fine lessons in how not to do OCR.
|
![]() |
![]() |
![]() |
#10 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() Quote:
"Archive.org ePub" All Archive.org's text formats are auto-generated OCR from the PDFs, no cleanup, no nothing. In Post #11, I even uploaded an EPUB straight out of Finereader 12... and you can see how much cleaner (and more readable) it is compared to the auto-generated junk. This is why I always recommend: PDF from Archive.org, then convert to text on your own if needed. |
||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Archive.org ePub | Ghitulescu | ePub | 12 | 06-01-2021 02:55 AM |
archive.org downloads | abrogard | Calibre | 2 | 08-11-2018 06:08 PM |
Archive.org | crutledge | General Discussions | 129 | 08-28-2015 06:22 AM |
How do I get rid of the thumbs.db file i my epub | wannabee | Workshop | 7 | 12-04-2011 09:16 PM |
Accessing/ re-saving downloaded epub file from within epubreader | cklammer | EPUBReader | 3 | 12-06-2009 04:59 AM |