|
|||||||
![]() |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Connoisseur
![]() Posts: 99
Karma: 10
Join Date: Oct 2017
Device: iPhone
|
How do you get rid of all images in an ePub file downloaded from Archive.org?
When I download ePub version of a book on Archive.org, I’m seeing not pure text but text mixed with images of the book pages. Is there a way to get just pure text version? Or is there a way to delete all images in an ePub file on Sigil?
|
|
|
|
|
|
#2 |
|
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,361
Karma: 13688495
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2 & Air/Surface Pro/Kindle PW
|
Normally Sigil and/or Calibre questions would be asked in their respective forum.
However, to delete images simply highlight the image(s) on the left side of the screen (Bookbrowser in Sigil) and hit the delete key. You will probably also want to delete the code which references the image from your html file(s). That can be done with a regex: search: <img.*?/> replace: nothing/blank |
|
|
|
| Advert | |
|
|
|
|
#3 | |
|
the rook, bossing Never.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,019
Karma: 36004966
Join Date: Jun 2017
Location: Ireland
Device: Both Kinds: epub based makes and Kindle
|
Quote:
You can also do what is suggested in Calibre Editor as well as Sigil. |
|
|
|
|
|
|
#4 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,711
Karma: 169429004
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I don't think I'd bother, myself. If you delete all those images of text, you'll probably be missing some content. My recommendation would be to delete the epub in question and find an alternative version.
|
|
|
|
|
|
#5 |
|
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 64,324
Karma: 104254653
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2O, PRS-650, PRS-T1, nook STR, iPad 4, iPhone SE 2020, PW3
|
I agree that it's best to buy the eBook if a retail version exists and if not, go with the pBook version or forget it and read something else.
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
the rook, bossing Never.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,019
Karma: 36004966
Join Date: Jun 2017
Location: Ireland
Device: Both Kinds: epub based makes and Kindle
|
Or do your own OCR if it's really really important PD content not available as cheap ebook.
|
|
|
|
|
|
#7 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,983
Karma: 9092545
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
This allows you to see all the images in the EPUB + little preview thumbnails (so you could tell if it's useless or an actual important image). You could then Right-Click each image and "Delete From Book". |
|
|
|
|
|
|
#8 | |
|
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,361
Karma: 13688495
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2 & Air/Surface Pro/Kindle PW
|
Quote:
You can also multi-select using ctrl+click or shift+click, then the del key, to delete all of them at once. |
|
|
|
|
|
|
#9 |
|
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 16,954
Karma: 82522897
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Kobo Forma, Kobo Clara HD, Lenovo M8 FHD, iPad Pro, Tolino
|
I've only gotten books from archive.org a couple of times. In both cases, what was displayed was the scanned image with the text layer hidden. I suspected that this was an artifact from making the scan to PDF searchable since the text files were fine lessons in how not to do OCR.
|
|
|
|
|
|
#10 | ||
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,983
Karma: 9092545
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
"Archive.org ePub" All Archive.org's text formats are auto-generated OCR from the PDFs, no cleanup, no nothing. In Post #11, I even uploaded an EPUB straight out of Finereader 12... and you can see how much cleaner (and more readable) it is compared to the auto-generated junk. This is why I always recommend: PDF from Archive.org, then convert to text on your own if needed. |
||
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Archive.org ePub | Ghitulescu | ePub | 12 | 06-01-2021 03:55 AM |
| archive.org downloads | abrogard | Calibre | 2 | 08-11-2018 07:08 PM |
| Archive.org | crutledge | General Discussions | 129 | 08-28-2015 07:22 AM |
| How do I get rid of the thumbs.db file i my epub | wannabee | Workshop | 7 | 12-04-2011 10:16 PM |
| Accessing/ re-saving downloaded epub file from within epubreader | cklammer | EPUBReader | 3 | 12-06-2009 05:59 AM |