![]() |
How do you get rid of all images in an ePub file downloaded from Archive.org?
When I download ePub version of a book on Archive.org, I’m seeing not pure text but text mixed with images of the book pages. Is there a way to get just pure text version? Or is there a way to delete all images in an ePub file on Sigil?
|
Normally Sigil and/or Calibre questions would be asked in their respective forum.
However, to delete images simply highlight the image(s) on the left side of the screen (Bookbrowser in Sigil) and hit the delete key. You will probably also want to delete the code which references the image from your html file(s). That can be done with a regex: search: <img.*?/> replace: nothing/blank |
Quote:
You can also do what is suggested in Calibre Editor as well as Sigil. |
I don't think I'd bother, myself. If you delete all those images of text, you'll probably be missing some content. My recommendation would be to delete the epub in question and find an alternative version.
|
Quote:
|
Or do your own OCR if it's really really important PD content not available as cheap ebook.
|
Quote:
This allows you to see all the images in the EPUB + little preview thumbnails (so you could tell if it's useless or an actual important image). You could then Right-Click each image and "Delete From Book". |
Quote:
You can also multi-select using ctrl+click or shift+click, then the del key, to delete all of them at once. |
I've only gotten books from archive.org a couple of times. In both cases, what was displayed was the scanned image with the text layer hidden. I suspected that this was an artifact from making the scan to PDF searchable since the text files were fine lessons in how not to do OCR.
|
Quote:
Quote:
"Archive.org ePub" All Archive.org's text formats are auto-generated OCR from the PDFs, no cleanup, no nothing. In Post #11, I even uploaded an EPUB straight out of Finereader 12... and you can see how much cleaner (and more readable) it is compared to the auto-generated junk. This is why I always recommend: PDF from Archive.org, then convert to text on your own if needed. |
| All times are GMT -4. The time now is 08:07 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.