View Single Post
Old 02-27-2023, 09:00 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by hdaackda View Post
Issue is, I can only view 1 page at a time.

Is there any way I can save the entire book in PDF or ePub etc easily? (i tried the inspect/view source but can't find any pdf etc.. each page is shown individually after clicking next)
If you wanted, you could even use something like AutoHotKey:

You'd be able to create a basic scraper which:
  • Downloads the image.
    • Or screenshots the page.
  • Saves + numbers the images sequentially.
  • Clicks through next page.

(Of course, this would still require you to sit there and press a few buttons.)

If you wanted to go full automation, then team up with a Computer Science student/programmer, and you could come up with something much smarter:
  • Detect page change
  • Check a location to make sure image was fully loaded.
    • Could be as simple as... 1/4th down + 1/4th middle of the screen:
    • If background color = X, image not loaded yet.
    • If color = white, image fully loaded.
  • Zoom in + Scrape highest resolution image possible.
    • Or pull/save image data right from the network.
  • Automatically number/split pages.
  • Automate the PDF bundling + OCR at the end.
  • [...]

Each publisher's website is probably going to require a customized solution.... but if it shows up on your screen, you can capture it.

I used something similar many years ago to grab a few images off of HathiTrust's website, because they had old public domain scans that weren't available elsewhere, but they were hidden behind their annoying website.

Last edited by Tex2002ans; 02-27-2023 at 09:14 PM.
Tex2002ans is offline   Reply With Quote