Hello,
There's a public wiki on the web, ie. a bunch of web pages hyperlinked from index.html — that I'd like to turn into a PDF/EPUB file, with bookmarks as cherry on the pie.
Does someone know of a desktop solution, either Windows or Linux?
Some combination of wget and pandoc/Calibre?
Thank you.
---
Edit: One way is to get the list of URLs within the source page, order them if needed, and run a second script to loop through that list, download each page, and append it to a single HTML page before turning it into a PDF/EPUB file
Code:
import requests
from bs4 import BeautifulSoup
url = 'https://wiki.acme.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a')
for link in links:
print(link.get('href'))
--
Edit: Since Calibre seems unable to download a web page, wget is required… with some interim editing to 1) remove useless stuff and 2) include pictures
Code:
wget -c -O input.html https://wiki.acme.com/somepage.html
"C:\Program Files\Calibre2\ebook-convert.exe" "input.html" "output.pdf" --enable-heuristics --authors "My author" --title "My title"