|
|
#1 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 222
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Python script to convert PDF to EPUB
Hello,
As an alternative to Calibre which relies on Poppler to convert PDF to HTML, here's a script in Python that I use to convert PDF articles into EPUB to use on my e-reader. It relies on modules pymupdf (to convert the PDF to XHTML) and pypandoc (from XHTML to EPUB), and expects the filename to be 1) in the form "authors#title.pdf" to fill the EPUB's metadata and 2) in the clipboard before running the script. Cheers, Code:
import os
#to read filename from clipboard
import pyperclip
#pip install pymupdf
import pymupdf
#If pandoc.exe not available and/or don't need it, use pypandoc-binary instead
#pip install pypandoc
import pypandoc
#======== grab input filename from clipboard
item = pyperclip.paste()
#expects author#title.pdf
if not item or ".pdf" not in item or "#" not in item:
print ("Expects authors#title.pdf in clipboard")
exit()
else:
print(f"Handling {item}")
#======== grab author(s) and title
INPUTFILE = item
x = [x.strip() for x in item.split('#')]
AUTHOR = x[0]
#ignore file extension
TITLE, _ = os.path.splitext(x[1])
TEMPFILE = f"{AUTHOR}#{TITLE}.xhtml"
EXTENSION = ".epub"
OUTPUTFILE = f"{AUTHOR}#{TITLE}{EXTENSION}"
#======== Open PDF file
pdf_document = pymupdf.open(INPUTFILE)
#======== Iterate through pages
html_content = ""
for page_num in range(len(pdf_document)):
page = pdf_document.load_page(page_num)
#https://pymupdf.readthedocs.io/en/latest/page.html#Page.get_text
html_content += page.get_text("xhtml",flags=pymupdf.TEXTFLAGS_XHTML)
#======== turn XHTML into EPUB, including metadata
#if CLI pandoc already on disk
os.environ.setdefault('PYPANDOC_PANDOC', r'c:\pandoc.exe')
extra_args=['--epub-title-page=false','--metadata',f'author={AUTHOR}','--metadata',f'title={TITLE}']
output = pypandoc.convert_text(html_content, format='html',to='epub',outputfile=OUTPUTFILE, extra_args=extra_args)
#remove XHTML file
os.remove(TEMPFILE)
Last edited by Shohreh; 12-09-2024 at 09:01 AM. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Converting ePub Python coding books to PDF outputs Python code far too small | lysakowski | Conversion | 9 | 04-30-2023 06:44 AM |
| Python script for epub -> kepub | Akk | Kobo Reader | 2 | 08-24-2015 10:32 PM |
| koboish: Script that convert your epub to a kepub.epub with the correct bookcover !! | the_m | Kobo Reader | 4 | 01-24-2013 11:01 PM |
| ePUB + PDF creation script | Trouhel | ePub | 30 | 07-28-2012 10:02 AM |
| Python script to add scribble to pdf - teach me how to make into a plugin | cadmus | Plugins | 2 | 02-21-2012 03:03 PM |