Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 12-07-2024, 09:19 PM   #1
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Python script to convert PDF to EPUB

Hello,

As an alternative to Calibre which relies on Poppler to convert PDF to HTML, here's a script in Python that I use to convert PDF articles into EPUB to use on my e-reader.

It relies on modules pymupdf (to convert the PDF to XHTML) and pypandoc (from XHTML to EPUB), and expects the filename to be 1) in the form "authors#title.pdf" to fill the EPUB's metadata and 2) in the clipboard before running the script.

Cheers,

Code:
import os
#to read filename from clipboard
import pyperclip
#pip install pymupdf
import pymupdf
#If pandoc.exe not available and/or don't need it, use pypandoc-binary instead
#pip install pypandoc
import pypandoc

#======== grab input filename from clipboard
item = pyperclip.paste()
#expects author#title.pdf
if not item or ".pdf" not in item or "#" not in item:
	print ("Expects authors#title.pdf in clipboard")
	exit()
else:
	print(f"Handling {item}")

#======== grab author(s) and title
INPUTFILE = item
x = [x.strip() for x in item.split('#')]
AUTHOR = x[0]
#ignore file extension
TITLE, _ = os.path.splitext(x[1])
TEMPFILE = f"{AUTHOR}#{TITLE}.xhtml"
EXTENSION = ".epub"
OUTPUTFILE = f"{AUTHOR}#{TITLE}{EXTENSION}"

#======== Open  PDF file
pdf_document = pymupdf.open(INPUTFILE)
#======== Iterate through pages
html_content = ""
for page_num in range(len(pdf_document)):
	page = pdf_document.load_page(page_num)
	#https://pymupdf.readthedocs.io/en/latest/page.html#Page.get_text
	html_content += page.get_text("xhtml",flags=pymupdf.TEXTFLAGS_XHTML)

#======== turn XHTML into EPUB, including metadata
#if CLI pandoc already on disk
os.environ.setdefault('PYPANDOC_PANDOC', r'c:\pandoc.exe')
extra_args=['--epub-title-page=false','--metadata',f'author={AUTHOR}','--metadata',f'title={TITLE}']
output = pypandoc.convert_text(html_content, format='html',to='epub',outputfile=OUTPUTFILE, extra_args=extra_args)
#remove XHTML file
os.remove(TEMPFILE)

Last edited by Shohreh; 12-09-2024 at 08:01 AM.
Shohreh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting ePub Python coding books to PDF outputs Python code far too small lysakowski Conversion 9 04-30-2023 05:44 AM
Python script for epub -> kepub Akk Kobo Reader 2 08-24-2015 09:32 PM
koboish: Script that convert your epub to a kepub.epub with the correct bookcover !! the_m Kobo Reader 4 01-24-2013 10:01 PM
ePUB + PDF creation script Trouhel ePub 30 07-28-2012 09:02 AM
Python script to add scribble to pdf - teach me how to make into a plugin cadmus Plugins 2 02-21-2012 02:03 PM


All times are GMT -4. The time now is 01:37 PM.


MobileRead.com is a privately owned, operated and funded community.