MobileRead Forums - View Single Post - Help with converting epub with greek characters to pdf

JollyRachele · 06-22-2025, 07:34 PM

I was using calibre's awesome CLI to make this script to generate PDF files out of a few dictionaries I needed (unfortunately I need to use the PDF file format to import them in Zotero, a popular cross-platform annotation tool).

Code:

p="$HOME/Downloads/tmp/"
mkdir $p; cd $p
#rm -rf *
list="
https://archive.org/details/anelementarylat01lewigoog/page/58/mode/2up?ref=ol&view=theater
https://archive.org/details/latindictionaryf00andr?ref=ol&view=theater
https://archive.org/details/intermediategree00lidd?ref=ol&view=theater
https://archive.org/details/homericdictionar00auteiala?ref=ol&view=theater
"


for i in $list; do
  echo $i | grep -o -P "https://[^)]*" | grep -o -P "/details/[^?\/]*" | sed -e 's/^\/details\///' | xargs -I {} wget "https://archive.org/download/{}/{}.epub" # | grep -o -P "\(https://.*?\)"
done

for i in ./*.epub; do #w x for i in "*.epub"; do
  ebook-convert $i $i.pdf #--txt-output-formatting plain #--txt-output-encoding utf-32 # --embed-all-fonts
  #pandoc $i -f epub -t pdf -s -o "$i.pdf"
done

You may try this out yourself if you wish: I think it only downloads 3/4 of the epub files, but that's just because one of the epub files is 'temporarily unavailable' on archive.org.

Unfortunately what happens in the end is that the PDF doesn't have proper greek characters, but seems to reproduce the UTF-8 format, part of the XHTML of the starting epub!

It's strange as Calibre's GUI previews everything fine...
Any ideas?

Thank you so much in advance!

06-22-2025, 07:34 PM	#1
JollyRachele Junior Member Posts: 7 Karma: 10 Join Date: Jun 2025 Device: None	Help with converting epub with greek characters to pdf I was using calibre's awesome CLI to make this script to generate PDF files out of a few dictionaries I needed (unfortunately I need to use the PDF file format to import them in Zotero, a popular cross-platform annotation tool). Code: p="$HOME/Downloads/tmp/" mkdir $p; cd $p #rm -rf * list=" https://archive.org/details/anelementarylat01lewigoog/page/58/mode/2up?ref=ol&view=theater https://archive.org/details/latindictionaryf00andr?ref=ol&view=theater https://archive.org/details/intermediategree00lidd?ref=ol&view=theater https://archive.org/details/homericdictionar00auteiala?ref=ol&view=theater " for i in $list; do echo $i \| grep -o -P "https://[^)]" \| grep -o -P "/details/[^?\/]" \| sed -e 's/^\/details\///' \| xargs -I {} wget "https://archive.org/download/{}/{}.epub" # \| grep -o -P "\(https://.?\)" done for i in ./.epub; do #w x for i in "*.epub"; do ebook-convert $i $i.pdf #--txt-output-formatting plain #--txt-output-encoding utf-32 # --embed-all-fonts #pandoc $i -f epub -t pdf -s -o "$i.pdf" done You may try this out yourself if you wish: I think it only downloads 3/4 of the epub files, but that's just because one of the epub files is 'temporarily unavailable' on archive.org. Unfortunately what happens in the end is that the PDF doesn't have proper greek characters, but seems to reproduce the UTF-8 format, part of the XHTML of the starting epub! It's strange as Calibre's GUI previews everything fine... Any ideas? Thank you so much in advance!