I was using calibre's awesome CLI to make this script to generate PDF files out of a few dictionaries I needed (unfortunately I need to use the PDF file format to import them in Zotero, a popular cross-platform annotation tool).
Code:
p="$HOME/Downloads/tmp/"
mkdir $p; cd $p
#rm -rf *
list="
https://archive.org/details/anelementarylat01lewigoog/page/58/mode/2up?ref=ol&view=theater
https://archive.org/details/latindictionaryf00andr?ref=ol&view=theater
https://archive.org/details/intermediategree00lidd?ref=ol&view=theater
https://archive.org/details/homericdictionar00auteiala?ref=ol&view=theater
"
for i in $list; do
echo $i | grep -o -P "https://[^)]*" | grep -o -P "/details/[^?\/]*" | sed -e 's/^\/details\///' | xargs -I {} wget "https://archive.org/download/{}/{}.epub" # | grep -o -P "\(https://.*?\)"
done
for i in ./*.epub; do #w x for i in "*.epub"; do
ebook-convert $i $i.pdf #--txt-output-formatting plain #--txt-output-encoding utf-32 # --embed-all-fonts
#pandoc $i -f epub -t pdf -s -o "$i.pdf"
done
You may try this out yourself if you wish: I think it only downloads 3/4 of the epub files, but that's just because one of the epub files is 'temporarily unavailable' on archive.org.
Unfortunately what happens in the end is that the PDF doesn't have proper greek characters, but seems to reproduce the UTF-8 format, part of the XHTML of the starting epub!
It's strange as Calibre's GUI previews everything fine...
Any ideas?
Thank you so much in advance!