|
|
#1 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Dec 2025
Device: Calibre on Linux PC
|
Python exceptions when PDF to TXT conversion
When I tried convert PDF document to text, no output was created and I got error:
Code:
$ ebook-convert V2510.pdf V2510_ebook.txt --enable-heuristics
Conversion options changed from defaults:
enable_heuristics: True
1% Converting input to HTML...
InputFormatPlugin: PDF Input running
on /home/OTHER/data/dos/diskd/UctoFH_doklady/faDosle.fh/t-mobile/Vyuctovani_55052935_2510.pdf
pdftohtml log:
Page-1
Page-2
Page-3
Traceback (most recent call last):
File "/usr/bin/ebook-convert", line 21, in <module>
sys.exit(main())
~~~~^^
File "/usr/lib64/calibre/calibre/ebooks/conversion/cli.py", line 429, in main
plumber.run()
~~~~~~~~~~~^^
File "/usr/lib64/calibre/calibre/ebooks/conversion/plumber.py", line 1089, in run
self.oeb = self.input_plugin(stream, self.opts,
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
self.input_fmt, self.log,
^^^^^^^^^^^^^^^^^^^^^^^^^
accelerators, tdir)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/calibre/calibre/customize/conversion.py", line 242, in __call__
ret = self.convert(stream, options, file_ext,
log, accelerators)
File "/usr/lib64/calibre/calibre/ebooks/conversion/plugins/pdf_input.py", line 66, in convert
PDFDocument(xml, self.opts, self.log)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/calibre/calibre/ebooks/pdf/reflow.py", line 1476, in __init__
self.find_header_footer()
~~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/lib64/calibre/calibre/ebooks/pdf/reflow.py", line 1877, in find_header_footer
if self.pages[head_page].texts \
~~~~~~~~~~^^^^^^^^^^^
IndexError: list index out of range
Same result is without --enable-heuristics option. Calibre was installed with dependencies: calibre-8.0.1-5.fc42.x86_64 optipng-7.9.1-1.fc42.x86_64 podofo-0.10.5-1.fc42.x86_64 python3-lxml-html-clean-0.4.2-1.fc42.noarch python3-pyqt6-webengine-6.9.0-0.1.fc42.x86_64 python3-xxhash-3.6.0-1.fc42.x86_64 qt6-qtimageformats-6.9.3-1.fc42.x86_64 qt6-qtwebview-6.9.3-1.fc42.x86_64 libwebp-tools-1.5.0-2.fc42.x86_64 mathjax3-3.2.2-7.fc42.noarch python3-apsw-3.47.2.0-2.fc42.x86_64 python3-css-parser-1.0.10-3.fc42.noarch python3-html2text-2024.2.26-5.fc42.noarch python3-html5-parser-0.4.12-5.fc42.x86_64 python3-mechanize-0.4.10-4.fc42.noarch python3-pychm-0.8.6-16.fc42.x86_64 python3-regex-2024.11.6-1.fc42.x86_64 chmlib-0.40-45.fc42.x86_64 Know someone where problem could be? (I'm quite new to Calibre/ebook-convert) Thanks, Franta Hanzlik |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,690
Karma: 28549304
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You are using a very old and unsupported version of calibre. Uninstall it, and install the official binary from https://calibre-ebook.com/download_linux if the error still occurs with that, then follow the instructions in: https://www.mobileread.com/forums/sh...d.php?t=186697
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Dec 2025
Device: Calibre on Linux PC
|
Hello Kovid, thanks for Your help!
I think I solved my problem by using pdftotext in the meantime. But I installed the latest version of calibre as you recommended. Unfortunately, the error still seems to occur (and the output is not produced): Code:
$ ebook-convert 2041.pdf 2041.txt 1% Converting input to HTML... InputFormatPlugin: PDF Input running on /home/OTHER/tmp/2041.pdf pdftohtml log: Page-1 Page-2 Traceback (most recent call last): File "runpy.py", line 198, in _run_module_as_main File "runpy.py", line 88, in _run_code File "site.py", line 47, in <module> File "site.py", line 43, in main File "calibre/ebooks/conversion/cli.py", line 427, in main File "calibre/ebooks/conversion/plumber.py", line 1088, in run File "calibre/customize/conversion.py", line 242, in __call__ File "calibre/ebooks/conversion/plugins/pdf_input.py", line 66, in convert File "calibre/ebooks/pdf/reflow.py", line 1474, in __init__ File "calibre/ebooks/pdf/reflow.py", line 1885, in find_header_footer IndexError: list index out of range If it would help, I can send you a PDF file where the error appears via e-mail (it is about 108kB in size). $ ebook-convert --version ebook-convert (calibre 8.16.2) Created by: Kovid Goyal <kovid@kovidgoyal.net> |
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,690
Karma: 28549304
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes, I will need the PDF to be able to help you further.
|
|
|
|
|
|
#5 |
|
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Dec 2025
Device: Calibre on Linux PC
|
Problematic file 2041.pdf should be uploaded.
Again, thank for Your effort! Fr. Hanzlik |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,690
Karma: 28549304
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Conversion from pdf to txt yields (near) empty file | jchwenger | Conversion | 3 | 07-07-2024 11:37 AM |
| "Python function terminated unexpectedly" on ePub to PDF conversion | zunga | Conversion | 10 | 03-17-2013 08:18 PM |
| python based pdf conversion tools | KevinH | Conversion | 1 | 01-23-2011 12:39 PM |
| PDF to TXT conversion | alkr | Calibre | 0 | 10-02-2009 05:34 AM |
| conversion - pdf to txt? | fishcube | Sony Reader | 1 | 10-24-2007 03:02 PM |