![]() |
#796 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Sep 2010
Device: iPad
|
They are OCRed PDFs. It took over 2 hours but it finally worked - the progress bar didn't appear to move but I think it's all set. Thanks!
|
![]() |
![]() |
![]() |
#797 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
If they are OCR'ed PDF files it will be interesting to see if the converted file is any better than using the PDF directly.
|
![]() |
![]() |
Advert | |
|
![]() |
#798 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I have never seen a big pdf with decent OCR text. If it has OCR text and images of pages, the OCR has never been proofed and is lousy. If the OCR has been proofed, the page images have always been either removed, or cut down to show only the true graphics.
|
![]() |
![]() |
![]() |
#799 |
Memento Mori
![]() Posts: 36
Karma: 10
Join Date: Apr 2007
Device: eClicto, iPad WiFi, Kindle 3 WiFi
|
Ok, I'll admit I haven't read through this topic, but... Can we have calibre to output a correct <date> element in the content.opf in ePUB output? The specs expect it to be 4-digit year, then optional 2-digit month and then optional 2-digit day (AKA YYYY[-MM[-DD]]).
So, instead of something like: Code:
<dc:date>2010-09-15T22:00:00+00:00</dc:date> Code:
<dc:date>2010-09-15</dc:date> The other thing is that epubcheck doesn't like the <br> tag just before the </body> that calibre adds. Maybe this could be fixed as well? Putting it in the <p> tag solves the problem. |
![]() |
![]() |
![]() |
#800 | |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 74,009
Karma: 315160596
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
I think that there's some disagreement about whether epubcheck is correct in this interpretation of the specification.
Quote:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#801 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,380
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
epubcheck is wrong and calibre doesn't add <br> tags, if you have a br tag in your output, it was there in your input
|
![]() |
![]() |
![]() |
#802 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I'm no expert, but I've seen Kovid quote the exact specs, and AFAICT, the above is wrong. Calibre is setting the date in accordance with the EPUB specs. Can you quote the specs that you think support your comment?
|
![]() |
![]() |
![]() |
#803 | ||
Memento Mori
![]() Posts: 36
Karma: 10
Join Date: Apr 2007
Device: eClicto, iPad WiFi, Kindle 3 WiFi
|
Quote:
As for the date: http://www.idpf.org/doc_library/epub...m#Section2.2.7 Quote:
|
||
![]() |
![]() |
![]() |
#804 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Sep 2010
Device: iPad
|
So I can read the epubs on my iPad but the OCR is gone as well as my bookmarks. Any simple way to make sure these are preserved from the PDF?
|
![]() |
![]() |
![]() |
#805 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Sep 2010
Device: none
|
Thanks for the excellent software, Goyal.
I am using Calibre to convert .MOBI books to .EPUB in order to extract the internal HTML, as there is no direct option for HTML as output. The problem is, I am trying to avoid splitting, as I would like to get a single HTML file. I checked "do not split on page breaks" on Preferences>Output Options, and even configured "Split files larger than" to 20480KB to avoid splitting, with no success. What would you recommend? Thanks, Eduardo |
![]() |
![]() |
![]() |
#806 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,380
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#807 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Sep 2010
Device: none
|
Goyal, you are my digital hero.
Thanks! Eduardo |
![]() |
![]() |
![]() |
#808 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,796
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But the problem is that even is ePubCheck is wrong in this case, if the ePubCheck spits out an error, some publishers won't accept the ePub as they thing it is incorrect.
|
![]() |
![]() |
![]() |
#809 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Apr 2009
Device: Sony
|
I'm having a problem converting my pdf files to epub. Each time I try it fails. I'm not a programmer so I don't understand the code that is returned. I have version 7.2
I apologize if I have posted this in the wrong thread. ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Dalton's Awakening) Convert book 1 of 1 (Dalton's Awakening) Resolved conversion options calibre version: 0.7.20 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': 'c:\\users\\lori\\appdata\\local\\temp\\calibre_0. 7.20_tmp_n3o853\\calibre_0.7.20_dkzjq_.jpeg', 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_split_on_page_breaks': False, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' , 'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)*<br>\\s*)?\\d+<br>\\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)*<br>\\s*)?.*?<br>\\s*\\d+))(?=<br>)' , 'html_unwrap_factor': 0.40000000000000002, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x046D85F0>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'new_pdf_engine': False, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_images': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x046D8990>, 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\users\\lori\\appdata\\local\\temp\\calibre_0. 7.20_tmp_n3o853\\calibre_0.7.20_lby4pu.opf', 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'unwrap_factor': 0.0, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: PDF Input running on C:\Users\Lori\Calibre\Carol Lynne\Dalton's Awakening (2724)\Dalton's Awakening - Carol Lynne.pdf Converting file to html... pdftohtml log: Retrieving document metadata... Error (42): Unknown filter 'Crypt' Generating manifest... Rendering manifest... Parsing all content... Parsing index.html ... Failed to parse content in index.html Traceback (most recent call last): File "site-packages\calibre\ebooks\oeb\reader.py", line 159, in _manifest_prune_invalid File "site-packages\calibre\ebooks\oeb\base.py", line 1060, in fget File "site-packages\calibre\ebooks\oeb\base.py", line 789, in _parse_xhtml File "site-packages\calibre\ebooks\conversion\preprocess.py", line 431, in __call__ UnboundLocalError: local variable 'length' referenced before assignment Spine item 'id1' not found Python function terminated unexpectedly Spine is empty (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 99, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 841, in run File "site-packages\calibre\ebooks\conversion\plumber.py", line 968, in create_oebbook File "site-packages\calibre\ebooks\oeb\reader.py", line 72, in __call__ File "site-packages\calibre\ebooks\oeb\reader.py", line 594, in _all_from_opf File "site-packages\calibre\ebooks\oeb\reader.py", line 289, in _spine_from_opf calibre.ebooks.oeb.base.OEBError: Spine is empty |
![]() |
![]() |
![]() |
#810 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 657
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
There's a problem with 7.20, either go back to 7.19 or wait for upated 7.21
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Old Thread] Epub Output: Line Height | greenapple | Conversion | 20 | 01-27-2013 09:27 AM |
EPUB output justification | toki08 | Calibre | 10 | 01-08-2011 04:14 PM |
Calibre epub output details and Nook | squidward | Calibre | 6 | 11-24-2010 03:21 PM |
epub output metadata | troymc | Calibre | 5 | 05-22-2010 12:23 AM |
Problem with epub output in Cybook Gen3 | fjf | Calibre | 3 | 02-03-2010 02:23 AM |