EPUB output - Page 54

arijon · 09-16-2010, 01:53 PM

They are OCRed PDFs. It took over 2 hours but it finally worked - the progress bar didn't appear to move but I think it's all set. Thanks!

itimpi · 09-16-2010, 02:33 PM

If they are OCR'ed PDF files it will be interesting to see if the converted file is any better than using the PDF directly.

Starson17 · 09-16-2010, 02:41 PM

Quote:

Originally Posted by itimpi

If they are OCR'ed PDF files it will be interesting to see if the converted file is any better than using the PDF directly.

I have never seen a big pdf with decent OCR text. If it has OCR text and images of pages, the OCR has never been proofed and is lousy. If the OCR has been proofed, the page images have always been either removed, or cut down to show only the true graphics.

moriakaice · 09-17-2010, 12:48 PM

Ok, I'll admit I haven't read through this topic, but... Can we have calibre to output a correct <date> element in the content.opf in ePUB output? The specs expect it to be 4-digit year, then optional 2-digit month and then optional 2-digit day (AKA YYYY[-MM[-DD]]).

So, instead of something like:

Code:

<dc:date>2010-09-15T22:00:00+00:00</dc:date>

Can we get:

Code:

<dc:date>2010-09-15</dc:date>

It should be even simpler and make calibre ePUBs conform the specs (at least some part of it).

The other thing is that epubcheck doesn't like the tag just before the </body> that calibre adds. Maybe this could be fixed as well? Putting it in the tag solves the problem.

pdurrant · 09-17-2010, 12:52 PM

I think that there's some disagreement about whether epubcheck is correct in this interpretation of the specification.

Quote:

Originally Posted by moriakaice

Ok, I'll admit I haven't read through this topic, but... Can we have calibre to output a correct <date> element in the content.opf in ePUB output? The specs expect it to be 4-digit year, then optional 2-digit month and then optional 2-digit day (AKA YYYY[-MM[-DD]]).

kovidgoyal · 09-17-2010, 01:00 PM

epubcheck is wrong and calibre doesn't add tags, if you have a br tag in your output, it was there in your input

Starson17 · 09-17-2010, 01:02 PM

Quote:

Originally Posted by moriakaice

Can we have calibre to output a correct <date> element in the content.opf in ePUB output? The specs expect it to be 4-digit year, then optional 2-digit month and then optional 2-digit day (AKA YYYY[-MM[-DD]]).

I'm no expert, but I've seen Kovid quote the exact specs, and AFAICT, the above is wrong. Calibre is setting the date in accordance with the EPUB specs. Can you quote the specs that you think support your comment?

moriakaice · 09-17-2010, 01:15 PM

Quote:

Originally Posted by kovidgoyal

epubcheck is wrong and calibre doesn't add tags, if you have a br tag in your output, it was there in your input

So, calibre adds tags when converting from RTF (as that was the source of my ePUB file)? That's strange!

As for the date:
http://www.idpf.org/doc_library/epub...m#Section2.2.7

Quote:

2.2.7: <date> </date>

Date of publication, in the format defined by "Date and Time Formats" at http://www.w3.org/TR/NOTE-datetime and by ISO 8601 on which it is based. In particular, dates without times are represented in the form YYYY[-MM[-DD]]: a required 4-digit year, an optional 2-digit month, and if the month is given, an optional 2-digit day of month.

The date element has one optional OPF event attribute. The set of values for event are not defined by this specification; possible values may include: creation, publication, and modification.

EDIT: Oh, I see my mistake about it supporting the full datetime. Sorry.

arijon · 09-17-2010, 02:21 PM

So I can read the epubs on my iPad but the OCR is gone as well as my bookmarks. Any simple way to make sure these are preserved from the PDF?

meketrefi · 09-19-2010, 03:33 PM

Thanks for the excellent software, Goyal.

I am using Calibre to convert .MOBI books to .EPUB in order to extract the internal HTML, as there is no direct option for HTML as output.

The problem is, I am trying to avoid splitting, as I would like to get a single HTML file. I checked "do not split on page breaks" on Preferences>Output Options, and even configured "Split files larger than" to 20480KB to avoid splitting, with no success.

What would you recommend?

Thanks,

Eduardo

kovidgoyal · 09-19-2010, 03:42 PM

Read http://calibre-ebook.com/user_manual...l#introduction

meketrefi · 09-19-2010, 04:00 PM

Goyal, you are my digital hero.

Thanks!

Eduardo

JSWolf · 09-19-2010, 05:31 PM

But the problem is that even is ePubCheck is wrong in this case, if the ePubCheck spits out an error, some publishers won't accept the ePub as they thing it is incorrect.

Metamorphosis · 09-30-2010, 02:13 PM

I'm having a problem converting my pdf files to epub. Each time I try it fails. I'm not a programmer so I don't understand the code that is returned. I have version 7.2

I apologize if I have posted this in the wrong thread.

ERROR: Conversion Error: Failed: Convert book 1 of 1 (Dalton's Awakening)

Convert book 1 of 1 (Dalton's Awakening)
Resolved conversion options
calibre version: 0.7.20
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|section|part\\s+', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': 'c:\\users\\lori\\appdata\\local\\temp\\calibre_0. 7.20_tmp_n3o853\\calibre_0.7.20_dkzjq_.jpeg',
'debug_pipeline': None,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'extra_css': None,
'extract_to': None,
'flow_size': 260,
'font_size_mapping': None,
'footer_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'header_regex': u'(?i)(?<=<hr>)((\\s*<a name=\\d+></a>((<img.+?>)* \\s*)?\\d+ \\s*.*?\\s*)|(\\s* <a name=\\d+></a>((<img.+?>)* \\s*)?.*? \\s*\\d+))(?= )' ,
'html_unwrap_factor': 0.40000000000000002,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x046D85F0>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'max_toc_links': 50,
'new_pdf_engine': False,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_images': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x046D8990>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preprocess_html': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'c:\\users\\lori\\appdata\\local\\temp\\calibre_0. 7.20_tmp_n3o853\\calibre_0.7.20_lby4pu.opf',
'remove_first_image': False,
'remove_footer': False,
'remove_header': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unwrap_factor': 0.0,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on C:\Users\Lori\Calibre\Carol Lynne\Dalton's Awakening (2724)\Dalton's Awakening - Carol Lynne.pdf
Converting file to html...
pdftohtml log:

Retrieving document metadata...
Error (42): Unknown filter 'Crypt'
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Failed to parse content in index.html
Traceback (most recent call last):
File "site-packages\calibre\ebooks\oeb\reader.py", line 159, in _manifest_prune_invalid
File "site-packages\calibre\ebooks\oeb\base.py", line 1060, in fget
File "site-packages\calibre\ebooks\oeb\base.py", line 789, in _parse_xhtml
File "site-packages\calibre\ebooks\conversion\preprocess.py", line 431, in __call__
UnboundLocalError: local variable 'length' referenced before assignment

Spine item 'id1' not found
Python function terminated unexpectedly
Spine is empty (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 99, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 841, in run
File "site-packages\calibre\ebooks\conversion\plumber.py", line 968, in create_oebbook
File "site-packages\calibre\ebooks\oeb\reader.py", line 72, in __call__
File "site-packages\calibre\ebooks\oeb\reader.py", line 594, in _all_from_opf
File "site-packages\calibre\ebooks\oeb\reader.py", line 289, in _spine_from_opf
calibre.ebooks.oeb.base.OEBError: Spine is empty

Perkin · 09-30-2010, 02:17 PM

There's a problem with 7.20, either go back to 7.19 or wait for upated 7.21

09-17-2010, 12:48 PM	#799
moriakaice Memento Mori Posts: 36 Karma: 10 Join Date: Apr 2007 Device: eClicto, iPad WiFi, Kindle 3 WiFi	Ok, I'll admit I haven't read through this topic, but... Can we have calibre to output a correct <date> element in the content.opf in ePUB output? The specs expect it to be 4-digit year, then optional 2-digit month and then optional 2-digit day (AKA YYYY[-MM[-DD]]). So, instead of something like: Code: <dc:date>2010-09-15T22:00:00+00:00</dc:date> Can we get: Code: <dc:date>2010-09-15</dc:date> It should be even simpler and make calibre ePUBs conform the specs (at least some part of it). The other thing is that epubcheck doesn't like the <br> tag just before the </body> that calibre adds. Maybe this could be fixed as well? Putting it in the <p> tag solves the problem.

09-17-2010, 01:00 PM	#801
kovidgoyal creator of calibre Posts: 46,043 Karma: 29579868 Join Date: Oct 2006 Location: Mumbai, India Device: Various	epubcheck is wrong and calibre doesn't add <br> tags, if you have a br tag in your output, it was there in your input

09-30-2010, 02:13 PM	#809
Metamorphosis Junior Member Posts: 4 Karma: 10 Join Date: Apr 2009 Device: Sony	I'm having a problem converting my pdf files to epub. Each time I try it fails. I'm not a programmer so I don't understand the code that is returned. I have version 7.2 I apologize if I have posted this in the wrong thread. ERROR: Conversion Error: <b>Failed</b>: Convert book 1 of 1 (Dalton's Awakening) Convert book 1 of 1 (Dalton's Awakening) Resolved conversion options calibre version: 0.7.20 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': 'c:\\users\\lori\\appdata\\local\\temp\\calibre_0. 7.20_tmp_n3o853\\calibre_0.7.20_dkzjq_.jpeg', 'debug_pipeline': None, 'disable_font_rescaling': False, 'dont_split_on_page_breaks': False, 'extra_css': None, 'extract_to': None, 'flow_size': 260, 'font_size_mapping': None, 'footer_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'header_regex': u'(?i)(?<=<hr>)((\\s<a name=\\d+></a>((<img.+?>)<br>\\s)?\\d+<br>\\s.?\\s)\|(\\s* <a name=\\d+></a>((<img.+?>)<br>\\s)?.?<br>\\s\\d+))(?=<br>)' , 'html_unwrap_factor': 0.40000000000000002, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x046D85F0>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'max_toc_links': 50, 'new_pdf_engine': False, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_images': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x046D8990>, 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 'prefer_metadata_cover': False, 'preprocess_html': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\users\\lori\\appdata\\local\\temp\\calibre_0. 7.20_tmp_n3o853\\calibre_0.7.20_lby4pu.opf', 'remove_first_image': False, 'remove_footer': False, 'remove_header': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'unwrap_factor': 0.0, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: PDF Input running on C:\Users\Lori\Calibre\Carol Lynne\Dalton's Awakening (2724)\Dalton's Awakening - Carol Lynne.pdf Converting file to html... pdftohtml log: Retrieving document metadata... Error (42): Unknown filter 'Crypt' Generating manifest... Rendering manifest... Parsing all content... Parsing index.html ... Failed to parse content in index.html Traceback (most recent call last): File "site-packages\calibre\ebooks\oeb\reader.py", line 159, in _manifest_prune_invalid File "site-packages\calibre\ebooks\oeb\base.py", line 1060, in fget File "site-packages\calibre\ebooks\oeb\base.py", line 789, in _parse_xhtml File "site-packages\calibre\ebooks\conversion\preprocess.py", line 431, in __call__ UnboundLocalError: local variable 'length' referenced before assignment Spine item 'id1' not found Python function terminated unexpectedly Spine is empty (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 99, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 24, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 841, in run File "site-packages\calibre\ebooks\conversion\plumber.py", line 968, in create_oebbook File "site-packages\calibre\ebooks\oeb\reader.py", line 72, in __call__ File "site-packages\calibre\ebooks\oeb\reader.py", line 594, in _all_from_opf File "site-packages\calibre\ebooks\oeb\reader.py", line 289, in _spine_from_opf calibre.ebooks.oeb.base.OEBError: Spine is empty

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Old Thread] Epub Output: Line Height	greenapple	Conversion	20	01-27-2013 10:27 AM
EPUB output justification	toki08	Calibre	10	01-08-2011 05:14 PM
Calibre epub output details and Nook	squidward	Calibre	6	11-24-2010 04:21 PM
epub output metadata	troymc	Calibre	5	05-22-2010 01:23 AM
Problem with epub output in Cybook Gen3	fjf	Calibre	3	02-03-2010 03:23 AM

09-16-2010, 01:53 PM	#796
arijon Junior Member Posts: 6 Karma: 10 Join Date: Sep 2010 Device: iPad	They are OCRed PDFs. It took over 2 hours but it finally worked - the progress bar didn't appear to move but I think it's all set. Thanks!

09-16-2010, 02:33 PM	#797
itimpi Wizard Posts: 4,553 Karma: 950151 Join Date: Nov 2008 Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)	If they are OCR'ed PDF files it will be interesting to see if the converted file is any better than using the PDF directly.

09-17-2010, 02:21 PM	#804
arijon Junior Member Posts: 6 Karma: 10 Join Date: Sep 2010 Device: iPad	So I can read the epubs on my iPad but the OCR is gone as well as my bookmarks. Any simple way to make sure these are preserved from the PDF?

09-19-2010, 03:33 PM	#805
meketrefi Junior Member Posts: 2 Karma: 10 Join Date: Sep 2010 Device: none	Thanks for the excellent software, Goyal. I am using Calibre to convert .MOBI books to .EPUB in order to extract the internal HTML, as there is no direct option for HTML as output. The problem is, I am trying to avoid splitting, as I would like to get a single HTML file. I checked "do not split on page breaks" on Preferences>Output Options, and even configured "Split files larger than" to 20480KB to avoid splitting, with no success. What would you recommend? Thanks, Eduardo

09-19-2010, 03:42 PM	#806
kovidgoyal creator of calibre Posts: 46,043 Karma: 29579868 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Read http://calibre-ebook.com/user_manual...l#introduction

09-19-2010, 04:00 PM	#807
meketrefi Junior Member Posts: 2 Karma: 10 Join Date: Sep 2010 Device: none	Goyal, you are my digital hero. Thanks! Eduardo

09-19-2010, 05:31 PM	#808
JSWolf Resident Curmudgeon Posts: 82,203 Karma: 150871427 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	But the problem is that even is ePubCheck is wrong in this case, if the ePubCheck spits out an error, some publishers won't accept the ePub as they thing it is incorrect.

09-30-2010, 02:17 PM	#810
Perkin Guru Posts: 657 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	There's a problem with 7.20, either go back to 7.19 or wait for upated 7.21

Advert

Advert