|
|
#1 |
|
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Jan 2020
Device: none
|
Converting FB2->EPUB, TOC is only half the size
I have a huge book in FB2 - 49521 Kb, 2262 chapters.
Tried to read it as is in a Viewer - TOC shows up to chapter 1043. So I converted the book into EPUB with built-in converter. Looking at the contents of EPUB I see all chapters correctly converted (several chapters per `index_split_N.xhtml` with last file `index_split_1205.xhtml`. And inside it, I see the final chapter up to the "Fin". So far so good... But the issue is with `toc.ncx` - it stops in the middle. Code:
<navPoint id="uYHYxzqrSIyP8DRv35YFT8D" playOrder="1050">
<navLabel>
<text>Chapter 1043. 12 people</text>
</navLabel>
<content src="index_split_1049.xhtml"/>
</navPoint>
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 46,253
Karma: 29630732
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Jan 2020
Device: none
|
I am not sure I can send you the original file - it is almost 50 MEGA bytes.
But in the conversion log, I see Code:
Convert book 1 of 1 (Martial World)
Conversion options changed from defaults:
read_metadata_from_opf: 'C:\\Users\\George\\AppData\\Local\\Temp\\calibre-gd7a55ez\\ayrh675q.opf'
verbose: 2
cover: 'C:\\Users\\George\\AppData\\Local\\Temp\\calibre-gd7a55ez\\4_qwrr4c.jpeg'
output_profile: 'kindle_pw3'
Resolved conversion options
calibre version: 9.8.0
{'add_alt_text_to_img': False,
'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': 'original',
'chapter': "//*[((name()='h1' or name()='h2') and re:test(., "
"'\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', "
"'i')) or @class = 'chapter']",
'chapter_mark': 'pagebreak',
'comments': None,
'cover': 'C:\\Users\\George\\AppData\\Local\\Temp\\calibre-gd7a55ez\\4_qwrr4c.jpeg',
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'embed_all_fonts': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'epub_inline_toc': False,
'epub_max_image_size': 'none',
'epub_toc_at_end': False,
'epub_version': '2',
'expand_css': False,
'extra_css': None,
'extract_to': None,
'filter_css': '',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000020BAD5B9A90>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': '//h:h1',
'level2_toc': '//h:h2',
'level3_toc': '//h:h3',
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_fb2_toc': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.KindlePaperWhite3Output object at 0x0000020BAD5C4590>,
'page_breaks_before': "//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': 'C:\\Users\\George\\AppData\\Local\\Temp\\calibre-gd7a55ez\\ayrh675q.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'search_replace': '[]',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'transform_css_rules': '[]',
'transform_html_rules': '[]',
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: FB2 Input running
on C:\Users\George\AppData\Local\Temp\calibre-gd7a55ez\vsq8fb_v.fb2
Parsing XML...
Converting XML to HTML...
Parsing all content...
Parsing index.xhtml ...
Forcing index.xhtml into XHTML namespace
Parsing inline-styles.css ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 1050 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2 items of level: div_1
Found 94103 items of level: div_2
Found 2 items of level: div_3
Found 65783 items of level: p_1
Ignoring level div_3
div_1 left margin stats: Counter({'': 1})
div_1 right margin stats: Counter({'': 1})
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
p_1 left margin stats: Counter({'0': 65783})
p_1 right margin stats: Counter({'0': 65783})
Cleaning up manifest...
Trimming unused files from manifest...
Trimming 'poster.jpg.jpg' from manifest
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
Splitting on page-break at id=calibre_toc_1
Adjusted split point to ancestor
Splitting on page-break at id=calibre_toc_2
....
Splitting on page-break at id=calibre_toc_1049
Splitting on page-break at id=calibre_toc_1050
Looking for large trees in index.xhtml...
Found large tree #1049
Splitting...
Split point: {http://www.w3.org/1999/xhtml}h4 /*/*[2]/*[54]/*[55851]
Split tree still too large: 16114.1943359375 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}h4 /*/*[2]/*[54]/*[30555]
Split tree still too large: 8571.6767578125 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}h4 /*/*[2]/*[54]/*[16166]
Split tree still too large: 4346.8828125 KB
.....
Splitting...
Split point: {http://www.w3.org/1999/xhtml}h4 /*/*[2]/*/*[722]
Committed sub-tree #154 (211.3525390625 KB)
Committed sub-tree #155 (207.0234375 KB)
Split tree still too large: 339.84765625 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}h4 /*/*[2]/*/*[555]
Committed sub-tree #156 (171.1171875 KB)
Committed sub-tree #157 (169.1904296875 KB)
Split into 1206 parts
Removing anchor from TOC href: index_split_000.xhtml#calibre_toc_1
Removing anchor from TOC href: index_split_001.xhtml#calibre_toc_2
.....
Removing anchor from TOC href: index_split_1048.xhtml#calibre_toc_1049
Removing anchor from TOC href: index_split_1049.xhtml#calibre_toc_1050
EPUB output written to C:\Users\George\AppData\Local\Temp\calibre-gd7a55ez\e40a3m_e.epub
The final file is smaller (just 14Mb), but still too large for sharing. It contains complete text from the original. Only problem is with table of contents. |
|
|
|
|
|
#4 |
|
JCL Punch-Card Collector
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 606560
Join Date: Jun 2014
Location: Antarctica
Device: Aggressively Device Independent
|
Whiteowl, does the FB2 file include page breaks or page-break markers attempting to preserve the pagination of an original printed edition? Even one error or inconsistency somewhere can blow up read of the table of contents — especially if it's in the table of contents file itself.
And is this large FB2 file a single publication, or is it a merging of several publications? |
|
|
|
|
|
#5 | ||
|
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Jan 2020
Device: none
|
Quote:
Quote:
If you look at log, there is a message in preparation phase: Code:
Auto generated TOC with 1050 entries. |
||
|
|
|
|
|
#6 |
|
JCL Punch-Card Collector
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 606560
Join Date: Jun 2014
Location: Antarctica
Device: Aggressively Device Independent
|
Then the next step is to not generate a table of contents, which may be getting caught up in a word-size limitation (it's suspiciously close to 1024...), probably best through Preferences | Conversion | Output options | Epub output | uncheck both "Insert inline table of contents" and "put inserted table of contents at end of book".
If that works, that's a strong indication that it's a "too many entries" problem, and you can then try reducing the number of automatically-detected chapters, which is a bit trickier. The first step is to go back and check "Insert inline table of contents," which is easy enough. The next step would be to change to only inserting <h1> into the table of contents, which requires hand-editing a parameter in Preferences | Conversion | Common options | Structure detection, in the very first box ("Detect chapters at"); remove exactly this from that little entry window, which should be easily visible without scrolling: or name()='h2' (make absolutely certain you don't remove any parentheses except the () pair right after name ) For this purpose, I'd leave the "insert page breaks after" a little farther down that screen alone, since your concern is the table of contents and not the internal file structure. Then try again... I've run into similar problems with technical books and cookbooks; autoconverted technical books with lots of data tables sometimes end up making each table a new "chapter" (because the headings, if external to and preceding the table, are often the same "weight" as a section heading), and cookbooks often end up making a new "chapter" for each recipe. In turn, that creates unmanageable (and misleading) automatic tables of contents. Last edited by Jaws; 05-15-2026 at 09:15 PM. |
|
|
|
|
|
#7 | ||
|
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Jan 2020
Device: none
|
Quote:
Quote:
Code:
Splitting on page-break at id=calibre_toc_1050
Looking for large trees in index.xhtml...
Found large tree #1049
Splitting...
Split point: {http://www.w3.org/1999/xhtml}h4 /*/*[2]/*[54]/*[55851]
Split tree still too large: 16114.1943359375 KB
|
||
|
|
|
|
|
#8 |
|
JCL Punch-Card Collector
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 606560
Join Date: Jun 2014
Location: Antarctica
Device: Aggressively Device Independent
|
The last thing to try -- and I always hesitate to suggest this sort of thing because I don't know how familiar you are with older page-layout programs, the probable source of your book -- is to do the conversion normally, open the editor, and look at the actual HTML in those 1000+ "chapters", right after the head section. I suspect that too many of them start with h1 (or maybe h2) when they're really subsections that should be h3 or h4 -- this used to happen a lot in files generated by Quark; what you've described makes me think they're all h1, because Quark's internal coding makes it really hard for other programs to discern which headings are which.
If you see that every one of those files has an h1 (or h2) at the top -- or, worse, has h1 and/or h2 in the middle of the file (a search should find that out for you) -- you're in for a fun time manually editing heading levels, then doing an epub-to-epub conversion to force a truly fresh table of contents generation. The real problem is buried in the source material somewhere, and is most probably that the fb2 version just has one kind of heading in it. Doing a systematic edit of the headings in the epub may just expose other errors -- fb2 can be very fragile on conversion. |
|
|
|
|
|
#9 | |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,780
Karma: 64144480
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
The tool allows various methods to /update /replace /repair a TOC. |
|
|
|
|
|
|
#10 | |
|
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Jan 2020
Device: none
|
Quote:
The chapter in FB2 starts with word `section` and section title is `title`: Code:
last words of previous chapter</p></section><section><title><p>Глава 1050. Вы, двое, проваливайте!</p></title><p>text from next chapter .... last words of previous chapter</p></section><section><title><p>Глава 1051. Начинается великая война</p></title><p>text from next chapter I still suspect that the problem is withing memory management for reading a huge file. I am not sure how converter is reading the FB2, but if it goes by "read all into memory, and parse inside the memory buffer" - that would explain a lot. |
|
|
|
|
|
|
#11 |
|
JCL Punch-Card Collector
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 606560
Join Date: Jun 2014
Location: Antarctica
Device: Aggressively Device Independent
|
I was unclear: I meant to look at the epub conversion, not at the fb2 original. Decoding the fb2 implementation of XML is for someone with a lot more patience and spare time than I have...
Since my understanding of your comments is that the TOC seems to be "broken" only in the epub, that's where we need to look for signs of breakage. As a potential, kludgy solution: One way that print-published multivolume references "solved" this was by having only a "summary" table of contents in the main volume, with the detailed table of contents for volume I in volume I. Thus, you might want to try:
|
|
|
|
|
|
#12 | |
|
Enthusiast
![]() Posts: 30
Karma: 10
Join Date: Jan 2020
Device: none
|
Quote:
I have even better one - use a different convertor. A standalone application (where are several of them on github and sourceforge). Works much faster and more correct than Calibre's one. The only issue it is not executed from the Calibre's UI. |
|
|
|
|
|
|
#13 | |
|
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,681
Karma: 33011292
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Converting a large number of e-books from fb2 to epub. | vicar82 | Conversion | 8 | 10-30-2024 09:49 AM |
| lost TOC converting epub -> fb2 | josepcla | Conversion | 3 | 04-08-2014 04:11 AM |
| Hyperlinks are gone when converting from epub to fb2 | SVN | Conversion | 4 | 05-30-2011 02:22 AM |
| Help : converting from EPUB to FB2 : spacing between words is frequently missing | q345 | Calibre | 1 | 09-18-2010 11:41 AM |
| TOC not created converting from FB2 | regul8or | Calibre | 1 | 08-03-2009 12:26 PM |