02-20-2017, 06:07 PM | #1 | |
Member
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
|
Converting UTF-8 TXT to Epub
Example file: spaghetti_sparkle_2_-_galaonline.txt (don't judge me).
Quote:
Conversion log for .docx-to-epub: Code:
Convert book 1 of 1 (spaghetti sparkle 2) DeDRM v6.1.0: In __init__ DeDRM v6.1.0: In load_resources DeDRM v6.1.0: verdir C:\Users\N\AppData\Roaming\calibre\plugins\DeDRM\6.1.0 DeDRM v6.1.0: In initialize Conversion options changed from defaults: search_replace: '[]' output_profile: 'kindle_pw' sr2_search: None transform_css_rules: '[]' sr2_replace: None verbose: 2 filter_css: u'' sr3_search: None read_metadata_from_opf: u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\bc57t_.opf' sr1_search: None sr3_replace: None sr1_replace: None Resolved conversion options calibre version: 2.79.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'docx_inline_subsup': False, 'docx_no_cover': False, 'docx_no_pagebreaks_between_notes': False, 'dont_split_on_page_breaks': False, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': u'', 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000005483CF8>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.KindlePaperWhiteOutput object at 0x00000000054983C8>, 'page_breaks_before': u'/', 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\bc57t_.opf', 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': u'', 'search_replace': '[]', 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': None, 'sr1_search': None, 'sr2_replace': None, 'sr2_search': None, 'sr3_replace': None, 'sr3_search': None, 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'transform_css_rules': '[]', 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: DOCX Input running on C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\fgzz3b.docx Converting Word markup to HTML Converting styles to CSS Cleaning up redundant markup generated by Word Parsing all content... Parsing index.html ... Initial parse failed, using more forgiving parsers Parsing index.html as HTML Parsing docx.css ... Generating default TOC from spine... Merging user specified metadata... Detecting structure... Auto generated TOC with 0 entries. Flattening CSS and remapping font sizes... Source base font size is 10.50000pt Removing fake margins... Found 183 items of level: p_1 p_1 left margin stats: Counter({u'0': 183}) p_1 right margin stats: Counter({u'0': 183}) Cleaning up manifest... Trimming unused files from manifest... Creating EPUB Output... Splitting markup on page breaks and flow limits, if any... Looking for large trees in index.html... No large trees found Generating default cover This EPUB file has no Table of Contents. Creating a default TOC EPUB output written to C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\datz4i.epub Code:
Convert book 1 of 1 (spaghetti_sparkle_2_-_galaonline) DeDRM v6.1.0: In __init__ DeDRM v6.1.0: In load_resources DeDRM v6.1.0: verdir C:\Users\N\AppData\Roaming\calibre\plugins\DeDRM\6.1.0 DeDRM v6.1.0: In initialize Conversion options changed from defaults: sr3_replace: None sr1_replace: None search_replace: '[]' output_profile: 'kindle_pw' markdown_extensions: u'toc, tables, footnotes' sr2_search: None transform_css_rules: '[]' sr2_replace: None verbose: 2 filter_css: u'' sr3_search: None read_metadata_from_opf: u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\xy4rwu.opf' sr1_search: None Resolved conversion options calibre version: 2.79.0 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_split_on_page_breaks': False, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'expand_css': False, 'extra_css': None, 'extract_to': None, 'filter_css': u'', 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'formatting_type': u'auto', 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000005352D68>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markdown_extensions': u'toc, tables, footnotes', 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.KindlePaperWhiteOutput object at 0x0000000005364438>, 'page_breaks_before': u"//*[name()='h1' or name()='h2']", 'paragraph_type': u'auto', 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'preserve_spaces': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\xy4rwu.opf', 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': u'', 'search_replace': '[]', 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': None, 'sr1_search': None, 'sr2_replace': None, 'sr2_search': None, 'sr3_replace': None, 'sr3_search': None, 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'transform_css_rules': '[]', 'txt_in_remove_indents': False, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: TXT Input running on C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\prmnfp.txt Reading text from file... Detected input encoding as ISO-8859-2 with a confidence of 84.8260567914% Auto detected paragraph type as unformatted Auto detected formatting as heuristic Running text through basic conversion... Language not specified Creator not specified Building file list... Found files... HTMLFile:0:a:C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\index.html Normalizing filename cases Rewriting HTML links Parsing index.html ... ********* Heuristic processing HTML ********* There are 12 blank lines. 0.107142857143 percent blank minimum chapters required are: 1 found 0 pre-existing headings Total wordcount is: 1240, Average words per section is: 1240, Marked up 0 chapters Hard line breaks check returned True Median line length is 39, calculated with html format Fixing hyphenated content Looking for more split points based on punctuation, currently have 0 marked 1 section markers based on punctuation. - Fucking embarrassing</p> Formatting scene breaks Forcing index.html into XHTML namespace Merging user specified metadata... Detecting structure... Auto generated TOC with 0 entries. Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Removing fake margins... Found 112 items of level: p_1 p_1 left margin stats: Counter({u'0': 112}) p_1 right margin stats: Counter({u'0': 112}) Cleaning up manifest... Trimming unused files from manifest... Creating EPUB Output... Splitting markup on page breaks and flow limits, if any... Splitting on page-break at id=calibre_pb_0 Looking for large trees in index.html... No large trees found Split into 2 parts Generating default cover This EPUB file has no Table of Contents. Creating a default TOC EPUB output written to C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\bbtamz.epub Last edited by ij26; 02-20-2017 at 09:56 PM. Reason: Correcting quote. |
|
02-20-2017, 06:43 PM | #2 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Moderator Notice
Mobileread strives to be a family-friendly site. Please refrain from profanity (even if it's quoted text) |
02-20-2017, 06:57 PM | #3 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
If you explicitly mark the input character encoding as utf8 (Look & feel->Text->Input character encoding) in the conversion settings, the character is properly preserved when converting TXT to EPUB. It was for me anyway.
|
02-20-2017, 07:23 PM | #4 |
Resident Curmudgeon
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
What is this nonsense being converted?
|
02-20-2017, 07:25 PM | #5 |
Well trained by Cats
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
02-20-2017, 09:59 PM | #6 | |
Member
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
|
Quote:
Is it safe to leave that as the general setting in Preferences, or are there was that it can go wrong? Also: Is there a fix for the line break issue? And is there a way to omit fonts when converting .docx files, or would I have to manually open each one and click the "Clear all formatting" button? |
|
02-20-2017, 11:05 PM | #7 |
Well trained by Cats
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
You can do specific overrides when you start Conversion
Preferences sets the Defaults BTW when you convert a book, THOSE settings are remembered, even if you change the defaults. |
02-20-2017, 11:23 PM | #8 |
creator of calibre
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Look at the Look & Feel section of the conversion dialog under the Stying tab. And look at the option in the txt input section of the conversion dialog.
|
02-21-2017, 10:49 AM | #9 | |
Member
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
|
Quote:
Edit: I see "unformatted" also strips line breaks. "single" seems to work best, although all the settings result in space appearing between lines that doesn't appear when using the Word route. Last edited by ij26; 02-21-2017 at 11:10 AM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert Chinese UTF-8 TXT file into ePub?? | C.Jones81 | Calibre | 4 | 12-05-2010 06:32 AM |
comic.txt UTF-8 for manga | kookiie | LRF | 0 | 11-15-2010 03:10 PM |
comic.txt UTF-8 | kookiie | Sony Reader | 0 | 11-15-2010 10:21 AM |
comic.txt UTF-8 | kookiie | Calibre | 0 | 11-15-2010 10:16 AM |
comic.txt UTF-8 | kookiie | Recipes | 0 | 11-15-2010 10:14 AM |