View Single Post
Old 02-20-2017, 06:07 PM   #1
ij26
Member
ij26 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
Converting UTF-8 TXT to Epub

Example file: spaghetti_sparkle_2_-_galaonline.txt (don't judge me).
Quote:
ksgiven:零
Opening the .txt file in Word, accepting the default encoding of UTF-8, resaving as .docx, and converting to .epub in calibre preserves the "零" (although by default Word saves the file in Courier New). But trying to use Calibre to convert the .txt file directly to .epub changes the "零" to "雜" (displayed as "éś"). It also strips single line breaks.

Conversion log for .docx-to-epub:
Code:
Convert book 1 of 1 (spaghetti sparkle 2)
DeDRM v6.1.0: In __init__
DeDRM v6.1.0: In load_resources
DeDRM v6.1.0: verdir C:\Users\N\AppData\Roaming\calibre\plugins\DeDRM\6.1.0
DeDRM v6.1.0: In initialize
Conversion options changed from defaults:
  search_replace: '[]'
  output_profile: 'kindle_pw'
  sr2_search: None
  transform_css_rules: '[]'
  sr2_replace: None
  verbose: 2
  filter_css: u''
  sr3_search: None
  read_metadata_from_opf: u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\bc57t_.opf'
  sr1_search: None
  sr3_replace: None
  sr1_replace: None
Resolved conversion options
calibre version: 2.79.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'docx_inline_subsup': False,
 'docx_no_cover': False,
 'docx_no_pagebreaks_between_notes': False,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000005483CF8>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KindlePaperWhiteOutput object at 0x00000000054983C8>,
 'page_breaks_before': u'/',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\bc57t_.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'transform_css_rules': '[]',
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: DOCX Input running
on C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\fgzz3b.docx
Converting Word markup to HTML
Converting styles to CSS
Cleaning up redundant markup generated by Word
Parsing all content...
Parsing index.html ...
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Parsing docx.css ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 10.50000pt
Removing fake margins...
Found 183 items of level: p_1
p_1  left margin stats: Counter({u'0': 183})
p_1  right margin stats: Counter({u'0': 183})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
	Looking for large trees in index.html...
	No large trees found
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\datz4i.epub
Conversion log for .txt-to-.epub:
Code:
Convert book 1 of 1 (spaghetti_sparkle_2_-_galaonline)
DeDRM v6.1.0: In __init__
DeDRM v6.1.0: In load_resources
DeDRM v6.1.0: verdir C:\Users\N\AppData\Roaming\calibre\plugins\DeDRM\6.1.0
DeDRM v6.1.0: In initialize
Conversion options changed from defaults:
  sr3_replace: None
  sr1_replace: None
  search_replace: '[]'
  output_profile: 'kindle_pw'
  markdown_extensions: u'toc, tables, footnotes'
  sr2_search: None
  transform_css_rules: '[]'
  sr2_replace: None
  verbose: 2
  filter_css: u''
  sr3_search: None
  read_metadata_from_opf: u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\xy4rwu.opf'
  sr1_search: None
Resolved conversion options
calibre version: 2.79.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'formatting_type': u'auto',
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000005352D68>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markdown_extensions': u'toc, tables, footnotes',
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KindlePaperWhiteOutput object at 0x0000000005364438>,
 'page_breaks_before': u"//*[name()='h1' or name()='h2']",
 'paragraph_type': u'auto',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'preserve_spaces': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\xy4rwu.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'transform_css_rules': '[]',
 'txt_in_remove_indents': False,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: TXT Input running
on C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\prmnfp.txt
Reading text from file...
Detected input encoding as ISO-8859-2 with a confidence of 84.8260567914%
Auto detected paragraph type as unformatted
Auto detected formatting as heuristic
Running text through basic conversion...
Language not specified
Creator not specified
Building file list...
	Found files...
		 HTMLFile:0:a:C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\index.html
Normalizing filename cases
Rewriting HTML links
Parsing index.html ...
*********  Heuristic processing HTML  *********
There are 12 blank lines. 0.107142857143 percent blank
minimum chapters required are: 1
found 0 pre-existing headings
Total wordcount is: 1240, Average words per section is: 1240, Marked up 0 chapters
Hard line breaks check returned True
Median line length is 39, calculated with html format
Fixing hyphenated content
Looking for more split points based on punctuation, currently have 0
marked 1 section markers based on punctuation. - Fucking embarrassing</p>
Formatting scene breaks
Forcing index.html into XHTML namespace
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 112 items of level: p_1
p_1  left margin stats: Counter({u'0': 112})
p_1  right margin stats: Counter({u'0': 112})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
		Splitting on page-break at id=calibre_pb_0
	Looking for large trees in index.html...
	No large trees found
	Split into 2 parts
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\bbtamz.epub
Is there any fast way to bulk-convert .txt to .epub that preserves Unicode symbols and line breaks and doesn't force a font?

Last edited by ij26; 02-20-2017 at 09:56 PM. Reason: Correcting quote.
ij26 is offline   Reply With Quote