Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-20-2017, 06:07 PM   #1
ij26
Member
ij26 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
Converting UTF-8 TXT to Epub

Example file: spaghetti_sparkle_2_-_galaonline.txt (don't judge me).
Quote:
ksgiven:零
Opening the .txt file in Word, accepting the default encoding of UTF-8, resaving as .docx, and converting to .epub in calibre preserves the "零" (although by default Word saves the file in Courier New). But trying to use Calibre to convert the .txt file directly to .epub changes the "零" to "雜" (displayed as "éś"). It also strips single line breaks.

Conversion log for .docx-to-epub:
Code:
Convert book 1 of 1 (spaghetti sparkle 2)
DeDRM v6.1.0: In __init__
DeDRM v6.1.0: In load_resources
DeDRM v6.1.0: verdir C:\Users\N\AppData\Roaming\calibre\plugins\DeDRM\6.1.0
DeDRM v6.1.0: In initialize
Conversion options changed from defaults:
  search_replace: '[]'
  output_profile: 'kindle_pw'
  sr2_search: None
  transform_css_rules: '[]'
  sr2_replace: None
  verbose: 2
  filter_css: u''
  sr3_search: None
  read_metadata_from_opf: u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\bc57t_.opf'
  sr1_search: None
  sr3_replace: None
  sr1_replace: None
Resolved conversion options
calibre version: 2.79.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'docx_inline_subsup': False,
 'docx_no_cover': False,
 'docx_no_pagebreaks_between_notes': False,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000005483CF8>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KindlePaperWhiteOutput object at 0x00000000054983C8>,
 'page_breaks_before': u'/',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\bc57t_.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'transform_css_rules': '[]',
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: DOCX Input running
on C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\fgzz3b.docx
Converting Word markup to HTML
Converting styles to CSS
Cleaning up redundant markup generated by Word
Parsing all content...
Parsing index.html ...
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Parsing docx.css ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 10.50000pt
Removing fake margins...
Found 183 items of level: p_1
p_1  left margin stats: Counter({u'0': 183})
p_1  right margin stats: Counter({u'0': 183})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
	Looking for large trees in index.html...
	No large trees found
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\datz4i.epub
Conversion log for .txt-to-.epub:
Code:
Convert book 1 of 1 (spaghetti_sparkle_2_-_galaonline)
DeDRM v6.1.0: In __init__
DeDRM v6.1.0: In load_resources
DeDRM v6.1.0: verdir C:\Users\N\AppData\Roaming\calibre\plugins\DeDRM\6.1.0
DeDRM v6.1.0: In initialize
Conversion options changed from defaults:
  sr3_replace: None
  sr1_replace: None
  search_replace: '[]'
  output_profile: 'kindle_pw'
  markdown_extensions: u'toc, tables, footnotes'
  sr2_search: None
  transform_css_rules: '[]'
  sr2_replace: None
  verbose: 2
  filter_css: u''
  sr3_search: None
  read_metadata_from_opf: u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\xy4rwu.opf'
  sr1_search: None
Resolved conversion options
calibre version: 2.79.0
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'expand_css': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'formatting_type': u'auto',
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000005352D68>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markdown_extensions': u'toc, tables, footnotes',
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KindlePaperWhiteOutput object at 0x0000000005364438>,
 'page_breaks_before': u"//*[name()='h1' or name()='h2']",
 'paragraph_type': u'auto',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'preserve_spaces': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'C:\\Users\\N\\AppData\\Local\\Temp\\calibre_o1jd4a\\xy4rwu.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'transform_css_rules': '[]',
 'txt_in_remove_indents': False,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: TXT Input running
on C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\prmnfp.txt
Reading text from file...
Detected input encoding as ISO-8859-2 with a confidence of 84.8260567914%
Auto detected paragraph type as unformatted
Auto detected formatting as heuristic
Running text through basic conversion...
Language not specified
Creator not specified
Building file list...
	Found files...
		 HTMLFile:0:a:C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\index.html
Normalizing filename cases
Rewriting HTML links
Parsing index.html ...
*********  Heuristic processing HTML  *********
There are 12 blank lines. 0.107142857143 percent blank
minimum chapters required are: 1
found 0 pre-existing headings
Total wordcount is: 1240, Average words per section is: 1240, Marked up 0 chapters
Hard line breaks check returned True
Median line length is 39, calculated with html format
Fixing hyphenated content
Looking for more split points based on punctuation, currently have 0
marked 1 section markers based on punctuation. - Fucking embarrassing</p>
Formatting scene breaks
Forcing index.html into XHTML namespace
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 112 items of level: p_1
p_1  left margin stats: Counter({u'0': 112})
p_1  right margin stats: Counter({u'0': 112})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
		Splitting on page-break at id=calibre_pb_0
	Looking for large trees in index.html...
	No large trees found
	Split into 2 parts
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to C:\Users\N\AppData\Local\Temp\calibre_o1jd4a\bbtamz.epub
Is there any fast way to bulk-convert .txt to .epub that preserves Unicode symbols and line breaks and doesn't force a font?

Last edited by ij26; 02-20-2017 at 09:56 PM. Reason: Correcting quote.
ij26 is offline   Reply With Quote
Old 02-20-2017, 06:43 PM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Moderator Notice
Mobileread strives to be a family-friendly site. Please refrain from profanity (even if it's quoted text)
DiapDealer is offline   Reply With Quote
Old 02-20-2017, 06:57 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
If you explicitly mark the input character encoding as utf8 (Look & feel->Text->Input character encoding) in the conversion settings, the character is properly preserved when converting TXT to EPUB. It was for me anyway.
DiapDealer is offline   Reply With Quote
Old 02-20-2017, 07:23 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
What is this nonsense being converted?
JSWolf is offline   Reply With Quote
Old 02-20-2017, 07:25 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by JSWolf View Post
What is this nonsense being converted?
Doesn't matter
Diap's answer was Tell Calibre (TXT) Conversion UTF-8, don't make it guess. Obviously, it got it wrong.
theducks is offline   Reply With Quote
Old 02-20-2017, 09:59 PM   #6
ij26
Member
ij26 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by DiapDealer View Post
If you explicitly mark the input character encoding as utf8 (Look & feel->Text->Input character encoding) in the conversion settings, the character is properly preserved when converting TXT to EPUB. It was for me anyway.
Thanks; that seems to work.

Is it safe to leave that as the general setting in Preferences, or are there was that it can go wrong?

Also: Is there a fix for the line break issue? And is there a way to omit fonts when converting .docx files, or would I have to manually open each one and click the "Clear all formatting" button?
ij26 is offline   Reply With Quote
Old 02-20-2017, 11:05 PM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
You can do specific overrides when you start Conversion
Preferences sets the Defaults
BTW when you convert a book, THOSE settings are remembered, even if you change the defaults.
theducks is offline   Reply With Quote
Old 02-20-2017, 11:23 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Look at the Look & Feel section of the conversion dialog under the Stying tab. And look at the option in the txt input section of the conversion dialog.
kovidgoyal is offline   Reply With Quote
Old 02-21-2017, 10:49 AM   #9
ij26
Member
ij26 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2015
Device: none
Quote:
Originally Posted by kovidgoyal View Post
Look at the Look & Feel section of the conversion dialog under the Stying tab. And look at the option in the txt input section of the conversion dialog.
Thanks. I see that there's a "Structure" section where I can select "Paragraph style" and "Formatting style". What's the difference between the paragraph styles "single", "unformatted" and "off"? The descriptions presented don't make it clear.

Edit: I see "unformatted" also strips line breaks. "single" seems to work best, although all the settings result in space appearing between lines that doesn't appear when using the Word route.

Last edited by ij26; 02-21-2017 at 11:10 AM.
ij26 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Chinese UTF-8 TXT file into ePub?? C.Jones81 Calibre 4 12-05-2010 06:32 AM
comic.txt UTF-8 for manga kookiie LRF 0 11-15-2010 03:10 PM
comic.txt UTF-8 kookiie Sony Reader 0 11-15-2010 10:21 AM
comic.txt UTF-8 kookiie Calibre 0 11-15-2010 10:16 AM
comic.txt UTF-8 kookiie Recipes 0 11-15-2010 10:14 AM


All times are GMT -4. The time now is 08:50 PM.


MobileRead.com is a privately owned, operated and funded community.