Trying to use Textile processing

getajob · 03-05-2011, 11:43 PM

I have spent hours trying to convert TXT to EPUB by marking it up with TEXTILE tags.

The resulting EPUB file shows no signs of any TEXTILE processing whatsoever. No headings, no linking, no italics or bold. Nothing.

I am now admitting defeat.

I am posting here in the hope that someone can tell me what is wrong.

I am running Calbre 0.7.48 on Windows XP SP3.

Basically I have set the TXT input processing to
Paragraph style: off
Formatting style: textile

Here is my input file (I stole this from Perkins):

Spoiler:

Here is my log file from Calibre

Spoiler:

I find the "initial parse failed" error message worrying but I cannot see a cause for this.

I have processed txt to epub in the past using MARKDOWN with acceptable results, but since I upgraded to 0.7.48, markdown is not working either.

Any ideas? John

Perkin · 03-06-2011, 06:55 AM

Just updated to 7.48 (from .46), and conversion works fine here.
Have you tried restarting machine?

Edit:
Win 7

user_none · 03-06-2011, 08:16 AM

The conversion is resulting correctly for me too. I would try: reboot your computer, uninstall calibre, reboot, reinstall calibre, reboot, try converting.

Quote:

Originally Posted by getajob

I find the "initial parse failed" error message worrying but I cannot see a cause for this.

Replace <hr> with <hr />. The error can be ignored as calibre will do the replacement later on during the conversion. But you can make that change to prevent it entirely.

getajob · 03-06-2011, 09:29 PM

Quote:

Originally Posted by user_none

The conversion is resulting correctly for me too. I would try: reboot your computer, uninstall calibre, reboot, reinstall calibre, reboot, try converting.

Replace <hr> with <hr />. The error can be ignored as calibre will do the replacement later on during the conversion. But you can make that change to prevent it entirely.

Thanks for your suggestions. I am glad to hear that 0.7.48 is working for you.

I had already re-installed Calibre and re-booted but just to be sure I did this again (Uninstall Calibre, re-boot, install Calibre, re-boot).

I have changed the <hr> to <hr /> but I still get no joy what-so-ever.

Here is my new source file:

Spoiler:

Here is my Calibre job log:

Spoiler:

Here is a link to the epub file that gets output http://dl.dropbox.com/u/18750031/Tex...20Unknown.epub

Although the job log says

Code:

Running text through textile conversion...

there is no evidence that this actually happened.

There are two .pyo files in C:\Program Files\Calibre2\Lib\site-packages\calibre\ebooks\textile so I seem to have the textile python executables installed.

Do you have any other ideas of what else I can check?

As you can see, I ran this with debug on. John

DoctorOhh · 03-06-2011, 11:30 PM

Quote:

Originally Posted by getajob

Thanks for your suggestions. I am glad to hear that 0.7.48 is working for you.

Do you have any other ideas of what else I can check?

I have one idea. Quit calibre, rename the configuration directory, restart calibre. This will force calibre to create a brand new configuration folder. If something in your configuration folder is corrupt this might fix it.

The epub you attached looked very close to what I got when I left both text input settings to auto.

My experiment:

I have never used textile. I took your source and added it to calibre. I converted to ePub using paragraph style - Off, Formatting style - Textile.

I attached the resultant epub, which looks great except I didn't have the image. I also attached the txt file I used for the source. I left all of my default settings alone, hopefully they haven't skewed it too much.

In case it might help, here is my job details info.

Spoiler:

Code:

Convert book 1 of 1 (textile) Resolved conversion options
 calibre version: 0.7.48
 {'asciiize': False,
  'author_sort': None,
  'authors': None,
  'base_font_size': 16.0,
  'book_producer': None,
  'change_justification': u'original',
  'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., 'introduction|prologue|epilogue|chapter|book|section|conclusion|part\\s+', 'i')) or @class = 'chapter']",
  'chapter_mark': u'none',
  'comments': None,
  'cover': None,
  'debug_pipeline': None,
  'dehyphenate': False,
  'delete_blank_paragraphs': False,
  'disable_font_rescaling': False,
  'dont_split_on_page_breaks': False,
  'enable_heuristics': False,
  'epub_flatten': False,
  'extra_css': u'body { margin: 0 0; padding: 0em 0em; }\n\np {margin-top:0.5em; margin-bottom:0.5em; text-indent:1.1em}\n\nh1+p, h2+p, h3+p, p.whitespace+p, p.softbreak+p {margin-top:0.1em; margin-bottom:0.3em; text-indent:0%}',
  'extract_to': None,
  'fix_indents': False,
  'flow_size': 260,
  'font_size_mapping': u'16,16,16,16,17.5,17.5,18,18',
  'format_scene_breaks': True,
  'formatting_type': u'textile',
  'html_unwrap_factor': 0.4,
  'input_encoding': None,
  'input_profile': <calibre.customize.profiles.SonyReaderInput object at 0x0444AC90>,
  'insert_blank_line': True,
  'insert_metadata': False,
  'isbn': None,
  'italicize_common_cases': False,
  'keep_ligatures': True,
  'language': None,
  'level1_toc': None,
  'level2_toc': None,
  'level3_toc': None,
  'line_height': 0.0,
  'linearize_tables': False,
  'margin_bottom': 5.0,
  'margin_left': 5.0,
  'margin_right': 5.0,
  'margin_top': 5.0,
  'markdown_disable_toc': False,
  'markup_chapter_headings': False,
  'max_toc_links': 50,
  'minimum_line_height': 120.0,
  'no_chapters_in_toc': False,
  'no_default_epub_cover': False,
  'no_inline_navbars': False,
  'no_svg_cover': False,
  'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x0444AF90>,
  'page_breaks_before': u'//h:h1',
  'paragraph_type': u'off',
  'prefer_metadata_cover': False,
  'preserve_cover_aspect_ratio': False,
  'preserve_spaces': False,
  'pretty_print': True,
  'pubdate': None,
  'publisher': None,
  'rating': None,
  'read_metadata_from_opf': 'C:\\Calibre_temp\\calibre_0.7.48_tmp_oghngz\\calibre_0.7.48_ckbwpv.opf',
  'remove_first_image': False,
  'remove_paragraph_spacing': True,
  'remove_paragraph_spacing_indent_size': 1.1,
  'renumber_headings': False,
  'replace_scene_breaks': u'',
  'series': None,
  'series_index': None,
  'smarten_punctuation': True,
  'sr1_replace': None,
  'sr1_search': None,
  'sr2_replace': None,
  'sr2_search': None,
  'sr3_replace': None,
  'sr3_search': None,
  'tags': None,
  'timestamp': None,
  'title': None,
  'title_sort': None,
  'toc_filter': None,
  'toc_threshold': 6,
  'txt_in_remove_indents': False,
  'unwrap_lines': False,
  'use_auto_toc': False,
  'verbose': 2}
 InputFormatPlugin: TXT Input running
 on C:\My Dropbox\CalibreLibrary\textile\textile (8108)\textile - textile.txt
 Reading text from file...
 Detected input encoding as ISO-8859-2 with a confidence of 83.5262045228%
 Running text through textile conversion...
 Language not specified
 Building file list...
     Found files...
          HTMLFile:0:a:C:\My Dropbox\CalibreLibrary\textile\textile (8108)\index.html
 Normalizing filename cases
 Rewriting HTML links
 Parsing index.html ...
 Forcing index.html into XHTML namespace
 Merging user specified metadata...
 Detecting structure...
 Auto generated TOC with 2 entries.
 Flattening CSS and remapping font sizes...
 Source base font size is 12.00000pt
 Cleaning up manifest...
 Trimming unused files from manifest...
 Parsing stylesheet.css ...
 Creating EPUB Output...
         Splitting on page-break
     Looking for large trees in index.html...
     No large trees found
 Generating default cover
 EPUB output written to C:\Calibre_temp\calibre_0.7.48_tmp_oghngz\calibre_0.7.48_brpjix.epub

Good Luck.

Perkin · 03-07-2011, 05:44 AM

dwanthy, that comes out nearly all correct, the only problem is the pre section, which is missing spaces (also incorrect in posts), should be

Code:

pre. 
There   was   a   man   from   hither,
  Who,  when  he  began  to  shiver,
    He     gave     a     cough,
      His   leg  dropped  off,
   And  floated  down  the  river.

There's a space after pre. (why the pre tag isn't converted), and several spaces in the text (to give a centrified limerick).

Another problem is some of the accented characters, which is just the coding,being different.

@getajob
Have you tried setting the 'Input character encoding' in the conversion-Look'n'feel, try converting with utf-8, and if that doesn't work try again with cp1252

I had a similar problem with a version just after Textile was introduced, but was the character encoding which was causing it.

Edit:
If I remember, I was converting with cp1252 but the file was utf-8.

getajob · 03-07-2011, 07:15 AM

Quote:

Originally Posted by dwanthny

I have one idea. Quit calibre, rename the configuration directory, restart calibre. This will force calibre to create a brand new configuration folder. If something in your configuration folder is corrupt this might fix it.

Dwanthy,

Good idea. I renamed my configuration folder and also deleted and reinstalled Calibre. I changed (only) the TXT input processing to
Paragraph style: off
Formatting style: textile

Otherwise it is a vanilla Calibre installation now.

Disappointingly I get exactly the same non-textile-processed result.

I tried a fresh install on a Windows XP desktop that has never had Calibre on it before. I get the same non-textile-processed result yet again.

I then tried installing on a new Windows 7 Pro laptop. Still no go - and I was confident that this would work for sure.

Quote:

Originally Posted by Perkin

@getajob
Have you tried setting the 'Input character encoding' in the conversion-Look'n'feel, try converting with utf-8, and if that doesn't work try again with cp1252

I had a similar problem with a version just after Textile was introduced, but was the character encoding which was causing it.

Edit:
If I remember, I was converting with cp1252 but the file was utf-8.

Perkin,

Good idea. I changed Input Character Coding to 'utf-8' in Look & Feel but all this seemed to was to do was put two black-diamond-question-marks before the 'h1. Header 1' which was unprocessed....(see attached result below)

Is there any easy way to determine what your Input Character Coding actually is?

The only markup that is working is the '<hr />' - maybe this is a clue...

Thanks for your help so far. Any more things I can try?

DoctorOhh · 03-07-2011, 07:23 AM

Quote:

Originally Posted by getajob

Good idea. I renamed my configuration folder and also deleted and reinstalled Calibre. I changed (only) the TXT input processing to
Paragraph style: off
Formatting style: textile

How about attaching the exact text file you are adding to calibre.

user_none · 03-07-2011, 07:31 AM

Also can you turn on having it give debug output. Then zip up and attach the debug output folder. I'm not sure how to do that with the GUI... On the command line you would use the --debug switch.

getajob · 03-07-2011, 08:19 AM

OK. It's after midnight here so I'll give you my input file, the last job log & the debug directory. I am calling it a night... Thanks for your help.

Perkin · 03-07-2011, 09:15 AM

Your file didn't convert here.
My text editor (EditPad Pro) is saying that the encoding is Unicode-UTF-16 Little Endian, perhaps that's something to do with it?
Try copying the whole text and pasting it into notepad - resaving and the add that to calibre and try again.
I did that and it then converted properly.

What text editor are you using?

Edit:
If you convert your file with 'UTF-16' as the input character encoding, it also converts properly here.

DoctorOhh · 03-07-2011, 09:15 AM

Quote:

Originally Posted by getajob

OK. It's after midnight here so I'll give you my input file, the last job log & the debug directory. I am calling it a night... Thanks for your help.

The good news is when I use your supplied txt file I get the exact same results you do. Notepad++ says it is encoded UCS-2 Little Endian. After I save it as UTF-8 encoded I get the expected conversion.

getajob · 03-07-2011, 05:43 PM

Thanks people you have solved my problem. (Thank God, it was driving me nuts...)

Quote:

Originally Posted by Perkin

What text editor are you using?

I am using WordPad but at some stage of the journey I decided to 'Save As' in what WordPad calls Unicode.

Quote:

Originally Posted by Perkin

If you convert your file with 'UTF-16' as the input character encoding, it also converts properly here.

You are absolutely correct. I tried setting the Input Character Encoding to 'Unicode' but that did not work. 'UTF-16' does not appear in the Input character encoding dropdown list and it never occurred to me to set it to 'UTF-16'.

WordPad has four 'Save As' options: ANSI, Unicode, Unicode big endian and UTF-8. Notepad is the same.

Using ANSI was giving me the annoying black-diamond-with-question-marks for the odd character so I changed to Unicode encoding.

SOLUTION:
Save your text in UTF-8 format using the 'Save As' dialog of WordPad or Notepad.
In Look & Feel, set Input character encoding to UTF-8
In TXT Input, set Paragraph style to off and set Formatting style to textile.

If you have to use Unicode, then set Input character encoding to UTF-16

Using ANSI is not recommended since it will give you black-diamond-with-question-marks for the odd character (don't ask me why...)

ldolse · 03-07-2011, 08:38 PM

You're mostly correct except for one part of your solution:

Quote:

Originally Posted by getajob

If you have to use Unicode, then set Input character encoding to UTF-16

UTF-8 is unicode, so there is no need to use UTF-16 ever. UTF-8 is basically the web and ebook standard for Unicode and is always the best file encoding to use. Just make sure your original file is saved as UTF-8.

Regarding your statement on ANSI, 'ANSI' shouldn't even really be called an encoding - ANSI really means 'encode this based on what country I live in, but make sure only people from the same country as me can read it'. Why Microsoft persists in defaulting all their products to ANSI I'll never understand, but it's the root cause of most people's encoding problems.

It probably wouldn't be terribly difficult to add support for reading the Unicode BOM at the beginning of the file so that Calibre can figure out UTF-8/16/32/LE/BE on it's own....

user_none · 03-07-2011, 09:00 PM

Quote:

Originally Posted by ldolse

It probably wouldn't be terribly difficult to add support for reading the Unicode BOM at the beginning of the file so that Calibre can figure out UTF-8/16/32/LE/BE on it's own....

It's already supposed to...

03-05-2011, 11:43 PM	#1
getajob Junior Member Posts: 7 Karma: 10 Join Date: Oct 2010 Location: Australia Device: Kindle 3, iPhone 3G, iPad 2 (on order)	Trying to use Textile processing I have spent hours trying to convert TXT to EPUB by marking it up with TEXTILE tags. The resulting EPUB file shows no signs of any TEXTILE processing whatsoever. No headings, no linking, no italics or bold. Nothing. I am now admitting defeat. I am posting here in the hope that someone can tell me what is wrong. I am running Calbre 0.7.48 on Windows XP SP3. Basically I have set the TXT input processing to Paragraph style: off Formatting style: textile Here is my input file (I stole this from Perkins): Spoiler: h1. Header 1 p(#fn1r). Here’s a link[1] which should jump to the end footnote. h2. Header 2 The first Robin Hobb trilogy, the _Farseer Trilogy,_ took place in the ??Six Duchies??. It is the tale of +FitzChivalry+ Farseer. p=. !E:/_BOOKS/Images/00004.jpg! The first Robin Hobb trilogy, the Farseer Trilogy, took place in the Six Duchies. h3. Header 3 Now some ^superscript^ followed by ~subscript~ and back to normal. @This should be in Code format.@ @To see what mono font looks like.@ pre. There was a man from hither, Who, when he began to shiver, He gave a cough, His leg dropped off, And floated down the river. * Bullet 1 * Bullet 2 * Bullet 3 # Numbered 1 # Numbered 2 # Numbered 3 And now here follows a horizontal rule <hr> fn1. A footnote is here, which should jump back to first paragraph link. When selecting here. "RETURN":#fn1r p<. Left ??justified?? p=. Center justified p>. Right _justified_ p<>. _This should be fully justified and in bold and italics. This should be fully justified and in bold and italics. This should be fully justified and in bold and italics._ Here is my log file from Calibre Spoiler: Convert book 1 of 1 (Textile sample conversion) Resolved conversion options calibre version: 0.7.48 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0.0, 'book_producer': None, 'change_justification': u'original', 'chapter': u"//[((name()='h1' or name()='h2') and re:test(., 'chapter\|book\|section\|part\\s+', 'i')) or @class = 'chapter']", 'chapter_mark': u'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': u'D:/Calibre Library/debug', 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_split_on_page_breaks': False, 'enable_heuristics': False, 'epub_flatten': False, 'extra_css': None, 'extract_to': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'formatting_type': u'textile', 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x046DB8F0>, 'insert_blank_line': False, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0.0, 'linearize_tables': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markdown_disable_toc': False, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x046DBAD0>, 'page_breaks_before': u"//[name()='h1' or name()='h2']", 'paragraph_type': u'off', 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'preserve_spaces': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': 'c:\\docume~1\\johnbr~1\\locals~1\\temp\\calibre_0 .7.48_tmp_oqc5pb\\calibre_0.7.48_pe3ac_.opf', 'remove_first_image': False, 'remove_paragraph_spacing': True, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': u'', 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': None, 'sr1_search': None, 'sr2_replace': None, 'sr2_search': None, 'sr3_replace': None, 'sr3_search': None, 'tags': None, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'txt_in_remove_indents': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: TXT Input running on D:\Calibre Library\Unknown\Textile sample conversion (155)\Textile sample conversion - Unknown.txt Reading text from file... Detected input encoding as windows-1252 with a confidence of 50.0% Running text through textile conversion... Language not specified Creator not specified Building file list... Found files... HTMLFile:0:a:\Calibre Library\Unknown\Textile sample conversion (155)\index.html Normalizing filename cases Rewriting HTML links Parsing index.html ... Initial parse failed: Traceback (most recent call last): File "site-packages\calibre\ebooks\oeb\base.py", line 881, in first_pass File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48634) File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:72245) File "parser.pxi", line 1417, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:71041) File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67581) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521) XMLSyntaxError: PCDATA invalid Char value 25, line 4, column 53 Parsing file 'index.html' as HTML Forcing index.html into XHTML namespace Input debug saved to: D:\Calibre Library\debug\input Parsed HTML written to: D:\Calibre Library\debug\parsed Merging user specified metadata... Detecting structure... Auto generated TOC with 0 entries. Structured HTML written to: D:\Calibre Library\debug\structure Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Cleaning up manifest... Trimming unused files from manifest... Parsing stylesheet.css ... Processed HTML written to: D:\Calibre Library\debug\processed Creating EPUB Output... Looking for large trees in index.html... No large trees found Generating default cover This EPUB file has no Table of Contents. Creating a default TOC EPUB output written to c:\docume~1\johnbr~1\locals~1\temp\calibre_0.7.48_ tmp_oqc5pb\calibre_0.7.48_wlo7ms.epub I find the "initial parse failed" error message worrying but I cannot see a cause for this. I have processed txt to epub in the past using MARKDOWN with acceptable results, but since I upgraded to 0.7.48, markdown is not working either. Any ideas? John

03-06-2011, 06:55 AM	#2
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	Just updated to 7.48 (from .46), and conversion works fine here. Have you tried restarting machine? Edit: Win 7 Last edited by Perkin; 03-06-2011 at 07:05 AM.

03-07-2011, 05:44 AM	#6
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	dwanthy, that comes out nearly all correct, the only problem is the pre section, which is missing spaces (also incorrect in posts), should be Code: pre. There was a man from hither, Who, when he began to shiver, He gave a cough, His leg dropped off, And floated down the river. There's a space after pre. (why the pre tag isn't converted), and several spaces in the text (to give a centrified limerick). Another problem is some of the accented characters, which is just the coding,being different. @getajob Have you tried setting the 'Input character encoding' in the conversion-Look'n'feel, try converting with utf-8, and if that doesn't work try again with cp1252 I had a similar problem with a version just after Textile was introduced, but was the character encoding which was causing it. Edit: If I remember, I was converting with cp1252 but the file was utf-8. Last edited by Perkin; 03-07-2011 at 05:47 AM.

03-07-2011, 09:15 AM	#11
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	Your file didn't convert here. My text editor (EditPad Pro) is saying that the encoding is Unicode-UTF-16 Little Endian, perhaps that's something to do with it? Try copying the whole text and pasting it into notepad - resaving and the add that to calibre and try again. I did that and it then converted properly. What text editor are you using? Edit: If you convert your file with 'UTF-16' as the input character encoding, it also converts properly here. Last edited by Perkin; 03-07-2011 at 09:27 AM.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

03-07-2011, 07:31 AM	#9
user_none Sigil & calibre developer Posts: 2,488 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR	Also can you turn on having it give debug output. Then zip up and attach the debug output folder. I'm not sure how to do that with the GUI... On the command line you would use the --debug switch.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Word Processing on the Kindle 3	cow_trix	Amazon Kindle	41	05-17-2011 03:22 AM
Textile conversion broken in 7.45	Perkin	Conversion	7	02-12-2011 06:36 PM
New edition of The Textile Planet; read chapter one for free [see post #14]	suelange	Self-Promotions by Authors and Publishers	14	09-29-2010 10:33 AM
Comic File Processing	wonderboy	Other formats	1	08-08-2009 04:17 AM
Perl processing	alexxxm	Sony Reader	3	11-26-2007 06:13 AM

Advert

Advert