LRF output - Page 47

kovidgoyal · 05-05-2008, 10:30 PM

put all files in a zip file

yagiz · 05-05-2008, 10:52 PM

Thanks for the prompt response.

When i downloaded the Web page with all dependencies (in a way to refer to local files) it all worked. So, it solves my problem.

Just as a feedback: When I put the same files in a ZIP file, I'm getting this error:

Processing rob-blog.zip
Traceback (most recent call last):
File "convert_from.py", line 1922, in <module>
File "convert_from.py", line 1916, in main
File "convert_from.py", line 1808, in process_file
File "convert_from.py", line 266, in __init__
File "convert_from.py", line 371, in add_file
File "calibre\ebooks\chardet\__init__.pyo", line 59, in xml_to_unicode
TypeError: decode() argument 1 must be string, not None

Have you ever considered supporting the .mht (Web Archive) format?

Thanks again,

kovidgoyal · 05-05-2008, 11:15 PM

You have to run the zip file through any2lrf

and calibre contains very sophisticated code to download and convert web content, see feeds2lrf and web2lrf

yagiz · 05-06-2008, 12:05 AM

Wow! I'm impressed with the efficiency of the news and RSS feeds conversion. It works perfectly with the blog site that I was looking for.

Thanks,

barron · 05-26-2008, 06:59 PM

Hello,

When using html2lrf with the --chapter-regex option, can I apply the search for a tag set instead of what's between the tags? In other words, if I want anything using <H1></H1> tags to be a chapter can I set this up using the --chapter-regex option?

Thanks.

kovidgoyal · 05-26-2008, 09:32 PM

No chapter-regex only searches the contents of tags. You can force page breaks before tags based on ther tag names and attributes, using the various force-page-break options

marwick · 06-06-2008, 05:25 AM

Maybe I'm asking a stupid question, but I couldn't find the answer using the search engine.

Here it goes: I have a .doc document. I created a TOC autodetecting its chapters. I saved and from here I created an html file. Everything worked and looked fine. Then I converted the html to lrf with Calibre and the TOC just lost all of its title descriptions. That's what I mean:

I. Chapter one title.................5
II: Chapter two title................32
III. etc.

in the lrf changed to:

LINK.....................5
LINK....................32
LINK..etc

Why am I loosing the chapter names? How could I bring them to the lrf file?

kovidgoyal · 06-06-2008, 12:59 PM

post the section of the HTML file that contains the TOC

marwick · 06-08-2008, 12:44 PM

Could it be this part?

<w:Sdt SdtDocPart="t" DocPartType="Table of Contents" DocPartUnique="t"
ID="21517536">
Contenido<w:sdtPr></w:sdtPr>
<a href="#_Toc200384664">Introducción. 5</a><o

></o

>
<a
href="#_Toc200384665">CAPITULO
I 19</a><o

></o

>

etc.

kovidgoyal · 06-08-2008, 04:09 PM

Try saving as a "clean HTML" file from Word, that will remove all the microsoft junk and give you an easier to understand HTML file and hopefully better conversion results as well.

zelda_pinwheel · 06-08-2008, 04:21 PM

how do you save as a "clean html" file from word ? i didn't know that was possible, but it would be great if it is... i didn't find the option in the "format" dropdown list in the "save as" dialogue.

kovidgoyal · 06-08-2008, 04:25 PM

Depends on the version of word you have and unfortunately, I no longer have a copy of word on hand to check, but I'm sure someone will be able to tell you. At a guess look at the "export" options

zelda_pinwheel · 06-08-2008, 04:31 PM

aaah... i suspected as much. i think my version is probably too old (2000). i'll just have to stick to writing my html code by hand. it's worked so far.

JSWolf · 06-08-2008, 06:58 PM

Quote:

Originally Posted by kovidgoyal

Try saving as a "clean HTML" file from Word, that will remove all the Microsoft junk and give you an easier to understand HTML file and hopefully better conversion results as well.

Would it not be better to save as RTF and use rtf2lrf?

kovidgoyal · 06-08-2008, 07:08 PM

rtf2lrf is not really as feature rich as html2lrf (it basically converts the RTF to html and then runs html2lrf on it)

05-26-2008, 06:59 PM	#695
barron Junior Member Posts: 1 Karma: 10 Join Date: May 2008 Device: Sony Reader, Amazon Kindle	Can html2lrf with "--chapter-regex" find tags? Hello, When using html2lrf with the --chapter-regex option, can I apply the search for a tag set instead of what's between the tags? In other words, if I want anything using <H1></H1> tags to be a chapter can I set this up using the --chapter-regex option? Thanks.

06-08-2008, 12:44 PM	#699
marwick Junior Member Posts: 9 Karma: 10 Join Date: Jul 2006	Could it be this part? <w:Sdt SdtDocPart="t" DocPartType="Table of Contents" DocPartUnique="t" ID="21517536"> <p class=MsoTocHeading><span lang=ES>Contenido<w:sdtPr></w:sdtPr></span></p> <p class=MsoToc1 style='tab-stops:right dotted 481.55pt'><!--[if supportFields]><span lang=ES><span style='mso-element:field-begin'></span><span style='mso-spacerun:yes'> </span>TOC \o "1-3" \h \z \u <span style='mso-element:field-separator'></span></span><![endif]--><span lang=ES><span class=MsoHyperlink><span style='mso-no-proof:yes'><a href="#_Toc200384664"><span style='mso-fareast-font-family:"Times New Roman"'>Introducción</span><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'><span style='mso-tab-count:1 dotted'>. </span></span><!--[if supportFields]><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'><span style='mso-element:field-begin'></span></span><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'> PAGEREF _Toc200384664 \h </span><span style='color:windowtext; display:none;mso-hide:screen;text-decoration:none;text-underline:none'><span style='mso-element:field-separator'></span></span><![endif]--><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'>5</span><span style='color:windowtext;display:none; mso-hide:screen;text-decoration:none;text-underline:none'><!--[if gte mso 9]><xml> <w:data>08D0C9EA79F9BACE118C8200AA004BA90B02000000 080000000E0000005F0054006F006300320030003000330038 0034003600360034000000</w:data> </xml><![endif]--></span><!--[if supportFields]><span style='color:windowtext; display:none;mso-hide:screen;text-decoration:none;text-underline:none'><span style='mso-element:field-end'></span></span><![endif]--></a></span></span></span><span style='mso-ansi-language:CA;mso-fareast-language:CA;mso-no-proof:yes'><o></o></span></p> <p class=MsoToc1 style='tab-stops:right dotted 481.55pt'><span class=MsoHyperlink><span lang=ES style='mso-no-proof:yes'><a href="#_Toc200384665"><span style='mso-fareast-font-family:"Times New Roman"'>CAPITULO I</span><span style='color:windowtext;display:none;mso-hide:screen;text-decoration: none;text-underline:none'><span style='mso-tab-count:1 dotted'> </span></span><!--[if supportFields]><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'><span style='mso-element:field-begin'></span></span><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'> PAGEREF _Toc200384665 \h </span><span style='color:windowtext; display:none;mso-hide:screen;text-decoration:none;text-underline:none'><span style='mso-element:field-separator'></span></span><![endif]--><span style='color:windowtext;display:none;mso-hide:screen;text-decoration:none; text-underline:none'>19</span><span style='color:windowtext;display:none; mso-hide:screen;text-decoration:none;text-underline:none'><!--[if gte mso 9]><xml> <w:data>08D0C9EA79F9BACE118C8200AA004BA90B02000000 080000000E0000005F0054006F006300320030003000330038 0034003600360035000000</w:data> </xml><![endif]--></span><!--[if supportFields]><span style='color:windowtext; display:none;mso-hide:screen;text-decoration:none;text-underline:none'><span style='mso-element:field-end'></span></span><![endif]--></a></span></span><span style='mso-ansi-language:CA;mso-fareast-language:CA;mso-no-proof:yes'><o></o></span></p> etc. Last edited by marwick; 06-08-2008 at 12:47 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Change font of header for LRF Output on PRS 505	duckbill	Calibre	3	05-15-2010 11:07 AM
Pissed off with LRF formatting: LRF/LRS clean tool?	grimborg	LRF	8	02-15-2010 01:14 PM
Fonts for LRF output	krischik	Calibre	1	10-03-2009 05:01 AM
CBZ > LRF (LRF>HTML/MOBI????)	sideburnt	Calibre	4	09-15-2009 06:44 AM
libprs500 Issues Converting .LIT to .LRF - .LRF crashes everything	vasbinde	Calibre	6	02-14-2008 12:16 PM

05-05-2008, 10:30 PM	#691
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	put all files in a zip file

05-05-2008, 10:52 PM	#692
yagiz http://is.gd/4flJX Posts: 134 Karma: 422 Join Date: Jan 2008 Device: Sony PRS-505, iPhone 3G	Thanks for the prompt response. When i downloaded the Web page with all dependencies (in a way to refer to local files) it all worked. So, it solves my problem. Just as a feedback: When I put the same files in a ZIP file, I'm getting this error: Processing rob-blog.zip Traceback (most recent call last): File "convert_from.py", line 1922, in <module> File "convert_from.py", line 1916, in main File "convert_from.py", line 1808, in process_file File "convert_from.py", line 266, in __init__ File "convert_from.py", line 371, in add_file File "calibre\ebooks\chardet\__init__.pyo", line 59, in xml_to_unicode TypeError: decode() argument 1 must be string, not None Have you ever considered supporting the .mht (Web Archive) format? Thanks again,

05-05-2008, 11:15 PM	#693
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You have to run the zip file through any2lrf and calibre contains very sophisticated code to download and convert web content, see feeds2lrf and web2lrf

05-06-2008, 12:05 AM	#694
yagiz http://is.gd/4flJX Posts: 134 Karma: 422 Join Date: Jan 2008 Device: Sony PRS-505, iPhone 3G	Wow! I'm impressed with the efficiency of the news and RSS feeds conversion. It works perfectly with the blog site that I was looking for. Thanks,

05-26-2008, 09:32 PM	#696
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	No chapter-regex only searches the contents of tags. You can force page breaks before tags based on ther tag names and attributes, using the various force-page-break options

06-06-2008, 05:25 AM	#697
marwick Junior Member Posts: 9 Karma: 10 Join Date: Jul 2006	Maybe I'm asking a stupid question, but I couldn't find the answer using the search engine. Here it goes: I have a .doc document. I created a TOC autodetecting its chapters. I saved and from here I created an html file. Everything worked and looked fine. Then I converted the html to lrf with Calibre and the TOC just lost all of its title descriptions. That's what I mean: I. Chapter one title.................5 II: Chapter two title................32 III. etc. in the lrf changed to: LINK.....................5 LINK....................32 LINK..etc Why am I loosing the chapter names? How could I bring them to the lrf file?

06-06-2008, 12:59 PM	#698
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	post the section of the HTML file that contains the TOC

06-08-2008, 04:09 PM	#700
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Try saving as a "clean HTML" file from Word, that will remove all the microsoft junk and give you an easier to understand HTML file and hopefully better conversion results as well.

06-08-2008, 04:21 PM	#701
zelda_pinwheel zeldinha zippy zeldissima Posts: 27,827 Karma: 921169 Join Date: Dec 2007 Location: Paris, France Device: eb1150 & is that a nook in her pocket, or she just happy to see you?	how do you save as a "clean html" file from word ? i didn't know that was possible, but it would be great if it is... i didn't find the option in the "format" dropdown list in the "save as" dialogue.

06-08-2008, 04:25 PM	#702
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Depends on the version of word you have and unfortunately, I no longer have a copy of word on hand to check, but I'm sure someone will be able to tell you. At a guess look at the "export" options

06-08-2008, 04:31 PM	#703
zelda_pinwheel zeldinha zippy zeldissima Posts: 27,827 Karma: 921169 Join Date: Dec 2007 Location: Paris, France Device: eb1150 & is that a nook in her pocket, or she just happy to see you?	aaah... i suspected as much. i think my version is probably too old (2000). i'll just have to stick to writing my html code by hand. it's worked so far.

06-08-2008, 07:08 PM	#705
kovidgoyal creator of calibre Posts: 45,364 Karma: 27230406 Join Date: Oct 2006 Location: Mumbai, India Device: Various	rtf2lrf is not really as feature rich as html2lrf (it basically converts the RTF to html and then runs html2lrf on it)