Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2011, 07:55 AM   #1
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Markdown Syntax in Comments

I am seeing cases where the Comments field retrieved by Calibre is in what looks like Markdown Syntax (for a simple example look at "Off the Mangrove Coast" by Louis L'Amour). When Calibre adds this comment it appears that it is being enclosed in a <div> and <p> tag to support rich Text Editing in the GUI comments field. I am trying to produce a web page incorporating this comment field, and the presence of the HTML tags stops markdown processing from operating on the field to give a better formatted comment.

It seems to me I that there are several options available
  • I hard-code knowledge of Calibre's enclosing tags and remove them so that I can apply Markdown processing. I am not enamoured of this approach as any change at the Calibre level is liable to break my code. It also does not benefit the average Calibre user.
  • Calibre does not put those tags around the Comments field if the text is plain text. This would mean that later Markdown processing is possible relatively easily. At the moment I am not sure what the Calibre Content Server does with Markdown syntax in comments (if anything).
  • Calibre processes the markdown at the point of receipt and converts it to (X)HTML before inserting it into the Comments field.
There may be options I have missed - anyone want to comment on that? Also are there related issues that I have missed in my initial analysis?

Of the options I have listed, the most effective to me seems the last one of converting to HTML at the point the Markdown hits Calibre. If that is what others think I will look at raising an appropriate ticket for an enhancement request. I can then look at whether I can work out the Calibre patch to go with the request, but first I thought I would solicit advice on the best way to proceed.
itimpi is offline   Reply With Quote
Old 01-25-2011, 08:18 AM   #2
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
You might want to go to the metadata download plugin, customize the Amazon plugin and check the box Convert comments downloaded from Amazon to plain text.
DoctorOhh is offline   Reply With Quote
Advert
Old 01-25-2011, 08:31 AM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The metadata downloading code has been through a lot of flux lately. Markdown was used to cleanse html comments, and then the comments are supposed to be converted back to html. I think there must be a bug where it's not getting converted back to html. But other bugs have been fixed - for a while any text that was also an href was getting deleted, that at least seems to be resolved.
ldolse is offline   Reply With Quote
Old 01-25-2011, 08:35 AM   #4
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by dwanthny View Post
You might want to go to the metadata download plugin, customize the Amazon plugin and check the box Convert comments downloaded from Amazon to plain text.
I already have that set!
itimpi is offline   Reply With Quote
Old 01-25-2011, 08:36 AM   #5
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by ldolse View Post
The metadata downloading code has been through a lot of flux lately. Markdown was used to cleanse html comments, and then the comments are supposed to be converted back to html. I think there must be a bug where it's not getting converted back to html. But other bugs have been fixed - for a while any text that was also an href was getting deleted, that at least seems to be resolved.
Does that suggest that not should already work as I wanted - and that if it does not it is probably a bug?
itimpi is offline   Reply With Quote
Advert
Old 01-25-2011, 08:41 AM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by itimpi View Post
Does that suggest that not should already work as I wanted - and that if it does not it is probably a bug?
I'm not the one who's been working in that area of the code, so I'm not entirely sure of the maintainer's intentions, but my suspicion is it's a bug.

That said - if you have the preference to convert Amazon comments to plain text maybe that's why it's markdown syntax now - if you disable that checkbox and re-download the metadata does it convert the markdown back to html?
ldolse is offline   Reply With Quote
Old 01-25-2011, 08:49 AM   #7
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by ldolse View Post
That said - if you have the preference to convert Amazon comments to plain text maybe that's why it's markdown syntax now - if you disable that checkbox and re-download the metadata does it convert the markdown back to html?
If I change the Amazon plugin to not convert to plain text then the immediate problem disappears as I end up with HTML rather than Markdown formatted comments. However, I think there is a bug there as if a comment is meant to be plain text then it should be stored as that - not as plain text with an HTML wrapper that stops Markdown processing from being possible?
itimpi is offline   Reply With Quote
Old 01-25-2011, 09:01 AM   #8
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Actually maybe the problem is on your web page/script side then - markdown is supposed to allow html tags anywhere in the middle of the text. I took advantage of this pretty heavily with a markdown book I was editing. Is it actually the markdown interpreter bailing out when it sees a tag, or is it whatever function you have passing the comments over to markdown that's the problem?
ldolse is offline   Reply With Quote
Old 01-25-2011, 09:09 AM   #9
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by ldolse View Post
Actually maybe the problem is on your web page/script side then - markdown is supposed to allow html tags anywhere in the middle of the text. I took advantage of this pretty heavily with a markdown book I was editing. Is it actually the markdown interpreter bailing out when it sees a tag, or is it whatever function you have passing the comments over to markdown that's the problem?
Actually not quite true! The Markdown spec specifically says that it does not process text inside an HTML block tag - and both the <DIV> and (I think) <P> fall into this category. I went looking when trying to process the Comments stored in Calibre was failing and found this statement. If Calibre had not stored the comment in a hybrid form of Markdown syntax inside an HTML block tag I would have been OK. That was why one of the options I mentioned in my original post was the brute force one of programatically removing the Calibre enclosing tags before applying the Markdown processing.

Interestingly enough - I notice the markdown spec says that text inside <SPAN> tags is subject to markdown processing.

Last edited by itimpi; 01-25-2011 at 09:14 AM.
itimpi is offline   Reply With Quote
Old 01-25-2011, 09:16 AM   #10
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by itimpi View Post
Actually not quite true! The Markdown spec specifically says that it does not process text inside an HTML block tag - and both the <DIV> and (I think) <P> fall into this category. I went looking when trying to process the Comments stored in Calibre was failing and found this statement. If Calibre had not stored the comment in a hybrid form of Markdown syntax inside an HTML block tag I would have been OK. That was why one of the options I mentioned in my original post was the brute force one of programatically removing the Calibre enclosing tags before applying the Markdown processing.
I had a suspicion that that might be the case after I posted that... Hmm.. brute force may be the only option short of a tying a new feature into Calibre. I suspect the surrounding tags are a requirement of the rich text editor as you noted.
ldolse is offline   Reply With Quote
Old 01-25-2011, 09:19 AM   #11
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by itimpi View Post
Actually not quite true! The Markdown spec specifically says that it does not process text inside an HTML block tag - and both the <DIV> and (I think) <P> fall into this category. I went looking when trying to process the Comments stored in Calibre was failing and found this statement. If Calibre had not stored the comment in a hybrid form of Markdown syntax inside an HTML block tag I would have been OK. That was why one of the options I mentioned in my original post was the brute force one of programatically removing the Calibre enclosing tags before applying the Markdown processing.

Interestingly enough - I notice the markdown spec says that text inside <SPAN> tags is subject to markdown processing.
Actually - now that I think about it, I've got lots of markdown formatted content inside of Div tags in the book I've been editing and it gets converted fine. I don't have any <p> tags though, they may be the real killer - I wonder if the rich text editor could be changed to use <br />?

Span tags don't surprise me - very much an inline element.
ldolse is offline   Reply With Quote
Old 01-25-2011, 11:28 AM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
comments are migrated to html irrespective of their download source. They are stored as html in the database. You're going to have to workaround that in your script. The HTML used is typically very simple, so it should be trivial to have a function that strips it. calibre uses the html2text python library for that, You cna use it in your own script with

calibre-debug -c
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Markdown to ePub : Optimize internal filesizes Agama Calibre 1 01-22-2011 09:36 AM
Slowness converting markdown book from the GUI in 7.38 ldolse Calibre 4 01-09-2011 10:21 PM
->Txt+Markdown Perkin Calibre 2 12-11-2010 04:04 AM
Capture intermediate html from markdown Agama Calibre 3 07-30-2010 11:33 AM
Markdown editor user_none Workshop 0 08-30-2009 09:25 PM


All times are GMT -4. The time now is 01:12 PM.


MobileRead.com is a privately owned, operated and funded community.