12-03-2015, 07:20 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Dec 2015
Device: Kindle for iOS
|
DOCX to XHTML
Hello,
I guess this is the perfect trifecta for me. I am completely new to Python, completely new to the Calibre API, and this is my first post, so please be gentle :-) After using Calibre, I found the conversion from DOCX to XHTML to be one of the best solutions out there. I would like to dig into this code a bit to see how exactly it works and perhaps even make some suggested improvements (specifically, I am looking at how lists are managed). Being so new to the code base, I am not sure where to start. All the references I could find to the DOCX conversion seem to be configuration files for the plugin, not the actual conversion code. Could someone point me to where the actual conversion from DOCX to XHTML lives? Also, it would be helpful to know if an external library is used to support this. Thanks! |
12-03-2015, 10:23 PM | #2 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Download the calibre source code and look in the docx folder, start with
docx/to_html.py |
Advert | |
|
12-03-2015, 11:34 PM | #3 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
See the development guide, definitely the best place to start
http://manual.calibre-ebook.com/develop.html So the code that defines reading and writing DOCX would be in src/calibre/ebooks/docx/ |
12-04-2015, 10:20 AM | #4 |
Junior Member
Posts: 2
Karma: 10
Join Date: Dec 2015
Device: Kindle for iOS
|
Thank you. Unfortunately, I could not find the DOCX conversion process in the APIs which is why I asked, but I am sure it is just because I am new.
|
12-04-2015, 10:52 AM | #5 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
The generated API documentation only covers the parts that people normally use in plugins or for news recipes.
You are going a little deeper than that. But the development guide also explains how the source code is structured. Not to worry -- questions aren't unexpected! |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New docx conversion | Francois_C | Conversion | 11 | 05-29-2015 10:36 AM |
split docx into multiple xhtml files | xanguera | Conversion | 14 | 08-01-2014 07:09 AM |
DOCX | orescb | Other formats | 0 | 06-16-2013 09:25 AM |
DOCX Input and DOCX Metadata Reader | SauliusP. | Development | 5 | 06-15-2012 02:17 AM |
.DOCX format | NoWorthWhile | Amazon Kindle | 10 | 01-14-2011 07:48 AM |