View Single Post
Old 06-07-2012, 11:03 PM   #1
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
DOCX Input and DOCX Metadata Reader

Updated and maintained plugin thread is here:
https://www.mobileread.com/forums/sho....php?p=2107703
---------------------------------------
Spoiler:

Hello,

I have made DOCX Metadata Reader and DOCX Input plugins for my own purposes and, well, maybe someone else would make use of them too. As an article writer I have lots of DOCX and tried to find good free alternative for DOCX to EPUB or MOBI conversion. However, good EPUB tools are not free, and Amazon's conversion service did not satisfy me, it makes formatting crappy and "not book like". So here they are, my own conversion tools. Please feel free to use them for your own purposes. Development will continue, I will constantly add new features. I was quite surprised there is no other plugin for Calibre, as DOCX format is comparatively simple.

DOCX Metadata Reader simply reads metadata from DOCX file, when added to Calibre library or appropriate button is pushed in book's details editor. The very first picture (if applicable) is used as a cover.

PLANS:
Add options dialogue to turn on/off cover extraction.



DOCX Input plugin converts a DOCX file format to OEB (if I'm not mistaken, bunch of HTMLs with OPF file and CSS stylesheets). Then Calibre converts it to anything it supports. My main target is MOBI, but no hacks included for better support.

SUPPORTED FEATURES
1. Conversion to CSS and filtering of Word styles (only in-use styles are converted).
2. Paragraph properties: left, right indents, first line indent, last rendered page break (might be: manual page break, style-based page break, section break etc).
3. Images support. Limitation: wrapped around pictures are floated to left only, as position calculation is a feature I didn't like in the Word.
4. Tables (also multi-level table in a cell support).
5. Everything until first rendered page break is considered to be "a cover". I.e. most of my documents, that I convert, include some type of cover and a manual page break.
6. Font embedding of DejaVu Serif (included into plugin itself).
7. Footnotes are saved into individual HTML files and superscript links are added.
8. Paragraphs, that have TOC level styles applied (like Heading 1, 2 etc., or custom ones), are converted to appropriate level h1, h2 etc. HTML tags.
9. Font-sizes are converted to pt (same value, as you see in Word itself).
10. Indents are converted to em (just looks better).

NOT SUPPORTED
1. Manual linebreak.
2. Lists (bulleted and numbered). However, for that purpose I use Word macro (also attached), that converts all the lists to plain text (bullets and numbers are preserved).
3. Table styling. Now only collapsed 1px black borders are hard-coded.
4. No font-face styling. I support only DejaVu Serif font-face for font-embedded (like EPUB) conversion.
5. Footnotes back-link.
6. No endnotes support and is not planned. If required, I convert all endnotes to footnotes beforehand.
7. Another fancy things, like vector graphics, OLEs, effects etc. Not planned either.

PLANNED
1. Options dialogue: cover conversion modes (until first page break, use Calibre's), font embedding on/off, switch font-size units: em, pt, px, %.
2. Font face support (but in far future).
3. List support (as well, as break-continued lists).
4. Line breaks.
5. Hard space (code 160), ndash and mdash to HTML entities.
6. Table styling (if not too difficult).


USING
To get best results Calibre should be tuned a bit.
1. To generate TOC, go to Common Options, Table of Contents and add expressions for HTML headings (use wizard or input //h:h1 for Level 1 TOC, //h:h2 for Level 2 and //h:h3 for Level 3).
2. For EPUB conversion go to EPUB output options and tick "No default cover" and "No SVG cover".

All critiques, crashes and suggestions are most welcome, but I will not be quick in responses or new features development. At the moment I'm quite satisfied with plugins.

Last edited by SauliusP.; 09-28-2012 at 05:32 AM.
SauliusP. is offline   Reply With Quote