![]() |
#211 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,150
Karma: 18509109
Join Date: Dec 2010
Device: Kindle PW2
|
Calibre is FOSS, cross-platform and has a much better DOCX-to-Epub filter than the default LibreOffice Epub export filter.
|
![]() |
![]() |
![]() |
#212 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 357
Karma: 1360945
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
|
![]() |
![]() |
![]() |
#213 |
Imperfect Perfectionist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 206
Karma: 571018
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
|
Reading the previous posts, I can't help asking (and this may ultimately belong in another forum), if anyone has tried this one:
https://daisy.org/activities/software/wordtoepub/ It's Windows-only, uses Pandoc as its motor, and produces accessible epub3's from any properly formatted .docx - works as an MS Word plugin, or standalone. I have tried it, and though I don't like its formatting of images, it might work for others. And it's probably better, not least for newbies, at producing epubs, than all those more or less esotheric procedures that have been invented over the years for lack of better. Regards, Kim Last edited by elibrarian; Yesterday at 10:42 AM. |
![]() |
![]() |
![]() |
#214 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 5276
Join Date: Feb 2013
Device: Asus Zen Pad
|
Quote:
But it's nice to know. |
|
![]() |
![]() |
![]() |
#215 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
|
That is just fine.
Thank you! Kevin Quote:
|
|
![]() |
![]() |
![]() |
#216 | |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
|
Pandoc is an open source gpl totally cross-platform document converter that has both epub2 and epub3 as one of its supported formats as well as docx and odt as other formats so there so there is no need for a windows only approach to generating epub3s via pandoc.
So I wonder why Daisy is only producing its pandoc based converter for Windows only? Seems strange from an Accessibility point of view. For those interested in trying pandoc checkout pandoc.org. Quote:
|
|
![]() |
![]() |
![]() |
#217 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
|
I have taken Ashjuk's reworked tutorial_convert_to_html chapter (Thank you Banjo and Ashjuk) and expanded the bit about Calibre and added a bit about Pandoc (since between the two of them I doubt there is a major format that is not covered!)
I have updated the first post with the updated version: src_updated_20210305_01.epub Thanks! |
![]() |
![]() |
![]() |
#218 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 357
Karma: 1360945
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
|
|
![]() |
![]() |
![]() |
#219 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
|
For those interested ...
I ran a few tests using my father-in-laws Memoirs from his time escaping Poland immediately after the war as a boy to come to Canada. It was originally in Word docx format (some 34 meg in size due to lots of photos and maps and tables). Of course not one style was used when the Word document was first written. As others have said, effectively using styles in Word is not something many people seem to do. I tried DOCXImport, pandoc from docx to epub3, LibreOffice using "Save as Copy" and "Export as EPUB (to EPUB3)", then I tried pandoc from the odt directly to epub3. Some of these did not work well but a combination of them did. For example the DOCXImport and Pandoc both barfed on the Images that had originally been jpegs but somehow inside the docx became .emf (or something like that?) image files. Pandoc just passed them along like they were a primary format in epub3 (they are not, and no browser supports them, at least on macOS). DocXImport just copied in image placeholders which was mentioned in its docs and so expected behaviour. Pandoc messed up again when trying to convert from odt to epub3, generating only a single empty html file and no images at all. An epic fail with no error messages generated at all. But when I used LibreOffice to read the .docx directly and then used "Save Copy" to export it to html, LibreOffice nicely converted all of the unreadable images to .gif files (which I can easily change to png) so none of the images were lost! That said the resulting html file was a bit messy with style tags in places, but ... The LibreOffice "Export to EPUB3" dropped the table of contents completely for some reason and was messier than the "Save Copy" approach. So the best overall "input" was obtained by mixing the DOCXImport and then overwriting the images with those generated by LibreOffice using the Save As Copy to get to the images. By combining the two different approaches I got a very clean, very nice set of files ready to be cleaned up styles added, etc. So if there was anyway to add a (.emf?) to .gif converter (or better yet .png) to the DOCXImport plugin that would make something very clean and nice. I will try to see if I can find one. In the future, when faced with this task again, I will probably try multiple approaches and then grab the best bits and pieces from each of them from to get the parts I want. Especially for images. Hope this helps. Edit: It seems LibreOffice allows a headless mode that can be used to convert from .emf to png painlessly. So if you have LibreOffice installed on Linux (and in your PATH) or macOS the following will work to do the conversions directly from the command line: Linux: Code:
libreoffice --headless --convert-to png image.emf Code:
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to png image.emf I am thinking about seeing if I can modify the DOCXImport plugin to check for LibreOffice being installed and convert the image files on the fly when I get a few free moments. If it is not easy to modify mammoth then perhaps via post processing the html and image files. Python's PIL is said to work with the older .wmf format and to convert them to svg (since they may contain both text and images) as well, but I have not tried it and others have reported some issues. Last edited by KevinH; Yesterday at 01:24 PM. Reason: add part about convert from .emf to png |
![]() |
![]() |
![]() |
#220 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,711
Karma: 7810595
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
"WordToEPUB Extended Tutorial – Accessible EPUB in Seconds" "Do more with WordToEPUB" I remember watching a few when they first announced the tool, but I haven't personally used it. Probably Windows-only because of Micorsoft Word (the Mac version of Word is a mess). Also making conversion much more user-friendly (pandoc isn't the greatest). Last edited by Tex2002ans; Yesterday at 02:03 PM. |
|
![]() |
![]() |
![]() |
#221 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
|
FWIW, The current Mac version of Word (office 365) supports plugins and vba and javascript macros just like the Windows version. I can see very little difference between them anymore. Not like in the old days when they dropped vba macro support, then added it back but poorly, had a very different interface, etc.
|
![]() |
![]() |
![]() |
#222 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 5276
Join Date: Feb 2013
Device: Asus Zen Pad
|
Another cross platform image conversion tool is imagemagick. It is strictly command line driven, but very powerful. Imagemagick is a suite of tools that manipulate images. The most used commands are "convert" and "mogrify". Converting lots of files at once can be done with a single command, e.g.
Code:
mogrify -format tif *.png |
![]() |
![]() |
![]() |
#223 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,711
Karma: 7810595
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
In ~2017, he obsoleted those 2 PNG optimizers and made: pingo + pinga (a GUI version), which handles JPG/PNG/WEBP/APNG/[...]: https://www.css-ig.net/pingo I haven't used this program much though. pingo's focus is on extremely high speed + fantastic compression. Where TruePNG had extreme compression, even if it took a MUCH longer time. Side Note: I actually just tested pingo, and it's absolutely fantastic. Blows away TruePNG. He's really done a fantastic job making it even better. ![]() (Of course, when optimizing, always make backup copies just in case, it overwrites the original files.) Quote:
![]() It's just like zipping up a Word document: You wouldn't want all your letters to get completely scrambled when you zip it up! (Lossy) You want every letter in its exact same spot! (Lossless) When you take an image with millions of colors and smush it into 256 (Indexed)... it's like you're taking all of your Unicode text and trying to smash it into ASCII:
JPG/lossy conversion is like you're throwing away all the accents + trying to pick English letters that LOOK similar to the Greek. ![]() Sure, it may look "close", but you'll never get back to the original. Quote:
Imagine it like a ZIP file. When you compress, you can usually move a slider from:
The higher the compression, the smaller the filesize. BUT it takes longer to gather all the data and figure out the best way to shrink it. Imagine these PNG optimization tools like a hidden:
But even here, you have:
- - - - - - - Basics of Lossless Compression Simple Compression For example, let's say you have a string of 100 1's: Code:
1 1 1 1 1 1 1 1 1 1 1 1 [...] 1
Slight Compression Let's say you saw this pattern: Code:
1 2 3 4 5 6 7 8 9 10 11 12 [...] 100.
More Complicated Compression PNG has a ton of other methods to compress images like this that aren't normally used... Like instead of only searching for patterns by row/column, it can also look diagonally + weird shapes like a 'z' + entire blocks at a time. It can also do funky things with the math to get the same exact answer in the end. Example: Simple Image Let's say you had a B&W image with an all white background with a single black pixel in the middle. The PNG would look something like this: Spoiler:
255 = pure white 0 = pure black (Note: In reality: RGB = 3 colors, so white would be "255 255 255" + black would be "0 0 0".) Indexing the Image A tool like pingo might look at the above and say: "Hmmm, there are only 2 colors here. Let me Index this." Spoiler:
and let me say: 1 = the color white 0 = the color black (Note: And when you multiply everything by 255, you get an exact match of the original!) Optimizing the Optimization But then it can scratch its head, and go even one step further. White is used 48/49 times, and black is barely used. Spoiler:
0 = the color white 1 = the color black Now it can just say something like:
Huge lines of zero compress INCREDIBLY WELL. This PNG would barely take any space now. ![]() So GIMP/Photoshop only try the easy stuff, even at "Maximum"... pingo and other optimization tools try more of the crazy tools in the toolbelt. ![]() Last edited by Tex2002ans; Yesterday at 09:52 PM. |
|||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Wanted: Volunteers to Update the Sigil User's Guide | KevinH | Sigil | 55 | 05-18-2018 03:52 AM |
Sigil Improvement Projects for Developers/Volunteers | KevinH | Sigil | 19 | 06-27-2016 07:19 PM |
Sigil and chapters. | p3aul | Sigil | 5 | 11-29-2012 10:23 PM |
Sigil + Adobe Digital Editions + chapters | wkuiper | Sigil | 6 | 11-23-2012 11:58 AM |
Sigil and epub (can I delete empty chapters?) | desaderal | Sigil | 3 | 03-24-2011 08:57 AM |