Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old Yesterday, 07:00 AM   #211
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,150
Karma: 18509109
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Ashjuk View Post
What I was actually looking for was a stand-alone program that would take a pre-saved docx (odt?) file a convert it to html similar to the ones I have linked to for Windows and Mac.
Calibre is FOSS, cross-platform and has a much better DOCX-to-Epub filter than the default LibreOffice Epub export filter.
Doitsu is offline   Reply With Quote
Old Yesterday, 08:27 AM   #212
Ashjuk
Addict
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 357
Karma: 1360945
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
Quote:
Originally Posted by Doitsu View Post
Calibre is FOSS, cross-platform and has a much better DOCX-to-Epub filter than the default LibreOffice Epub export filter.
Thanks Doitsu,

I see there is already a reference to Calibre at the end of the chapter that mentions its conversion capabilities.
Ashjuk is offline   Reply With Quote
Old Yesterday, 10:07 AM   #213
elibrarian
Imperfect Perfectionist
elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.
 
elibrarian's Avatar
 
Posts: 206
Karma: 571018
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
Reading the previous posts, I can't help asking (and this may ultimately belong in another forum), if anyone has tried this one:

https://daisy.org/activities/software/wordtoepub/

It's Windows-only, uses Pandoc as its motor, and produces accessible epub3's from any properly formatted .docx - works as an MS Word plugin, or standalone.

I have tried it, and though I don't like its formatting of images, it might work for others. And it's probably better, not least for newbies, at producing epubs, than all those more or less esotheric procedures that have been invented over the years for lack of better.

Regards,

Kim

Last edited by elibrarian; Yesterday at 10:42 AM.
elibrarian is offline   Reply With Quote
Old Yesterday, 10:18 AM   #214
Banjo
Zealot
Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!
 
Banjo's Avatar
 
Posts: 105
Karma: 5276
Join Date: Feb 2013
Device: Asus Zen Pad
Quote:
Originally Posted by Doitsu View Post
FYI: the current LibreOffice versions come with an Epub export filter that allows users to export both epub2 and epub3 books.
Unfortunately, the epubs that it generates will fail EPUBCheck and not all LibreOffice document properties are supported.
That is interesting. I never use LibreOffice "Export As" anymore so I didn't notice that "Export As EPUB" is in there. When I write an ePub I do it directly in Sigil to avoid the junk that the other tools put into the output, so I don't use these much.

But it's nice to know.
Banjo is offline   Reply With Quote
Old Yesterday, 10:52 AM   #215
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
That is just fine.

Thank you!

Kevin


Quote:
Originally Posted by Ashjuk View Post
I have edited the tutorial_convert_to_html file to include the LibreOffice information provided by Banjo (thanks) and the small change I wanted to make.

Hope it is OK.

https://www.mobileread.com/forums/at...1&d=1614936507
KevinH is offline   Reply With Quote
Old Yesterday, 11:17 AM   #216
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
Pandoc is an open source gpl totally cross-platform document converter that has both epub2 and epub3 as one of its supported formats as well as docx and odt as other formats so there so there is no need for a windows only approach to generating epub3s via pandoc.

So I wonder why Daisy is only producing its pandoc based converter for Windows only?

Seems strange from an Accessibility point of view.

For those interested in trying pandoc checkout pandoc.org.


Quote:
Originally Posted by elibrarian View Post
Reading the previous posts, I can't help asking (and this may ultimately belong in another forum), if anyone has tried this one:

https://daisy.org/activities/software/wordtoepub/

It's Windows-only, uses Pandoc as its motor, and produces accessible epub3's from any properly formatted .docx - works as an MS Word plugin, or standalone.

I have tried it, and though I don't like its formatting of images, it might work for others. And it's probably better, not least for newbies, at producing epubs, than all those more or less esotheric procedures that have been invented over the years for lack of better.

Regards,

Kim
KevinH is offline   Reply With Quote
Old Yesterday, 12:11 PM   #217
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
I have taken Ashjuk's reworked tutorial_convert_to_html chapter (Thank you Banjo and Ashjuk) and expanded the bit about Calibre and added a bit about Pandoc (since between the two of them I doubt there is a major format that is not covered!)

I have updated the first post with the updated version:

src_updated_20210305_01.epub

Thanks!
KevinH is offline   Reply With Quote
Old Yesterday, 12:30 PM   #218
Ashjuk
Addict
Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.Ashjuk ought to be getting tired of karma fortunes by now.
 
Ashjuk's Avatar
 
Posts: 357
Karma: 1360945
Join Date: May 2011
Location: Surrey, UK
Device: Kobo Aura One, Sony PRS 600/650
Quote:
Originally Posted by KevinH View Post
I have updated the first post with the updated version:

src_updated_20210305_01.epub

Thanks!
Looks good, Kevin - I think between us we have just about covered all of the methods of conversion to html.
Ashjuk is offline   Reply With Quote
Old Yesterday, 12:56 PM   #219
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
For those interested ...

I ran a few tests using my father-in-laws Memoirs from his time escaping Poland immediately after the war as a boy to come to Canada. It was originally in Word docx format (some 34 meg in size due to lots of photos and maps and tables). Of course not one style was used when the Word document was first written. As others have said, effectively using styles in Word is not something many people seem to do.

I tried DOCXImport, pandoc from docx to epub3, LibreOffice using "Save as Copy" and "Export as EPUB (to EPUB3)", then I tried pandoc from the odt directly to epub3.

Some of these did not work well but a combination of them did.

For example the DOCXImport and Pandoc both barfed on the Images that had originally been jpegs but somehow inside the docx became .emf (or something like that?) image files.

Pandoc just passed them along like they were a primary format in epub3 (they are not, and no browser supports them, at least on macOS). DocXImport just copied in image placeholders which was mentioned in its docs and so expected behaviour.

Pandoc messed up again when trying to convert from odt to epub3, generating only a single empty html file and no images at all. An epic fail with no error messages generated at all.

But when I used LibreOffice to read the .docx directly and then used "Save Copy" to export it to html, LibreOffice nicely converted all of the unreadable images to .gif files (which I can easily change to png) so none of the images were lost! That said the resulting html file was a bit messy with style tags in places, but ...

The LibreOffice "Export to EPUB3" dropped the table of contents completely for some reason and was messier than the "Save Copy" approach.

So the best overall "input" was obtained by mixing the DOCXImport and then overwriting the images with those generated by LibreOffice using the Save As Copy to get to the images.

By combining the two different approaches I got a very clean, very nice set of files ready to be cleaned up styles added, etc.

So if there was anyway to add a (.emf?) to .gif converter (or better yet .png) to the DOCXImport plugin that would make something very clean and nice.

I will try to see if I can find one.

In the future, when faced with this task again, I will probably try multiple approaches and then grab the best bits and pieces from each of them from to get the parts I want. Especially for images.

Hope this helps.

Edit:

It seems LibreOffice allows a headless mode that can be used to convert from .emf to png painlessly.

So if you have LibreOffice installed on Linux (and in your PATH) or macOS the following will work to do the conversions directly from the command line:

Linux:
Code:
libreoffice --headless --convert-to png image.emf
macOS:
Code:
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to png image.emf
Both will place the image.png file right beside the input image.emf file. I am sure that LibreOffice for Windows can do something similar.

I am thinking about seeing if I can modify the DOCXImport plugin to check for LibreOffice being installed and convert the image files on the fly when I get a few free moments.

If it is not easy to modify mammoth then perhaps via post processing the html and image files.

Python's PIL is said to work with the older .wmf format and to convert them to svg (since they may contain both text and images) as well, but I have not tried it and others have reported some issues.

Last edited by KevinH; Yesterday at 01:24 PM. Reason: add part about convert from .emf to png
KevinH is offline   Reply With Quote
Old Yesterday, 01:43 PM   #220
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,711
Karma: 7810595
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KevinH View Post
So I wonder why Daisy is only producing its pandoc based converter for Windows only?

Seems strange from an Accessibility point of view.
They recently released a few new videos about it on Youtube:

"WordToEPUB Extended Tutorial – Accessible EPUB in Seconds"
"Do more with WordToEPUB"

I remember watching a few when they first announced the tool, but I haven't personally used it.

Probably Windows-only because of Micorsoft Word (the Mac version of Word is a mess).

Also making conversion much more user-friendly (pandoc isn't the greatest).

Last edited by Tex2002ans; Yesterday at 02:03 PM.
Tex2002ans is offline   Reply With Quote
Old Yesterday, 02:20 PM   #221
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 5,123
Karma: 3265870
Join Date: Nov 2009
Device: many
FWIW, The current Mac version of Word (office 365) supports plugins and vba and javascript macros just like the Windows version. I can see very little difference between them anymore. Not like in the old days when they dropped vba macro support, then added it back but poorly, had a very different interface, etc.
KevinH is offline   Reply With Quote
Old Yesterday, 02:38 PM   #222
Banjo
Zealot
Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!Banjo , Klaatu Barada Niktu!
 
Banjo's Avatar
 
Posts: 105
Karma: 5276
Join Date: Feb 2013
Device: Asus Zen Pad
Another cross platform image conversion tool is imagemagick. It is strictly command line driven, but very powerful. Imagemagick is a suite of tools that manipulate images. The most used commands are "convert" and "mogrify". Converting lots of files at once can be done with a single command, e.g.

Code:
mogrify -format tif *.png
converts all the .png files into .tif files. It handles an enormous number of file formats and transformations.
Banjo is offline   Reply With Quote
Old Yesterday, 09:29 PM   #223
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,711
Karma: 7810595
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Ashjuk View Post
I don't know what application you are using but I downloaded what is supposedly the number 1 free lossless compression tool - PNGGauntlet - and running one of my uncompressed images through it resulted in an almost identical file size to the original.
TruePNG is what I used to get the ~22% further compression. It was an older tool created by the same guy who created ScriptPNG.

In ~2017, he obsoleted those 2 PNG optimizers and made:

pingo + pinga (a GUI version), which handles JPG/PNG/WEBP/APNG/[...]:

https://www.css-ig.net/pingo

I haven't used this program much though.

pingo's focus is on extremely high speed + fantastic compression.

Where TruePNG had extreme compression, even if it took a MUCH longer time.

Side Note: I actually just tested pingo, and it's absolutely fantastic. Blows away TruePNG.

He's really done a fantastic job making it even better.

(Of course, when optimizing, always make backup copies just in case, it overwrites the original files.)

Quote:
Originally Posted by Ashjuk View Post
I'm not saying you are wrong as you obviously have much more knowledge and experience than I, but following the above suggestions has not worked for me.
And again, the fantastic thing about lossless images is you'll get out EXACTLY what you put in.

It's just like zipping up a Word document:

You wouldn't want all your letters to get completely scrambled when you zip it up! (Lossy)

You want every letter in its exact same spot! (Lossless)

When you take an image with millions of colors and smush it into 256 (Indexed)... it's like you're taking all of your Unicode text and trying to smash it into ASCII:
  • résumé the crème brûlée (Unicode)
  • resume the creme brulee (ASCII)
  • αβγδεζηθικλμνξοπρστυφχψω (Unicode)
  • abgdezethiklmnksoprstufchpso (ASCII)

JPG/lossy conversion is like you're throwing away all the accents + trying to pick English letters that LOOK similar to the Greek.

Sure, it may look "close", but you'll never get back to the original.

Quote:
Originally Posted by Ashjuk View Post
I have just tried another compression tool - PngOptimizer - on the largest of the image files (tutorial-main-window.png) and the optimised file was approximately 10% smaller.

Whilst there has been some saving in my mind it is not that significant, and less than half of the ~22% you had mentioned.
Most of these tools go for speed + easy wins over super compression.

Imagine it like a ZIP file.

When you compress, you can usually move a slider from:
  • "No compression" -> "Maximum compression"

The higher the compression, the smaller the filesize. BUT it takes longer to gather all the data and figure out the best way to shrink it.

Imagine these PNG optimization tools like a hidden:
  • "Even maximum-er-er compression"

But even here, you have:
  • different quality tools (most stink)
  • diminishing returns
  • super duper new techniques
    • Like in 2013 Google came up with a newer compression algorithm (Zopfli). It's able to compress PNGs ~10% further, but at the cost ~80 times more CPU time.
    • For example, if PNGGauntlet's latest release was 2012, there's no way it even has this technique in there.

- - - - - - -

Basics of Lossless Compression

Simple Compression

For example, let's say you have a string of 100 1's:

Code:
1 1 1 1 1 1 1 1 1 1 1 1 [...] 1
Instead of storing all 100 numbers, you could instead say:
  • "put 100 ones in a row here"

Slight Compression

Let's say you saw this pattern:

Code:
1 2 3 4 5 6 7 8 9 10 11 12 [...] 100.
the PNG can say:
  • "Start at 1, and keep adding 1 for the next 99 positions."

More Complicated Compression

PNG has a ton of other methods to compress images like this that aren't normally used...

Like instead of only searching for patterns by row/column, it can also look diagonally + weird shapes like a 'z' + entire blocks at a time.

It can also do funky things with the math to get the same exact answer in the end.

Example: Simple Image

Let's say you had a B&W image with an all white background with a single black pixel in the middle.

The PNG would look something like this:

Spoiler:
Code:
255 255 255 255 255 255 255
255 255 255 255 255 255 255
255 255 255 255 255 255 255
255 255 255 0   255 255 255
255 255 255 255 255 255 255
255 255 255 255 255 255 255
255 255 255 255 255 255 255


255 = pure white
0 = pure black

(Note: In reality: RGB = 3 colors, so white would be "255 255 255" + black would be "0 0 0".)

Indexing the Image

A tool like pingo might look at the above and say:

"Hmmm, there are only 2 colors here. Let me Index this."

Spoiler:
Code:
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 0 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1


and let me say:

1 = the color white
0 = the color black

(Note: And when you multiply everything by 255, you get an exact match of the original!)

Optimizing the Optimization

But then it can scratch its head, and go even one step further.

White is used 48/49 times, and black is barely used.

Spoiler:
Code:
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0


0 = the color white
1 = the color black

Now it can just say something like:
  • "this entire image is color 0... except one teeny dot."

Huge lines of zero compress INCREDIBLY WELL.

This PNG would barely take any space now.

So GIMP/Photoshop only try the easy stuff, even at "Maximum"... pingo and other optimization tools try more of the crazy tools in the toolbelt.

Last edited by Tex2002ans; Yesterday at 09:52 PM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Wanted: Volunteers to Update the Sigil User's Guide KevinH Sigil 55 05-18-2018 03:52 AM
Sigil Improvement Projects for Developers/Volunteers KevinH Sigil 19 06-27-2016 07:19 PM
Sigil and chapters. p3aul Sigil 5 11-29-2012 10:23 PM
Sigil + Adobe Digital Editions + chapters wkuiper Sigil 6 11-23-2012 11:58 AM
Sigil and epub (can I delete empty chapters?) desaderal Sigil 3 03-24-2011 08:57 AM


All times are GMT -4. The time now is 01:53 PM.


MobileRead.com is a privately owned, operated and funded community.