Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 07-30-2017, 05:06 PM   #1
lostinlodos
Enthusiast
lostinlodos began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
Breaking down a massive file

Hi. I'm not sure where to post this so feel free to move me!
I created a document in Microsoft word that I then converted to ePub in Calibre. It's 56XX pages long and image heavy.
I now have two problems. First I accidentally deleted (shredded) the folder I was using for various page things including my source docx.
Second my file is too big for most readers, at 160MB.
iBooks crashes and Totalreader stalls for a long time (almost half an hour) on each load.
It displays fine on most of my PC apps so it's definitely a size or length problem.
I've tried converting the ePub back to docx and Microsoft Word can't load it, invalid file. Fact, Atlantis is the only WP that can load Calibre's docx. Resaving from AT doesn't help, still can't load in word.

SO: question is, is there an easy, direct, automated way to split my ePub into, say, 500 page chunks? (Not worried about where it splits at the moment I can fix sentence cutting and stuff manually). Like drag n drop type 500 click and walk away?
Or is there another text based format I could use to do this. PDF works and gives me a "valid" output but the formatting is messed up and I'm simply attempting to not have to rearrange and reformat more than 5000 pages.
Please, any help would be wonderful!
lostinlodos is offline   Reply With Quote
Old 07-30-2017, 05:17 PM   #2
fbrzvnrnd
Fanatic
fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.
 
Posts: 556
Karma: 400004
Join Date: Feb 2009
Device: ONYX M96
Quote:
Originally Posted by lostinlodos View Post
SO: question is, is there an easy, direct, automated way to split my ePub into, say, 500 page chunks? (Not worried about where it splits at the moment I can fix sentence cutting and stuff manually). Like drag n drop type 500 click and walk away?

You can use Sigil and the string Sigil uses to break chapters. For example, you can search for title tags (h1 or h2, you have to see the code) and add the string Sigil uses to break chapters. After you added the string, you can ask Sigil to split file in that places.
fbrzvnrnd is offline   Reply With Quote
Old 07-30-2017, 05:37 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,935
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
There's a plugin for Calibre that will split an epub into multiple epubs, you could give it a try ===>>> EpubSplit

BR
BetterRed is offline   Reply With Quote
Old 07-30-2017, 06:11 PM   #4
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,184
Karma: 18843349
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
160 MB is immense when it comes to an ebook. Even very large - picture intense - children's books that I've seen are around 20-25MB. I would guess that the size of your image files are the main culprit. You can/should reduce the image size to about 1600px in height and/or use a better image format to compact the size. That is really all the image size/quality you need with today's reading apps/devices. If you absolutely must have the full size images then you could host them on a network server and include links in the ebook.

As far as splitting the ePub into smaller files....yes...definitely! The smaller the file size the faster the device will load and display each new file. Most people will split the html file at a new chapter - 20 Chapters = 20 files plus files for cover, title, acknowledgments, copyright, etc.

You can use a simple search and replace to find new chapter headings...or if you have the html tags set to use the <h1>-<h6> for different headings you can split there.

If you are using Sigil - and I'm pretty sure Calibre has the same or similar functionality - you can use the following search and replace:

search: "<h1" or "<h3>Chapter" etc. (without the quotes)
replace: "<hr class="sigil_split_marker" /><h1" or "<hr class="sigil_split_marker" /><h3>Chapter" etc.

Then you can use the function "Split at Markers (F6)" in Sigil to automatically split at all those marked locations.

Conversely...you can use the "Split at Cursor (Ctrl-return)" to manually split up the document.


Hope that helps!
Turtle91 is offline   Reply With Quote
Old 07-30-2017, 06:41 PM   #5
lostinlodos
Enthusiast
lostinlodos began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
Edited, this isn't going to work: see photo.

Thanks both of you. It's not exactly what I had in mind but looks like it would work. Since there are boldened chapters which should have converted to headings. I hope.
Going to try ePub split shortly. Given I can't get Atlantis to do what I want here I'm running out of programming options before I attempt reworking via PDF.

Yes it's massive, though not the largest I've seen, but it's not intended for general distribution. Electronically it's only going to a few people. It's not up set to be consumable at the moment.
Well, at the moment it's not set up to be anything since nothing portable loads it. Lol.
I know the length is what's tossing totalreader since it can handle files many times this size, so it needs to be split.
Nothing out there like HTML join and split, for ePubs?
Attached Thumbnails
Click image for larger version

Name:	Clipboard02.png
Views:	243
Size:	452.8 KB
ID:	158174  

Last edited by lostinlodos; 07-30-2017 at 07:13 PM. Reason: Tried it out
lostinlodos is offline   Reply With Quote
Old 07-30-2017, 11:44 PM   #6
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,935
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
I've made occasional use of pdfSAM (Split And Merge) - the Lite version is free.

You cannot split HTML randomly and expect to do anything sensible with it - it needs to split on a boundary of some sort. Try using calibre's recently added calibre-debug --explode-book and --implode-book options, see ==>> calibre-debug.

That should allow you to use standard html and other text editing tools.

BR
BetterRed is offline   Reply With Quote
Old 07-31-2017, 02:24 AM   #7
fbrzvnrnd
Fanatic
fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.fbrzvnrnd ought to be getting tired of karma fortunes by now.
 
Posts: 556
Karma: 400004
Join Date: Feb 2009
Device: ONYX M96
Quote:
Originally Posted by lostinlodos View Post
Edited, this isn't going to work: see photo.
Try the Sigil way I suggested to you.
fbrzvnrnd is offline   Reply With Quote
Old 07-31-2017, 06:23 AM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 75,890
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Use calibre to edit the eBook. Given that it came from Word, chances are there is a lot of garbage in the ePub that you can remove. Also check the size of the images. If they are beyond 1500 lines, reduce them in size to 1500 lines. Calibre can do this in the editor. If there are any embedded fonts, remove them. Make this as streamlined as possible. Also, check the code using epubcheck. There is a Calibre editor plugin to allow you to use epubcheck to check the eBook. Also, check the CSS and remove any excess lines from the classes. And if you see things like <p class="someclass">, replace it with <p> and and define <p> as what <p> is going to be most of the time.

Last edited by JSWolf; 07-31-2017 at 06:26 AM.
JSWolf is online now   Reply With Quote
Old 07-31-2017, 12:21 PM   #9
lostinlodos
Enthusiast
lostinlodos began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
Thanks everyone. I've got more options now. I think I'll give sigil a try . For whatever reason I never got the hang of it, so maybe now is a good time to learn.
I've used HTML split and join since the 90s and never had a problem with it but never used it on anything fancier than standard text and graphics pages. So not really sure what would happen if I extracted the ePub and ran it through SaJ. I should probably try that too.
I'm trying not to mess with the images at all because they're maps.
Sigil here I come.
lostinlodos is offline   Reply With Quote
Old 07-31-2017, 02:30 PM   #10
salamanderjuice
Guru
salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.
 
Posts: 727
Karma: 10216666
Join Date: Jul 2017
Device: Boox Nova 2
Quote:
Originally Posted by Turtle91 View Post
160 MB is immense when it comes to an ebook. Even very large - picture intense - children's books that I've seen are around 20-25MB. I would guess that the size of your image files are the main culprit. You can/should reduce the image size to about 1600px in height and/or use a better image format to compact the size. That is really all the image size/quality you need with today's reading apps/devices. If you absolutely must have the full size images then you could host them on a network server and include links in the ebook.

As far as splitting the ePub into smaller files....yes...definitely! The smaller the file size the faster the device will load and display each new file. Most people will split the html file at a new chapter - 20 Chapters = 20 files plus files for cover, title, acknowledgments, copyright, etc.

You can use a simple search and replace to find new chapter headings...or if you have the html tags set to use the <h1>-<h6> for different headings you can split there.

If you are using Sigil - and I'm pretty sure Calibre has the same or similar functionality - you can use the following search and replace:

search: "<h1" or "<h3>Chapter" etc. (without the quotes)
replace: "<hr class="sigil_split_marker" /><h1" or "<hr class="sigil_split_marker" /><h3>Chapter" etc.

Then you can use the function "Split at Markers (F6)" in Sigil to automatically split at all those marked locations.

Conversely...you can use the "Split at Cursor (Ctrl-return)" to manually split up the document.


Hope that helps!
I don't think the total file size is the issue. I have graphic novels over 300MBs that my Kobo and it's various apps happily handle. Overly complex HTML/CSS is a more likely culprit especially at 5600+ pages and converted from a word document.

Unless the images are absolutely massive but with a 5600+ page document there could just be a few hundred 0.5MB images which would be pretty typical e-Reader sized images.
salamanderjuice is offline   Reply With Quote
Old 07-31-2017, 05:21 PM   #11
lostinlodos
Enthusiast
lostinlodos began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
Quote:
Originally Posted by salamanderjuice View Post
I don't think the total file size is the issue. I have graphic novels over 300MBs that my Kobo and it's various apps happily handle. Overly complex HTML/CSS is a more likely culprit especially at 5600+ pages and converted from a word document.

Unless the images are absolutely massive but with a 5600+ page document there could just be a few hundred 0.5MB images which would be pretty typical e-Reader sized images.
I figured the size isn't the issue for totalreader since I've opened books orders of magnitude larger. Such as some server manuals that were in the near gigabyte range. It's that total reader does a layout scan on open, and at 5600+ pages, that's what takes it forever.
Hence my attempting to spit this in the first place. I'm going to try SG as mentioned above shortly. And see if I can find a way using that.
lostinlodos is offline   Reply With Quote
Old 07-31-2017, 06:24 PM   #12
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,184
Karma: 18843349
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Sigil has a merge and split function.


Once you get it to open in Sigil:

- Use the search and replace function mentioned above to insert the <hr class="sigil_split_marker" /> markers at your desired locations
- highlight all the files in the Book Browser, right click and select merge... you will have one very large file
- Select F6 (or Edit/Split at Markers) and Sigil will re-split the document at your marked locations


Although I would certainly listen to salamandarjuice's advice and check that your css/html is not overly complex....I've seen some very heinous code-bloat created by software programs that you would think are better than that.

Eliminating complex html/css can drastically reduce the number of "pages" - but I would still look at your images. Even maps can be losslessly reduced significantly. I get very good results with cover images that are 1600 x 600-ish pixels at around 200KB or less.

Cheers,
Turtle91 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Breaking up a large epub file weberr Sigil 7 12-09-2014 04:33 PM
Massive metadata grab fcewen00 Calibre 4 12-23-2011 12:05 PM
"Breaking into" a .prc file Ea Workshop 17 04-14-2009 11:54 AM
Massive conversion in .prc idum Kindle Formats 2 11-10-2008 08:22 AM


All times are GMT -4. The time now is 06:49 AM.


MobileRead.com is a privately owned, operated and funded community.