07-30-2017, 05:06 PM | #1 |
Enthusiast
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
|
Breaking down a massive file
Hi. I'm not sure where to post this so feel free to move me!
I created a document in Microsoft word that I then converted to ePub in Calibre. It's 56XX pages long and image heavy. I now have two problems. First I accidentally deleted (shredded) the folder I was using for various page things including my source docx. Second my file is too big for most readers, at 160MB. iBooks crashes and Totalreader stalls for a long time (almost half an hour) on each load. It displays fine on most of my PC apps so it's definitely a size or length problem. I've tried converting the ePub back to docx and Microsoft Word can't load it, invalid file. Fact, Atlantis is the only WP that can load Calibre's docx. Resaving from AT doesn't help, still can't load in word. SO: question is, is there an easy, direct, automated way to split my ePub into, say, 500 page chunks? (Not worried about where it splits at the moment I can fix sentence cutting and stuff manually). Like drag n drop type 500 click and walk away? Or is there another text based format I could use to do this. PDF works and gives me a "valid" output but the formatting is messed up and I'm simply attempting to not have to rearrange and reformat more than 5000 pages. Please, any help would be wonderful! |
07-30-2017, 05:17 PM | #2 | |
Fanatic
Posts: 556
Karma: 400004
Join Date: Feb 2009
Device: ONYX M96
|
Quote:
You can use Sigil and the string Sigil uses to break chapters. For example, you can search for title tags (h1 or h2, you have to see the code) and add the string Sigil uses to break chapters. After you added the string, you can ask Sigil to split file in that places. |
|
07-30-2017, 06:11 PM | #4 |
A Hairy Wizard
Posts: 3,184
Karma: 18843349
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
160 MB is immense when it comes to an ebook. Even very large - picture intense - children's books that I've seen are around 20-25MB. I would guess that the size of your image files are the main culprit. You can/should reduce the image size to about 1600px in height and/or use a better image format to compact the size. That is really all the image size/quality you need with today's reading apps/devices. If you absolutely must have the full size images then you could host them on a network server and include links in the ebook.
As far as splitting the ePub into smaller files....yes...definitely! The smaller the file size the faster the device will load and display each new file. Most people will split the html file at a new chapter - 20 Chapters = 20 files plus files for cover, title, acknowledgments, copyright, etc. You can use a simple search and replace to find new chapter headings...or if you have the html tags set to use the <h1>-<h6> for different headings you can split there. If you are using Sigil - and I'm pretty sure Calibre has the same or similar functionality - you can use the following search and replace: search: "<h1" or "<h3>Chapter" etc. (without the quotes) replace: "<hr class="sigil_split_marker" /><h1" or "<hr class="sigil_split_marker" /><h3>Chapter" etc. Then you can use the function "Split at Markers (F6)" in Sigil to automatically split at all those marked locations. Conversely...you can use the "Split at Cursor (Ctrl-return)" to manually split up the document. Hope that helps! |
07-30-2017, 06:41 PM | #5 |
Enthusiast
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
|
Edited, this isn't going to work: see photo.
Thanks both of you. It's not exactly what I had in mind but looks like it would work. Since there are boldened chapters which should have converted to headings. I hope. Going to try ePub split shortly. Given I can't get Atlantis to do what I want here I'm running out of programming options before I attempt reworking via PDF. Yes it's massive, though not the largest I've seen, but it's not intended for general distribution. Electronically it's only going to a few people. It's not up set to be consumable at the moment. Well, at the moment it's not set up to be anything since nothing portable loads it. Lol. I know the length is what's tossing totalreader since it can handle files many times this size, so it needs to be split. Nothing out there like HTML join and split, for ePubs? Last edited by lostinlodos; 07-30-2017 at 07:13 PM. Reason: Tried it out |
07-30-2017, 11:44 PM | #6 |
null operator (he/him)
Posts: 20,935
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
I've made occasional use of pdfSAM (Split And Merge) - the Lite version is free.
You cannot split HTML randomly and expect to do anything sensible with it - it needs to split on a boundary of some sort. Try using calibre's recently added calibre-debug --explode-book and --implode-book options, see ==>> calibre-debug. That should allow you to use standard html and other text editing tools. BR |
07-31-2017, 02:24 AM | #7 |
Fanatic
Posts: 556
Karma: 400004
Join Date: Feb 2009
Device: ONYX M96
|
|
07-31-2017, 06:23 AM | #8 |
Resident Curmudgeon
Posts: 75,890
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Use calibre to edit the eBook. Given that it came from Word, chances are there is a lot of garbage in the ePub that you can remove. Also check the size of the images. If they are beyond 1500 lines, reduce them in size to 1500 lines. Calibre can do this in the editor. If there are any embedded fonts, remove them. Make this as streamlined as possible. Also, check the code using epubcheck. There is a Calibre editor plugin to allow you to use epubcheck to check the eBook. Also, check the CSS and remove any excess lines from the classes. And if you see things like <p class="someclass">, replace it with <p> and and define <p> as what <p> is going to be most of the time.
Last edited by JSWolf; 07-31-2017 at 06:26 AM. |
07-31-2017, 12:21 PM | #9 |
Enthusiast
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
|
Thanks everyone. I've got more options now. I think I'll give sigil a try . For whatever reason I never got the hang of it, so maybe now is a good time to learn.
I've used HTML split and join since the 90s and never had a problem with it but never used it on anything fancier than standard text and graphics pages. So not really sure what would happen if I extracted the ePub and ran it through SaJ. I should probably try that too. I'm trying not to mess with the images at all because they're maps. Sigil here I come. |
07-31-2017, 02:30 PM | #10 | |
Guru
Posts: 727
Karma: 10216666
Join Date: Jul 2017
Device: Boox Nova 2
|
Quote:
Unless the images are absolutely massive but with a 5600+ page document there could just be a few hundred 0.5MB images which would be pretty typical e-Reader sized images. |
|
07-31-2017, 05:21 PM | #11 | |
Enthusiast
Posts: 34
Karma: 10
Join Date: Dec 2012
Device: multi
|
Quote:
Hence my attempting to spit this in the first place. I'm going to try SG as mentioned above shortly. And see if I can find a way using that. |
|
07-31-2017, 06:24 PM | #12 |
A Hairy Wizard
Posts: 3,184
Karma: 18843349
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Sigil has a merge and split function.
Once you get it to open in Sigil: - Use the search and replace function mentioned above to insert the <hr class="sigil_split_marker" /> markers at your desired locations - highlight all the files in the Book Browser, right click and select merge... you will have one very large file - Select F6 (or Edit/Split at Markers) and Sigil will re-split the document at your marked locations Although I would certainly listen to salamandarjuice's advice and check that your css/html is not overly complex....I've seen some very heinous code-bloat created by software programs that you would think are better than that. Eliminating complex html/css can drastically reduce the number of "pages" - but I would still look at your images. Even maps can be losslessly reduced significantly. I get very good results with cover images that are 1600 x 600-ish pixels at around 200KB or less. Cheers, |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Breaking up a large epub file | weberr | Sigil | 7 | 12-09-2014 04:33 PM |
Massive metadata grab | fcewen00 | Calibre | 4 | 12-23-2011 12:05 PM |
"Breaking into" a .prc file | Ea | Workshop | 17 | 04-14-2009 11:54 AM |
Massive conversion in .prc | idum | Kindle Formats | 2 | 11-10-2008 08:22 AM |