10-04-2011, 01:32 PM | #1 |
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
Importing and converting html fles
I checked the forum to see if there was a similar question to mine, but couldn't find one. I understand why calibre converts a file saved as web page, complete to a zip file since it contains photos and css and whatnot, but why does it to the same thing to a pure html file without any css or images attached? Just a simple html file containing text? The files in question are fanfiction files saved through a fanfiction downloader.
I just tried importing a few files just to check whether I was mistaking simple html files for the ones with images attached, but I'm not. I also have a problem with the author of the story being confused with the author of the dowloader program when imported into calibre. Looking at the markup of the file, the downloader inserts <meta name='author' content='name_of_programmer' /> into the file which is read as author name by calibre. I could correct this for a few files, but I have thousands of files I plan to import and convert. Is there a way to automate this? Thank you in advance. |
10-04-2011, 02:16 PM | #2 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
Advert | |
|
10-04-2011, 03:52 PM | #3 | ||
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
Quote:
Quote:
|
||
10-04-2011, 04:46 PM | #4 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Disable the HTML to Zip filetype plugin.
Quote:
|
|
10-04-2011, 05:07 PM | #5 | ||
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
Quote:
Quote:
I know I'll have to change the metadata to the correct author. I presume there's a way to do it for several files at once for those fics that share an author; it will be more work for those of individual authors. I definitely have some reading up to do. ETA OK, so I just read a few threads on regexes; am I right to assume they're not used to replace text within a file? There was a thread where Kovid recommended using Sigil for this. Or am I misunderstanding regexes? Last edited by citac; 10-04-2011 at 05:51 PM. |
||
Advert | |
|
10-05-2011, 08:49 AM | #6 | |||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
Quote:
Last edited by Starson17; 10-05-2011 at 08:55 AM. |
|||||
10-05-2011, 02:13 PM | #7 |
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
It really gets on my nerves that it imports even plain html as zip files, even though it makes sense to me that it should do so with web pages. I haven't really been processing anything yet: I've imported several fics, each one a different format depending on whether it was saved from an ancient archive active in the early years of fandom (.txt), whether it was done for a Big Bang or fest exchange on LJ or DW (accompanying artwork = web page), or saved with a downloader from current fic archives (plain html that contains bold, italics and no images). I don't keep several different formats of the same file as I gather many users do. I want my plain html to stay plain html, I would like to convert my txt files to epub, and probably web pages too. I don't understand the need for it.
So, I've been trying out the program again, trying to understand how it works because I wanted to see if I can bulk edit some things in my fics, like this problem with incorrect authors, and I've been reading the tutorials on regexes ad search&replace, and I still don't understand how thy work. I tried following them, but what I'm doing doesn't seem to have any results. If I choose Edit metadata individually, there's no search&replace; if I choose bulk editing, there's a search&replace tab, and I have several options which again don't work for me. I enter the <meta name='author' content='Grzegorz Hordynski' /> string, and replace it with the proper author, it shows up in the test text, but hitting Apply does absolutely nothing. How do I achieve what I'm after? I have authors who have written over 40 works and I cannot edit them all. What am I doing wrong? Last edited by citac; 10-05-2011 at 02:23 PM. Reason: typo |
10-05-2011, 03:08 PM | #8 | ||||||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Also, be aware that the zip processing of even a single file html ebook does more than just compress the single file. Look inside the zip file produced by the plugin to see the other files created and stored in the zip file. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Last edited by Starson17; 10-05-2011 at 03:10 PM. |
||||||||
10-05-2011, 03:52 PM | #9 | ||
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
I keep forgetting calibre makes a copy of everything it imports and puts it into its own folder; I kept looking at the original location and I could not see what you were talking about because of course it wasn't there.
I just saw what you were talking about: there's a META-INF folder, a mimetype file and content.opf, and there's also a metadata.opf in the ebook folder together with the zip. Everyone kept talking about how you should never look into calibre created folders. Since these folders are what drove me bonkers whenever I used calibre before, I was deliberately staying away from them. Quote:
Quote:
Once the fics get imported into calibre, it reads the <meta name='author' content='Grzegorz Hordynski' /> information and puts that as the author's name. How do I go about automatically changing <meta name='author' content='Grzegorz Hordynski' /> to <meta name='author' content='Author Name' /> ? Am I making myself clear? You say there's a way to do that on the Conversion screen, and that it's possible even though it's not meant for it, but I don't understand how. What expression do I need to enter to make this change because obviously pasting the above line does not work? My ultimate goal is to eventually convert all my saved fics into epub, after which I plan to delete the imported books from the calibre library. I do not intend to use it for ebook management, just for converting. (I just bet it's going to turn out to be something really simple and I'll be kicking myself for missing something painfully obvious!) |
||
10-06-2011, 09:42 AM | #10 | ||||||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Quote:
Quote:
<meta name='author' content='Grzegorz Hordynski' /> then a simple Search for "Grzegorz Hordynski" and Replace with "Author Name" should do it to create: <meta name='author' content='Author Name' /> Quote:
|
||||||
10-06-2011, 11:08 AM | #11 |
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
OK, I am doing this as we speak.
I have imported six files from an author saved in Fandom > Fic > Individual authors > Author folder. The author tag inside the file is as above. The title tag is <title> Title by Author </title>. The files themselves are named 01 Title.htm 02 Title Coda Subtitle one.htm 03 Title Coda Subtitle two.htm 04 Title Coda Subtitle three.htm 05 Title Coda Subtitle four.htm 06 Title Coda Subtitle five.htm These files have no author name in the title of the file because they are all saved in their author folder. I have another group of fics which go into Fandom > Fic > Various authors, and contain single stories out of several an author has written (so one story from author a, one from author b etc etc). I haven't imported any of those stories, but they are named Author name - Title.html. After import, calibre reads the author and title tags and populates the metadata with what it finds there. It also doesn't read the order in which the fics should be read, and alphabetizes the fics, which gives me Title.htm Title Coda Subtitle three.htm Title Coda Subtitle five.htm Title Coda Subtitle four.htm Title Coda Subtitle two.htm Title Coda Subtitle one.htm I will obviously have to do something about the files themselves (though it will be a massive undertaking - I have thousands of fics, the earliest dating from 2004) before any further importing into calibre, because what I'm trying to do now simply isn't working. OK, so the files are imported now. I click Convert books, go to search and replace, enter Grzegorz Hordynski into the First expression window, the Search Regular Expression line, and enter the correct author name into Replacement text lin. I hit OK. calibre converts the files into epub. Author is still Grzegorz Hordynski. A simple Search for "Grzegorz Hordynski" and Replace with "Author Name" is not working. What am I doing wrong??? This is what I have in the html coding of the file (details changed but you should get the picture): <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name='author' content='Grzegorz Hordynski' /> <title> Title by author </title> </head> <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080"> <div> Title by author </div> <div> Summary of the story. Disclamer. Author notes/thankyous. </div> <div> Story word count: 33355 </div> <div> <a href="http://www.site.com/">Original link to the story</a> </div> <br /> <br /> <div> <hr /> <div> Content of the story. </div> </body> </html> So, first: what is the file naming scheme I will have to apply to the files in order for calibre to correctly interpret individual authors who may have up to 60 fics saved in their respective folders (Fandom > Fic > Individual authors > Author > saved fics.html), and authors for whom I have saved only a single story out of several they have written (Fandom > Fic > Various authors > saved fics.html)? Should I apply Author name - Title.html to all the fics I have saved? Will it correctly recognize files named Author name - Title 01.html, Author name - Title 02.html files? What about those with a more complicated naming scheme, like those saved in Fandom > Fic > Individual authors > Author folder > Series Book 1 - Title - Part 1 - Subtitle.txt Book 1 - Title - Part 2 - Subtitle.txt etc Book 2 - Title.txt Book 3 - Title - Part 1 - Subtitle.txt Book 3 - Title - Part 2 - Subtitle.txt etc Book 4 - Title - Part 1 - Subtitle.txt Book 4 - Title - Part 2 - Subtitle.txt etc Will I need to add author name to all these files, even those in author folders? I also need to change the incorrect tags inside the fics themselves. Is there a program that will do this for me? Yes, this will not affect calibre, but like I said I don't intend to manage anything with it, just convert and transfer the result to my reader. Frankly, at this point I don't even care about any additional features, I just want the conversion to work, and I want to find a way to change the content of the converted file, the way others are changing headers and footers left in their files after converting them from lit or whatever. I want the metadata to be saved inside the file, so that a reader I uploaded the fics to can read the metadata and present it correctly on the screen. I'm going to try connecting calibre to my device and transferring the converted files to see how CoolReaer will read the files. |
10-06-2011, 11:17 AM | #12 |
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
OH MY GOD, it just deleted everything on my SD card!!!
I had three fandoms saved there in my fiction folder, and notes from my postgraduate lectures, word documents and powerpoint presentations - you've got to be kidding me!! I clicked Save to disk, Save only epub format to disk, it recognized Pocketbook's internal card, but when I clicked on Removable disk (G it asked if I wanted to format the card. Of course I don't, I have important documents there. So I click Cancel, and I get G:\ is not accessible. Data error (cyclic redindancy check). What the hell is that??? :headdesk: All those files gone. I will have to re-download them from our university's server. I really did not need this. And it's not just those files; I had all my apps backed up to that card. I had information for my contacts stored there. I had appointments saved in my diary app. How the hell am I going to rebuild those? |
10-06-2011, 12:40 PM | #13 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
Cyclic redundancy has nothing to do with calibre AFAIK. It is an IOS error.
Usually a disk error caused by bad spot on disk. Electrical shock or scratches on media or removing media before file writing is complete. Try removing card, reboot computer and try the card again. BTW it is your OS wanting to reformat the disk because it could not read the system files to write a file. Nothing was deleted the card was not readable so calibre could not write anything or delete anything. Helen |
10-06-2011, 01:03 PM | #14 |
Fanatic
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
I had to leave it all for a while because I thought there was something obvious I was missing, hence not being able to make the necessary changes.
I just removed the card from my device, inserted it into the card reader on my computer, and you're right, I didn't lose the files. I am sorry for losing my temper though. But what am I supposed to do when I want to send files to my device? Or should I just forget about it and use the SD card reader on my computer and transfer the files manually? I thought calibre was supposed to be able to do that. Why am I having so much trouble with it? |
10-06-2011, 02:19 PM | #15 |
Wizard
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|
There is probably something wrong with the card reader or the card.
Maybe just dirty contacts. Best thing to do back up your data from the card just in case. Perhaps when you are 100% sure you have a good backup you could reformat the card. Or get another card and start fresh. There are lots of ways to send files, but try sending them again (after backup ). Less confusing IMO. For instance you could send them to the card when it is in the computer card reader. To beat the horse to death, the errors you were getting are basic i/o errors. Calibre uses the computer operating system i/o functions to transfer data. You were probably doing everything correctly but had bad luck (data read error). I am sure you will get it all done very soon. Helen |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Importing and converting .azw to .mobi | Karin Spaink | Amazon Kindle | 5 | 08-21-2012 11:39 PM |
Importing / Converting DOCX files | fan of kovid | Conversion | 13 | 03-12-2011 07:47 PM |
HTML importing problem | PaladinBL | Sigil | 13 | 03-16-2010 05:03 PM |
Importing HTML Files | Shadowlane | Calibre | 1 | 12-19-2009 03:04 PM |
Looking for Advice with an HTML Importing Problem | deanstow | Calibre | 2 | 10-03-2009 05:14 PM |