downloading an old site from archive.org

Talk about various bits of technology (both hardware and software) here. Also used for troubleshooting and other problems.

downloading an old site from archive.org

Postby slickrcbd » Wed Dec 06, 2017 10:01 pm

I thought I had a copy of Sinom Bre/D.F. Roeder's fics archived for over a decade, but I was trying to recall if a quote was from Cryoslides or Accidental Goddess and was unable to locate the fics in my archive.
His site is long gone, but I was able to find it on archive.org since I have it in my bookmark file (that really needs to be cleaned out. It is 1.7 * OVER NINE THOUSAND bookmarks).

Downthemall used to work on archive.org, but now I'm just getting a lot of HTML junk with no content even though they are saving with .txt extensions.

Noscript also appears to be blocking the content unless I authorize each "object".

Any idea how to quickly, easily, and efficiently grab all the stories from these three links? I would just use 3 passes of downthemall in Firefox 56 (with NoScript and uBlock installed), but that doesn't appear to be working anymore. I acknowledge that on the "series" page it might take more passes, but you get the idea. This USED to work just fine, but everybody is so CSS/JavaScript & embedded object happy these days they just have to make things more complicated than they have to be.

https://web.archive.org/web/20060810192 ... fault.html
https://web.archive.org/web/20080903141 ... eries.html
https://web.archive.org/web/20080912065 ... eshot.html

here is the base link, although you might want to try the august of 2006 version instead of the 2008 capture. Since the site wasn't updated since 2003, it doesn't matter.
https://web.archive.org/web/20080914024 ... index.html
slickrcbd
Chibi Sailor Senshi
Posts: 275
 

Re: downloading an old site from archive.org

Postby PCHeintz72 » Thu Dec 07, 2017 12:02 am

Well... the main thing is you are likely running into the issue that Archive.ORG / Wayback changed how it works a few years ago... it actually is a bit more intrusive than it used to be.

Also, why use that version of the site... if it has not updated since 2003 as you state then you are missing the updates he did later. There is a newer one, also in Archive.ORG / Wayback: http://www.rakhal.com/onnaranma/index.html

Try: https://web.archive.org/web/20100519122 ... index.html

There is also his FanFiction.NET account, though that does not have everything: https://www.fanfiction.net/u/364323/

And I sympathize on number of links... as of my last check... I have 11,265 Favorites / Bookmark links.
PCHeintz72
User avatar
Prism Power Senshi
Posts: 2736
 

Re: downloading an old site from archive.org

Postby slickrcbd » Thu Dec 07, 2017 2:51 am

the 2006 version did not seem to have errors, but it still did not work with DTL.

Any suggestions on how to suck all the .txt files at once?
slickrcbd
Chibi Sailor Senshi
Posts: 275
 

Re: downloading an old site from archive.org

Postby PCHeintz72 » Thu Dec 07, 2017 4:01 am

Sorry, I've never done it that way for personal sites. Never bothered.

Because I determined long ago figuring out how to grab it all on a per site basis is far more work than merely grabbing the select ones I personally care for by way of right clicking and selecting save, then ignoring the rest. Or in the case of some Internet Archive sites, I find going into the file, and then going to file save, works for me.

For Non Internet Archive / Wayback sites, I sometimes use PageNest Free, which grabs an entire website. The problem is i do not use that much, because I find personal sites have tons of junk and hidden stuff that is worthless.
PCHeintz72
User avatar
Prism Power Senshi
Posts: 2736
 

Re: downloading an old site from archive.org

Postby Té Rowan » Thu Dec 07, 2017 7:44 am

My usual method is rather long-winded: View the site on the Archive, then mine it from the browser cache.

Using wget might work, but I have not tried it yet.
I go walking. My mind goes wandering.
Té Rowan
User avatar
Prism Power Senshi
Posts: 3014
 

Re: downloading an old site from archive.org

Postby slickrcbd » Thu Dec 07, 2017 8:38 pm

PCHeintz72 wrote:Sorry, I've never done it that way for personal sites. Never bothered.

Because I determined long ago figuring out how to grab it all on a per site basis is far more work than merely grabbing the select ones I personally care for by way of right clicking and selecting save, then ignoring the rest. Or in the case of some Internet Archive sites, I find going into the file, and then going to file save, works for me.

For Non Internet Archive / Wayback sites, I sometimes use PageNest Free, which grabs an entire website. The problem is i do not use that much, because I find personal sites have tons of junk and hidden stuff that is worthless.

I already knew how to do it, but the method no longer works.
I'd just use the "downthemall" extension, that has been available since Mozilla 1.0 or 1.1.
However it does not work with Firefox 57 (I'm using 56).
All I'd have to do is view the page with all the stories, invoke "downthemall" and presto, it would download everything on the page for me automatically, or just the links I highlighted.

Before that, I was using a PowerMac 6500/300 running Mac OS 7.6, 8.1, or 8.6 (came with 7.6, upgraded to 8, then upgraded to 8.6 over the next couple years), with dial-up until 2002. I had a program called "Monica" that was a program to queue downloads and worked similar to DTA or FlashGot's download manager. Especially useful when I was using dial-up.
Then, the procedure was a bit more complicated. I'd have to right-click and copy each link individually, Monica would monitor the clipboard and grab it.
If the chapters were numbered, such as on Tannanim's (now defunct) Ranma crossover site, I could use wild cards or specify a range and use one line to grab all chapters from that site.

However, with this embedding that archive.org is doing, I don't see such an easy solution. I know just a couple years ago I had no problem downloading an entire site.
P.S. I haven't seen the term "site sucker" in a while, is that still the correct term for what I'm talking about doing?
slickrcbd
Chibi Sailor Senshi
Posts: 275
 

Re: downloading an old site from archive.org

Postby PCHeintz72 » Thu Dec 07, 2017 9:49 pm

P.S. I haven't seen the term "site sucker" in a while, is that still the correct term for what I'm talking about doing?


I've never actually heard that term, but it fits... what it sounds like Monica was, and PageNest is, has a couple different terms that all mean the same thing...

Offline Browser
Offline Downloader
Offline Reader
Website Copier

Some do all of the above, some do only one or two of the above... Generally all will download a site for you to go to whenever you want while not-online, some will allow directly from the program to browse or read the contents of the pages without a browser or going to them manually and selecting them, acting as if you really are still on-line when you are not.

There are a number of other ones out there, though since I've not tried them with Archive.ORG or Wayback, I do not know if they would work with that site specifically. Some of the free ones limit you in various ways... either site, or levels. PageNest Free by Solent Software is/was like that, it is a free version of a Commerical Program, and they wanted you to buy the full version. It is defunct now, but I still use it.

HTTrack is one I've heard multiple times over the years as being good, and it has been around awhile. Though Ive never used it myself. https://www.httrack.com/
PCHeintz72
User avatar
Prism Power Senshi
Posts: 2736
 

Re: downloading an old site from archive.org

Postby slickrcbd » Fri Dec 08, 2017 7:47 pm

Monica wasn't technically a site sucker, it was just a downloading app that could queue up a bunch of downloads overnight.
Back in the '90s it was trending to split archives over a few MB into smaller files as resuming downloading was not universally supported (still isn't), and dial-up was the most common means of accessing the internet.
Usually they would split it into chunks that fit on floppies for easy backing up.

Monica would let me download that 30mb upgrade from Mac OS 8.5 to Mac OS 8.6 overnight even though it was in a bunch of small parts (also available as a 30mb file). Yes, 30mb seems small today for "overnight", but not on a 33.6K modem.

Incidentally, Bill Clinton was still President when I got Monica, and associating it with a "site sucker" brought something lewd to mind because of a different Monica that was in the news back then.
slickrcbd
Chibi Sailor Senshi
Posts: 275
 

Re: downloading an old site from archive.org

Postby Makoto » Sat Dec 09, 2017 5:08 pm

I know D.F. hasn't updated his fanfiction.net page since at least December, 2007... but have you tried contacting him there? He might be able to get you full copies of the stories, as well.
Still alive, but failing miserably at dodging Real Life.
My webpage has returned! http://www.fdnest.com/~makoto/
My Fanfiction.net profile: https://www.fanfiction.net/u/1473284/
Makoto
User avatar
Asteroid Senshi
Posts: 862
 

Re: downloading an old site from archive.org

Postby Ellen Kuhfeld » Sat Dec 09, 2017 7:23 pm

Just tried DownloadThemAll! on my own web site. I am not impressed. It (kinda) got my greeting page. I could have done that with the "save" command. Is there anything that'll download an entire site from the greeting page? I'll admit I just tried it, without studying it first; but when I saw what happened, I went off to their site and found the instructions. It was the same thing you usually get from tech-heads: they pretty well assumed I knew how to do it, but wanted to clear up a few details.
Visit Big Washuu's Lab of Arcane Knowledge at http://washuu.net
Ellen Kuhfeld
User avatar
Sailor Starlight
Posts: 2228
 

Re: downloading an old site from archive.org

Postby Té Rowan » Sat Dec 09, 2017 8:53 pm

I have used wget to grab (parts of) sites in the past. Still occasionally do.

Wget, btw, is a command-line utility. Whether there exists a point-and-click front-end for it, I do not know.
I go walking. My mind goes wandering.
Té Rowan
User avatar
Prism Power Senshi
Posts: 3014
 

Re: downloading an old site from archive.org

Postby PCHeintz72 » Sun Dec 10, 2017 12:44 am

Just tried DownloadThemAll! on my own web site. I am not impressed. It (kinda) got my greeting page. I could have done that with the "save" command. Is there anything that'll download an entire site from the greeting page? I'll admit I just tried it, without studying it first; but when I saw what happened, I went off to their site and found the instructions. It was the same thing you usually get from tech-heads: they pretty well assumed I knew how to do it, but wanted to clear up a few details.

Are you talking Washuu.NET?

I just used PageNest on it, and works fine... took under 2 minutes at default settings just by entering the site main page link... 24.3mb at some 317 files... I can navigate in its off-line browser window to I think just about everything... or use an explorer window and poke around the whole sites file structure.

But as I said PageNest is no longer officially available.
PCHeintz72
User avatar
Prism Power Senshi
Posts: 2736
 

Re: downloading an old site from archive.org

Postby slickrcbd » Sun Dec 10, 2017 3:06 am

First of all, Downthemall is not really a site sucker, it just grabs all the links on the current page (or filtered by what you specify) and downloads them all to disk.
A site sucker will crawl the website to download the entire site.
They fell out of favor when dynamically generated pages came into favor along with a lot of scripting stuff.

I haven't experimented with a site sucker since I was using a PowerMac 6500/300 running Mac OS 8.6 as my primary computer and it was not considered hopelessly obsolete. I can't recall what the program was called and that computer died in 2009 at age 12.

Anyways, we're drifting. I don't really care about anything but archive.org in this thread and getting the stupid text files from it.
slickrcbd
Chibi Sailor Senshi
Posts: 275
 

Re: downloading an old site from archive.org

Postby Ellen Kuhfeld » Sun Dec 10, 2017 10:40 am

PCHeintz72 wrote:Are you talking Washuu.NET?

I just used PageNest on it, and works fine... took under 2 minutes at default settings just by entering the site main page link... 24.3mb at some 317 files... I can navigate in its off-line browser window to I think just about everything... or use an explorer window and poke around the whole sites file structure.

But as I said PageNest is no longer officially available.

Yes, but I know somebody who has it. Is there some way I could get a copy from you?

(Washuu.net does nothing fancy. Active content and style sheets are not my thing. It's likely many ancient fan-sites are the same.)
Visit Big Washuu's Lab of Arcane Knowledge at http://washuu.net
Ellen Kuhfeld
User avatar
Sailor Starlight
Posts: 2228
 

Re: downloading an old site from archive.org

Postby Ellen Kuhfeld » Sun Dec 10, 2017 4:33 pm

I found Pagenest for download on Softpedia.
Visit Big Washuu's Lab of Arcane Knowledge at http://washuu.net
Ellen Kuhfeld
User avatar
Sailor Starlight
Posts: 2228
 


Return to Tech

Who is online

Users browsing this forum: No registered users

cron