Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

i keep getting the urge to scrape this board

Name: Anonymous 2013-09-04 6:25

for too long, when I've read a good thread, my first instinct is, oh I better scrape it before it's deleted. I have access to it now, but I may not in the future. What would I do then? Forget? I am uneasy. How does a date rape victim learn to love again? To trust again?

Name: Admin 2013-09-04 6:28

If you'd like, I can just expose the res/ folder in this software... or, rather, I can just archive the board weekly and give you an .xz of it. It's literally just a folder with a bunch of [threadnumber].html in it. Perl quality.

Tell me if you want this, as to me it is preferable than having 400 people scrape the site with wget every other day.

Name: Anonymous 2013-09-04 6:28

let us make furious sex

Name: Anonymous 2013-09-04 6:34

>>2
That would be more efficient wouldn't it.

>>3
ok.

Name: Anonymous 2013-09-04 7:14

>>5
No, but look at the software itself. Every thread is parsed with regex in Perl before being shown. It's slow, and not very efficient. It also locks files when writing to them, etc. Furthermore, while computational power on EC2 is cheap, bandwidth isn't.

So I'd rather just offer periodic board archives.

I'm moving this off of EC2 soon, either way.

Name: Anonymous 2013-09-04 7:35

>>7
This is an EC2 cluster I use for my own stuff. It isn't costing me any extra money. As long as people don't start pulling 1TB per day. For short bursts this will scale up to a few dozen gigabits per second. I just had it available and it was less of a nightmare to configure than my OpenBSD VPS, but that's probably overkill. I'd have to figure out how to statically compile perl and put it in the jail, etc.

I'll get an unmetered dedicated server soon, but probably over the weekend this or next week, since I don't have time to set it up right now.

I can easily implement delta updating if you want. That's what your technique is called, transfer only the additions / changes. I don't think it's going to make much difference since an .xz of this entire board right now is 78KB. I'm more concerned about nginx and the perl script getting raped with hundreds of requests a second.

I spend almost all my time programming, either way.

Name: Anonymous 2016-05-09 4:22

Check em

Name: Cudder !cXCudderUE 2016-05-09 8:38

Could the Fossil repo be put to some use for this purpose too?

Name: Anonymous 2016-05-09 8:40

>>10
Fuck off

Name: 4get 2016-05-09 11:30

What is wget?

Name: Anonymous 2016-05-09 12:05

>>10
Like uploading the site source code?

Name: archduke 2021-09-21 19:56

eels has an API for this site; he just likes to hide it.

Don't change these.
Name: Email:
Entire Thread Thread List