Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

A project

Name: Anonymous 2014-04-19 21:23

Get the URLs of all 4chan.org textboards (dis.) and scrap and archive them. Use archive.org

Name: Anonymous 2014-05-22 2:01

I'm going to find ways to compress the /prog/ db. I'll start by compacting the tags. Substituting spoiler should give good results. After that, a representation for repeated posts will help compress the spam. If I can get it below 500MB then heliohost can host it, which is the only cool free webhost. The deadline is eventually.

Name: Anonymous 2014-05-22 22:06

>>41
The archives aren't that big, and archive.org is fine. Why do you want to compress them?

Name: Anonymous 2014-05-23 11:14

>>42
I want to host a readable writable old world4ch, but am too cheap to pay for hosting that provides more than 500MB of storage.

Name: Anonymous 2014-05-23 14:40

>>43
maybe admin-kike will give you the hosting

Name: Anonymous 2014-05-23 14:50

Stop crying already and move on for the ima/g/eboards.

Name: Anonymous 2014-05-23 14:51

>>45
you're mom

Name: Anonymous 2014-06-24 0:53

replacing all the spoiler tags on old /prog/ with <span class="spoiler">...</span> saves 817 MB. That's more than half the size of the uncompressed db.

Name: Anonymous 2014-06-24 1:33

>>48,50
Who dost thou quoth?

Name: Anonymous 2014-06-25 1:09

>>51
He's quoting me!

Name: Anonymous 2014-06-27 18:00

compressing the markup reduced the 1.5 GB prog.db to around 390 MB. I could host the old prog on heliohost now, but I want to fit all of world4ch. Any recommendations for using data compression in a database is welcome. Right now I'm thinking of serializing each thread into a flat file and then gzipping them.

Name: Anonymous 2014-07-02 8:18

serializing all threads to flat text and DEFLATEing them by thread gave good results. All of world4ch fits in 200~ MB like this and can be randomly accessed efficiently enough. With an uncompressed caching layer for frequently accessed threads the overhead shouldn't be too bad.

Name: Anonymous 2014-07-02 10:04

>>47
here is an idea, make your own format for prog that uses a weird for of bbcode where every tag is 1 leter and when you close the tag you write [/]

Name: Anonymous 2014-07-02 13:29

>>55
If every tag is one letter, why bother keeping the square bracket syntax? The only reason the tag names need delimiters is so the parser (and the user) can instantly tell where they end. So you might as well switch to \b \i \o \u or something.

Name: Anonymous 2014-07-02 14:26

>>56
That would work even better

Name: Anonymous 2014-07-02 20:06

>>55-57
I tried substituting tags with shorter representations in >>53. <sub> became something like <s and </sub> could have become <S. I didn't want to do an encoding that dependended on balanced tags because of the malformed html. The scheme gracefully handled malformed tags. Decoding was easy. The parser seeked to the next < and used the look ahead to determine the substitution. There were savings but they didn't compare to >>54. I left the spoiler tags in their original form and the spoiler spam is so low in entropy it isn't a problem.

Name: Anonymous 2014-07-04 15:44

tag my anus

Name: Anonymous 2014-07-04 17:10

>>59
<tag>(_._)</tag>

Name: Anonymous 2014-07-05 23:06

>>60
Thank you!

Name: Anonymous 2014-07-09 2:28

There is now something on heliohost.

http://w5ch.heliohost.org

A clever one may be able to view the script and download the database. Not yet implemented are:

* Posting
* BBCode parser
* Post truncation
* Post selection expressions
* Caching

please don't ddos it ;_; It is vulnerable to expensive queries because of the data compression.

Name: >>62 2014-07-09 2:30

Oh and the javascript is removed until I can go through it and make sure it isn't doing anything that harms you.

Name: Anonymous 2014-07-09 2:51

>>62
Shitchan
Yayy someone uses the real name!

Name: Anonymous 2014-07-09 10:05

>>62
I'm surprised someone finally picked up this project. Thank you so much!

Name: Anonymous 2014-07-10 19:00

When submitting a new thread in shiichan, the thread id is part of the post request. So the thread id is the timestamp of when you loaded the page to submit the thread, not when the thread is submitted. And if you generate the post body yourself, you can put in whatever thread id you want. This explains the threads, -2147483648, 1, 3, 4, 1337, 7357, and 2147483648.

Name: Anonymous 2014-07-13 6:20

Posting works now. The post creation page is still a debug page, so after posting, just hit back and refresh. BBCode doesn't work yet. The heliohost server w5ch is on has been down all day, so I've created a hidden service for the site.

https://buvmp4vgrqm2parx.onion/
Finger print: 7C:B4:8E:D8:A5:B0:C8:6B:8E:AC:02:1C:1D:6F:1E:BC:84:94:76:B3

I may experiment with syncing content between multiple unreliable web hosts.

Name: Anonymous 2014-07-13 6:31

The board software can be downloaded here:
https://buvmp4vgrqm2parx.onion/board.py

and the compressed database is here:
https://buvmp4vgrqm2parx.onion/db/w5ch.db

The database has threads in compressed form. See the source in board.py to access it. It's a 270 MB file.

Name: Anonymous 2014-07-13 6:47

>>68
board software
*.py

No, thank you.

Name: Anonymous 2014-07-13 6:50

>>69
I'm not proud of it. But it was the language I was using when I was trying to get the database below 500 MB for heliohost so I stayed with it.

Name: Anonymous 2014-07-13 13:46

>>69
What else do you suggest? Heliohost doesn't support Lisp.

Name: Anonymous 2014-07-13 13:55

>>71
Perl, if it supports cgi + compiled applications use C
And you can make in any language a lisp interpreter anyway

Name: Anonymous 2014-07-13 14:00

>>71
Yesod.

Name: Anonymous 2014-07-13 17:16

Someone post on the hidden service already. Fuck.

Name: Anonymous 2014-07-14 0:20

>>74
Just did, and I have a bug report: Optimizable quotes show up as raw text for new posts.

Name: Anonymous 2014-07-14 1:32

>>75
Optimisable

Name: Anonymous 2014-07-14 3:14

>>75-76
Yeah, everything is raw text still. There's no quotes, hyperlinks, or bbcode yet.

Name: Anonymous 2014-07-14 4:37

> This [o ]
> is how [o ]
> you multiline quote, right?[/o ][/o ]

Name: Anonymous 2014-07-14 4:42

The >'s become <span class="quote">, the newlines after the quote become </span>, the [o ] becomes <span class="o">, and [/o ] becomes </span>. The result is

<span class="quote">This <span class="o"></span>
<span class="quote">is how <span class="o"></span>
<span class="quote">you multiline quote, right?</span></span></span>


I think you could also multiline quote with spoilier tags.

Name: Anonymous 2014-07-14 14:02

>>79

Yep [spoiler] worked, there was another span tag as well, [aa] I think.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List