/prog/ - Why browsers are bloated

Name: Anonymous 2014-07-27 0:20

https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/Scrollbar.cpp
https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/win/ScrollbarThemeWin.cpp
Let's reinvent the fucking scrollbar, which every goddamn platform with a UI already has, and make it behave subtly different from the native one!

Right-click a native scrollbar in some other app:
- Scroll Here
- Top
- Bottom
- Page Up
- Page Down
- Scroll Up
- Scroll Down

Right-click a scrollbar in Chrome:
- Back
- Forward
- Reload
- Save As...
...

Right-click a scrollbar in Firefox and Opera:
Absolutely fucking nothing happens!

What the fuck!? How did these terminally retarded idiots get involved in creating one of the most important pieces of software to the average user?

Name: Anonymous 2014-12-29 16:11

>>518
Keep going. Recommend an alternative. You know, an alternative that has never had a vulnerability in the history of its development.

Name: Anonymous 2014-12-29 16:29

>>518
WebKit (Chrome mostly, Opera and a lot of third party browsers) is about 50% C/C++ and ~30% JS

Gecko (Firefox) is ~65% C/C++ and ~25% JS

Now take a guess which parts of these engines are written in C/C++

Name: Anonymous 2014-12-29 18:00

>>522
There is no such thing named C/C++. I think you mean C++.

Name: Anonymous 2014-12-29 22:27

>>520
Is this a serious question? Are you retarded?

Name: Anonymous 2014-12-29 22:36

>>524
Yes, it is a serious question.

Name: Anonymous 2014-12-30 11:18

>>512
Just because there's no language that has never had a vulnerability in programs written in it, doesn't mean it's easier to write insecure code in some languages (C, PHP, Ruby, ...) than it is in others (Scheme, Java, Go).

In general, managed memory is more secure than unmanaged memory (otherwise why not just ring0 everything? So Fast!!). Of course it doesn't matter in the hands of a True Master but you and I both know we're not True Masters. See: every buffer overflow since the beginning of time.

In general, strongly typed languages are more secure than weakly typed languages. That's a string you say? Sure, but it might also be an array. See: http://blog.sucuri.net/2014/10/drupal-sql-injection-attempts-in-the-wild.html

In general, more magic means more ways things can go wrong. The more a language tried to do behind your back, the easier it is for things to slip by unnoticed. See: http://www.sitepoint.com/anatomy-of-an-exploit-an-in-depth-look-at-the-rails-yaml-vulnerability/

So really what we're looking for is a boring, strongly-typed language with managed memory. Vulnerabilities will still happen, but with less regularity.

Name: Anonymous 2014-12-30 11:32

>>526

managed

No thanks. It's 2015. There's no excuse for using a type system that can't express resource relationships, particularly memory safety.

Name: Anonymous 2014-12-30 12:45

>>526
Use C with static checking tools and managed memory.
Also try this https://staff.aist.go.jp/y.oiwa/FailSafeC/index-en.html

Name: Anonymous 2014-12-30 13:06

What is wrong with NetSurf? Just contribute to it.

Name: Anonymous 2014-12-30 13:54

>>529
Cudder has posted in this very thread what's wrong with netsurf.

Name: Anonymous 2014-12-30 14:07

>>530
The only time Cudder talked about netsurf was on >>110 but it was indirectly (servo takes netsurf as basis sometime).

I can't see why not use NetSurf.

Name: Anonymous 2014-12-30 14:11

>>530
Cudder is a troll who never delivered anything (and won't deliver his "browser" either).

Name: Anonymous 2014-12-30 18:31

>>526
SQL injection is a terrible example to bring up. If can be fixed with a library that escapes all strings. Don't allow the programmer to provide their own strings. Problem solved at the expense of efficiency and convenience.

otherwise why not just ring0 everything?

Some programs are malicious. If all programs shared memory, an unprivileged program could intentionally interfere with the memory of programs running under more privilege. Not every memory error is a bug.

Of course it doesn't matter in the hands of a True Master but you and I both know we're not True Masters.

Maybe not. But managed memory is not the only solution, and is not an adequate solution. There are still vulnerabilities in the support libraries. See java. See flash. Which is safer, running a 5K line C program or a 4K line java program that uses 100K lines of C/C++ in the standard libraries? Keep in mind these libraries are standard and studied by every hacker that targets java.

more magic means more ways things can go wrong. The more a language tried to do behind your back, the easier it is for things to slip by unnoticed.

There is the other side of the spectrum, where a language requires you to write so much the actual logic of what is happening is obscured.

Name: Anonymous 2014-12-30 19:36

>>533

Some programs are malicious

Use only trustworthy, open source programs then.

Name: Anonymous 2014-12-31 13:31

>>534
In a multiuser operating system, one user can attempt to attack other users or escalate their own privilege.

Name: Anonymous 2015-01-01 1:17

>>532
yeah, I know, and he doesn't give a shit about anything other than tiny optimisations. some value correctness, code simplicity, portability, etc. cudder only cares about how fast and how little space his shitty programs use.

what a fucking weeaboo

Name: Cudder !MhMRSATORI 2015-01-01 6:41

>>518
You and the rest of the user-oppressing DRM-advocating gang. Who do jailbreaks exist? Insecurity is freedom. Keep this in mind: "Those who give up freedom for security deserve neither."

>>535
We're talking about personal computers here, not SOA cloud crap. There should only ever be one user for a PC, and (s)he completely owns the hardware and software. All of it.

On the topic of "sufficiently smart compilers"...
(1) The equivalent of "goto fail;"?

   mov [ebx+24], eax
   mov [ebx+24], eax

(2) Wasting another instruction, a register (ebx is not used again in this function), and a memory access.

   call 4706996
   add  esp, 12
   mov  ebx, eax
   mov  [ebp-28], ebx
   ... {neither ebx, eax, nor [ebp-28] are used after this} ...

There is absolutely no way a human programmer would write such idiotic code, unless he was drunk or doing stupid copy-pasting. I've been staring at compiler-generated shit for almost 8 hours a day for the last decade or so and I can tell you that COMPILERS ARE NOT SMART.

>>536
Simple code also tends to be small, fast, and correct. Portability is stupid "lowest-common-denominator" idiocy. (Just look at Java... it runs everywhere, but not great anywhere.)

Name: Anonymous 2015-01-01 15:26

>>537
I don't like the use of that quote in this context, because you gain no security from DRM and locked down machines. Only vulnerabilities.

Name: Cudder !MhMRSATORI 2015-01-01 15:35

>>538
It's perceived security. The way they market it just makes people think "this is so safe and secure!" It's called a "jailbreak" for a reason - a high-security prison is one of the most "secure" places to live in. No sane person would voluntarily go live in one, yet that's what people are doing to themselves with computers.

Name: Anonymous 2015-01-02 5:21

>>537
So if you want your kawaii 4chan-esque :^) browser to support ARM, do you compile x86 asm to ARM asm?

Name: Anonymous 2015-01-02 6:25

>>540

ARM

Come on, be serious.

Name: Anonymous 2015-01-02 9:09

nothing says simplicity more than windows and x86

Name: Cudder !MhMRSATORI 2015-01-02 10:20

>>540
If/when there is a need, I'll just write an ARM version.

Name: Anonymous 2015-01-03 1:53

>>543
Why don't you contribute to NetSurf instead of trying to create a new web browser (a Cudder™ project that will eventually result in failure)? I'm starting to believe you really are a troll.

Name: Anonymous 2015-01-03 2:42

>>544
Because it is not his job to please you, and he can work on any project that he likes, whenever he likes, however he likes.

Why aren't you contributing to NetSurf?

Name: Anonymous 2015-01-03 2:51

>>544
Man, I don't care how much of a pathetic masochist cudder is. I can make computers do whatever I want. lmao

Name: Anonymous 2015-01-03 3:18

Man, I don't care how much of a pathetic masochist 546 is. I can make software do whatever I want. rofl-la-mo

Name: Anonymous 2015-01-03 3:57

>>544
Nice dubs xD

Name: Cudder !MhMRSATORI 2015-01-03 13:02

>>544
For mostly the same reason I'm not going to submit code to Firefox, WebKit, etc. - Because I'd have to rewrite (or remove...) most of it anyway.

>>531
>>516
Hubbub is NetSurf's parser, and I've already mentioned it in >>110. In detail, here's what's wrong with it (refer to https://github.com/servo/libhubbub/blob/master/src/tokeniser/tokeniser.c as of the time of this post):

- Using switch statement and explicit case numbers to implement the tokeniser FSM instead of simple gotos and implicit position-based FSM. This means it has to go through that switch every time through the main loop, even if the next state is the same, instead of just jumping to the right place.

- Making every damn state a function. The amount of work that has to be done in each state is tiny, but somehow they've managed to bloat them big enough to look like each one of those functions is nontrivial. There is a ton of duplicate code as a result - for example, compare hubbub_tokeniser_handle_attribute_value_dq(), hubbub_tokeniser_handle_attribute_value_sq(), and hubbub_tokeniser_handle_attribute_value_uq(). (The names are ridiculously long too.) Each one is over 40 lines and performs basically the same thing... can you spot the differences between them?

Here's the corresponding part of my parser's tokeniser; it handles all 3 types of attribute values.

attrv_unquoted_loop:
 cmp al, '>'
 jz finish_attr_done
 call [ebp+htmlp.func_iswhs]
 jnz attrv_loop
finish_attr_done:
 ; omitted code here finishes an attribute and emits a tag or starts a new one (~10 lines)
 ...
find_attrv:
 xor bl, bl    ; presume unquoted
 ; omitted code here scans for start of an attribute value (<10 lines)
 ...
 cmp al, '"'
 jz attrv_quoted
 cmp al, "'"
 jnz attrv_unquoted
attrv_quoted:
 mov bl, al    ; save quote char
 inc edx       ; skip quote char
attrv_loop:
 call [ebp+htmlp.func_getchar]  ; EOF - ignore
attrv_unquoted:
 or bl, bl
 jz attrv_unquoted_loop
 cmp al, bl
 jnz attrv_loop
 jmp finish_attr_done

I've shown quite a bit more than the actual unquoted/quoted attribute value states, just so you can follow the structure around it, and yet this code in x86 Asm is still shorter than one of Hubbub's functions, in C, that does a tiny fraction of the real work this does! The actual quoted/unquoted attribute value states are represented by these ten instructions:

attrv_unquoted_loop:
 cmp al, '>'
 jz finish_attr_done
 call [ebp+htmlp.func_iswhs]
 jnz attrv_loop
 ...
attrv_loop:
 call [ebp+htmlp.func_getchar]  ; EOF - ignore
attrv_unquoted:
 or bl, bl
 jz attrv_unquoted_loop
 cmp al, bl
 jnz attrv_loop
 jmp finish_attr_done

I'm not claiming to be an expert on HTML parsing by any means, but this code is so short because what it does really is that simple - unquoted attributes are terminated by > or whitespace, and quoted ones are terminated by their quote character. There's nothing at all deep or complex about this, yet Hubbub's version makes it look like even "lightweight" code (which they claim to be) needs to do all that shit so it gets used as "evidence" by all the people claiming "browsers need to be complex!"

- Character references don't need to be handled by the tokeniser, since the presence or absence of any '&'s doesn't affect tokenisation. There's another chunk of complexity that could be factored out. I mentioned this in >>110 already. As expected, Hubbub takes the stupid route.

- DOCTYPE tokens, same thing, already mentioned above.

- EOF - this is another bloater. Hubbub checks for EOF in the code of every state(!), but if you read the spec you'll see that the number of operations upon EOF are limited to two: not doing anything or "emit the current token", so my EOF handling is done in getchar. The cases that don't do anything don't need any code, the ones that do are recorded in an array (there's only 16 entries) and the right place is jumped to when the EOF function is called.

- Partial input handling: this is all handled in getchar - which returns "up two levels" to whatever called the parser when it runs out of input (try doing that in C!) The input is one big buffer which gets resized and moved as needed. I'm not sure what Hubbub does here, but it's probably more complex than this.

- Output interface: Hubbub stuffs everything into a "token" structure and then calls one function, token_handler(), to push the results out. It has functions named emit_current_tag(), emit_current_comment(), emit_current_chars(), etc., but all those have to go through that one token_handler() function. On the other side, the tree construction has to then switch on that type, just to call the right function, when they could've made the tokeniser call those different functions in the first place. How stupid is that? It's [i]bloody-panties-on-head retarded!![/code] I take the sane route with my tokeniser interface:

extrn @emit_doctype@4:near
extrn @emit_text@4:near
extrn @emit_comment@4:near
extrn @emit_tag@4:near

One function for each token type, as it should be. No need for a "token type" either, because the function being called implicitly determines what token was emitted, and it can directly read from the right fields of the parser/tokeniser structure to get its info. The pointer to the parser structure is in a register already. Simple and efficient. Tree construction is currently in C so I have to fight with stupid HLL calling conventions bullshit (I use ecx for a loop counter - the way it should be used - and ebp for the parser structure, but __fastcall wants to put the first parameter in ecx!!) but I expect that'll change once I rewrite that part in Asm too.

tl;dr: I'm not going to touch NetSurf, because it wouldn't be NetSurf anymore if I did.

Name: Anonymous 2015-01-03 14:36

>>549
See if you can fit a tokenizer into the symbol space used by hubub's.

Name: Cudder !MhMRSATORI 2015-01-03 14:48

P.S. anyone else who wants to try "fixing" the parser and/or telling NetSurf devs about this, go ahead, but I doubt they'll listen...

Apparently this new HTML5 parser was the result of GSoC - where n00bs are paid to churn out as much shit code as they can in a limited timeframe. No wonder it's full of amateur mistakes.

Name: Cudder !MhMRSATORI 2015-01-03 15:50

>>550
Explain.

Name: Anonymous 2015-01-03 19:16

>>552
Well all those long function names must be using up a lot of space in the symbol table when compiled to object code. You could probably fit a small tokenizer in there.

Name: Anonymous 2015-01-04 0:20

>>549
duhh hdey shud use utf8-le yall

Name: Anonymous 2015-01-04 12:23

Cudder, do you have any idea when your code-cleanup will be ready? A rough estimative?
I'm really excited to go through it and maybe even try a Linux port on my free-time.
I feel disgusted every single day using these other bloated browsers.

Name: Cudder !MhMRSATORI 2015-01-04 15:11

>>553
Function names are not really a runtime concern, since the tokeniser is not something you would be dynamically linking anyway. They just make the source code really annoying to read. But seeing as my tokeniser is ~1KB, I'm pretty sure it's smaller than all the function names combined; for comparison, my post in >>549 is over 5KB already.

>>555
No idea (and work is starting back up so probably not all that much time for the coming month or two...) Remember that this is just a parser and DOM viewer, not anywhere near a full browser yet. >>123 is around what it's like at the moment. It's still in need of a CSS parser/renderer.

Name: Anonymous 2015-01-18 13:01

>>110
Servo hasn't used libhubbub for a number of months; they use https://github.com/servo/html5ever/ now instead.

Name: Cudder !MhMRSATORI 2015-01-18 14:44

>>557
Although I don't know Rust that looks much better at first glance... but the trend of splitting everything into a bunch of relatively tiny files is still a bit irritating. There's still some duplication especially in the attribute quote handling, leaving in "parse error" cases is useless for something intended for a browser, character references are being done inline, and it's a pretty dumb literal translation of the standard, but this is definitely better than the mess of Hubbub.

However, I haven't looked at the binary resulting from this so can't compare directly...

Name: Cudder !MhMRSATORI 2015-01-19 12:18

Time to look at CSS3 syntax more closely...

http://www.w3.org/TR/css-syntax-3/

It's around as verbose as the HTML5 syntax spec if not more, but somehow also less informative and quite hard to follow. The selector syntax is in a different spec. There are 32 token types, if I counted correctly... and the spec isn't in the form of a state machine.

Does anyone find it funny that the CSS parser may end up being larger than the HTML one...?

Name: Anonymous 2015-01-19 12:32

How about CSS 2?

Why browsers are bloated

1 Name: Anonymous 2014-07-27 0:20

521 Name: Anonymous 2014-12-29 16:11

522 Name: Anonymous 2014-12-29 16:29

523 Name: Anonymous 2014-12-29 18:00

524 Name: Anonymous 2014-12-29 22:27

525 Name: Anonymous 2014-12-29 22:36

526 Name: Anonymous 2014-12-30 11:18

527 Name: Anonymous 2014-12-30 11:32

528 Name: Anonymous 2014-12-30 12:45

529 Name: Anonymous 2014-12-30 13:06

530 Name: Anonymous 2014-12-30 13:54

531 Name: Anonymous 2014-12-30 14:07

532 Name: Anonymous 2014-12-30 14:11

533 Name: Anonymous 2014-12-30 18:31

534 Name: Anonymous 2014-12-30 19:36

535 Name: Anonymous 2014-12-31 13:31

536 Name: Anonymous 2015-01-01 1:17

537 Name: Cudder !MhMRSATORI 2015-01-01 6:41

538 Name: Anonymous 2015-01-01 15:26

539 Name: Cudder !MhMRSATORI 2015-01-01 15:35

540 Name: Anonymous 2015-01-02 5:21

541 Name: Anonymous 2015-01-02 6:25

542 Name: Anonymous 2015-01-02 9:09

543 Name: Cudder !MhMRSATORI 2015-01-02 10:20

544 Name: Anonymous 2015-01-03 1:53

545 Name: Anonymous 2015-01-03 2:42

546 Name: Anonymous 2015-01-03 2:51

547 Name: Anonymous 2015-01-03 3:18

548 Name: Anonymous 2015-01-03 3:57

549 Name: Cudder !MhMRSATORI 2015-01-03 13:02

550 Name: Anonymous 2015-01-03 14:36

551 Name: Cudder !MhMRSATORI 2015-01-03 14:48

552 Name: Cudder !MhMRSATORI 2015-01-03 15:50

553 Name: Anonymous 2015-01-03 19:16

554 Name: Anonymous 2015-01-04 0:20

555 Name: Anonymous 2015-01-04 12:23

556 Name: Cudder !MhMRSATORI 2015-01-04 15:11

557 Name: Anonymous 2015-01-18 13:01

558 Name: Cudder !MhMRSATORI 2015-01-18 14:44

559 Name: Cudder !MhMRSATORI 2015-01-19 12:18

560 Name: Anonymous 2015-01-19 12:32