Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-8081-120121-160161-

designing a suckless bignum library

Name: Anonymous 2015-11-16 22:11

Let's design a suckless bignum library. (I'm not part of suckless though, just curious about replacing GMP).

I researched a bit into algorithms and the rundown is this:
* long multiplication: O(n^2)
* karatsuba O(n^1.5)
* Toom-Cook, fourier transform based methods - even faster but only used for numbers 10k digits+ long. Much more complex.

So we should probably use karatsuba for all multiplications. Squaring can be done a bit faster than multiplying two different numbers sometimes.

Now I suggest programming it in assembly, that gives you access to the carry bit (C doesn't get you that). Of course we will use libc and the normal C calling conventions so that it's a regular C library.

What to do about memory management? e.g. if you want to add two numbers do we need to allocate a new 'number' as long as the largest to write the result into or do it destructively "x <- x + y"? Maybe the library should support both - then a calculator program would figure out the best primitives to use for a given computation.

It might be nice to also support things like (big modulus) modular arithmetic and polynomials. stuff like exponentiation and modular inverses have interesting algorithms.

What other integer operations would we want? I don't really want to do anything with arb. prec. real numbers - arithmetic with rationals could be done though.

Name: Anonymous 2015-11-16 22:13

Oh: prime testing, generating and factorization.

Name: Anonymous 2015-11-16 22:26

This sounds like a really interesting project! I'm interested in what kind of scope you have in mind. What kinds of operations should be supported? Aside from integers and modular arithmetic, do you also want to have real numbers, vectors, matrices, rationals, etc? What systems do you want to support (x86, MIPS, etc)? What do you think would be a nice goal for a prototype/proof of concept?

I have some ideas for what a very simple prototype could include:
* a representation for arbitrary length integers
* addition/subtraction/multiplication on these integers
* multiplication implemented with karatsuba
* c calling convention wrapper

Name: Anonymous 2015-11-17 2:16

Make everything fixed point and represent everything in BCD.
END OF DISCUSSION

Name: Anonymous 2015-11-17 2:21

>>4
BCD? are we writing this for the fucking NES?

Name: Anonymous 2015-11-17 3:01

make it in haskell

Name: Anonymous 2015-11-17 3:02

>>5
The 6502 variant in the NES had the BCD support removed, dumbass.

Name: suigin 2015-11-17 5:21

>>1
Would be interesting to know if these new intrinsics generate (near) optimal code if used to implement bignums, if it works, it would save you having to get your hands dirty with lots of assembly:

https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins

Clang also supports those intrinsics, plus it has its own set of intrinsics specifically for arbitrary precision arithmetic:

http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins

And of course, there's Intel's ADX extensions, but AMD doesn't implement them yet. Have to wait for Zen.

https://software.intel.com/en-us/node/523866

Name: Anonymous 2015-11-17 15:35

>>7
that was le second part of le joke

Name: Anonymous 2015-11-17 15:35

>>1
who is tom cook
is he like tom nook
do i owe him money

Name: Anonymous 2015-11-17 19:33

>>5,7
He obviously meant Game Boy. The GBZ80 has some BCD support.

Name: Anonymous 2015-11-17 19:43

Just do integers. Destructive updates could be specified by when the output parameter has the same address as an input parameter.

Name: Anonymous 2015-11-18 2:35

>>10
Apple CEO. He's a poof.

Name: Anonymous 2015-11-18 3:04

>>12
This might limit some algorithms though.

Name: Anonymous 2015-11-18 6:25

>>8
Having your dynamically defined compiler generate the proper assembly is far better than getting your hands dirty with that horrible C shit.

Name: Anonymous 2015-11-18 12:53

>>2
i got a bit of code for that
it can do the first 5,000,000 ints in 13 sec running JIT javascript on my smartphone

Name: Anonymous 2015-11-19 2:25

>>2
See PARI/GP

Name: Anonymous 2015-11-19 3:38

>>15
I don't know of an existing C compiler that will generate optimal assembly for bignums without using intrinsics. Implementing semantic rules for that would be extremely complex, and may cause performance problems for non-bignum code.

You're free to submit patches to GCC or Clang/LLVM. But from a practical standpoint, one must work with existing compiler technology. And so you either hand roll the assembly yourself or use intrinsics.

Name: suigin 2015-11-19 4:01

>>1
What to do about memory management? e.g. if you want to add two numbers do we need to allocate a new 'number' as long as the largest to write the result into or do it destructively "x <- x + y"? Maybe the library should support both - then a calculator program would figure out the best primitives to use for a given computation.

Been thinking about this. I think the simplest thing would be to consider memory management as outside of the scope of the library. The set of functions implemented by the library would be low-level and require that the buffers passed in be suitably large, otherwise overflow and truncation occurs.

Memory management would be left up to the programmer using the library. This would be sufficient for implementing bc(1) or dc(1) for sbase, which uses fixed size buffers and precision that can be adjusted by the user.

If people want a set of bignum functions that manages memory, this could be done by layering a higher-level library on top of the lower-level one.

Name: Anonymous 2015-11-19 4:29

Karatsuba requires temporary storage for the sums.

That would make for a very unfriendly API if there is no library-side memory management.

Name: Anonymous 2015-11-19 5:01

>>19
That sorta requires some way to retrieve the required size of a calculation and its result in order to allocate the right amount of memory, though.

Name: suigin 2015-11-19 5:31

>>21
There's usually an upper-bound for the size of the result that can be determined ahead of time based on the size of the inputs.

If, on the other hand, you try to implement it so that you always allocate just as much memory as you need, your bignum operations become far more complex, as they need to detect this condition, allocate more memory, then restart where they left off. It also becomes more CPU cache intensive when done this way.

From both a complexity and performance standpoint, for most operations, I think it would be better if you allocate everything up front, and then do the operation in one shot. Memory capacity is cheap.

>>20
There are also upper-bounds for the amount of temporary storage you need as well, and the library can be designed so that you have to pass in pointers to temporary buffers. A higher-level library with an easier-to-use API can still be layered on top, which handles all of the memory management.

Name: Anonymous 2015-11-19 5:50

>>18
That's why I said C is horrible shit for this. Get out of that fucking abomination of a language.

Its notions of expressions and evaluation are so far up its own ass it's impossible to fundamentally change anything in the language. It's fast for one class of activity, but is shit slow if you try to break out of it.

Name: Anonymous 2015-11-19 6:32

>>23
That's why I said C is horrible shit for this. Get out of that fucking abomination of a language.

So we should implement a ``suckless'' bignum library in LISP? But anon, LISP already has bignums. What's the point?

Or you meant Rust or Go? Well, both Rust and Go compilers have the same problem as C in this regard.

Besides, C compiler intrinsics are no different than LISP forms that are builtin to the LISP compiler or interpreter, instead of implemented with macros on top of more elementary forms. It's the same idea. In order to get an optimal mapping from a form to machine instructions, the LISP implementer often has to do the job himself, or use some form of brute-force search or genetic algorithm combined with runtime profiling to arrive at the optimal solution.

Name: Anonymous 2015-11-19 8:20

>>24
So we should implement a ``suckless'' bignum library in LISP? But anon, LISP already has bignums. What's the point?

If you feel like creating a better bignum implementation, what better place to play with it than a language which supports bignums? Why dick around with calling library functions instead of directly using immediate built-in syntax?

Besides, C compiler intrinsics are no different than LISP forms that are builtin to the LISP compiler or interpreter, instead of implemented with macros on top of more elementary forms.

Lol, do you even Lisp, bro?

You don't do this with regular macros, you do it with compiler macros and defining new transforms in the intermediate compiler definitions. You can add CPU-specific AVX instructions or whatever as a general toolkit for the compiler to use without having to code entire routines in asm. Lisp lets you hone Lisp to your problem, instead of fighting against the C compiler's fixed assumptions.

In order to get an optimal mapping from a form to machine instructions, the LISP implementer often has to do the job himself, or use some form of brute-force search or genetic algorithm combined with runtime profiling to arrive at the optimal solution.

Bullshit. Type inference carries a long way, and the language passes a higher level of abstraction to the compiler for optimization than C, with a billion times less undefined behavior keeping it cleaner. And if you're talking about high-level human optimization of algorithms and style, that's the same in every single language.

Name: Anonymous 2015-11-19 8:36

I remember reading about a language that let you specify optimizations. You could just specify a rule add(negate(X), Y) -> sub(Y, X) and the optimizer would do that transformation at the start of compilation.

Name: Anonymous 2015-11-19 9:01

>>26
You're describing pretty much every mature symbolic language.

Name: Anonymous 2015-11-19 13:19

>>27
Is javascript mature yet?

Name: Anonymous 2015-11-19 16:17

    -=つ=つ
  =⊃∧_∧≡⊃ミヽ☆_∧ ∩
. -つ <# `Д´>=つ `☆ ))∀`)//
   =つ   つ≡ ☆*☆ググル /

Name: Anonymous 2015-11-19 16:31

オレノナハ エラー ダ...

Name: Anonymous 2015-11-19 17:04

>>20
this is an important problem

>>23
you are not competent to be involved in this thread.

Name: Anonymous 2015-11-19 17:31

>>25
You can add CPU-specific AVX instructions or whatever as a general toolkit for the compiler to use without having to code entire routines in asm. Lisp lets you hone Lisp to your problem, instead of fighting against the C compiler's fixed assumptions.

C has this too. It's called ``inline assembly''. Your inability to see the parallels between these C constructs and LISP makes me want to shake my head. Sure, LISP perhaps does it better, but since ``suckless'' more or less sticks to POSIX standards and the UNIX way, the project, by definition, is best implemented in C.

Bullshit. Type inference carries a long way, and the language passes a higher level of abstraction to the compiler for optimization than C, with a billion times less undefined behavior keeping it cleaner. And if you're talking about high-level human optimization of algorithms and style, that's the same in every single language.

There isn't always a one to one mapping from a form to a sequence of instructions, especially on modern RISC and CISC CPUs with vector instruction sets, out-of-order execution, multi-level instruction and data memory caches, etc.

Finding the optimal sequence of instructions can be a non-deterministic polynomial time operation, and it becomes very expensive as the size of a sequence grows. Compilers take a lot of short-cuts in their code generators.

Name: Anonymous 2015-11-19 19:21

Name: Anonymous 2015-11-20 4:17

>>32
C has this too. It's called ``inline assembly''.

You need some reading comprehension. Fuck inline assembly, that's what I said you can escape by leaving C. Writing assembly is WRONG. Telling the compiler how to use new intrinsics to make your existing code use CPU specifics is right. You do NOT end up including blocks of asm code, but write your inner algorithms in plain Lisp.

There isn't always a one to one mapping from a form to a sequence of instructions blah blah blah

None of that is language specific.

However, re "Compilers take a lot of short-cuts in their code generators", the C compiler is very constrained because of all the undefined behaviors from the spec that prevents very obvious optimizations. A better specified language can take advantage of much more.

Name: Anonymous 2015-11-20 7:21

You're fucking welcome.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define swap(__a,__b); { __a = __a ^ __b; __b = __a ^ __b; __a = __a ^ __b; }

char* shift_left(char* src){
int len = strlen(src);
char* moved = calloc(len - 1, sizeof(char));
memcpy(moved, src + 1, len - 1);
return moved;
}

char* add(char* addend, char* addend2){
int len = strlen(addend), a2len = strlen(addend2);
int swapped = 0, result_digits = 0;

if (len < a2len){
swap((int)addend, (int)addend2);
swap(len, a2len);
swapped = 1;
}
char* a2resized = calloc(len + 1, sizeof(char));
memset(a2resized, '0', len);
a2resized[len] = 0;
memcpy(a2resized + len - a2len, addend2, a2len);

int reslen = len + 1;
char* result = calloc(reslen, sizeof(char));
memset(result, '0', reslen);
int carry = 0;
for (int i = 0; i < len; i++){
int respos = reslen - i - 1;
int a1val = addend[len - i - 1] - 48, a2val = a2resized[len - i - 1] - 48;
int val = a1val + a2val + carry;
carry = 0;
if (val >= 10){
result[respos] = (val - 10) + 48;
carry = 1;
}
else{
result[respos] = val + 48;
}
}
free(a2resized);
if (swapped == 1) swap((int)addend, (int)addend2);
if (carry == 1){
result[0] = 49;
result[reslen] = '\0';
}else{
char* shifted_result = shift_left(result);
swap((int)result, (int)shifted_result);
free(shifted_result);
result[reslen-1] = '\0';
}
return result;
}

int main(int argc, char *argv[]){
char* a = "99999999999999999999";
char* b = "99999999999999999999";
char* c;
char* d = "9999999999999990";
swap((int)a, (int)b);
c = add(a, b);
printf("%s + %s = %s\n", a, b, c);
char* res = add(d, c);
printf("%s + %s = %s\n", d, c, res);
return 0;
}

Name: Anonymous 2015-11-20 13:16

>>35
Why no discrete logarithm?

Name: Anonymous 2015-11-20 13:53

>>35
pleb tier swap macro in C
not using the glorious xchg opcode in asm

disdain for plebs

Name: Anonymous 2015-11-20 14:04

>>37
Fix it. Post the better version.
I see >>35-san's work, but I don't see yours.

Name: Anonymous 2015-11-20 14:21

Let's

no

Name: Anonymous 2015-11-20 15:09

>>39
stay out of the thread then, consumer

Name: Anonymous 2015-11-20 16:39

op, if you're serious about this, do:

* modular arith only (ie, crypto only). for fast infinite precision, you really need hairy ball of code like GMP, no way to be suckless
* shift-and-add multiplier, then you get modulo for free
* Why not schoolbook & barrett reduction? Sure it works better on superscalar CPUs with ALUs, but it *will* suck horribly on a CPU without multiplier (which is everything really low power these days)
* basics, ie modular add,sub,mul,pow are ~500loc. prime tester (miller-rabin) 100 loc. other bells and whistles (ie actual PKCS+RSA, DHE, maybe even ECDHE) 1kloc. 2kloc all.

What to do about memory management?

The golden rule of suckless C - just forget dynamic memory. Dynamically allocated memory was a mistake. K&R were right. You free memory through return or exit(). Just cap your bignum type to some hardcoded limb count. Sacrifice a bit of stack depth, and and no need for silly realloc() every time you add two numbers. Not to mention then you don't even need libc and can run baremetal easily.

As an example of bignum of this style, here is 60 loc RSA signature verifier for a (very) ROM space constrained embedded MCU bootloader:

#define RSA_BYTES (RSA_BITS/8)
#define DIGIT uint32_t
#define DIGIT_BYTES (sizeof(DIGIT))
#define DIGIT_BITS (DIGIT_BYTES*8)
#define DIGIT_MAX ((DIGIT)(-1))
#define NDIGITS (RSA_BITS/DIGIT_BITS)

static int b_add(DIGIT * restrict r, const DIGIT *x, const DIGIT *y)
{
DIGIT w, carry = 0;
for (int i = 0; i < NDIGITS; i++) {
if ((w = x[i] + carry) < carry)
w = y[i];
else
carry = ((w += y[i]) < y[i]);
r[i] = w;
}
return carry;
}

static int b_mulmod(DIGIT * restrict res, const DIGIT *xsrc, const DIGIT *y, const DIGIT *mod)
{
DIGIT rbuf1[NDIGITS], rbuf2[NDIGITS], xbuf1[NDIGITS], xbuf2[NDIGITS];

DIGIT *r1 = rbuf1;
DIGIT *r2 = rbuf2;
DIGIT *x1 = xbuf1;
DIGIT *x2 = xbuf2;

DIGIT *swp;

memset(rbuf1, 0, sizeof rbuf1);
memcpy(xbuf1, xsrc, sizeof xbuf1);

for (int i = 0; i < NDIGITS; i++) {
for (DIGIT bit = 1; bit; bit += bit) {
if (y[i] & bit) {
if (b_add(r2, r1, x1))
return -1;
if (!b_add(r1, r2, mod)) {
swp = r1;
r1 = r2;
r2 = swp;
}
}
if (b_add(x2, x1, x1))
return -1;

if (!b_add(x1, x2, mod)) {
swp = x1;
x1 = x2;
x2 = swp;
}
}
}
memcpy(res, r1, sizeof rbuf1);
return 0;
}

static int rsa_public(DIGIT *output, const DIGIT *input, const DIGIT *modulus)
{
DIGIT buf2[NDIGITS];
/* buf2 = buf^2 % modulus */
if (b_mulmod(buf2, input, input, modulus))
return -1;
/* buf3 = buf^3 % modulus */
return b_mulmod(output, buf2, input, modulus);
}


(modulus is stored negated, thus no need for b_sub)

Name: Anonymous 2015-11-20 17:12

>>41
this is really nice and I appreciate it.

One question: what about nice code for modular arithmetic like this, but then also another bit of code for infinite precision - does it really have to be hairy? If we just pick one algorithm that works reasonably well it should be decent enough and not suck?

Name: Anonymous 2015-11-20 18:05

>>41
for fast infinite precision, you really need hairy ball of code like GMP, no way to be suckless

Prove it.

Name: Anonymous 2015-11-20 18:18

>>1-43
suck less hairy balls

Name: Anonymous 2015-11-20 18:43

Suckless is the Amish of programming.

Name: Anonymous 2015-11-20 18:48

I'm a white hat. I have always been a white hat. Reverse engineering is for
inferior monkey engineers. Humiliation. A glorious engineer makes new
standards.

Name: Anonymous 2015-11-20 19:36

>>35
Make a multiplier next.

Name: Anonymous 2015-11-21 4:24

>>35
>>41
Why not just use libtommath? It's already recommended by suckless. The only problem is that it being pure C, isn't as optimized as it could be if it were hand-coded assembly.

It's right near the top as an alternative to GMP: http://suckless.org/sucks/

Name: Anonymous 2015-11-21 4:40

>>37
I love these quotes! But what post are they from?

Name: sage 2015-11-21 16:50

>>49
hi! i think you are a bit confused, you're using memes from 4chan's /jp/ board, but this isn't actually /jp/ at all! it's actually on a completely different site, called progrider (not related to 4chan.org). the board you actually meant to post on is at https://boards.4chan.org/jp/ . i'm glad we could clear up this confusion so that you can get back to posting on the right board. well, that's about all, enjoy being on your correct board, and please do try to be a bit more careful in the future. you're welcome!

Name: Anonymous 2015-11-21 17:17

>>48
Indeed libtom is very close to what
>>42
is suggesting - a good compromise in not being too slow for somewhat general use, but still far from a good choice for heavy (ie non-crypto) number crunching. That's why it might be wiser to strive for some absolute reductionism (fe specialize to embedded and crypto, and then you can do a lot under 2kloc without being too slow) and not directly compete with tommath.

The problem here is that general fast infinite precision is simply collection of dozens of algorithms (libtom is far beyond 2kloc suckless limit :), the more you specialize for various cases (including heuristics to auto choose method to do something, for easier use), the "faster" is the lib overall. The disadvantage is eventually you end up with a kitchen sink like GMP, NTT or ARPREC.

>>43
Engineering is empirical - hearsays and superstitions. You can find rigorous proofs of complexity of individual algorithms, but not for bignum in general, as that depends on what number theoretic tricks you actually decide to implement.

>>48
Assembly
Ah, just drop it. Hand written asm (of which there is not immense amount in GMP, just inner loop kernels) can squeeze say, 10-20% more. But that makes sense only after you exhausted algorithmic optimizations which can be often much faster. For example NTT or ARPREC can be still magnitude faster than GMP in some cases (say, non-modular multiplication), despite being pure C++.

I think that tommath indeed is pretty close to to GMP in terms of implemented tricks, but lacks ntt, and asm kernels IIRC.

Name: Anonymous 2015-11-21 19:00

>>51
Really strong points. Gotta rethink the scope of this somewhat to be able to pull it off.

regarding assembly: But I don't want to have to do some stupid thing that relies on the compiler pattern matching against it to figure out i meant the carry bit. I just want to say 'check the carry bit'.

Name: Anonymous 2015-11-21 20:31

>>50
Hi! think you are a bit confused, you're using sage as the 4chan imageboard does, but this isn't actually 4chan at all! it's actually on a completely different site, called progrider (not related to 4chan.org). the board you actually meant to post on is at https://boards.4chan.org/g/ . i'm glad we could clear up this confusion so that you can get back to posting on the right board. well, that's about all, enjoy being on your correct board, and please do try to be a bit more careful in the future. you're welcome!

Name: Anonymous 2015-11-21 22:54

>>1,35,41,48,51,52
https://github.com/suiginsoft/hebimath

This was how I spent my Saturday morning. Not that I'm taking ownership of this or anything, baka, but you're free to use it as a starting point.

It supports only x86_64 currently. Run make to make the library and test programs. Run make test to execute the test programs in the test dir. However, there's only one right now, which is a microbenchmark profiling the addition function.

>>52
Ah, just drop it. Hand written asm (of which there is not immense amount in GMP, just inner loop kernels) can squeeze say, 10-20% more.

$ make test
test/profile_add
>>35-san add: 7.24666365 seconds
Hebimath add: 0.22406670 seconds

Name: Anonymous 2015-11-21 23:00

>>54
We love you!

Name: Anonymous 2015-11-21 23:11

>>54
are you the avatarfag on /tech/?

Name: Anonymous 2015-11-21 23:14

>>54
I like the code, nice one

Name: Anonymous 2015-11-21 23:50

>>56
The what on what?

Name: Anonymous 2015-11-22 0:06

>>55,57
Thanks!

>>56
No. I sometimes post on /tech/, but always anonymously and not with smugface anime meme images.

Name: suigin 2015-11-22 0:37

I made a mistake in test/profile_add.c. I wasn't computing the right number of base 10 digits to generate for >>35-san's BCD adder. Using the correct formula, it turns out the optimized assembly version is nearly 100 times faster. I've submitted the changes to github.

$ test make
test/profile_add
>>35-san add: 22.01722459 seconds
Hebimath add: 0.22316967 seconds

Name: Anonymous 2015-11-22 9:14

What voodoo do you need to compile these down to vector operations using SSE/AVX/etc? Will the Intel compiler pull it off? Because shuffling all these numbers through normal CPU registers is loldumb.

Name: Anonymous 2015-11-22 13:56

suigin, why don't you announce your suckless projects (sstoy, hebimath, etc) to other suckless devs on irc or mailing list?
That way they can put it on their git and give more visibility (and development help) on the projects.
Sorry for the off-topic, just wanted to give out this idea. They have git.suckless.org - git.2f30.org - git.r-36.net

Name: Anonymous 2015-11-22 13:58

git sucks, use hg

Name: Anonymous 2015-11-22 14:42

>>41
The golden rule of suckless C - just forget dynamic memory.

Nice to see Forth finally getting some representation in /prog/. Shame about trying to do it in C, though.

Name: Anonymous 2015-11-22 14:43

>>63
suckless used hg, but they migrated to git on the end on 2012.
there was a long discussion on the mailing list that started here http://lists.suckless.org/dev/1211/13601.html
read the rest to see pros/cons.
anyway, this is the suckless bignum library thread, not the versioning system war thread.

Name: Anonymous 2015-11-22 15:04

ok fuck off about version control.
make a separate thread or something.
this is about BIG NUM

Name: Anonymous 2015-11-22 15:05

BIG NUM

Name: suigin 2015-11-22 20:26

>>61
SSE/AVX/NEON doesn't support addition or multiplication with carries. You have to compute the carry manually. You're better off using the scalar multiplier. Intel has their ADX extensions, but it has nothing to do with SIMD.

You can still use vector instructions for other things like bitwise and, or, xor, etc.

GCC and Clang are pretty good at automatically vectorizing code these days, so long as you configure the target architecture correctly and align your memory to the SIMD register boundary size.

>>62
Because I'm not a suckless developer. One must first prove himself before asking for a git account and sanctioning of one's own projects.

Name: suigin 2015-11-22 21:03

So, I've been thinking some more. Writing these low-level bignum kernels gets you unsigned arithmetic, but not signed arithmetic.

It's possible to augment the adder to do sign-extension during the propagation phase so you can stick with two's complement notation, but you're still in trouble when it comes to multiplication. You have to keep track of whether an operand is negative or not, and negate the subdivided components during reduction, which requires new code paths, or negate the entire operand up-front and recompute the sign at the end. I can see why most bignum libraries store the sign externally, and use an unsigned internal representation.

So then we need a data structure to store the sign in, and once you do that, it also starts to make sense to include size of the operand in the struct and as a result, it naturally makes sense to handle memory management in the library. I'm not sure if there's a cleaner way to do it.

We can still do a better job than libtommath.

A compact bignum type on a 64-bit platform might look like this:

typedef struct {
uint64_t *data;
size_t capacity;
size_t sign:1; /* using bit-fields for brevity, but probably */
size_t size:63; /* want to do the mask and shifting manually as */
} Bignum; /* the standard only supports bitfields for 'int' */


However, for bignums smaller than 128-bits, it's possible to store the data directly in the bignum type to avoid memory allocations.

typedef struct {
union {
struct {
uint64_t *data;
size_t capacity;
} big;
uint64_t small[2];
} num;
size_t sign:1;
size_t isbig:1; /* 0 = number stored in num.small, 1 = allocated externally */
size_t size:62;
} Bignum;


But there's something else to consider. What if we were to make to make the bignum kernels only deal with 256-bit aligned chunks (or 128-bit on 32-bit platforms). Our compute kernels would shrink in half, would be easier to understand, and would be more friendly on CPU instruction caches, as we don't have to handle cases where we have left over words that aren't modulo 256-bits. CPUs would literally scream running our code.

But the smallest possible bignum would be 256-bits in size, and so our small-number memory allocation trick will no longer work unless we artificially pad the struct. This may or may not be a bad idea.

It really boils down to the usage patterns. Is the user computing lots of large numbers bigger than 256-bits in size? Or are the majority say 128-bits or less?

What's the best way to weigh these trade-offs? What do you prefer?

Name: Anonymous 2015-11-22 21:46

>>63
hg is bloated crap

Name: >>35 2015-11-22 23:22

>>60
ZOMG, optimized! This little endian (maybe big endian, I always get those mixed up), decimal adder is a little over 15% faster than my last version. This time, when you run the benchmark, do it on ARMv9, it really brings out the advantages of my approach.
#define swap(__a,__b) { __typeof__(__a) __t = __a; __a = __b; __b = __t; }
/* This is little endian, 99959 is encoded as "95999" */
void little_endian_add(char* addend, char* addend2, char* result){
char* a = addend, *b = addend2;
int a1len = strlen(addend), a2len = strlen(addend2);
if(a1len < a2len) {
swap(a, b);
swap(a1len, a2len);
}
int reslen = a1len + 1, carry = 0, i, immval;

for (i = 0; i < reslen - 1; i++){
immval = a[i] + ((i < a2len)? b[i] : 48) + carry;
carry = 0;
if (immval >= 106){
result[i] = immval - 58;
carry = 1;
}
else{
result[i] = immval - 48;
}
}
result[reslen - 1] = (carry == 1) ? '1' : '\0';
result[reslen] = '\0';

return;
}

Name: suigin 2015-11-23 0:14

>>71
Thanks. I've added it to the profiling test.

I need to port the assembly routines to ARMv7 32-bit. Don't currently have a 64-bit ARM SoC, just a Cubieboard v3 and a Raspberry Pi 2.

Name: Anonymous 2015-11-23 2:38

Just pipe everything through bc — it's the Unix way.

Name: Anonymous 2015-11-23 10:52

>>52
I just want to say 'check the carry bit'

Like it or not, but compiler generally knows what it's doing. It also gives direct access to carry - look at disassembly of:

unsigned getsumcarry(unsigned *carry, unsigned a, unsigned b)
{
unsigned v;
*carry = (v=(a+b)) < a;
return v;
}


Is compiled to:
18: 89 f0 mov %esi,%eax
1a: 01 d0 add %edx,%eax
1c: 0f 92 07 setb (%rdi)


But you're correct that carry and other flags are the kryptonite of modern cpus and 386-era superstitious assembly is often not helping. You better shove it into register asap and pray the cpu is clever enough to not stall the pipeline when it sees that. If you jump based on it, you fuck things up pretty much always. Why? You killed out of order execution with a data dependent branch.

If you want to go slightly faster, use radix 27-30 and delayed carries (ie "normalize" the number every now and then). In some cases (P4 netburst) it can be like 2-5x faster than add/adc chains if the cpu is too stupid to not short circuit these pairs in its opcode folding pass of JIT.

Delayed carries are often used on moderately superscalar p4-like cpus which don't JIT, but have deep pipelines (these days mostly ARM), because anything depending on ALU flag will introduce hard pipeline stall there, the CPU is simply too dumb to avoid it.

tl;dr: 386 asm programmers are surefire way to trigger full autism like this rant

Name: Anonymous 2015-11-23 11:47

>>54
S-senpai, can you push your secret radix 2^(10/3) version? I'm having problems substituting >>35-san's code with yours, as your digits are, um, too big.

Name: Anonymous 2015-11-23 12:35

>>71
char* a = addend, *b = addend2;

Check out this bell-end whom don't even know C.

Name: Andreas Wagner 2015-11-23 14:49

Name: Anonymous 2015-11-23 14:51

>>75
There is no secret version, the size of the digits is fine, it makes maximal use of CPU resources.

What needs to be done instead is implement functions that convert character strings into the optimized internal representation, and vice versa.

Name: Anonymous 2015-11-24 12:44

>>76
who Are you le quoting xD

Name: suigin 2015-12-06 4:03

I've updated the hebimath library with a gmp/libtommath style integer API that handles memory management, and I've implemented runtime CPUID dispatching for the assembly functions. I wanted to make the memory management as flexible as possible, so you could easily override it with different allocators and do so in a thread-safe manner. If you don't need memory management, you can still use the low-level assembly kernels directly.

Currently working on adding multiplication and division by a small constant so I can implement hebi_int <-> base 10 string conversion, which will be needed for future unit tests.

Please take a look!

https://github.com/suiginsoft/hebimath/

Name: Anonymous 2015-12-06 4:22

>>80
Send that to the mailing list.

Name: Anonymous 2015-12-06 4:24

I'm suprised that with all the instruction set bloat accumulated over the years, Intel didn't add some bignum instructions somewhere along the way. Somethibg like movsb that manages edi and esi on its own and just needs to be invoked in a loop.

Name: Anonymous 2015-12-06 4:55

>>80
You also post on /g/ and/or /tech/, am I wrong?

Name: Cudder !cXCudderUE 2015-12-06 13:31

CPUID dispatching for the assembly functions
Stupid idea. The CPU is not going to suddenly change at runtime. Just do it compile-time instead.

>>82
I've always wanted a REP ADCS and REP SBBS, perhaps REPNZ to make it stop early would be good too. No need for different-size versions since you just specify the size in bytes and it should figure out how much to work on at a time (like REP MOVS can copy entire cachelines in one cycle).

Name: Anonymous 2015-12-06 21:27

>>84
Stupid idea. The CPU is not going to suddenly change at runtime. Just do it compile-time instead.

Oh, good, so you're finally advocating JIT. I agree that that's the best way to bake the runtime code to be as fast as it should, wherever it's run.

Name: Anonymous 2015-12-07 2:00

that's the best way to bake the runtime code to be as fast as it should
You are stupid

Name: suigin 2015-12-07 6:19

>>82
They added the ADX instructions, but they're more useful when optimizing multiplication and division algorithms.

>>83
No. I haven't posted on 4chan since world4ch was shut down, and the last time I visited /tech/ was over a month ago.

>>84
Yes. I thought about just selecting the best version at build time, or allowing the user to configure it somehow, and if everyone who would be interested in using it in their software expected their own users to be compiling from source, this wouldn't be an issue. But that's not the world we live in today. If one is interested in displacing the competition when it comes to multi-precision arithmetic, that means implementing some form of runtime dispatching to handle all use-cases.

However, at this point, I don't think it would be much more effort to add in build time selection of architecture-specific kernels as an option. Just need to use some assembler directives and include a `config.inc' file that can be customized by the user which specifies what their processor supports. Runtime dispatching would still be the default.

From a performance perspective, once a dynamically dispatched function has been run at least once, there's pretty much no overhead as long as the memory containing the function address remains in the L1 cache, at least on x86-64. The additional mov instruction to get the function address in a register usually gets pipelined away.

Name: Anonymous 2015-12-07 8:31

>>84
The CPU is not going to suddenly change at runtime. Just do it compile-time instead.

Not a great idea in the wild and wooly world of x86. Over 20 years of binary compatibility means you had better check what generation you are running on if you want to be sure you're executing anything near optional. People are not going to put up with separate binary distributions for every supported architecture, especially when it just doesn't matter for 80% of their code. Just ship them all, pick at runtime and be happy.

Name: Anonymous 2015-12-07 9:59

>>88
Excellent Hitler dubs!

Name: Anonymous 2015-12-07 14:25

The era of x86 is OVER.

Name: Anonymous 2015-12-07 15:34

>>88
Stop sending x86 binaries and only see them for x64 then.

Name: suigin 2015-12-07 17:08

>>91
There isn't a single optimal solution for x86-64 either. SSE2 is the only guaranteed instruction set extension. Since then, SSE3, SSSE3, SSE4.1, SSE4.2, AESNI, FP16C, CLMUL, AVX, FMA, AVX2, BMI1, BMI2, ERMSB, ADX, SHA, and AVX512 have been introduced, among others. While not all are relevant to arbitrary-precision arithmetic, many are.

Sure, you could say cut out all CPUs older than 5 years and go with AVX as a baseline, but many people still use 7+ year old hardware. One of my i7 820 desktop systems is over 6 years old now, and only supports SSE 4.2.

Name: Anonymous 2015-12-07 18:01

>>88,91,92
That's why you use source-based distributions and boycott SEPPLES

Name: Anonymous 2015-12-07 19:23

>>91
Not good enough. >>92 has it right; even if you limit yourself to AMD64 compatible implementations (which is a bad idea in any case - 32 bit mode is ideal for a great many applications), you still have to differentiate K8/Netburst/Core/Bulldozer, determine whether you have support for SSE3/SSE4/AVX...

The performance profile of arithmetic heavy code changes with every architectural iteration (2-3 years, on the outside). This is why Intel caught flak in the past for not bothering to implement anything other than a generic slow path for AMD hardware in code emitted by their C compilers. Just passively failing to optimize more aggressively for specific AMD implementations resulted in a measurable performance difference.

>>93
Compiling from source isn't always worthwhile (if you consider the full breadth of users, it almost never is). If the critical path is too large to be inlined, static linking alone may be enough.

For maximum portability, the ideal would be to keep performance critical code in a separate shared object for each target, and load the right one at runtime. This does require that application developers to stay aware of whether their own code is on the critical path or not. For larger shared codebases (e.g., mplayer) it's more common to just give up and statically link if it shows a performance improvement.

Name: Anonymous 2015-12-07 20:43

-round and -notround resource qualifiers

API 23 makes it easier to build apps for both round and square Android Wear watches. We listened to your feedback and added new resource qualifiers for -round and -notround, so you can use the resource system to load the appropriate images, layouts, and strings based on the type of watch you are working with. You can also combine this with existing resource qualifiers -hdpi, -tvdpi, -280dpi, and -360dpi for the various Android Wear watches that are currently available. All of the existing classes in the wearable UI library, such as WatchViewStub, BoxInsetLayout, and WearableFrameLayout will continue to work as well, so you do not need to change your code. The -round and -notround resource qualifiers will not work on API 22 devices, so you cannot assume they will be available until all devices are on API 23.

Name: Newprogrammer 2015-12-07 20:46

This whole subject is one big advertisement for JIT. Don't generate and distribute a billion different precompiled combinations, tell the system how to generate an "optimal" version for whatever it's running on.

You're not talking about hand-optmized asm anyway, given the dozens of variations just listed here.

Name: Anonymous 2015-12-08 3:44

>>96
JIT for static code? B-but the 90s are over. AOT compilers (of portable code) still blow JITs out of the water. Even Android 5 is AOT, because dalvik sucked.

Name: suigin 2015-12-08 3:47

>>96
Right.

So consider an optimizing JIT capable of targetting numerous, modern CPU architecture families and variants, with the ability to auto-vectorize code and generate machine instructions approaching equivalent hand-optimized code, and without requiring a garbage collector or a huge managed-code runtime.

There's only one existing project that I know of that meets that requirement: LLVM. But LLVM is quite a huge dependency, and not everyone likes polluting their system with something that's written in C++. Embedding LLVM as a JIT compiler is going to add 30MB or so to your executable size.

One thing I was thinking of, however, was writing reference implementations of the bignum kernels in LLVM IR or SPIR-V, then use the LLVM compiler to generate GNU assembly files for the different kernel variants. LLVM IR/SPIR-V have bignum and SIMD instructions, so this would get us perhaps 90% of the way there. A little sed/awk magic to glue it all altogether, and maybe a final pass of hand optimizing of the target assembly code for where LLVM is lacking to take things toward perfection.

The final GNU assembler files would then be all that is required to build the library. The LLVM IR/SPIR-V is just there to aid in adding support for a new architecture, and LLVM isn't required as a dependency. You would still ship all of the kernel variants together in the library and do runtime dispatching based on what your CPU supports.

Name: Anonymous 2015-12-08 3:49

dubs

Name: suigin 2015-12-08 3:53

>>98
The other added benefit of writing it in SPIR-V is that it would serve as a starting point for GPU/FPGA accelerating bignum computing with OpenCL and/or Vulkan, although you would likely want to transition over to using highly parallel algorithms.

Name: Anonymous 2015-12-08 5:31

>>98
LLVM-IR
Does not buy you anything compared to C. The IR is generally useful only if you generate it from some higher level language, not manually writing it. And yes, C has portable SIMD intrinsics (mapping very closely to LLVM because guess why).

Additionaly, SIMD does not buy you that much in bignum most of the time, except some very specific cases (FFT).

>>100
SPIR-V
With SPIR-V, it's the same exact thing. Just write your damn kernels in OpenCL.

As for FPGA - fuck no, you really want to write VHDL for that shit.

Name: Anonymous 2015-12-08 8:53

>>82
How's that going to work with multiplication, smart guy?

Name: Cudder !cXCudderUE 2015-12-08 10:12

From a performance perspective, once a dynamically dispatched function has been run at least once, there's pretty much no overhead as long as the memory containing the function address remains in the L1 cache, at least on x86-64. The additional mov instruction to get the function address in a register usually gets pipelined away.

If you really want to include all the variants -- of which no one will ever use more than 1 of (FUCK YOU FOR MAKING ME DOWNLOAD THIS USELESS BLOAT) --- at least copy the code directly into place so you don't even need to do that unnecessary work on every call.

It's as ridiculous as apps which somehow include 10MB of strings in over a dozen different languages. No one is ever going to use it in more than 1 language (probably English), you're just wasting everyone's disk space and bandwidth (including that of your server).

Name: Anonymous 2015-12-08 11:25

>>101
Yes, you can do that. The idea of using LLVM IR or SPIR-V is that the multi-precision arithmetic instructions, vector types, and loop unrolling/vectorization attributes are part of the spec. Doing the equivalent in C would mean using compiler specific extensions. I'd only be bother with supporting GCC and Clang, which isn't a problem for most people, but might be for some.

>>102
I have multiplication and division working where one addend is single word-sized value. Can do roundtrip conversions between strings and bignums now.

https://github.com/suiginsoft/hebimath/blob/master/src/packet/x86_64/mulp_u.S
https://github.com/suiginsoft/hebimath/blob/master/test/unit/get_set.c

I haven't benchmarked anything, there's a lot of room for optimization in how the conversions are done. For example, using shifting for power of two bases, and fixed-point multiplication for the division.

I will try to get to Karatsuba by this weekend (I'm working during the days).

>>103
If you're going to be compiling from source, you'll be able to spend the 5 minutes configuring it to strip out the unneeded kernel variants.

Name: suigin 2015-12-08 11:55

>>102
Tired, didn't realize you were talking about something else entirely.

Time to retire to a higher plane.

Name: Anonymous 2015-12-08 18:06

llvm is never the solution to anything "suckless"

Name: Anonymous 2015-12-08 18:10

>>86
x86 GET

Name: Anonymous 2015-12-08 19:43

>>104
Checked the code, and it has a very strong data dependency chain (disregarding the loop, which can be fixed fairly easily). It would be fast on 486 which does not execute 30+ ALU operations in one tick (if you break data deps). Is there any reason why not write it in C?

Try it. You'll see the overhead of C is miniscule, because your bottleneck isn't the slight C overhead, but the huge latency introduced by carries.

Delayed carry, together with FFT would of course blow your implementation out of water by several orders of magnitude.

Note that heavy-duty kernels (ie not naive schoolbook) in GMP et al are written in asm to improve register scheduling because the kernel is _large_ (FFT, toom-3, montgomery...) with considerable register pressure. While compilers are often better than humans wrt instruction scheduling (especially ICC), they often do poor job of figuring out the spill schedule inside a hot loop. So the way it's usually done is let compiler produce the code, and then human hacks it in trial and error fashion to squeeze less ticks out of it.

Name: Anonymous 2015-12-08 20:05

>>103
I miss the real Cudder.

If you really want to include all the variants -- of which no one will ever use more than 1 of (FUCK YOU FOR MAKING ME DOWNLOAD THIS USELESS BLOAT) --- at least copy the code directly into place so you don't even need to do that unnecessary work on every call.

If it's already in L1, it's not extra work. The only extra work involved would be if someone actually followed your pointless suggestion to write self modifying code where there's no benefit to doing so.

It's as ridiculous as apps which somehow include 10MB of strings in over a dozen different languages. No one is ever going to use it in more than 1 language (probably English), you're just wasting everyone's disk space and bandwidth (including that of your server).

No, you're wasting everyone's time by forcing them to make sure they pick the correct locale at install time and never change it afterward. l18n string tables are fucking tiny compared to the available space on most systems. If you really care so much, the build process for the app will likely permit you to strip out the ones you don't need. Unless the app developer actually makes this difficult there is no point in complaining.

Name: Anonymous 2015-12-09 9:31

>>109
me 2
Cudder is my waifu

Name: Anonymous 2015-12-09 10:03

trips

Name: Anonymous 2015-12-09 10:18

>>111
Sweet!

Name: Anonymous 2015-12-10 3:17

>>1
FUCK OFF YOUR ANIME SUCKS, YOUR HARDWARE SUCKS, GO FUCK CUDDER YOU SHRIMP

Name: Anonymous 2015-12-10 6:15

What would LAC do in a situation like this?

Name: Anonymous 2015-12-10 7:48

>>114
Probably call us a bunch of stack boys.

Name: Cudder !cXCudderUE 2015-12-10 11:13

If it's already in L1, it's not extra work
You forgot the fact that L1 is shared by all the code running on the CPU, and if your code is taking up some portion of it, it got there by forcing some other code out. Not realising this is why microbenchmarking is stupid. Faster in this small part makes everything else slower.

No, you're wasting everyone's time by forcing them to make sure they pick the correct locale at install time and never change it afterward.
Waiting a lot longer (it adds up really quickly if everyone does this) for each download, along with extracting and installing shit I don't need and probably never will, or an extra second or two of nearly no thought?

l18n string tables are fucking tiny compared to the available space on most systems
"The amount of chemicals we dump into the ocean is fucking tiny compared to the amount of water in there". That didn't turn out so well, did it? It's not just YOU who will be doing it, if you're saying everyone should be.

http://i.stack.imgur.com/yDd8C.png

Look at the size of all the Windows language packs. That's not even all of them and it's 1GB+ already. Do you want to wait for another hour or two, paying for the power and bandwidth, downloading data which is absolutely useless to you?

Name: suigin 2015-12-11 11:49

>>116
Are you happy? It's not fully working yet, but it will be soon. Baka.

https://github.com/suiginsoft/hebimath/blob/master/config.mk

As a consequence, I now generate hebimath.h from a simple template processed by an awk one-liner.

Name: suigin 2015-12-13 13:53

>>108
Thanks. I now have a generic version in C. It's still over twice as slow as the assembly optimized version.

It's hard to get it to generate good code without venturing outside of the standard. The largest guaranteed integer type is of course unsigned long long (or uintmax_t, which is a typedef for unsigned long long on most platforms). So on i386 or x86-64 for example, there's no standard way to capture the high 64-bits of a 64x64->128 multiply. You are stuck with doing 32x32->64 bit multiplications in C, on these platforms.

Name: Anonymous 2015-12-13 17:31

>>118
long long long cat;

Name: suigin 2015-12-14 13:05

>>108,118
Was going to do a backend specifically for Clang using it's language extensions, but turns out it generates suboptimal code. Instead of keeping track of the carry flag in eflags, it uses sbb to move the carry flag into a gpr. It performs no better than the version written in standard C.

I'm surprised they still can't generate optimal code for arbitrary-precision arithmetic, even with intrinsics, after all these years.

Name: suigin 2015-12-16 10:42

Well, well, well...

void
hebi_mulp_karatsuba(
hebi_packet *restrict r,
hebi_packet *restrict w,
const hebi_packet *a,
const hebi_packet *b,
size_t an,
size_t bn )
{
size_t k, l, m, n;

if (an <= 4 || bn <= 1) {
hebi_mulp(r, a, b, an, bn);
return;
}

l = an + bn + 1;
m = (an + 1) / 2;
n = bn < m ? bn : m;

hebi_setp_u(w+m, hebi_addp(w, a, a+m, m, an-m));
hebi_setp_u(w+m+n+1, hebi_addp(w+m+1, b, b+n, n, bn-n));
hebi_mulp_karatsuba(r+m, w+m+n+2, w, w+m+1, m+1, n+1);

k = an-m + bn-n;
hebi_zerop(w, k+1);
hebi_mulp_karatsuba(w, w+k+1, a+m, b+n, an-m, bn-n);
hebi_addp(r+m+n, r+m+n, w, l-m-n, k);
hebi_subp(r+m, r+m, w, l-m, k);

k = m + n;
hebi_zerop(w, k+1);
hebi_mulp_karatsuba(w, w+k+1, a, b, m, n);
hebi_addp(r, r, w, l, k);
hebi_subp(r+m, r+m, w, l-m, k);
}


Assumes an >= bn >= ceil(a/2). All temporary/work memory is allocated up-front using a discrete upper-bound function to compute the space required and is passed in through the 'w' pointer. Should be a lot faster than libtommath, which performs multiple allocations/deallocations at each recursive level.

Name: Anonymous 2015-12-16 11:05

dubs

Name: Anonymous 2015-12-17 2:12

Optimized row-order long-multiplication for x86-64.

https://github.com/suiginsoft/hebimath/blob/master/src/kern/x86_64/mulp.S

Some micro-benchmarks on my AMD A10-7850K, multiplying 2000 iterations of 16384 byte x 8192 byte -> 24576 byte length integers:
Standard C long-multiplication kernel: 22.61794163 seconds
x86-64 assembly long-multiplication kernel: 6.17667357 seconds
Karatsuba w/ Standard C long-multiplication kernel: 8.23420589 seconds
Karatsuba w/ x86-64 assembly long-multiplication kernel: 2.34662074 seconds

Name: suigin 2015-12-17 2:25

Also, my software is no longer s̶u̶c̶k̶l̶e̶s̶s̶communist software. I hereby christen my software as your choice of traditionalist software, heroic software, spartan software, Spenglarian software, or reactionary software. Take your pick.

Don't get me wrong, the suckless people write nice software. And I'll continue to use it. It's a shame their software sentiments do not extend to real-life.

On one hand, they favor small, decentralized software programs. On the other, they advocate for collectivization, centralization, and increasing power of the state, in order to ``apparently'' eliminate it.

Does not compute.

Name: Anonymous 2015-12-17 2:38

the suckless people write nice software
lolno http://bbs.progrider.org/prog/read/1428010887

Name: suigin 2015-12-17 5:09

>>125
I said nice software, not perfect. I too was aghast when they started ``refactoring'' one line fragments into their own functions for sake of ``readability.'' I also don't like how their ls(1) defaults to column output with -l. Whoever the fuck did that should have just used a shell alias to scratch their own itch.

So what to do, revive anon coreutils? Fork sbase? Easy. At that point, the problem isn't so much the software, but creating and nurturing a viable community. Therein lies the real undertaking.

Name: Anonymous 2015-12-17 6:19

>>122
Fuck your dubs.

Check these bits!

Name: Anonymous 2015-12-17 6:38

>>127
Oh yeah, well check my bits.

Name: suigin 2015-12-17 11:18

>>116
Alright, compile-time function multi-versioning is now fully working for x86-64. It strips out everyone you don't want, according to your `config.inc' file, and calls the kernel functions directly by address.

See config.mk on how to configure it for static mode, and run $ make config.inc to generate your `config.inc' file for editing.

Dynamic run-time multi-versioning still works as before.

Name: Anonymous 2015-12-17 11:26

A big num library that outperforms GMP would be a historic /prog/ event.

Name: Anonymous 2015-12-17 23:05

>>128
Too many 0s, not enough 1s.

Name: Anonymous 2015-12-17 23:24

>>124
I think software should speak on its own, no matter what the developer's political ideas are, or their favorite food or where they live, etc.
If the software is good, it's good. The end.
Also, the suckless and 2f30 community are composed of several people with very different ideals. The thing they have in common is: simple software. The rest? Doesn't matter.
Forking won't improve the software just because you have different political views compared to some of the members. I think you're taking the wrong turn, but that's just my opinion.

>>126
If you send a simple patch + rationale for the patch, it'll almost surely be accepted. And sbase will be improved.
But saying the current implementation has such and such flaws won't help. Discussion (with patches provided) will help in the other hand. Join #2f30 suigin. Fragmentation isn't the answer, I believe.

Name: Anonymous 2015-12-18 1:09

>>132
Fragmentation isn't the answer, I believe
You are right, when is suckmore going to join ACU?

Name: Anonymous 2016-05-26 19:19

>>133
What`s ACU?

Name: Anonymous 2016-05-26 20:39

>>134
address computation unit

Name: Anonymous 2016-05-27 7:47

>>134
anaphylactic cunt undulation, usually caused by the act of suckshit.

Name: Anonymous 2016-05-28 13:06

Abilene Christian University

Name: Anonymous 2016-05-29 19:33

bc(1) candidate implementation for sbase WIP
https://github.com/suiginsoft/bc

Hey Suigin.
Will you update your bc(1) with big-integer support by using either hebimath or libzahl?
Are you working on any other suckless project or still busy with the soul-sucking corporate job?
I hope you have fun soon.

Name: Anonymous 2016-05-30 2:10

>>138
bc(1) isn't a priority for me. Still working on hebimath in my personal time, libzahl has definitely been a strong influence in terms of design. I've got some major changes and additions in the pipe.

I've also discovered this piece of useful software, which I'm attempting to use: https://cs.stanford.edu/people/sharmar/pubs/asplos291-schkufza.pdf

In my opinion, neither libzahl nor hebimath are suitable for a suckless bc(1). What you want to use there is a very minimalistic suite of functions.

Name: Anonymous 2016-05-30 5:14

>>139
That's fine. suckless isn't suitable for anything anyway.

Name: Anonymous 2016-05-30 10:15

the critical flaw in suckless is that they're trying to prevent bloat by optimizing for SLOC. this does prevent bloat but in a suboptimal way that sacrifices functionality and usability for user and maintainability (because muh SLOC limit) and readability (because a dirty deep magic hack can have less SLOC than a proper solution) for programmers.

for example, does anyone actually browse the web with surf? no, because it lacks features for actually browsing the web (that goes double for powerusers who like seeing source, inspecting or just having over 9000 tabs at the same time). rendering HTML is not enough.

the proper way of making a bloat-free project (especially hobby projects where you don't have corporate assholes with their deadline, changing requirements and le enterprise agile UML XML smart IOT buzzwords) would be a clearly defined scope (THE most important thing), a simple (preferably modular for easier maintenance) design and an adherence to good coding standards.

basically, aiming for low SLOC count makes sense only if you're doing code golf. if you want to make actually useful software, it should just be a rule of thumb

Name: Anonymous 2016-05-30 12:21

>>141
How about forgetting about the whiny kiddie notion of "bloat" altogether, and just make sure the program runs fast enough in real-world usage, can be maintained, and has a well designed feature set? Do whatever it takes to make that happen, fuck the haters.

Name: Anonymous 2016-05-30 12:48

>>142
fast enough in real-world usage

that's nebulous as fuck though, things that work fast enough on my gaming PC will be slow on my 7-year old laptop which wassn't even that good when I bought it. and that's a real concern because both browsers and webapps got so bloated, said laptop will be slow at basic web browsing unless I disable all the JS bullshit.

also

runs fast enough in real-world usage, can be maintained, and has a well designed feature set

isn't the whole thing synonymous with not being bloated, avoiding feature creep etc. while maintaining functionality? aka the thing I've been talking about all along.

basically, bloat isn't about SLOC. it's about feature creep, overengineering and plain old adding useless bullshit to impress the general audience.

Name: Anonymous 2016-05-30 14:29

designing a suckless dubs checking library

Name: Anonymous 2016-05-30 14:42

>>143
isn't the whole thing synonymous with not being bloated, avoiding feature creep etc. while maintaining functionality? aka the thing I've been talking about all along.

No, the term "not being bloated" means a billion different things, usually harping on faggot concerns like nitpicking the size of the source code base or the memory footprint.

Name: Anonymous 2016-05-30 14:44

>>145
or even worse, the download/install size.

Name: Anonymous 2016-05-30 15:07

Overture - To walk a less harmful path
===============================================================================

It is a well known phenomenon that all great fields of science and higher
thought at a particular point in history reflect the unique nuances and aspects
of culture of their host civilizations. When it comes to the perilous state of
the field of ``Computer Science'' and the software development community at
large, it is no different. One simply has to look at the culture and direction
of the present civilizations that currently hold grip over computing
to understand the source of our predicament.

When a cycle of civilization is reaching its end, it is difficult to achieve
anything by resisting it and by directly opposing the forces in motion. The
``current'' is too strong; one would be overwhelmed. The essential thing is
to not let oneself be impressed by the omnipotence and apparent triumph of the
forces of the epoch. These forces, devoid of connection to any higher princ-
iple, are in fact on a short chain. One should not become fixated on the
present and on things at hand, but keep in view the conditions that may come
about in the future. Thus the principle to follow could be that of letting
the forces and processes of this epoch take their own course, while keeping
oneself firm and ready to intervene when the great behemoths of the software
and computing world--and their supporting base of ignorant users--finally
succumb to irrelevance.

The old Christian saying ``resist not evil'' may have a similar meaning, if
taken in a very particular way. One abandons direct action and retreats to a
more internal position. The perspective offered by the doctrine of cyclical
laws is implicit here. When one cycle closes, another begins, and the point
at which a given process reaches its extreme is also the point at which it
turns in the opposite direction.

But there is still the problem of continuity between the two cycles. To use
an image from the poet Hofmannsthal, the positive solution would be that of a
meeting between those who have been able to stay awake through the long night,
and those who may appear the next morning. But one cannot be sure of this
happening. It is impossible to foresee with certainty how, and on what plane,
there can be any continuity between the cycle that is nearing its end and the
next one.

My assertion that today there is no political system, no software development
community, and no computing methodology whatsoever worth devoting oneself to,
and that everything existing must be denied, likely disconcerts many. However,
this denial and non-commitment do not derive from a lack of principles, but
from the possession of principles, which are precise, solid and not subject to
compromise. In the life of today it can be appropriate, for many, to withdraw
in order to settle in a more interior line of trenches, so that which we cannot
do anything about cannot do anything against us.

However, it should not be encouraged that people let themselves go, but
precisely the contrary: a strict discipline of life and of computing brought
to the highest point should be the foremost aspiration. On this inner,
spiritual plane of the individual, what is required is the opposite of non-
involvement. To draw your attention to the possibility that, before thinking
of outer actions, often dictated by momentary enthusiasms, one should think of
the formation of oneself, the action on oneself, against everything that is
formless, temporary, mediocre and borgeois.

We must thus pay attention to the particular plane of problems that are
oriented inward from the anterior, and moreover, realise that not every
individual we come across can expect to identify himself with modernity.
Perhaps he too is alienated from the present and desires only lucidity and
realism in how he conducts his life and carries out computing. Therein a
common ground can be found. And if there are men willing to fight in spite
of all, even on lost positions, one should be most grateful to them.

Name: Anonymous 2016-05-30 15:09

Manifest
=======

Many (open source) hackers are proud if they achieve large amounts of code, because they believe the more lines of code they’ve written, the more progress they have made. The more progress they have made, the more skilled they are. This is simply a delusion.

Most hackers actually don’t care much about code quality. Thus, if they get something working which seems to solve a problem, they stick with it. If this kind of software development is applied to the same source code throughout its entire life-cycle, we’re left with large amounts of code, a totally screwed code structure, and a flawed system design. This is because of a lack of conceptual clarity and integrity in the development process.

Code complexity is the mother of bloated, hard to use, and totally inconsistent software. With complex code, problems are solved in suboptimal ways, valuable resources are endlessly tied up, performance slows to a halt, and vulnerabilities become a commonplace. The only solution is to scrap the entire project and rewrite it from scratch.

The bad news: quality rewrites rarely happen, because hackers are proud of large amounts of code. They think they understand the complexity in the code, thus there’s no need to rewrite it. They think of themselves as masterminds, understanding what others can never hope to grasp. To these types, complex software is the ideal.

Ingenious ideas are simple. Ingenious software is simple. Simplicity is the heart of the Unix philosophy. The more code lines you have removed, the more progress you have made. As the number of lines of code in your software shrinks, the more skilled you have become and the less your software sucks.

Name: Anonymous 2016-05-30 20:43

>>148
What a pile of horseshit, projecting his pet peeve onto everyone else.

Name: suigin 2016-05-31 6:07

>>147
Someone finally noticed! It was actually meant as a half-serious joke to create board drama, but no one picked it up until now... Now it's too late. I may end up removing it and the bonus/ content from Hebimath when I get close to tagging a release.

>>148
That is the Suckless Philosophy.

http://suckless.org/philosophy

Name: Anonymous 2016-05-31 6:26

>>145
I already said those concerns are mostly bullshit so I'm not sure what are we arguing about

Name: Anonymous 2016-05-31 8:14

>>151
I'm arguing that you should stop using the word "bloat", because it's meaningless and associated with faggots.

Name: Anonymous 2016-05-31 8:55

>>152
what would be the better term? 'feature creep, reinventing the wheel, Greenspun's law, pointless fucking bullshit that keeps running in the background, labyrinthine object oriented design and the very existence of XML' is a bit too long.

Name: Anonymous 2016-05-31 10:13

Name: Anonymous 2016-05-31 10:35

>>154
I was thinking of 'Enterprise™-level software design' but this works too

Name: Anonymous 2016-05-31 11:32

>>150
I notioced a long time ago. Almost went as far as submitting a patch to edit the suckless philosophy page with your joke text, but decided not to bother them. The bonus/ is cool too.

Name: Anonymous 2016-05-31 16:55

The idea that a vast majority of 'hackers' measure their e-peen by LoC is one of the more stupid delusions I've seen in a long time. Fuck suckless.

Name: Anonymous 2016-06-01 5:18

>>157
You're confusing what Suckless desires with low LoC, because low LoC ends up being a side-effect of it.

To understand what Suckless is really after, here's the essence of it:

https://www.youtube.com/watch?v=YyIQKBzIuBY

Name: Anonymous 2016-06-01 5:33

Name: Anonymous 2016-06-01 7:48

>>158
What crack are you smoking? Their manifesto accusing everybody of LoC e-peenism is right in >>148, you dickless nigger.

Name: Anonymous 2016-06-01 15:53

158
That was a nice talk but it seems to go against tinkering with C like suckless does.

Don't change these.
Name: Email:
Entire Thread Thread List