Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

designing a suckless bignum library

Name: Anonymous 2015-11-16 22:11

Let's design a suckless bignum library. (I'm not part of suckless though, just curious about replacing GMP).

I researched a bit into algorithms and the rundown is this:
* long multiplication: O(n^2)
* karatsuba O(n^1.5)
* Toom-Cook, fourier transform based methods - even faster but only used for numbers 10k digits+ long. Much more complex.

So we should probably use karatsuba for all multiplications. Squaring can be done a bit faster than multiplying two different numbers sometimes.

Now I suggest programming it in assembly, that gives you access to the carry bit (C doesn't get you that). Of course we will use libc and the normal C calling conventions so that it's a regular C library.

What to do about memory management? e.g. if you want to add two numbers do we need to allocate a new 'number' as long as the largest to write the result into or do it destructively "x <- x + y"? Maybe the library should support both - then a calculator program would figure out the best primitives to use for a given computation.

It might be nice to also support things like (big modulus) modular arithmetic and polynomials. stuff like exponentiation and modular inverses have interesting algorithms.

What other integer operations would we want? I don't really want to do anything with arb. prec. real numbers - arithmetic with rationals could be done though.

Name: Anonymous 2015-12-08 19:43

>>104
Checked the code, and it has a very strong data dependency chain (disregarding the loop, which can be fixed fairly easily). It would be fast on 486 which does not execute 30+ ALU operations in one tick (if you break data deps). Is there any reason why not write it in C?

Try it. You'll see the overhead of C is miniscule, because your bottleneck isn't the slight C overhead, but the huge latency introduced by carries.

Delayed carry, together with FFT would of course blow your implementation out of water by several orders of magnitude.

Note that heavy-duty kernels (ie not naive schoolbook) in GMP et al are written in asm to improve register scheduling because the kernel is _large_ (FFT, toom-3, montgomery...) with considerable register pressure. While compilers are often better than humans wrt instruction scheduling (especially ICC), they often do poor job of figuring out the spill schedule inside a hot loop. So the way it's usually done is let compiler produce the code, and then human hacks it in trial and error fashion to squeeze less ticks out of it.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List