Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

designing a suckless bignum library

Name: Anonymous 2015-11-16 22:11

Let's design a suckless bignum library. (I'm not part of suckless though, just curious about replacing GMP).

I researched a bit into algorithms and the rundown is this:
* long multiplication: O(n^2)
* karatsuba O(n^1.5)
* Toom-Cook, fourier transform based methods - even faster but only used for numbers 10k digits+ long. Much more complex.

So we should probably use karatsuba for all multiplications. Squaring can be done a bit faster than multiplying two different numbers sometimes.

Now I suggest programming it in assembly, that gives you access to the carry bit (C doesn't get you that). Of course we will use libc and the normal C calling conventions so that it's a regular C library.

What to do about memory management? e.g. if you want to add two numbers do we need to allocate a new 'number' as long as the largest to write the result into or do it destructively "x <- x + y"? Maybe the library should support both - then a calculator program would figure out the best primitives to use for a given computation.

It might be nice to also support things like (big modulus) modular arithmetic and polynomials. stuff like exponentiation and modular inverses have interesting algorithms.

What other integer operations would we want? I don't really want to do anything with arb. prec. real numbers - arithmetic with rationals could be done though.

Name: Anonymous 2015-11-16 22:13

Oh: prime testing, generating and factorization.

Name: Anonymous 2015-11-16 22:26

This sounds like a really interesting project! I'm interested in what kind of scope you have in mind. What kinds of operations should be supported? Aside from integers and modular arithmetic, do you also want to have real numbers, vectors, matrices, rationals, etc? What systems do you want to support (x86, MIPS, etc)? What do you think would be a nice goal for a prototype/proof of concept?

I have some ideas for what a very simple prototype could include:
* a representation for arbitrary length integers
* addition/subtraction/multiplication on these integers
* multiplication implemented with karatsuba
* c calling convention wrapper

Name: Anonymous 2015-11-17 2:16

Make everything fixed point and represent everything in BCD.
END OF DISCUSSION

Name: Anonymous 2015-11-17 2:21

>>4
BCD? are we writing this for the fucking NES?

Name: Anonymous 2015-11-17 3:01

make it in haskell

Name: Anonymous 2015-11-17 3:02

>>5
The 6502 variant in the NES had the BCD support removed, dumbass.

Name: suigin 2015-11-17 5:21

>>1
Would be interesting to know if these new intrinsics generate (near) optimal code if used to implement bignums, if it works, it would save you having to get your hands dirty with lots of assembly:

https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Integer-Overflow-Builtins.html#Integer-Overflow-Builtins

Clang also supports those intrinsics, plus it has its own set of intrinsics specifically for arbitrary precision arithmetic:

http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins

And of course, there's Intel's ADX extensions, but AMD doesn't implement them yet. Have to wait for Zen.

https://software.intel.com/en-us/node/523866

Name: Anonymous 2015-11-17 15:35

>>7
that was le second part of le joke

Name: Anonymous 2015-11-17 15:35

>>1
who is tom cook
is he like tom nook
do i owe him money

Name: Anonymous 2015-11-17 19:33

>>5,7
He obviously meant Game Boy. The GBZ80 has some BCD support.

Name: Anonymous 2015-11-17 19:43

Just do integers. Destructive updates could be specified by when the output parameter has the same address as an input parameter.

Name: Anonymous 2015-11-18 2:35

>>10
Apple CEO. He's a poof.

Name: Anonymous 2015-11-18 3:04

>>12
This might limit some algorithms though.

Name: Anonymous 2015-11-18 6:25

>>8
Having your dynamically defined compiler generate the proper assembly is far better than getting your hands dirty with that horrible C shit.

Name: Anonymous 2015-11-18 12:53

>>2
i got a bit of code for that
it can do the first 5,000,000 ints in 13 sec running JIT javascript on my smartphone

Name: Anonymous 2015-11-19 2:25

>>2
See PARI/GP

Name: Anonymous 2015-11-19 3:38

>>15
I don't know of an existing C compiler that will generate optimal assembly for bignums without using intrinsics. Implementing semantic rules for that would be extremely complex, and may cause performance problems for non-bignum code.

You're free to submit patches to GCC or Clang/LLVM. But from a practical standpoint, one must work with existing compiler technology. And so you either hand roll the assembly yourself or use intrinsics.

Name: suigin 2015-11-19 4:01

>>1
What to do about memory management? e.g. if you want to add two numbers do we need to allocate a new 'number' as long as the largest to write the result into or do it destructively "x <- x + y"? Maybe the library should support both - then a calculator program would figure out the best primitives to use for a given computation.

Been thinking about this. I think the simplest thing would be to consider memory management as outside of the scope of the library. The set of functions implemented by the library would be low-level and require that the buffers passed in be suitably large, otherwise overflow and truncation occurs.

Memory management would be left up to the programmer using the library. This would be sufficient for implementing bc(1) or dc(1) for sbase, which uses fixed size buffers and precision that can be adjusted by the user.

If people want a set of bignum functions that manages memory, this could be done by layering a higher-level library on top of the lower-level one.

Name: Anonymous 2015-11-19 4:29

Karatsuba requires temporary storage for the sums.

That would make for a very unfriendly API if there is no library-side memory management.

Name: Anonymous 2015-11-19 5:01

>>19
That sorta requires some way to retrieve the required size of a calculation and its result in order to allocate the right amount of memory, though.

Name: suigin 2015-11-19 5:31

>>21
There's usually an upper-bound for the size of the result that can be determined ahead of time based on the size of the inputs.

If, on the other hand, you try to implement it so that you always allocate just as much memory as you need, your bignum operations become far more complex, as they need to detect this condition, allocate more memory, then restart where they left off. It also becomes more CPU cache intensive when done this way.

From both a complexity and performance standpoint, for most operations, I think it would be better if you allocate everything up front, and then do the operation in one shot. Memory capacity is cheap.

>>20
There are also upper-bounds for the amount of temporary storage you need as well, and the library can be designed so that you have to pass in pointers to temporary buffers. A higher-level library with an easier-to-use API can still be layered on top, which handles all of the memory management.

Name: Anonymous 2015-11-19 5:50

>>18
That's why I said C is horrible shit for this. Get out of that fucking abomination of a language.

Its notions of expressions and evaluation are so far up its own ass it's impossible to fundamentally change anything in the language. It's fast for one class of activity, but is shit slow if you try to break out of it.

Name: Anonymous 2015-11-19 6:32

>>23
That's why I said C is horrible shit for this. Get out of that fucking abomination of a language.

So we should implement a ``suckless'' bignum library in LISP? But anon, LISP already has bignums. What's the point?

Or you meant Rust or Go? Well, both Rust and Go compilers have the same problem as C in this regard.

Besides, C compiler intrinsics are no different than LISP forms that are builtin to the LISP compiler or interpreter, instead of implemented with macros on top of more elementary forms. It's the same idea. In order to get an optimal mapping from a form to machine instructions, the LISP implementer often has to do the job himself, or use some form of brute-force search or genetic algorithm combined with runtime profiling to arrive at the optimal solution.

Name: Anonymous 2015-11-19 8:20

>>24
So we should implement a ``suckless'' bignum library in LISP? But anon, LISP already has bignums. What's the point?

If you feel like creating a better bignum implementation, what better place to play with it than a language which supports bignums? Why dick around with calling library functions instead of directly using immediate built-in syntax?

Besides, C compiler intrinsics are no different than LISP forms that are builtin to the LISP compiler or interpreter, instead of implemented with macros on top of more elementary forms.

Lol, do you even Lisp, bro?

You don't do this with regular macros, you do it with compiler macros and defining new transforms in the intermediate compiler definitions. You can add CPU-specific AVX instructions or whatever as a general toolkit for the compiler to use without having to code entire routines in asm. Lisp lets you hone Lisp to your problem, instead of fighting against the C compiler's fixed assumptions.

In order to get an optimal mapping from a form to machine instructions, the LISP implementer often has to do the job himself, or use some form of brute-force search or genetic algorithm combined with runtime profiling to arrive at the optimal solution.

Bullshit. Type inference carries a long way, and the language passes a higher level of abstraction to the compiler for optimization than C, with a billion times less undefined behavior keeping it cleaner. And if you're talking about high-level human optimization of algorithms and style, that's the same in every single language.

Name: Anonymous 2015-11-19 8:36

I remember reading about a language that let you specify optimizations. You could just specify a rule add(negate(X), Y) -> sub(Y, X) and the optimizer would do that transformation at the start of compilation.

Name: Anonymous 2015-11-19 9:01

>>26
You're describing pretty much every mature symbolic language.

Name: Anonymous 2015-11-19 13:19

>>27
Is javascript mature yet?

Name: Anonymous 2015-11-19 16:17

    -=つ=つ
  =⊃∧_∧≡⊃ミヽ☆_∧ ∩
. -つ <# `Д´>=つ `☆ ))∀`)//
   =つ   つ≡ ☆*☆ググル /

Name: Anonymous 2015-11-19 16:31

オレノナハ エラー ダ...

Name: Anonymous 2015-11-19 17:04

>>20
this is an important problem

>>23
you are not competent to be involved in this thread.

Name: Anonymous 2015-11-19 17:31

>>25
You can add CPU-specific AVX instructions or whatever as a general toolkit for the compiler to use without having to code entire routines in asm. Lisp lets you hone Lisp to your problem, instead of fighting against the C compiler's fixed assumptions.

C has this too. It's called ``inline assembly''. Your inability to see the parallels between these C constructs and LISP makes me want to shake my head. Sure, LISP perhaps does it better, but since ``suckless'' more or less sticks to POSIX standards and the UNIX way, the project, by definition, is best implemented in C.

Bullshit. Type inference carries a long way, and the language passes a higher level of abstraction to the compiler for optimization than C, with a billion times less undefined behavior keeping it cleaner. And if you're talking about high-level human optimization of algorithms and style, that's the same in every single language.

There isn't always a one to one mapping from a form to a sequence of instructions, especially on modern RISC and CISC CPUs with vector instruction sets, out-of-order execution, multi-level instruction and data memory caches, etc.

Finding the optimal sequence of instructions can be a non-deterministic polynomial time operation, and it becomes very expensive as the size of a sequence grows. Compilers take a lot of short-cuts in their code generators.

Name: Anonymous 2015-11-19 19:21

Name: Anonymous 2015-11-20 4:17

>>32
C has this too. It's called ``inline assembly''.

You need some reading comprehension. Fuck inline assembly, that's what I said you can escape by leaving C. Writing assembly is WRONG. Telling the compiler how to use new intrinsics to make your existing code use CPU specifics is right. You do NOT end up including blocks of asm code, but write your inner algorithms in plain Lisp.

There isn't always a one to one mapping from a form to a sequence of instructions blah blah blah

None of that is language specific.

However, re "Compilers take a lot of short-cuts in their code generators", the C compiler is very constrained because of all the undefined behaviors from the spec that prevents very obvious optimizations. A better specified language can take advantage of much more.

Name: Anonymous 2015-11-20 7:21

You're fucking welcome.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define swap(__a,__b); { __a = __a ^ __b; __b = __a ^ __b; __a = __a ^ __b; }

char* shift_left(char* src){
int len = strlen(src);
char* moved = calloc(len - 1, sizeof(char));
memcpy(moved, src + 1, len - 1);
return moved;
}

char* add(char* addend, char* addend2){
int len = strlen(addend), a2len = strlen(addend2);
int swapped = 0, result_digits = 0;

if (len < a2len){
swap((int)addend, (int)addend2);
swap(len, a2len);
swapped = 1;
}
char* a2resized = calloc(len + 1, sizeof(char));
memset(a2resized, '0', len);
a2resized[len] = 0;
memcpy(a2resized + len - a2len, addend2, a2len);

int reslen = len + 1;
char* result = calloc(reslen, sizeof(char));
memset(result, '0', reslen);
int carry = 0;
for (int i = 0; i < len; i++){
int respos = reslen - i - 1;
int a1val = addend[len - i - 1] - 48, a2val = a2resized[len - i - 1] - 48;
int val = a1val + a2val + carry;
carry = 0;
if (val >= 10){
result[respos] = (val - 10) + 48;
carry = 1;
}
else{
result[respos] = val + 48;
}
}
free(a2resized);
if (swapped == 1) swap((int)addend, (int)addend2);
if (carry == 1){
result[0] = 49;
result[reslen] = '\0';
}else{
char* shifted_result = shift_left(result);
swap((int)result, (int)shifted_result);
free(shifted_result);
result[reslen-1] = '\0';
}
return result;
}

int main(int argc, char *argv[]){
char* a = "99999999999999999999";
char* b = "99999999999999999999";
char* c;
char* d = "9999999999999990";
swap((int)a, (int)b);
c = add(a, b);
printf("%s + %s = %s\n", a, b, c);
char* res = add(d, c);
printf("%s + %s = %s\n", d, c, res);
return 0;
}

Name: Anonymous 2015-11-20 13:16

>>35
Why no discrete logarithm?

Name: Anonymous 2015-11-20 13:53

>>35
pleb tier swap macro in C
not using the glorious xchg opcode in asm

disdain for plebs

Name: Anonymous 2015-11-20 14:04

>>37
Fix it. Post the better version.
I see >>35-san's work, but I don't see yours.

Name: Anonymous 2015-11-20 14:21

Let's

no

Name: Anonymous 2015-11-20 15:09

>>39
stay out of the thread then, consumer

Name: Anonymous 2015-11-20 16:39

op, if you're serious about this, do:

* modular arith only (ie, crypto only). for fast infinite precision, you really need hairy ball of code like GMP, no way to be suckless
* shift-and-add multiplier, then you get modulo for free
* Why not schoolbook & barrett reduction? Sure it works better on superscalar CPUs with ALUs, but it *will* suck horribly on a CPU without multiplier (which is everything really low power these days)
* basics, ie modular add,sub,mul,pow are ~500loc. prime tester (miller-rabin) 100 loc. other bells and whistles (ie actual PKCS+RSA, DHE, maybe even ECDHE) 1kloc. 2kloc all.

What to do about memory management?

The golden rule of suckless C - just forget dynamic memory. Dynamically allocated memory was a mistake. K&R were right. You free memory through return or exit(). Just cap your bignum type to some hardcoded limb count. Sacrifice a bit of stack depth, and and no need for silly realloc() every time you add two numbers. Not to mention then you don't even need libc and can run baremetal easily.

As an example of bignum of this style, here is 60 loc RSA signature verifier for a (very) ROM space constrained embedded MCU bootloader:

#define RSA_BYTES (RSA_BITS/8)
#define DIGIT uint32_t
#define DIGIT_BYTES (sizeof(DIGIT))
#define DIGIT_BITS (DIGIT_BYTES*8)
#define DIGIT_MAX ((DIGIT)(-1))
#define NDIGITS (RSA_BITS/DIGIT_BITS)

static int b_add(DIGIT * restrict r, const DIGIT *x, const DIGIT *y)
{
DIGIT w, carry = 0;
for (int i = 0; i < NDIGITS; i++) {
if ((w = x[i] + carry) < carry)
w = y[i];
else
carry = ((w += y[i]) < y[i]);
r[i] = w;
}
return carry;
}

static int b_mulmod(DIGIT * restrict res, const DIGIT *xsrc, const DIGIT *y, const DIGIT *mod)
{
DIGIT rbuf1[NDIGITS], rbuf2[NDIGITS], xbuf1[NDIGITS], xbuf2[NDIGITS];

DIGIT *r1 = rbuf1;
DIGIT *r2 = rbuf2;
DIGIT *x1 = xbuf1;
DIGIT *x2 = xbuf2;

DIGIT *swp;

memset(rbuf1, 0, sizeof rbuf1);
memcpy(xbuf1, xsrc, sizeof xbuf1);

for (int i = 0; i < NDIGITS; i++) {
for (DIGIT bit = 1; bit; bit += bit) {
if (y[i] & bit) {
if (b_add(r2, r1, x1))
return -1;
if (!b_add(r1, r2, mod)) {
swp = r1;
r1 = r2;
r2 = swp;
}
}
if (b_add(x2, x1, x1))
return -1;

if (!b_add(x1, x2, mod)) {
swp = x1;
x1 = x2;
x2 = swp;
}
}
}
memcpy(res, r1, sizeof rbuf1);
return 0;
}

static int rsa_public(DIGIT *output, const DIGIT *input, const DIGIT *modulus)
{
DIGIT buf2[NDIGITS];
/* buf2 = buf^2 % modulus */
if (b_mulmod(buf2, input, input, modulus))
return -1;
/* buf3 = buf^3 % modulus */
return b_mulmod(output, buf2, input, modulus);
}


(modulus is stored negated, thus no need for b_sub)

Name: Anonymous 2015-11-20 17:12

>>41
this is really nice and I appreciate it.

One question: what about nice code for modular arithmetic like this, but then also another bit of code for infinite precision - does it really have to be hairy? If we just pick one algorithm that works reasonably well it should be decent enough and not suck?

Name: Anonymous 2015-11-20 18:05

>>41
for fast infinite precision, you really need hairy ball of code like GMP, no way to be suckless

Prove it.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List