I would be super happy to have someone find a way to make this faster and/or use less RAM. I did some research a few months ago and found a GitHub repo here: https://github.com/TobyLobster/multiply_test with a huge number of libraries with benchmarks. I had started with doing the shift and add to get to basic operation, then I tried using these algorithms and smaller precompute tables. In pursuit of performance I have let these things grow to pretty big tables. I have been doing some refactoring of my benchmarks after discovering issues with the way I’d been benching them before so it is possible that gains that appeared with table growth were fictional. Right now I am filling 128KB in the REU for an 8x8 table. My libraries below are not yet smart about this and duplicate this table so I am working on having that be detected and managed across projects using my public lib-contracts repo. So that you understand the goal here and why I have been cavalier about RAM usage - I have an extreme need for speed. As it stands it is not really possible for the TLS 1.3 handshake to complete before a normal server would time out and close the connection - even at 48Mhz on the UI64E this is a problem. I am trying to maintain side channel protections with constant time as well. Doing CT has caused significant performance degradation, and I have thought about whether this is so slow that I could intentionally introduce a PRNG-generated delay for CT protection at lower cost than the huge number of cycles that need to be spent doing useless math just to avoid side channel leaks. These are the crypto libraries I have built that the bigger projects are actively working with that use precompute tables. They are a work in progress but x25519 and nist-curves P-256 have worked end to end with the C64 completing a TLS 1.3 handshake and HTTPS GET against a patient listener. I have not yet extended that to completing the certificate validation that uses the nist-curves P-384 implementation: https://github.com/JC-000/c64-ChaCha20-Poly1305 https://github.com/JC-000/c64-x25519 https://github.com/JC-000/c64-nist-curves Justin > On May 20, 2026, at 13:46, groepaz <groepaz_at_gmx.net> wrote: > > Am Mittwoch, 20. Mai 2026, 19:59:18 Mitteleuropäische Sommerzeit schrieb > Segher Boessenkool: >> Hi! >> >> On Wed, May 20, 2026 at 09:11:27AM -0500, Justin wrote: >>> things like large precomputed mult tables >> >> ab = ((a+b)^2 - (a-b)^2)/4 >> >> You can do full 8x8->16 muls in just 512 bytes table. Pretty damn fast, >> too! > > And that's the common approach in every demo, of course :) > > -- > > https://cc65.github.io https://rr.pokefinder.org > https://vice-emu.sourceforge.net https://magicdisk.untergrund.net > > The world is a madhouse, so it's only right that it is patrolled by armed > idiots. > <Brendan Behan> > > > > >
Archive generated by hypermail 2.4.0.