# Re: Image database

From: Anders Carlsson <anders.carlsson_at_sfks.se>
Date: Fri, 12 May 2017 10:34:49 +0200
```Ruud wrote:

>> Given that MD5 is a 32 bit hash
> No, 128 bit: https://en.wikipedia.org/wiki/MD5

Aha. Yes, I got confused by the 32 digits. That means it can have 256
billion combinations?

>> all possible combinations of every bit of Commodore software ever
>> released would by far exceed 4 billion combinations

> As I understand the above you say that over 4.000.000.000 pieces of
> software are present.

Well, over 4 billion combinations of how you can store a subset of the
programs A, B, C, D ... onto a 166 kB floppy disk. Assuming that each
program released is 1 kB large (and in practise most are much bigger!), a
figure of 10 gigabytes equals 10 million programs of 1 kB each. If you store
one program on each disk, you will thus have 10 million disks to enumerate,
which the 128 bits MD5 easily will handle.

If you pack them closely, you will thus be able to fit up to 166 such
programs on a single floppy disk, though I can't recall how many files on a
disk the directory has capacity for. If you make sure to not have any
duplicates, it will reduce your collection to 10000000/166 = 60241 disks, or
halve of that if you are using flippies.

My point of thinking here is that if duplicates may appear across different
disks and all those programs may be combined in any order, you get a very
large number. I found an online calculator that says that there are 10^864
ways of combining those 10 million programs into disk images of 166 files
each, and that is still only the case where each disk is entirely stuffed.

256 billion is 2*10^11 so in this theoretical case MD5 would not suffice. :)

Of course in practise those 10 gigabytes of software probably averages at 20
kB or so, reducing the set to 500,000 programs * 20 kB. Each disk image will
store 8 such programs with a bit of slack. It equals 9*10^40 different
floppy disk images with all combinations, still a larger number than
2*10^11.

If we set the average Commodore program/file to 33 kB, which I think is a
bit too large as there are a lot of small ROM images and other files that
won't get anywhere that size, we get 303030 programs of which we can fit 5
of those on each disk. The total number of combinations is down to 2*10^25.

If we pretend to be clueless and assume there never has been any smaller
files than 83 kB (easy thought in the 2017 world of computing), we get
120482 files of which 2 on each disk = 7*10^9 so in this case the 128-bit
hash would be able to hold all possible combinations.

Well, all this is hypothetical both when it comes to file sizes and the fact
that each existing file would appear on a disk image in combination with
every other existing file but this was along my way of thinking.

Best regards

Anders Carlsson

Message was sent through the cbm-hackers mailing list
```