Re: Image database

From: Anders Carlsson <anders.carlsson_at_sfks.se>
Date: Fri, 12 May 2017 10:34:49 +0200
Message-ID: <6BAD48C308204799894914092F9F9881@ryds>
Ruud wrote:

>> Given that MD5 is a 32 bit hash
> No, 128 bit: https://en.wikipedia.org/wiki/MD5

Aha. Yes, I got confused by the 32 digits. That means it can have 256 
billion combinations?

>> all possible combinations of every bit of Commodore software ever 
>> released would by far exceed 4 billion combinations

> As I understand the above you say that over 4.000.000.000 pieces of 
> software are present.

Well, over 4 billion combinations of how you can store a subset of the 
programs A, B, C, D ... onto a 166 kB floppy disk. Assuming that each 
program released is 1 kB large (and in practise most are much bigger!), a 
figure of 10 gigabytes equals 10 million programs of 1 kB each. If you store 
one program on each disk, you will thus have 10 million disks to enumerate, 
which the 128 bits MD5 easily will handle.

If you pack them closely, you will thus be able to fit up to 166 such 
programs on a single floppy disk, though I can't recall how many files on a 
disk the directory has capacity for. If you make sure to not have any 
duplicates, it will reduce your collection to 10000000/166 = 60241 disks, or 
halve of that if you are using flippies.

My point of thinking here is that if duplicates may appear across different 
disks and all those programs may be combined in any order, you get a very 
large number. I found an online calculator that says that there are 10^864 
ways of combining those 10 million programs into disk images of 166 files 
each, and that is still only the case where each disk is entirely stuffed.

256 billion is 2*10^11 so in this theoretical case MD5 would not suffice. :)

Of course in practise those 10 gigabytes of software probably averages at 20 
kB or so, reducing the set to 500,000 programs * 20 kB. Each disk image will 
store 8 such programs with a bit of slack. It equals 9*10^40 different 
floppy disk images with all combinations, still a larger number than 
2*10^11.

If we set the average Commodore program/file to 33 kB, which I think is a 
bit too large as there are a lot of small ROM images and other files that 
won't get anywhere that size, we get 303030 programs of which we can fit 5 
of those on each disk. The total number of combinations is down to 2*10^25.

If we pretend to be clueless and assume there never has been any smaller 
files than 83 kB (easy thought in the 2017 world of computing), we get 
120482 files of which 2 on each disk = 7*10^9 so in this case the 128-bit 
hash would be able to hold all possible combinations.

Well, all this is hypothetical both when it comes to file sizes and the fact 
that each existing file would appear on a disk image in combination with 
every other existing file but this was along my way of thinking.

Best regards

Anders Carlsson


       Message was sent through the cbm-hackers mailing list
Received on 2017-05-12 09:00:02

Archive generated by hypermail 2.2.0.