Re: Image database

From: silverdr_at_wfmh.org.pl
Date: Fri, 12 May 2017 10:50:50 +0200
Message-Id: <FC121CE7-A799-4971-9783-9F665EF5739E@wfmh.org.pl>
> On 2017-05-10, at 14:15, Baltissen, GJPAA (Ruud) <ruud.baltissen@apg.nl> wrote:
> 
> Hallo Peter,
>  
>  
> > Use something like SHA256
>  
> First problem: never heard of it. OK, we have Wikipedia and Google of course....
> Next problem: I don't have the source code of SHA256 in Pascal. If it is available in C, I could port it but....
> 
> The reasons I use MD5 are:
> - I have the code in Pascal.

Ruud, one does not port things like these today. One uses libraries. I use various hashes extensively in numerous languages and I am yet to have a need to port any of the known hashing functions. I wanted to write that I "bet there are appropriate libraries for Pascal too" but others have already pointed you to some.

> But what I heard form others is that the SHA-256 algorithm is much more time consuming

It is.

> and I need to reserve more bytes in each record.

Negligible difference.

> And as said, I'm only using MD5 to have a _quick_ mean to compare two disks and files. I am not interested at all in the various security risks MD5 has because I don't use it as a security tool. So for the moment I stick to MD5.


The security vs. non-security risks differ only in the application of the function. If you use it as a security tool you get weak security. If you use it as comparison tool you get weak comparison.

>  - I only use it to have a quick way to find out if two files are the same. If I find two with the same hash, I always can check them byte by byte to make sure they are the same indeed. Most files are smaller than 64 KB so that won't take much time.

But what you just wrote basically voids your use of the hashing function. You don't want to compare byte-by-byte if you find hashes to match, the same as you don't want to do it when they don't match, do you? Why use the hash if you still want to do the cmp(), which basically means you don't trust your hash anyway, which brings us back to the original suggestion.

I had once similar dilemma and I worked this around by using two different fast (and weak by today's standards) hashing functions instead of one. I took that conflict is several orders of magnitude less likely to happen with two different functions than with one. I even put exception notifiers around those comparisons, knowing that they are practically "never" to be triggered. I find it an alternative solution for non-security related application to using resource extensive functions like bCrypt.

-- 
SD!


       Message was sent through the cbm-hackers mailing list
Received on 2017-05-12 09:02:52

Archive generated by hypermail 2.2.0.