silverdr_at_inet.com.pl
Date: 2005-11-22 19:14:04
On 2005-11-22, at 18:01, MagerValp wrote:
> If you look at the IDE64 performance measurements:
>
> http://www.volny.cz/dundera/compar.html
>
> you'll see that IDE64 can do sustained writes at about 40-45 kB/s.
> That leaves about 10 seconds for GCR decoding and drive delays (track
> and sector seek, etc). At about 44 cycles per byte, I'm guessing
> there's still a little room for improvement, though the effort needed
> might not be worth it as it'd be hard to gain more than maybe 5 secs.
When I started writing this, my target was about 20 seconds. But when
I got the first alfa running, I was quite disappointed with the
results as I expected optimisations to begin at more or less 25
seconds and not 29, which I achieved.
Now I believe I'll be able to meet the target but even if I remain at
21-22 seconds it will still be acceptable for me... as long as noone
writes a faster one, that is ;-) but of course I wouldn't mind making
it even faster right now! There is still a lot of minor optimisation
possibilities that I am aware of but those are not that important.
> And still: you can't compare streaming raw gcr data without error
> checks from one drive to another with fully decoding sectors.
Again - I'd be very glad to learn about those fast GCR routines
Antitrack is referring to.
> And to get this thread back on topic: does anyone have fast, table
> based GCR decoding routines? Patrycjusz, what does your solution look
> like, if you don't mind sharing?
Sure. Long answer: I started with 1571 ROM assembly sources found on
zimmers but since they are barely commented and also lacking some
parts, namely some of the referenced tables, I decided to write my
own. I spent a considerable part of one of the recent weekends
writing my GCR tables from scratch and once I wrote the routine and
counted cycles I found out that it's still some six to seven percent
slower than the one in 1571's ROM. Therefore I took the 1571 routine
and used as a drop-in replacement. This made the immediate impact of
dropping down from 29 to 25 seconds. That was already a good start
but last weekend I optimised it a bit by inlining the main part of
the code and removing unnecessary use of temporary storage. Short
answer: I use the (faster) 1571 routine inlined and without temporary
BIN storage - the decoded BIN quartets go directly to output bufffers
rather than ZP locations.
--
Every programmer knows the answer: $2b or (not $2b) is $ff.
Message was sent through the cbm-hackers mailing list
Archive generated by hypermail pre-2.1.8.