Date: 2005-11-22 19:14:04
On 2005-11-22, at 18:01, MagerValp wrote: > If you look at the IDE64 performance measurements: > > http://www.volny.cz/dundera/compar.html > > you'll see that IDE64 can do sustained writes at about 40-45 kB/s. > That leaves about 10 seconds for GCR decoding and drive delays (track > and sector seek, etc). At about 44 cycles per byte, I'm guessing > there's still a little room for improvement, though the effort needed > might not be worth it as it'd be hard to gain more than maybe 5 secs. When I started writing this, my target was about 20 seconds. But when I got the first alfa running, I was quite disappointed with the results as I expected optimisations to begin at more or less 25 seconds and not 29, which I achieved. Now I believe I'll be able to meet the target but even if I remain at 21-22 seconds it will still be acceptable for me... as long as noone writes a faster one, that is ;-) but of course I wouldn't mind making it even faster right now! There is still a lot of minor optimisation possibilities that I am aware of but those are not that important. > And still: you can't compare streaming raw gcr data without error > checks from one drive to another with fully decoding sectors. Again - I'd be very glad to learn about those fast GCR routines Antitrack is referring to. > And to get this thread back on topic: does anyone have fast, table > based GCR decoding routines? Patrycjusz, what does your solution look > like, if you don't mind sharing? Sure. Long answer: I started with 1571 ROM assembly sources found on zimmers but since they are barely commented and also lacking some parts, namely some of the referenced tables, I decided to write my own. I spent a considerable part of one of the recent weekends writing my GCR tables from scratch and once I wrote the routine and counted cycles I found out that it's still some six to seven percent slower than the one in 1571's ROM. Therefore I took the 1571 routine and used as a drop-in replacement. This made the immediate impact of dropping down from 29 to 25 seconds. That was already a good start but last weekend I optimised it a bit by inlining the main part of the code and removing unnecessary use of temporary storage. Short answer: I use the (faster) 1571 routine inlined and without temporary BIN storage - the decoded BIN quartets go directly to output bufffers rather than ZP locations. -- Every programmer knows the answer: $2b or (not $2b) is $ff. Message was sent through the cbm-hackers mailing list
Archive generated by hypermail pre-2.1.8.