From: Christopher Phillips (shrydar_at_jaruth.com)
Date: 2004-11-03 10:57:55
I'm typing this response while away from my somewhat intermittant web
connection so I may be a few messages out of date (still waiting for my
new house to be connected), but fwiw..
On 1 Nov 2004, at 04:46, Hatch wrote:
>
>
>
> I'm thinking that EOR filling will be done by the cct, as the ctt
> displays
> the
> image it EOR fills (takes up no CPU cycles, this was the idea of
> someone
> on the CSDb forum), there will be a control bit that turn EOR fill on
> or off
> and 40 addressable bytes for setting the initial byte to EOR with at
> the
> start
> of each column and then stores the result ready for the next row. .
> With
> this in mind would horizontal formatting be faster for the 3D
> calculations?
It really depends on whether you use option V or option C, and if
option V how the memory is accessed.
I EOR-fill horizontally, not vertically (this way I can pattern fill
fairly trivially).
But doing the eor-fill in hardware is a nice idea, as is the option of
clearing the screen (not necessary for the screen itself when doing
eor-fill in software, but the eor-buffer still needs clearing, which
can still be faster to zero fill the line than to undo the writes done
by the h-events. In Effluvium I keep the eor-buffer in zero page, but
that's still three cycles per byte...)
Be aware that Evans and Sutherland have a patent on clearing the screen
as it is displayed - not that they would care. (bloody patents on the
blindingly obvious....)
thinking out loud here:
This is how my fill loop would look if I only had access to an
incrementing vram byte:
lda eorbuf
sta IO ; register on video card that writes to an autoincremented
address in vram
stx eorbuf ; clear eor buffer for next row of plotting.
eor eorbuf+1
sta IO
stx eorbuf+1
...
(this whole routine is called once per pixel row, in between updating
the horizontal intercepts of the currently active edges and plotting
them into the eorbuf)
or 10 cycles per byte, 80,000 for a full screen. That's only one cycle
per byte faster than no hardware assist at all.
Ideally you want to avoid this overhead altogether.
How about this idea:
memory map an eor-buffer into the IO space - this way the line plotting
can be directly into the video card, then you could write to a control
register to say 'fill your current pixel row from the eor-buffer, then
clear the eor-buffer and increment the pixel row pointer'. This should
only take one video-card cycle per byte, or around 1000 c64 cycles for
the entire screen - a savings of over four frames!
It would be nice to have the option of eor-filling either horizontally
or vertically, as coders will be arguing about which is better until
the end of time :)
>
> This would be ideal (cct doing some of the 3d work itself), if I use
> C64
> memory I would like to at least use the idol video accesses (When the
> raster is outside of the display window) for clearing or byte filling
> memory.
> Although this isn't 3D work it would clear or fill a portion of memory
> without using CPU cycles which would have to speed things up.
*nod*
>
> Are you thinking that the cct could actually do some calculations and
> return values to the coder?
I was more thinking assisting with the filling, but certainly a circuit
that does a 3d rotation and perspective divide would be very useful.
The playstation would probably be a useful model for this - you can set
up a 3x3 fixed point rotation matrix with an integer translation and
screen offset, then feed it x,z,y coordinates in model space and around
20 cycles later you can read back coordinates in screen space.
At the moment, my own 3d renderer spends more time clipping edges to
the view frustum than it does doing rotations and perspective divisions
- each edge that crosses a clip plane spends over 1000 cycles computing
xc=(xa-xb)*(0-zb)/(za-zb)
yc=(ya-yb)*(0-zb)/(za-zb)
where (xa,ya,za) and (xb,yb,zb) are the 16-bit endpoints of the edge in
camera-space after skewing the clip-plane onto z=0
(although I do it with a binary search for the point where the edge
crosses z=0 rather than doing the multiplications and divisions
explicitly)
So again, something else that would be nice to have in hardware.
Christopher Jam/shrydar
Message was sent through the cbm-hackers mailing list
Archive generated by hypermail pre-2.1.8.