Re: strange 2001N fault

From: Francesco Messineo <*francesco.messineo_at_gmail.com*> Date: Tue, 20 Feb 2018 23:21:58 +0100 Message-ID: <CAESs-_wEnBpKiLxAU5zj+PV_=0VWw466CZAzNeZxWiViz0dAHw@mail.gmail.com> · francesco.messineo_at_gmail.com

On Tue, Feb 20, 2018 at 10:49 PM, Mia Magnusson <mia@plea.se> wrote:
> Den Tue, 20 Feb 2018 19:28:23 +0100 skrev Gerrit Heitsch
> <gerrit@laosinh.s.bawue.de>:
>> >> On 02/20/2018 11:53 AM, Francesco Messineo wrote:
>> >>> Hi all,
>> >>> in case someone doesn't read vcfed forums, my 3032 developed a
>> >>> strange fault:
>> >>>
>> >>> http://www.vcfed.org/forum/showthread.php?62202-3032-(2001N)-strange-fault
>> >>>
>> >>> The thread link is just to recap what I've found and what I did,
>> >>> so I don't have to type all again here ;-)
>> >>> Ideas welcome, I'm just scratching my head. This is probably
>> >>> where a logic analyzer is needed badly.
>> >>> It seems to me there're unintended writes on some ram locations
>> >>> (not so random, address wise) and it makes me think about VSP
>> >>> crashes on C64.
>> >>> Any idea where to look for a fault?
>
> It seems like there is a correlation between many but not all 1's in
> the addresses and some data bits changing from 1 to 0.

I didn't look much into correlation, most of the times, all (or
almost) locations at 128 bytes interval gets corrupted with the same
value, so at
the moment it seems more like a refresh or address hold time problem.
Also I failed really to identify a place in the schematic where the
address and data bus would "meet" in any way, unless it's a big power
bounce, but I've searched for any power noise without success so far.
Both +5V rails have < 1mV ripple, +12V has 2.5mV ripple (ok, that's
big, but probably because of very low capacitance on that rail,
compared to the tens of 100nF scattered on the +5V lines), -5V has
about 1mV ripple.

>
> I'd look into decoupling of both the DRAMs and the IC's that drive the
> signals to the DRAM's. Also the actual DC voltages could be
> interesting. Those old DRAMs use three different voltages, easy to miss
> a glitch in some voltage. Remember that the oscilloscope usually has
> signal ground connected to protective earth ground so there might be
> some limitations in what you can measure with a single probe.

5.184, 5.167 (the 2001N has two separate 5V regulators feeding
different parts of the board), +11.882, -5.034

>
> It cannot hurt to improve decoupling on the power rails even if the
> power rails looks good on your oscilloscope.
>
> I haven't much experience with DRAMs but from what I've read you'd
> really need a 500MHz oscilloscope or similar to be able to really see
> all kinds of glitches, so with a more common slower oscilloscope you
> are a bit blind.

well, it's a 250 MHz analog scope, there're many spikes visible when
the data multiplexer toggle for example. I think the problem of
this board should be visible easily when (and if) I discover where to
look and what to look for.

>
> Not sure which state the adress lines have before the mux:es switch to
> a read (and if the inputs to the mux:es are stable before the mux
> switches), but many lines switching states causes a spike on the power
> rails.

sure, the logic levels bounce a few mV following the other lines
toggling, that's a very normal fact on these big TTL boards, but
I haven't seen any real big bounce exceeding 10mV. There's plenty of
headroom before a logic level crosses the threshold by a rail bounce.
I think such a bounce would be really "stable" anyway if it were the
case. Unless a power rail is failing entirely momentarily of course.

>
> Also if there is something suspicious with dram refresh, the ram would
> (afaik/iirc) anyway partially get refreshed by normal reads. Basic and
> Kernal uses a bunch of addresses <1024 (zero page, stack and other
> stuff) which could mean that most but not all of the 128 different
> combinations of A0-A6 gets read at some point either in the loops
> running while idle or running a basic program and in the interrupt
> handlers.

Yes, but thinking about it, a refresh failure would maybe not produce
almost the same values each 128 bytes (I don't really know anyway,
just guessing that it would just "zero" the bits more randomly).

>
> There is afaik some test programs that you can burn onto an eprom
> and run. (I assume that you already know this, though).

the usual pettester.bin ran for hours without finding RAM errors, but
that's not really a great test, it completely writes one bank (0),
then reads it back. Another clue that probably refresh works fine (and
its address counter does seem to count correctly, as far as I could
see).

>
>> But if the address bits changes too close to /RAS going low, it can
>> happen that you get 2 or more rows selected at the same time. Meaning
>> each read amp finds itself connected to 2 or more memory cells. If
>> those cells don't have the same data (*) in them, you will get data
>> corruption when the data is written back.
>
> In this case where AA should be everywhere, it's probably more a case
> of only one cell getting written back after two were drained.

there're many writes in zero page and stack page even during basic
idling at the prompt (there's the periodic keyboard scan interrupt for
example), so if there's some address not holding enough long after
/RAS, it could "copy" any value from zero/one pages to other
locations.
Only bank1 should remain full of AA, but it gets corrupted too, albeit
less frequently.

>
> P.S. Isn't chop mode of an analogue oscilloscope far too slow for this?
> I've actually used two resistors to be able to look at two digital
> signals with only one input active on the scope. Not perfect but you
> get rid of the chop/alt problem.

I don't know, it seems to work fastly enough. I've identified a few
problems with it in the years.
I should look at the manual anyway.

F

       Message was sent through the cbm-hackers mailing list