Difference between pages "PPU OAM" and "Visual circuit tutorial"

From Nesdev wiki
(Difference between pages)
Jump to navigationJump to search
m (→‎See also: link to orphaned page)
 
m
 
Line 1: Line 1:
The OAM (Object Attribute Memory) is internal memory inside the PPU that contains a display list of up to 64 sprites, where each sprite's information occupies 4 bytes.
This is a crash course on making sense of the circuit displays in Visual
6502/2C02/2A03, written for people without much low-level electronics
experience (like the author). It aims to present the information needed to read
the diagrams at a basic level in simple language, omitting details that are
unimportant when starting out.


=== Byte 0 ===
You might want to read [http://visual6502.org/wiki/index.php?title=JssimUserHelp the Visual 6502 user's guide] and the [[Visual 2C02]] page first.
Y position of top of sprite


Sprite data is delayed by one scanline; you must subtract 1 from the sprite's Y coordinate before writing it here.
== What the different colored areas are ==
Hide a sprite by writing any values in $EF-$FF here.
Sprites are never displayed on the first line of the picture, and it is impossible to place a sprite partially off the top of the screen.


=== Byte 1 ===
Let's start by defining what the different colors mean:
Tile index number


For 8x8 sprites, this is the tile number of this sprite within the pattern table selected in bit 3 of [[PPUCTRL]] ($2000).
[[File:vis_areas.png|none]]


For 8x16 sprites, the PPU ignores the pattern table selection and selects a pattern table from bit 0 of this number.
  * Green areas are diffusion (explained below) connected to ground.
  76543210
  * Red areas are diffusion connected to VCC (power).
||||||||
* Yellow areas are diffusion that is neither connected directly to ground nor
|||||||+- Bank ($0000 or $1000) of tiles
  directly to VCC.
  +++++++-- Tile number of top of sprite (0 to 254; bottom half gets the next tile)
* Gray areas are metal.
Thus, the pattern table memory map for 8x16 sprites looks like this:
* Purple areas are polysilicon.
*$00: $0000-$001F
*$01: $1000-$101F
*$02: $0020-$003F
*$03: $1020-$103F
*$04: $0040-$005F<br>[...]
*$FE: $0FE0-$0FFF
*$FF: $1FE0-$1FFF


=== Byte 2 ===
At the level presented here, diffusion, metal, and polysilicon can be thought
Attributes
of as roughly equivalent when viewed in isolation; they all conduct current. The
76543210
important difference is in how they interact with each other, which is
||||||||
explained below.
||||||++- Palette (4 to 7) of sprite
|||+++--- Unimplemented
||+------ Priority (0: in front of background; 1: behind background)
|+------- Flip sprite horizontally
+-------- Flip sprite vertically


Flipping does not change the position of the sprite's bounding box, just the position of pixels within the sprite.
== Basic building blocks ==
If, for example, a sprite covers (120, 130) through (127, 137), it'll still cover the same area when flipped.
In 8x16 mode, vertical flip flips each of the subtiles and also exchanges their position; the odd-numbered tile of a vertically flipped sprite is drawn on top.
This behavior differs from the behavior of the [http://wiki.superfamicom.org/snes/show/Registers#obsel__object_size_and_character_address_8 unofficial 16x32 and 32x64 pixel sprite sizes on the Super NES], which [http://wiki.superfamicom.org/snes/show/Sprites will only vertically flip each square sub-region].


The three unimplemented bits of each sprite's byte 2 do not exist in the PPU and always read back as 0 on PPU revisions that allow reading PPU OAM through [[OAMDATA]] ($2004). This can be emulated by ANDing byte 2 with $E3 either when writing to or when reading from OAM. It has not been determined whether the PPU actually drives these bits low or whether this is the effect of data bus capacitance from reading the last byte of the instruction (LDA $2004, which assembles to AD 04 20).
=== Transistors ===


=== Byte 3 ===
When a piece of polysilicon is sandwiched between two areas of diffusion, it
X position of left side of sprite.
acts as a gate, only letting current through when the polysilicon is powered
(or, equivalently, ''high'', ''1'', or ''open''). The diffusion area from which
current will flow when the gate is open is called the ''source''. The diffusion
area into which current will flow is called the ''drain''. The gate together with
the source and drain is what makes a transistor.


X-scroll values of $F9-FF results in parts of the sprite to be past the right edge of the screen, thus invisible. It is not possible to have a sprite partially visible on the left edge. Instead, left-clipping through [[PPUMASK| PPUMASK ($2001)]] can be used to simulate this effect.
[[File:vis_transistor.png|none]]


=== DMA ===
=== Power sources ===
Most programs write to a copy of OAM somewhere in CPU addressable RAM (often $0200-$02FF) and then copy it to OAM each frame using the [[OAMDMA]] ($4014) register. Writing N to this register causes the DMA circuitry inside the 2A03/07 to fully initialize the OAM by writing [[OAMDATA]] 256 times using successive bytes from starting at address $100*N). The CPU is suspended while the transfer is taking place.


The address range to copy from could lie outside RAM, though this is only useful for static screens with no animation.
Around an area of powered diffusion we will often see something like the following
(note the distinctive "hook" in the polysilicon):


Not counting the [[OAMDMA]] write tick, the above procedure takes 513 CPU cycles (+1 on odd CPU cycles): first one (or two) idle cycles, and then 256 pairs of alternating read/write cycles. (For comparison, an unrolled LDA/STA loop would usually take four times as long.)
[[File:vis_power.png|none]]


=== Sprite zero hits ===
Here the polysilicon acts roughly like a resistor, preventing a short from VCC to
ground when the power source would otherwise have a direct connection to ground along
some path of open gates.


Sprites are conventionally numbered 0 to 63.
== Nodes ==
Sprite 0 is the sprite controlled by OAM addresses $00-$03, sprite 1 is controlled by $04-$07, ..., and sprite 63 is controlled by $FC-$FF.


While the PPU is drawing the picture, when an opaque pixel of sprite 0 overlaps an opaque pixel of the background, this is a '''sprite zero hit'''.
Electrically common areas are called ''nodes'' in Visual 6502/2C02/2A03. Clicking
The PPU detects this condition and sets bit 6 of [[PPUSTATUS]] ($2002) to 1 starting at this pixel, letting the CPU know how far along the PPU is in drawing the picture.
on a node will highlight it, making it easier to see how things are connected
(clicking on powered or grounded diffusion won't work; these only modify
properties of other nodes and are not themselves nodes). When a node is
highlighted, a numeric ID unique to the node will be displayed in the upper
right, along with a name for the node if it has one. Node names are defined in
nodenames.js.


Sprite 0 hit does not happen:
The '''Find:''' edit field can be used to locate nodes, either by numeric ID or by
* If background or sprite rendering is disabled in [[PPUMASK]] ($2001)
name.
* At x=0 to x=7 if the left-side clipping window is enabled (if bit 2 or bit 1 of PPUMASK is 0).
* At x=255, for an obscure reason related to the pixel pipeline.
* At any pixel where the background or sprite pixel is transparent (2-bit color index from the CHR pattern is %00).
* If sprite 0 hit has already occurred this frame. Bit 6 of PPUSTATUS ($2002) is cleared to 0 at dot 1 of the pre-render line. This means only the first sprite 0 hit in a frame can be detected.


Sprite 0 hit happens regardless of the following:
== Logic elements ==
* Sprite priority. Sprite 0 can still hit the background from behind.
* The pixel colors. Only the CHR pattern bits are relevant, not the actual rendered colors, and ''any'' CHR color index except %00 is considered opaque.
* The palette. The contents of the palette are irrelevant to sprite 0 hits. For example: a black ($0F) sprite pixel can hit a black ($0F) background as long as neither is the transparent color index %00.
* The PAL PPU blanking on the left and right edges at x=0, x=1, and x=254 (see [[Overscan#PAL|Overscan]]).


=== Sprite overlapping ===
=== Inverters ===
[[PPU sprite priority|Priority between sprites]] is determined by their address inside OAM.
So to have a sprite displayed in front of another sprite in a scanline, the sprite data that occurs first will overlap any other sprites after it.
For example, when sprites at OAM $0C and $28 overlap, the sprite at $0C will appear in front.


=== Internal operation ===
An inverter is constructed like in the image below:


In addition to the primary OAM memory, the PPU contains 32 bytes (enough for 8 sprites) of secondary OAM memory that is not directly accessible by the program. During each visible scanline this secondary OAM is first cleared, and then a linear search of the entire primary OAM is carried out to find sprites that are within y range for the '''next''' scanline (the ''sprite evaluation'' phase). The OAM data for each sprite found to be within range is copied into the secondary OAM, which is then used to initialize eight internal sprite output units.
[[File:vis_inverter.png|none]]


See [[PPU rendering]] for information on precise timing.
When the input gate is low, current flows into the output wire. When the input
gate is high, current flows into ground, driving the output wire low. The
output wire is hence the inverse of the input wire.


The reason sprites at lower addresses in OAM overlap sprites at higher addresses is that sprites at lower addresses also get assigned a lower address in the secondary OAM, and hence get assigned a lower-numbered sprite output unit during the loading phase. Output from lower-numbered sprite output units is wired inside the PPU to take priority over output from higher-numbered sprite output units.
When one node is the inverse of another, it is said that it ''inverts into'' the
other node.


Sprite zero hit detection relies on the fact that sprite zero, when it is within y range for the next scanline, always gets assigned the first sprite output unit. The hit condition is basically ''sprite zero is in range'' '''AND''' ''the first sprite output unit is outputting a non-zero pixel'' '''AND''' ''the background drawing unit is outputting a non-zero pixel''. (Internally the PPU actually uses '''two''' flags: one to keep track of whether sprite zero occurs on the ''next'' scanline, and another one&mdash;initialized from the first&mdash;to keep track of whether sprite zero occurs on the ''current'' scanline. This is to avoid sprite evaluation, which takes place concurrently with potential sprite zero hits, trampling on the second flag.)
=== NOR gates ===


=== Dynamic RAM decay ===
Below is an example of a NOR gate taken from Visual 2A03, related to
controlling when the first square channel is silenced:


Because OAM is implemented with dynamic RAM instead of static RAM, the data stored in OAM memory will quickly begin to decay into random bits if it is not being refreshed. The OAM memory is refreshed once per scanline while rendering is enabled (if either the sprite or background bit is enabled via the [[PPUMASK|register at $2001]]), but on an NTSC PPU this refresh is prevented whenever rendering is disabled.
[[File:vis_nor.png|none]]


When rendering is turned off, or during vertical blanking between frames, the OAM memory will hold stable values for a short period before it begins to decay. It will last at least as long as an NTSC vertical blank interval (~1.3ms), but not much longer than this.<ref>[http://forums.nesdev.org/viewtopic.php?p=109548#p109548 Forum post:] Re: Just how cranky is the PPU OAM?</ref> Because of this, it is not normally useful to write to OAM outside of vertical blank, where rendering is expected to start refreshing its data soon after the write. Writes to [[OAMDMA|$4014]] or [[OAMDATA|$2004]] should usually be done in an NMI routine, or otherwise within vertical blanking.
If any of the gates in red circles are open (high), the current from the
highlighted node will go to ground instead of to the gate in the blue circle on
the top. Hence the value that reaches the gate in the blue circle is the NOR of
the values on the gates in the red circles.


If using an advanced technique like forced blanking to manually extend the vertical blank time, it may be necessary to do the OAM DMA last, before enabling rendering mid-frame, to avoid decay.
The gate in the blue circle is part of a ''pass transistor'', so called because
it passes current between two nodes rather than driving or grounding a node.
The gate in this case is '''apu_clk1''', and we say that value is "buffered on
'''apu_clk1'''".


Because OAM decay is more or less random, and with timing that is sensitive to temperature or other environmental factors, it not something a game could normally rely on. Most emulators do not simulate the decay, and suffer no compatibility problems as a result. Software developers targeting the NES hardware should be careful not to rely on this.
== Storage elements ==


Because PAL machines have a longer vertical blanking interval, the 2C07 (PAL PPU) begins refreshing OAM 21 scanlines after NMI<ref>[http://forums.nesdev.org/viewtopic.php?f=9&t=11041 Forum post:] OAM reading on PAL NES</ref>. This prevents the values in DRAM from decaying during the extra 50 scanlines before the picture starts. The 2C07 additionally refreshes OAM during the visible portion of the screen even if rendering is disabled. Because of this, OAM DMA must be done near the beginning of vertical blank on the 2C07, as everywhere else it will conflict with this refresh. In exchange, OAM decay does not occur at all on the PAL NES.
=== Cross-coupled inverters ===


== See also ==
Two cross-coupled inverters will make a latch (an element that stores a single
* [[PPU sprite evaluation]]
bit). This arrangement is often used for latches that are set or cleared by
* [[PPU sprite priority]]
specific logic rather than by having a value copied into them (from e.g. a data
* [[Sprite overflow games]]
bus line).
* [[PPU OAM/zh|this page in Chinese]]


== References ==
Below is the VBlank flag from Visual 2C02. To the left the '''vbl_flag''' node is
<references />
highlighted, and to the right its inverse is highlighted. (We would label the
inverse '''/vbl_flag''', where "'''/'''" denotes "inverse" or "active low"). As can be
seen by the two gates in white circles, each inverts into the other, forming
two cross-coupled inverters.
 
[[File:vis_crossreg.png|none]]
 
(The different highlight colors are due to '''vbl_flag''' being set when the
screenshot was taken.)
 
The two gates in blue circles set and clear the latch, respectively. To clear
the latch, '''vbl_flag''' is driven low. To set the latch, '''/vbl_flag''' is driven low.
 
=== Clocked latches ===
 
When a latch can be set directly from the value of some line, e.g. a data bus
line, an arrangement involving a clock is often used. The motivation is to
avoid having to form both '''data_line''' and '''/data_line''' and route them to the
respective terminals of the latch, which would use more logic. (The clock is
already routed all around the chip, so mixing it in usually isn't as much of a
problem.)
 
As an example, here's the '''noi_lfsrmode''' node (the "Loop noise" flag from
[[APU Noise|$400E]]):
 
[[File:vis_clockedreg.png|none]]
 
When '''apu_clk1''' is high, '''noi_lfsrmode''' will flow into the second highlighted node,
which then inverts into '''/noi_lfsrmode''', forming a cross-coupled inverter latch.
While '''apu_clk1''' is low, the loop will be broken momentarily, and during this
phase a new value can be copied into the latch by opening the '''w400e''' gate (which
goes high on writes to $400E). The value let through by the pass transistor is the
'''_db7''' node, corresponding to the seventh bit of the data bus. (There's a [[#Terms|via]] between
the diffusion and the '''_db7''' line - easier to see if the node is highlighted.) If the loop was not
broken during the write operation, the old value in the latch would interfere with setting a new value.
 
=== Wire capacitance as storage ===
 
If a wire is "closed off" so that it is no longer connected to neither power
nor ground, it will retain its value for a while through capacitance. This is
used to store some short-lived data "on the wire" without requiring a latch
(this is called [http://en.wikipedia.org/wiki/Dynamic_logic_%28digital_electronics%29 dynamic logic],
since it has time-dependent behavior beyond just the input clock). As an example,
here's the read buffer for the 2C02's VBlank flag, which lets its value be read even though
reading [[PPU_registers|$2002]] immediately clears the VBlank flag:
 
[[File:vis_vblbuf.png|none]]
 
When the circled gate ('''/read_2002_output_vblank_flag''') goes low, the gate
closes, holding the value. When the circled gate is high, the value of '''vbl_flag'''
(or rather '''/vbl_flag''' in this case) is connected to the wire.
 
== Layers ==
 
(This information is not essential to reading the diagrams.)
 
The layers that make up the chip are as follows, in order from bottom to top: substrate, diffusion, oxide (with holes for [[#Terms|burried contacts]] and [[#Terms|vias]]), polysilicon, more oxide (with holes for vias), metal, and overglass.
 
The way diffusion is powered or grounded is through vias to areas of metal that are either grounded or powered (called the "GND plane" and the "VCC plane", respectively).
 
== Terms ==
 
Below are various terms you might run into:
 
; Burried contact
:  A connection between diffusion and polysilicon.
 
; NMOS
:  The technology used for the transistors in the 2A03 and 2C02. In NMOS, transistors are n-doped (have an excess of electrons - the "n" is presumably from the electrons' '''n'''egative charge). This type of transistor is good at sinking current to ground (this is what causes a 0 bit to usually "win" in [[Bus conflict|bus conflicts]]), and worse at pulling up. PMOS is the opposite. The transistors used in NMOS and PMOS are sometimes called nMOSFET and pMOSFET, respectively.
 
; Open drain
:  A type of output that works by sinking current from an external pull-up resistor instead of generating current on its own. An example is the PPU's INT pin. The pull-up resistor is denoted "RM1" in [[media:neswires.jpg|this wiring diagram]].
 
; Pull-up resistor
:  A resistor connected to power. "Pull-up" comes from pulling the wire to a high state.
 
; Pull-up transistor
:  A transistor whose gate when open causes current to flow from a power source.
 
; Via
:  A connection between polysilicon/diffusion and metal.
 
== Local copies of the simulator ==
 
Being able to add node names to nodenames.js can be very helpful when figuring out a circuit. To do this, a local version of the simulator can be downloaded with e.g. '''$ wget --convert-links''' on a *nix system. Please watch the recursion level and avoid downloading data needlessly, as at least Visual 2C02 and Visual 2A03 are hosted on a limited uplink.

Revision as of 19:33, 24 May 2013

This is a crash course on making sense of the circuit displays in Visual 6502/2C02/2A03, written for people without much low-level electronics experience (like the author). It aims to present the information needed to read the diagrams at a basic level in simple language, omitting details that are unimportant when starting out.

You might want to read the Visual 6502 user's guide and the Visual 2C02 page first.

What the different colored areas are

Let's start by defining what the different colors mean:

Vis areas.png
* Green areas are diffusion (explained below) connected to ground.
* Red areas are diffusion connected to VCC (power).
* Yellow areas are diffusion that is neither connected directly to ground nor
  directly to VCC.
* Gray areas are metal.
* Purple areas are polysilicon.

At the level presented here, diffusion, metal, and polysilicon can be thought of as roughly equivalent when viewed in isolation; they all conduct current. The important difference is in how they interact with each other, which is explained below.

Basic building blocks

Transistors

When a piece of polysilicon is sandwiched between two areas of diffusion, it acts as a gate, only letting current through when the polysilicon is powered (or, equivalently, high, 1, or open). The diffusion area from which current will flow when the gate is open is called the source. The diffusion area into which current will flow is called the drain. The gate together with the source and drain is what makes a transistor.

Vis transistor.png

Power sources

Around an area of powered diffusion we will often see something like the following (note the distinctive "hook" in the polysilicon):

Vis power.png

Here the polysilicon acts roughly like a resistor, preventing a short from VCC to ground when the power source would otherwise have a direct connection to ground along some path of open gates.

Nodes

Electrically common areas are called nodes in Visual 6502/2C02/2A03. Clicking on a node will highlight it, making it easier to see how things are connected (clicking on powered or grounded diffusion won't work; these only modify properties of other nodes and are not themselves nodes). When a node is highlighted, a numeric ID unique to the node will be displayed in the upper right, along with a name for the node if it has one. Node names are defined in nodenames.js.

The Find: edit field can be used to locate nodes, either by numeric ID or by name.

Logic elements

Inverters

An inverter is constructed like in the image below:

Vis inverter.png

When the input gate is low, current flows into the output wire. When the input gate is high, current flows into ground, driving the output wire low. The output wire is hence the inverse of the input wire.

When one node is the inverse of another, it is said that it inverts into the other node.

NOR gates

Below is an example of a NOR gate taken from Visual 2A03, related to controlling when the first square channel is silenced:

Vis nor.png

If any of the gates in red circles are open (high), the current from the highlighted node will go to ground instead of to the gate in the blue circle on the top. Hence the value that reaches the gate in the blue circle is the NOR of the values on the gates in the red circles.

The gate in the blue circle is part of a pass transistor, so called because it passes current between two nodes rather than driving or grounding a node. The gate in this case is apu_clk1, and we say that value is "buffered on apu_clk1".

Storage elements

Cross-coupled inverters

Two cross-coupled inverters will make a latch (an element that stores a single bit). This arrangement is often used for latches that are set or cleared by specific logic rather than by having a value copied into them (from e.g. a data bus line).

Below is the VBlank flag from Visual 2C02. To the left the vbl_flag node is highlighted, and to the right its inverse is highlighted. (We would label the inverse /vbl_flag, where "/" denotes "inverse" or "active low"). As can be seen by the two gates in white circles, each inverts into the other, forming two cross-coupled inverters.

Vis crossreg.png

(The different highlight colors are due to vbl_flag being set when the screenshot was taken.)

The two gates in blue circles set and clear the latch, respectively. To clear the latch, vbl_flag is driven low. To set the latch, /vbl_flag is driven low.

Clocked latches

When a latch can be set directly from the value of some line, e.g. a data bus line, an arrangement involving a clock is often used. The motivation is to avoid having to form both data_line and /data_line and route them to the respective terminals of the latch, which would use more logic. (The clock is already routed all around the chip, so mixing it in usually isn't as much of a problem.)

As an example, here's the noi_lfsrmode node (the "Loop noise" flag from $400E):

Vis clockedreg.png

When apu_clk1 is high, noi_lfsrmode will flow into the second highlighted node, which then inverts into /noi_lfsrmode, forming a cross-coupled inverter latch. While apu_clk1 is low, the loop will be broken momentarily, and during this phase a new value can be copied into the latch by opening the w400e gate (which goes high on writes to $400E). The value let through by the pass transistor is the _db7 node, corresponding to the seventh bit of the data bus. (There's a via between the diffusion and the _db7 line - easier to see if the node is highlighted.) If the loop was not broken during the write operation, the old value in the latch would interfere with setting a new value.

Wire capacitance as storage

If a wire is "closed off" so that it is no longer connected to neither power nor ground, it will retain its value for a while through capacitance. This is used to store some short-lived data "on the wire" without requiring a latch (this is called dynamic logic, since it has time-dependent behavior beyond just the input clock). As an example, here's the read buffer for the 2C02's VBlank flag, which lets its value be read even though reading $2002 immediately clears the VBlank flag:

Vis vblbuf.png

When the circled gate (/read_2002_output_vblank_flag) goes low, the gate closes, holding the value. When the circled gate is high, the value of vbl_flag (or rather /vbl_flag in this case) is connected to the wire.

Layers

(This information is not essential to reading the diagrams.)

The layers that make up the chip are as follows, in order from bottom to top: substrate, diffusion, oxide (with holes for burried contacts and vias), polysilicon, more oxide (with holes for vias), metal, and overglass.

The way diffusion is powered or grounded is through vias to areas of metal that are either grounded or powered (called the "GND plane" and the "VCC plane", respectively).

Terms

Below are various terms you might run into:

Burried contact
A connection between diffusion and polysilicon.
NMOS
The technology used for the transistors in the 2A03 and 2C02. In NMOS, transistors are n-doped (have an excess of electrons - the "n" is presumably from the electrons' negative charge). This type of transistor is good at sinking current to ground (this is what causes a 0 bit to usually "win" in bus conflicts), and worse at pulling up. PMOS is the opposite. The transistors used in NMOS and PMOS are sometimes called nMOSFET and pMOSFET, respectively.
Open drain
A type of output that works by sinking current from an external pull-up resistor instead of generating current on its own. An example is the PPU's INT pin. The pull-up resistor is denoted "RM1" in this wiring diagram.
Pull-up resistor
A resistor connected to power. "Pull-up" comes from pulling the wire to a high state.
Pull-up transistor
A transistor whose gate when open causes current to flow from a power source.
Via
A connection between polysilicon/diffusion and metal.

Local copies of the simulator

Being able to add node names to nodenames.js can be very helpful when figuring out a circuit. To do this, a local version of the simulator can be downloaded with e.g. $ wget --convert-links on a *nix system. Please watch the recursion level and avoid downloading data needlessly, as at least Visual 2C02 and Visual 2A03 are hosted on a limited uplink.