Introduction
This book is a collection of notes about the Nintendo GameCube, geared towards emulator development. Currently, it is not sufficient on it’s own - please see the resources page for more.
While information in this book is supposed to be correct, it might just not be! Documentation on the gamecube is scarce and it’s not uncommon for it to be wrong.
These notes are mostly just me trying to share what I’ve learned and found while developing my own GameCube emulator. If something is wrong (or missing), sorry! Consider opening an issue or creating a pull request in that case. Help cubenotes become better!
General Conventions
- Least significant bit is 0 (unlike PowerPC manuals where the most significant bit is 0)
- Ranges of values follow Rust’s syntax, i.e.
a..bis a range from a (inclusive) to b (exclusive) anda..=bis a range from a (inclusive) to b (inclusive). If eitheraorbare missing, the ranges start/end at their respective limits (i.e. start or end of valid range) - Hexadecimal values are always prefixed with
0xand separated with an underline every 4 digits, e.g.0xDEAD_BEEF - Likewise, binary values are always prefixed with
0band separated with an underline every 4 digits, e.g.0b1111_0000_1010_0011 - Addresses are physical unless stated otherwise
- Every MMIO register is both readable and writable unless stated otherwise
- Every bit flag is 0 for
falseand 1 fortrueunless stated otherwise - Descriptions prefixed with
[W]are only valid for writes while those prefixed with[R]are only valid for reads
These are not all of the conventions used in cubenotes, other conventions will be listed when appropriate.
Overview
The Nintendo GameCube is a console released in 2001/2002 and is the successor to the Nintendo 64. It belongs to the sixth generation of consoles, together with others like Sony’s Playstation 2 and Microsoft’s Xbox.
While the GameCube is frequentely overshadowed by it’s big brother, the Nintendo Wii, it is a very charming console, powerful for it’s time.
Some of it’s best selling games are:
- Super Smash Bros. Melee
- Mario Kart: Double Dash
- Super Mario Sunshine
- The Legend of Zelda: The Wind Waker
- Luigi’s Mansion
- Metroid Prime
- Animal Crossing
And these are just a few of the approximately 650 games released for the system!
Hardware
Memory: 43 MiB total
- RAM: 24 MiB (2 x 12 MiB) 1T-SRAM1 running at 324 MHz (CPU / 1.5) (Flipper * 2)
- VRAM: 3 MiB 1T-SRAM1 memory embedded within Flipper, 2 MiB for framebuffers and 1 MiB for textures
- ARAM: 16 MiB DRAM connected to Flipper, used as Auxiliary RAM, usually for audio
DVD Reader
- Reads miniDVD sized Nintendo optical discs
- Discs are 1.46 GiB, read at constant angular velocity (in practice, this means data in the outer part of the disc is read faster than data in the inner part)
Motherboard
Here’s an overview of the GameCube’s motherboard, with the some important components labeled:
And here’s a diagram of the architecture:
A note on buses:
- Northbridge: Connects the CPU to the Flipper. It is 64-bit and runs on the Flipper clock.
- Southbridge: Connects both 12 MiB 1T-SRAM chips to the Flipper. It is 64-bit and runs at double the Flipper clock.
- Eastbridge: Connects ARAM to the Flipper. It is 8-bit and runs at half the Flipper clock.
-
A kind of pseudo-static RAM (PSRAM). Internally it’s just DRAM, but it’s made to behave like SRAM from an outside point of view. ↩ ↩2
Gekko (CPU)
The GameCube CPU is the IBM PowerPC Gekko:
- Runs at 486 Mhz
- 32-bit
- 64-bit Floating Point Unit
- 32 KiB L1 Instruction Cache
- 32 KiB L1 Data Cache1
- 256 KiB L2 Unified Cache
- Single core
- Pipelined (instructions are divided into multiple stages of execution, allowing multiple instructions to be at different stages at the same time)
- Superscalar (contains multiple execution units which can work on independent instructions at the same time)
- Contains a SIMD extension geared torwards 3D graphics called “Paired Singles”
-
The data cache can be split into two 16 KiB sections and one of them can be mapped into memory to be used as a “scratchpad” (super fast RAM). ↩
Memory Mapping in the Gekko
The Gekko has two main ways of mapping logical addresses to physical ones. The first is a mechanism called BATs (Block Address Translation), which most games use exclusively. The second one is a classic segmentation mechanism with page tables.
Most games do not use page tables and, instead, just use the default BAT configuration provided by the Dolphin OS.
PowerPC manuals refer to an untranslated address as an effective address, which cubenotes instead calls a logical address.
BATs and page tables/segmentation are not exclusive and, therefore, used together. Translation using BATs has precedence over translation using page tables, and if both fail then an exception occurs.
Block Address Translation (BAT)
The BAT mechanism allows the Gekko to define up to 8 “blocks” which map a range of logical addresses to a range of physical addresses. These 8 blocks are split into two: one half dedicated to instruction address translation and the other half dedicated to data address translation.
Each block is controlled by a pair of registers, e.g. IBAT2U and IBAT2L control the [I]nstruction BAT block 2. Blocks have three main properties:
Block length
The length of the block, ranging from the minimum of 128 KiB to the maximum of 256 MiB, stepping in powers of two (i.e. 128 Kib, 256 Kib, …, 128 MiB, 256 MiB).
Logical address start
This is the start of the block in the logical address space. This value must be a multiple of the block length.
Physical address start
This is the start of the block in the physical address space. This value must be a multiple of the block length.
Segmentation and Page Tables
The segmentation mechanism works by splitting the logical address space into 16 contiguous segments of 256 MiB. These segments are then subdivided into 4 KiB pages. Each segment is associated with a segment descriptor register which controls how to map a logical address in it to a virtual address, which is 52 bit.
The most significant 40 bits of a virtual address are what’s called a virtual page number. This value is used to search for a corresponding entry in the page table, which will then map it to a physical page number, which is 20 bit. Replacing the virtual page number with the physical page number yields the physical address and the result of the translation.
Page Table
The page table is hash table of entry groups, each containing 8 entries.
The translation pipeline
The BAT mechanism always take precedence over the segmentation mechanism.
Exceptions
Exceptions in the PowerPC architecture are defined as a mechanism which allow the CPU to change state to deal with unusual conditions which might arise during execution or come from external sources. There are multiple exception kinds.
Classification of exceptions is a somewhat confusing topic in the PowerPC manuals, so here’s the convention used in cubenotes:
| Classification | Cause |
|---|---|
| Internal Exception | Instruction Execution |
| External Exception (Interrupt) | Anything else |
Exception Kinds
| Vector | Exception | Classification |
|---|---|---|
| 0x0100 | Reset | Interrupt1 |
| 0x0200 | Machine Check | Interrupt1 |
| 0x0300 | DSI | Internal Exception |
| 0x0400 | ISI | Internal Exception |
| 0x0500 | External Interrupt | Interrupt |
| 0x0600 | Alignment | Internal Exception |
| 0x0700 | Program | Internal Exception |
| 0x0800 | Floating Point Unavailable | Internal Exception |
| 0x0900 | Decrementer | Interrupt |
| 0x0C00 | System Call | Internal Exception |
| 0x0D00 | Trace | Internal Exception |
| 0x0F00 | Performance Monitor | Internal Exception |
| 0x1300 | Breakpoint | Internal Exception |
For more information regarding exception kinds, consult the PowerPC manuals.
Masking Exceptions
Some exception kinds may be masked (i.e. disabled) depending on CPU configuration:
- Floating point exceptions may be disabled depending on MSR.FE0 and MSR.FE1
- Maskable interrupts must be enabled through MSR.EE
Processing an Exception
When an exception occurs, the CPU saves it’s current state and starts executing at an address specific to the kind of the exception called the exception vector.
Saving State
Whenever the CPU identifies an exception, it handles it by first saving the current state of the CPU to the SRR0 and SRR1 registers.
SRR0: Address to resume after exception handling
The SRR0 register is updated to contain the address where execution should resume once the exception handler is finished.
Usually, this is the address of the instruction which either caused the exception or, for exceptions caused by external sources, that would execute when the exception happened. Some exception kinds, however, have it as the address of the instruction right after that.
SRR1: Machine Status
The SRR1 register is updated to contain parts of the machine status register and, sometimes, extra information that’s exception specific.
For most instructions, bits 0..16 and 22..27 contain the corresponding bits in MSR, while bits 27..30 and 19..22 contain exception specific information.
Update the Machine State Register
MSR is updated to represent the new context of execution. This is equivalent to zeroing out all of it’s bits, except for MSR.ILE, MSR.ME, MSR.IP and MSR.LE (which gets assigned MSR.ILE).
Resume execution
Execution resumes as normal. Eventually, the exception handler will execute a rfi instruction, which
will then return to the address in SRR0 and restore MSR to it’s original state by copying bits of
SRR1 into it.
Write Gather Pipe
The write gather pipe is a CPU mechanism for transferring bursts of 32-byte data to external memory.
It has a 128 byte ring buffer and is controlled by the HID2 and WPAR registers.
Registers
WPAR (SPR 921)
| Bits | Name | Description |
|---|---|---|
| 0 | BNE | Whether the ring buffer has data (Buffer Not Empty) (Read Only) |
| 1..5 | Reserved | |
| 5..32 | Gather Address | High order bits of the address to gather writes from |
Operation
It operates by redirecting any writes to the gather address to the internal ring buffer. Once the buffer has 32 bytes of data or more, the write gather pipe will actually transfer the data to memory (in chunks of 32 bytes). The destination is the same address - the write gather pipe acts as a proxy between the CPU and the external memory.
The write gather pipe packs writes - it does not insert any sort of padding in order to “align” values in the internal buffer
In the GameCube, it is always pointed at the address of the PI FIFO (0x0C00_8000) and is used to
build display lists.
Flipper
The ATI Flipper is a complex IC with multiple systems embedded into it, such as the DSP, I/O controllers and more. It hosts the graphics processor, also known as the GX, which is the GameCube’s GPU.
- Runs at 162 Mhz (CPU / 3)
Digital Signal Processor
The DSP in the GameCube is a custom Macronix IC embedded within Flipper. It has it’s own instruction set, it’s own internal memory and it’s own quirks.
- 16 bit processor
- Runs at 81 Mhz (CPU / 6)
- Memory: Addressed by words instead of bytes. Harvard architecture.
- Instruction RAM: 8 KiB (4096 words)
- Instruction ROM: 8 KiB (4096 words)
- Data RAM: 8 KiB (4096 words)
- Data ROM: 4 KiB (2048 words)
Memory Map
The GameCube’s physical memory map is as follows:
| Address Range | Size | Region |
|---|---|---|
| 0x0000_0000..0x0180_0000 | 24 MiB | RAM |
| 0x0C00_0000..0x0C00_1000 | 4 KiB | CP Registers |
| 0x0C00_1000..0x0C00_2000 | 4 KiB | PE Registers |
| 0x0C00_2000..0x0C00_3000 | 4 KiB | VI Registers |
| 0x0C00_3000..0x0C00_4000 | 4 KiB | PI Registers |
| 0x0C00_4000..0x0C00_5000 | 4 KiB | MI Registers |
| 0x0C00_5000..0x0C00_6000 | 4 KiB | DSPI Registers |
| 0x0C00_6000..0x0C00_6400 | 1 KiB | DI Registers |
| 0x0C00_6400..0x0C00_6800 | 1 KiB | SI Registers |
| 0x0C00_6800..0x0C00_6C00 | 1 KiB | EXI Registers |
| 0x0C00_6C00..0x0C00_6C10 | 32 B | AI Registers |
| 0x0C00_6C00..0x0C00_8000 | 32 B | GX FIFO |
| 0xE000_0000..0xE000_4000 | 16 KiB | Scratchpad |
| 0xFFF0_0000.. | 1 MiB | IPL (First MiB) |
Real addressing mode is, however, very uncommon. Most games use memory address translation in order to map a logical address to a physical one.
Reserved RAM regions
In the PowerPC architecture, some predefined regions of RAM are reserved for specific usages:
| Address Range | Size | Purpose |
|---|---|---|
| 0x0000_0000..0x0000_0100 | 256 B | Reserved for the OS |
| 0x0000_0100..0x0000_1000 | 3.75 KiB | Exception Vectors |
| 0x0000_1000..0x0000_3000 | 8 KiB | Implementation Specific |
Dolphin OS uses the implementation specific region as an extension to the exception vectors region.
Middle Endian
Some memory mapped registers are middle endian. Yep, you read that right. They are 4 byte long, but divided into two 2 byte consecutive parts low and high (in that order), and these parts themselves are big endian.
In practice, this means the byte significance order is [1, 0, 3, 2] instead of the expected big
endian [3, 2, 1, 0].
Graphics Processor (GX)
The graphics processor (GX or GP) is the GPU within Flipper and is responsible for rendering content into the embedded framebuffer. It operates by reading command lists from memory.
Glossary
Flipper has a lot of acronyms for it’s parts. The following is a list of acronyms for parts that belong to the GX.
- TEV: Texture Environment
- RASN: Rasterizer N (0 or 1)
- PE: Pixel Engine
- SU: Setup Unit
- CP: Command Processor
- XF: Transform Unit
Command Processor (CP)
The command processor (CP) is responsible for fetching and processing GX commands generated by the CPU from an in-memory ring buffer (FIFO).
Registers
All FIFO registers are middle endian.
CP Status (0x0C00_0000, 2 bytes)
Contains status regarding the command processor.
| Bits | Name | Description |
|---|---|---|
| 0 | CP FIFO Overflow | Watermark logic |
| 1 | CP FIFO Underflow | Watermark logic |
| 2 | CP read idle | |
| 3 | CP command idle | |
| 4 | Breakpoint Interrupt | |
| 5..16 | Unused |
CP Control (0x0C00_0002, 2 bytes)
Controls the command processor.
| Bits | Name | Description |
|---|---|---|
| 0 | CP FIFO Read Enable | Whether the CP will read commands from the FIFO |
| 1 | CP FIFO Breakpoint Enable | |
| 2 | CP FIFO Overflow IRQ Enable | |
| 3 | CP FIFO Underflow IRQ Enable | |
| 4 | CP FIFO Linked Mode | Controls the FIFO mode. Is set on reset. |
| 5 | CP FIFO Breakpoint IRQ Enable | |
| 6..16 | Unused |
CP Clear (0x0C00_0004, 2 bytes, write only)
| Bits | Name | Description |
|---|---|---|
| 0 | Clear CP FIFO Overflow | Write 1 to clear status bit 0 |
| 1 | Clear CP FIFO Underflow | Write 1 to clear status bit 1 |
| 2..16 | Unused |
CP FIFO Start (0x0C00_0020, 4 bytes)
Start address of the ring buffer.
CP FIFO End (0x0C00_0024, 4 bytes)
End address of the ring buffer (exclusive).
CP FIFO High Watermark (0x0C00_0028, 4 bytes)
Ring buffer’s high watermark, i.e. the remaining capacity of the CP FIFO that triggers a high watermark interrupt.
CP FIFO Low Watermark (0x0C00_002C, 4 bytes)
Ring buffer’s low watermark, i.e. the remaining capacity of the CP FIFO that triggers a low watermark interrupt.
CP FIFO Count (0x0C00_0030, 4 bytes)
Distance between the FIFO read and write pointers.
CP FIFO Write Pointer (0x0C00_0034, 4 bytes)
Current address of the FIFO write pointer (i.e. where new data is going to be written).
CP FIFO Read Pointer (`0x0C00_00348, 4 bytes)
Current address of the FIFO read pointer (i.e. where new data is going to read from).
Command Processor FIFO
The command processor consumes commands from RAM using a ring buffer mechanism controlled by a bunch of registers. The FIFO has two modes of operations: linked and multi-buffer.
Linked Mode
In this mode, the CP FIFO is linked to the PI FIFO. Whenever a value is written to the PI FIFO, the value of the CP FIFO write pointer is increased by 4.
This way, whenever a command is written to the PI FIFO the CP can process it immediatly. Of course, this only really makes sense if the CP FIFO write pointer is the same as the PI FIFO write pointer.
This mode also contains some special logic called “watermark”:
- If the CP count becomes smaller than the low watermark, then a CP FIFO underflow interrupt is generated.
- If the CP count is greater than the high watermark, then a a CP FIFO overflow interrupt is generated.
Whenever one of these interrupts is active the CP stops processing new commands.
Watermark essentially allows the CP to signal to the system whether it’s close to filling up or close to running out of commands.
Multi-buffer Mode
TODO
Commands
GX commands are identified by a single byte opcode as described in the following table:
| Opcode | Name |
|---|---|
| 0b0000_0000 | NOP |
| 0b0000_1XXX | Set CP register |
| 0b0001_0XXX | Set XF registers |
| 0b0010_0XXX | Set XF registers indexed (A) |
| 0b0010_1XXX | Set XF registers indexed (B) |
| 0b0011_0XXX | Set XF registers indexed (C) |
| 0b0011_1XXX | Set XF registers indexed (D) |
| 0b0100_0XXX | Call |
| 0b0100_1XXX | Invalidate vertex cache |
| 0b0110_0001 | Set BP register |
| 0b1000_0VVV | Draw quads |
| 0b1001_0VVV | Draw triangles |
| 0b1001_1VVV | Draw triangle strip |
| 0b1010_0VVV | Draw triangle fan |
| 0b1010_1VVV | Draw lines |
| 0b1011_0VVV | Draw line strip |
| 0b1011_1VVV | Draw points |
Where VVV is a vertex attribute table index.
These commands are going to be explained in more detail in the following sections. Commands which require extra data will have a table at the end describing it, with rows in the order data is received.
NOP
Does nothing.
Set CP register
Sets a CP register to a given value.
| Size | Name |
|---|---|
| 1 byte | Register |
| 4 bytes | Value |
Set XF registers
Sets XF registers to given values.
| Size | Name |
|---|---|
| 2 bytes | Length - 1 |
| 2 bytes | First address |
| (4 * length) bytes | Values |
Set XF indexed
Sets XF registers to given values.
| Size | Name |
|---|---|
| 2 bytes | Index |
| 2 bytes | Length and Address |
The last 2 bytes have the following format:
| Size | Name |
|---|---|
| 4 bits | Length - 1 |
| 12 bits | First Address |
Call
Calls a command list.
| Size | Name |
|---|---|
| 4 bytes | Address |
| 4 bytes | Count (in words?) |
Invalidate vertex cache
Invalidates the vertex cache.
Set BP register
Sets a BP register to a given value.
| Size | Name |
|---|---|
| 1 byte | Register |
| 3 bytes | Value |
All drawing commands
Draws a series of primitives.
| Size | Name |
|---|---|
| 2 bytes | Vertex count |
| Variable | Vertex attributes |
Vertex Attribute Tables
Pixel Engine (PE)
The pixel engine (PE) is responsible for the final stage of the rendering process, performing effects like anti-aliasing and blending operations on the final image.
Transform Unit (XF)
The transform unit (XF) is responsible for performing transformations on positions, normals and even texture coordinates.
Internal memory
Addressable unit is a word (4 bytes), address space is 16-bit.
| Address Range | Size | Region |
|---|---|---|
| 0x0000..0x0100 | 1 KiB | Position matrix memory |
| 0x0400..0x0460 | 96 B | Normal matrix memory |
| 0x0500..0x0600 | 256 B | Dual texture transform matrix memory |
| 0x0600..0x0680 | 128 B | Light memory |
| 0x1000..0x1057 | 128 B | Internal registers |
Normal memory and light memory only keep the 20 most significant bits of written values.
Position matrix memory
This region is organized as 64 groups of 4 words. Each group represents a column in a matrix, and each column can be used as the beginning of a 4x3 matrix.
Example matrix starting at 0x0000 (values are offsets):
00 04 08
01 05 09
02 06 10
03 07 11
Matrix beginning at 0x0000 is usually the position matrix.
Normal matrix memory
This region is organized as 32 groups of 3 words. Each group represents a column in a matrix, and each column can be used as the beginning of a 3x3 matrix.
Dual texture transform matrix memory
This region is organized exactly the same as the position matrix memory.
Light memory
Contains all lighting information.
Texture Environment (TEV)
The texture environment (TEV) is responsible for blending vertex colors and textures through a fixed function pipeline of at most 16 stages. These stages are the closest thing to shaders in the GX.
Stages
A TEV stage performs a configurable operation on some color inputs and generates an output color that can be fed into the next stage. Stages are executed sequentially. The output of the last stage is the final color.
The TEV has a set of 4 registers, called R0, R1, R2 and R3 (also called PREV), which are shared between all stages and can be used as either input or output of each one of them. Since the color and alpha components are computed separatedly, they can be referred to as R0C, R0A, R1C, R1A, etc.
TEV stages operate on normalized floating-point RGBA color values (i.e. each channel has a value in [0, 1]). The alpha channel is special and can be configured separatedly.
Each TEV stage has 4 input values A, B, C and D. It has only one output, which will be called OUT. The operation of a TEV stage is always of the following form:
value = [sign * (A * (1.0 - C) + B * C) + D + bias] * scale
if (clamp)
OUT = clamp(value)
else
OUT = value
In the operation, sign, bias, scale and clamp are configurable:
| Variable | Possible values |
|---|---|
| Sign | 1 and -1 |
| Bias | 0, 0.5 and -0.5 |
| Scale | 0.5, 1, 2 and 4 |
| Clamp | 0 and 1 (boolean) |
Inputs
Each input value (A, B, C and D) can be configured to be sourced from one of the following values for the color operation:
| Code | Value |
|---|---|
| 0x0 | R3C |
| 0x1 | R3A |
| 0x2 | R0C |
| 0x3 | R0A |
| 0x4 | R1C |
| 0x5 | R1A |
| 0x6 | R2C |
| 0x7 | R2A |
| 0x8 | Texture Color |
| 0x9 | Texture Alpha |
| 0xA | Rasterizer Color |
| 0xB | Rasterizer Alpha |
| 0xC | One |
| 0xD | One half |
| 0xE | Constant |
| 0xF | Zero |
And one of the following values for the alpha operation:
| Code | Value |
|---|---|
| 0x0 | R3A |
| 0x1 | R0A |
| 0x2 | R1A |
| 0x3 | R2A |
| 0x4 | Texture Alpha |
| 0x5 | Rasterizer Alpha |
| 0x6 | Constant |
| 0x7 | Zero |
The texture and rasterizer colors available to a given stage are also configurable.
Outputs
The output value (OUT) can only be put into one of the 4 TEV registers (R0, R1, R2 and R3).
Register formats
TEV Stage Color
This is the format of the GX registers for TEV stages color configuration:
| Bits | Name | Description |
|---|---|---|
| 0..4 | Input D | Source of input D |
| 4..8 | Input C | Source of input C |
| 8..12 | Input B | Source of input B |
| 12..16 | Input A | Source of input A |
| 16..18 | Bias | Bias value (see below) |
| 18 | Negate | Sign is -1 if set, +1 otherwise |
| 19 | Clamp | Enables clamping |
| 20..22 | Scale | Scale value (see below) |
| 22..24 | Out | Destination register of output value (see below) |
Bias value selection:
| Code | Value |
|---|---|
| 0b00 | 0 |
| 0b01 | +0.5 |
| 0b10 | -0.5 |
| 0b11 | Reserved |
Scale value selection:
| Code | Value |
|---|---|
| 0b00 | 1 |
| 0b01 | 2 |
| 0b10 | 4 |
| 0b11 | 0.5 |
Destination register selection:
| Code | Value |
|---|---|
| 0b00 | R3 |
| 0b01 | R0 |
| 0b10 | R1 |
| 0b11 | R2 |
TEV Stage Alpha
This is the format of the GX registers for TEV stages alpha configuration:
| Bits | Name | Description |
|---|---|---|
| 0..2 | Rasterizer swap | Index of swap table to use for rasterizer inputs |
| 2..4 | Texture swap | Index of swap table to use for texture inputs |
| 4..7 | Input D | Source of input D |
| 7..10 | Input C | Source of input C |
| 10..13 | Input B | Source of input B |
| 13..16 | Input A | Source of input A |
| 16..18 | Bias | Bias value (see color format) |
| 18 | Negate | Sign is -1 if set, +1 otherwise |
| 19 | Clamp | Enables clamping |
| 20..22 | Scale | Scale value (see color format) |
| 22..24 | Out | Destination register of output value (see color format) |
Texture Coordinate Generator (TC)
The texture coordinate generator (TC) is responsible for generating texture coordinates to be used by the TEV during texture sampling. The process of generating texture coordinates is called texgen.
There are 8 texture coordinate channels, each configured separatedly.
Video Interface (VI)
The video interface is responsible for outputing a video signal to a display from a framebuffer in main RAM, named the eXternal FrameBuffer (XFB).
For a high level overview on the topic of analog video, take a look at the analog video page.
Registers
VI Vertical Timing (0x0C00_2000, 2 bytes)
This register configures the vertical timing properties of the video signal.
| Bits | Name | Description |
|---|---|---|
| 0..4 | Equalization Pulse Length | Double the length of the EQU pulse in half-lines |
| 4..14 | Active Video Length | Length of the active video section in half-lines |
VI Display Config (0x0C00_2002, 4 bytes)
| Bits | Name | Description |
|---|---|---|
| 0 | Enable | Enable video timing generation and data requests |
| 1 | Reset | Clears all requests and puts the interface into an idle state |
| 2 | Progressive | Whether progressive video mode is enabled (interlaced otherwise) |
| 3 | Stereoscopic | Whether the 3D stereoscopic effect is enabled1 |
| 4..6 | Display Latch 0 Mode | |
| 6..8 | Display Latch 1 Mode | |
| 8..10 | Video Format | Current video format (0 = NTSC, 1 = Pal50, 2 = Pal60, 3 = Debug) |
VI Horizontal Timing 0 (0x0C00_2004, 4 bytes)
This register configures part of the horizontal timing properties of the video signal.
| Bits | Name | Description |
|---|---|---|
| 0..7 | Sync Length | Length of the HSync pulse |
| 7..17 | Sync Start to Blank End Length | Length of interval between HSync start and HBlank end |
| 17..27 | Half-line to Blank Start Length | Length of interval between the half-line and HBlank start |
VI Horizontal Timing 1 (0x0C00_2008, 4 bytes)
This register configures part of the horizontal timing properties of the video signal.
| Bits | Name | Description |
|---|---|---|
| 0..9 | Half-line length | Length of a half-line |
| 16..23 | Sync Start to Color Burst End | Length of interval between HSync start and Color Burst end |
| 24..31 | Sync Start to Color Burst Start | Length of interval between HSync start and Color Burst start |
VI Odd Field Vertical Timing (0x0C00_200C, 4 bytes)
This register configures the vertical timing properties of the video signal of the odd field.
| Bits | Name | Description |
|---|---|---|
| 0..10 | Pre-blanking length | Length of the pre-blanking period, in half-lines |
| 16..26 | Post-blanking length | Length of the pre-blanking period, in half-lines |
VI Even Field Vertical Timing (0x0C00_2010, 4 bytes)
This register configures the vertical timing properties of the video signal of the even field.
Same bits as VI Odd Field Vertical Timing.
VI Top Field Base Register (0x0C00_201C, 4 bytes)
This register configures the address of the XFB for the odd field.
| Bits | Name | Description |
|---|---|---|
| 0..24 | Base | Bits 0..24 of the XFB address in physical memory |
| 24..28 | Horizontal Offset | |
| 28 | Shift | Whether to shift the base address right by 5 |
VI Bottom Field Base Register (0x0C00_2024, 4 bytes)
This register configures the address of the XFB for the even field.
| Bits | Name | Description |
|---|---|---|
| 9..24 | Base | Bits 9..24 of the XFB address in physical memory |
| 28 | Shift | Whether to shift the base address right by 5 |
VI Current Line (Vertical Counter) (0x0C00_202C, 2 bytes)
This register contains a counter for the current line in the frame. It starts at 1 and increases every HSync, up to the number of lines per frame. It is reset at the start of a new frame.
VI Current Sample (Horizontal Counter) (0x0C00_202E, 2 bytes)
This register contains a counter for the current sample in the line. It starts at 1 and increases every sample, up to the number of samples per frame. It is reset at the start of a new line.
VI Display Interrupt 0 (0x0C00_2030, 4 bytes)
This register configures the VI interrupt 0.
| Bits | Name | Description |
|---|---|---|
| 0..9 | Horizontal Target | Target value for the current sample |
| 16..25 | Vertical Target | Target value for the current line |
| 28 | Mask | Whether this interrupt is enabled |
| 31 | Status | Whether this interrupt is active |
VI Display Interrupt 1 (0x0C00_2034, 4 bytes)
This register configures the VI interrupt 1.
Same bits as VI Display Interrupt 0.
VI Display Interrupt 2 (0x0C00_2038, 4 bytes)
This register configures the VI interrupt 2.
Same bits as VI Display Interrupt 0.
VI Display Interrupt 3 (0x0C00_203C, 4 bytes)
This register configures the VI interrupt 3.
Same bits as VI Display Interrupt 0.
-
Nothing uses this. ↩
Analog Video
This page is a very high level overview of analog video, and I’m not very knowledgeable on the topic. Take everything here with a grain of salt.
For more information, consult the analog video resources.
Analog video works by scanning an image through a screen from left to right, top to bottom. In order to produce the correct image, the scanning beam must change it’s “color” according to it’s position, which is done by timing parts of the scanning process.
The scanning beam is controlled by an analog video signal, which can be thought of as being composed of three component signals:
- Color signal, which contains the actual color information (called samples)
- Horizontal synchronization (HSync) signal, which tells the screen to take the scanning beam back to the start of the next scan line
- Vertical synchronization (VSync) signal, which tells the screen to take the scanning beam back to the top of the screen so it can start scanning out a new image
There’s two ways to scan an image out: progressive and interlaced. First, we’ll take a look at progressive video, since it’s simpler to understand and works as a base for interlaced.
Progressive Video
In progressive video, the whole image is scanned out at once. The video signal starts scanning out a line of samples (i.e. a row of image data), which goes roughly like this:
│────── H. Back Porch ──────│
│───── H. Front Porch ──────│
│───────────────── Horizontal Blank ────────────────────│
Color:
1 ╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷ ╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷
0 ──────────────────────────────────┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴───────┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴
Color Burst Active Video
HSync:
1 ┌───────────────────┐
0 ────────┘ └───────────────────────────────────────────────────────
VSync:
1
0 ────────────────────────────────────────────────────────────────────────────────────
This diagram represents what the video signal is emitting and what section these parts belong to. Note that this diagram is cyclic (i.e. it repeats once it reaches the end).
The signal starts out by emitting a HSync pulse, telling the screen to move the beam to the start of the next line. Then, there’s a burst of color information (the color burst), used to “calibrate” the color level of the screen. Finally, the actual color information is sent (active video).
The process can be split into three sections:
- Front Porch: Period between end of active video and the end of HSync pulse.
- Back Porch: Period between end of HSync pulse and start of active video. This section exists to give time for the beam to move back to the start of the line.
- Horizontal Blank: This is any period outside of the active video signal.
This process repeats for each scan line until the last one, where things go a little differently: there’s no next line to scan out - the beam needs to go back to the top, where the scanning process can restart. This is when VSync comes into play:
│──────────────────────── Vertical Blank ──────────────────────────│
Color:
1 ╷╷╷╷╷╷ ╷╷╷ ╷╷╷╷╷╷
0 ┴┴┴┴┴┴─────────────────────────────────────────────────────────────┴┴┴──┴┴┴┴┴┴──────
AV B AV
HSync:
1 ┌───────┐ ┌───────┐ ┌───────┐ ┌───
0 ────────┘ └───────────────┘ └───────────────┘ └───────────────┘
VSync:
1 ┌───────────────────────────────────────────────────────┐
0 ────────┘ └───────────────────
Similarly to the first diagram, this diagram represents what the video signal is emitting when it reaches the end of a scan.
The signal emits a VSync pulse, telling the screen to move the beam back to the top of the screen. The VSync lasts for a few scan lines, where nothing is actually scanned out.
The Vertical Blank is the period between the end of the last visible line in a scan and the start of first visible line in the next scan.
If we zoom out and take a look at a whole scan, it would look something like this:
Color:
1 ╷╷╷╷╷╷╷╷ ╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷
0 ┴┴┴┴┴┴┴┴────────┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴
HSync:
1 ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷ ╷
0 ┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──
VSync:
1 ┌───────┐
0 ────────┘ └───────────────────────────────────────────────────────────────────
Interlaced Video
Interlaced video is like progressive video, except it splits the scan (i.e. the image) into two parts with alternating scan lines, called fields.
The odd field scans out odd lines, while the even field scans out even lines. Together, these fields make up a whole scan.
Processor Interface (PI)
The processor interface is responsible for connecting the CPU to the Flipper.
Flipper Interrupts
The PI receives interrupt requests from Flipper and passes them along to the CPU.
FIFO
Within the PI exists a FIFO mechanism that catches burst writes1 to 0x0C00_8000 and writes
them to a ring buffer in memory. It’s controlled by three registers: FIFO Start, FIFO End and
FIFO Current.
Registers
PI Interrupt Cause (0x0C00_3000, 4 bytes)
Contains which external interrupt causes are active. This register is read-only: in order to acknowledge an interrupt, you must write to the controlling status bit of the interrupt specific register.
| Bits | Name | Description |
|---|---|---|
| 0 | GP Error | Graphics Processor runtime error |
| 1 | Reset | Reset switch was pressed |
| 2 | DVDI | DVD interface |
| 3 | SI | Serial interface |
| 4 | EXI | External Interface |
| 5 | AI | Audio Interface |
| 6 | DSPI | DSP Interface |
| 7 | MEM | Memory Interface |
| 8 | VI | Video Interface |
| 9 | PE Token | Token assertion in command list |
| 10 | PE Finish | Frame ready |
| 11 | CP | Command Processor FIFO |
| 12 | Debug | External Debugger |
| 13 | HSP | High Speed Port |
| 14..16 | Reserved | |
| 16 | Reset State | |
| 17..32 | Reserved |
PI Interrupt Mask (0x0C00_3004, 4 bytes)
Masks PI interrupts, describing which ones are allowed to be raised.
| Bits | Name | Description |
|---|---|---|
| 0 | GP Error | Graphics Processor runtime error |
| 1 | Reset | Reset switch was pressed |
| 2 | DVDI | DVD interface |
| 3 | SI | Serial interface |
| 4 | EXI | External Interface |
| 5 | AI | Audio Interface |
| 6 | DSPI | DSP Interface |
| 7 | MEM | Memory Interface |
| 8 | VI | Video Interface |
| 9 | PE Token | Token assertion in command list |
| 10 | PE Finish | Frame ready |
| 11 | CP | Command Fifo |
| 12 | Debug | External Debugger |
| 13 | HSP | High Speed Port |
PI FIFO Start (0x0C00_300C, 4 bytes)
| Bits | Name | Description |
|---|---|---|
| 0..5 | Zeroed | |
| 5..27 | Start | The start address of the ring buffer |
| 27..32 | Unknown |
PI FIFO End (0x0C00_3010, 4 bytes)
| Bits | Name | Description |
|---|---|---|
| 0..5 | Zeroed | |
| 5..27 | End | The end address of the ring buffer (exclusive) |
| 27..32 | Unknown |
PI FIFO Current (0x0C00_3014, 4 bytes)
| Bits | Name | Description |
|---|---|---|
| 0..5 | Zeroed | |
| 5..27 | Current | The current address for writing the next 32 bytes of data |
| 27 | Wrapped | Whether the current address reached the end and wrapped around to the start |
| 28..32 | Unknown |
Wrapped is cleared only on CPU writes to the register.
-
These burst writes are executed by the Write Gather Pipe. ↩
DSP Interface (DSPI)
The DSP interface controls the DSP in the Flipper and allows the CPU to communicate with it through mailboxes and DMAs.
Registers
Mailbox registers
The mailbox registers are used to send and receive small 31 bit messages to/from the DSP. Software often interacts with mailboxes as if they were split into two 2 byte parts (high and low).
DSP Mailbox (0x0C00_5000, 4 bytes)
Mailbox used to send data from the CPU to the DSP. High part is always written first, which probably means a transfer only starts when both parts are written to (otherwise the DSP could read partial data).
| Bits | Name | Description |
|---|---|---|
| 0..31 | Data | [W] Send data to the DSP mailbox |
| 31 | Status | [R] Whether the transfer is ongoing |
CPU Mailbox (0x0C00_5004, 4 bytes)
Mailbox used to receive data from the DSP.
| Bits | Name | Description |
|---|---|---|
| 0..31 | Data | [R] Receive data from the DSP |
| 31 | Status | [R] Whether there’s new data from the DSP |
DSP Control (0x0C00_500A, 2 bytes)
Controls the DSP and also contains some ARAM and AI related bits.
| Bits | Name | Description |
|---|---|---|
| 0 | Reset | [W] Reset DSP |
| [R] Whether DSP is resetting | ||
| 1 | Interrupt | [W] Assert DSP PI interrupt |
| 2 | Halt | [W] Halt (1) or unhalt (0) DSP |
| [R] Whether DSP is halted | ||
| 3 | AI interrupt status | [W] Clear if set |
| [R] Whether an AI interrupt is pending | ||
| 4 | AI interrupt mask | AI interrupt assertion allowed |
| 5 | ARAM interrupt status | [W] Clear if set |
| [R] Whether an ARAM interrupt is pending | ||
| 6 | ARAM interrupt mask | ARAM interrupt assertion allowed |
| 7 | DSP interrupt status | [W] Clear if set |
| [R] Whether a DSP interrupt is pending | ||
| 8 | DSP interrupt mask | DSP interrupt assertion allowed |
| 9 | DSP DMA status | [R] Whether ARAM DMA is ongoing |
| 10 | Unknown | |
| 11 | Also seems to be used as a reset bit |
The ARAM interrupt is raised whenever the ARAM DMA completes.
ARAM Size (0x0C00_5012, 2 bytes)
ARAM Mode (0x0C00_5016, 2 bytes)
ARAM Refresh (0x0C00_501A, 2 bytes)
ARAM DMA RAM Address (0x0C00_5020, 4 bytes)
Contains the physical address of the DMA transfer in main RAM.
ARAM DMA ARAM Address (0x0C00_5024, 4 bytes)
Contains the address of the DMA transfer in ARAM.
ARAM DMA Control (0x0C00_5028, 2 bytes)
Controls the ARAM DMA.
| Bits | Name | Description |
|---|---|---|
| 0..15 | Length | Length of the transfer in words |
| 15 | Direction | Whether to read from ARAM instead of writing |
Writing to this register triggers a DMA transfer if the length field is non-zero.
ABI
Software appears to use the PowerPC embedded ABI (or simply EABI), which is a modified version of the PowerPC ABI supplement to System V (described in this document) designed for embedded systems.
The EABI is described in detail in this document, but this page will give an overview over the most important parts of it.
Registers
Registers have three kinds:
- Volatile: These registers are general purpose and do not need to be preserved by routines.
- Preserved: These registers are general purpose and must be preserved by routines.
- Dedicated: These registers have a special purpose in the ABI, and must also be preserved by routines.
General purpose registers
| Register | Kind | Purpose |
|---|---|---|
| R00 | Volatile | Language specific |
| R01 | Dedicated | Stack pointer |
| R02 | Dedicated | Read only small data base address |
| R03 | Volatile | First parameter, first return value |
| R04 | Volatile | Second parameter, second return value |
| R05..R11 | Volatile | Parameters after R3 & R4 |
| R11..R13 | Volatile | Temporaries |
| R14.. | Preserved | General |
Floating point registers
| Register | Kind | Purpose |
|---|---|---|
| F00 | Volatile | Language specific |
| F01 | Volatile | First parameter, first (and only) float return value |
| F02..F09 | Volatile | Parameters after F01 |
| F09..F14 | Volatile | Temporaries |
| F14.. | Preserved | General |
Condition register fields
All fields in the condition register are volatile except for CR2..CR5, which are preserved.
Other registers
All other registers are volatile.
Stack Frames
The stack pointer points to the lowest word of the current stack frame (i.e. the stack grows down). Stack frames are always aligned on double words (8 bytes). Their format is as follows:
| Address | Size | Description |
|---|---|---|
| sp | 4 | Previous sp |
| sp + 4 | 4 | Return address |
| sp + 8 | Variable | Routine specific |
Registers are callee saved. As a side effect, if a routine is a leaf and does not need the stack or any preserved registers, it can skip creating a stack frame.
Resources
This page is a collection of links that might be referenced through the book or that otherwise are useful as additional material. No particular order.
Documentation
YAGCD - Yet Another GameCube Documentation
Documentation on all aspects of the GameCube.
https://www.gc-forever.com/yagcd
IBM PowerPC 750CL User Manual
Manual for the commercial variant of the GameCube’s CPU.
GameCube’s CPU, the PowerPC Gekko, has it’s own manual. It is, however, marked as “confidential”. That doesn’t make it any harder to find with a simple google search - it’s confidentiality is up to debate.
https://fail0verflow.com/media/files/ppc_750cl.pdf
PowerPC Programming Environments Manual
Manual that describes common features of the PowerPC architecture in detail. CPU manuals often refer to it instead of explaining stuff themselves.
http://refspecs.linux-foundation.org/PPC_hrm.2005mar31.pdf
GameCube Architecture - A pratical analysis
An overview of the GameCube’s architecture. A great starting point for getting to know the system.
https://www.copetti.org/writings/consoles/gamecube/
Dolwin Docs
Collection of docs shared by the developers of the Dolwin emulator.
https://github.com/ogamespec/dolwin-docs/tree/master
libogc API documentation
Documentation of devkitPro’s libogc
https://libogc.devkitpro.org/api_doc.html
Dolphin Emulator
The most comprehensive GameCube/Wii emulator available.
- Blog: https://dolphin-emu.org/blog/
- Source Code: https://github.com/dolphin-emu/dolphin
DenSinH’s GameCube Resources
https://github.com/DenSinH/GameCubeResources
Nintendo GameCube device tree - Linux Kernel Docs
https://www.kernel.org/doc/Documentation/devicetree/bindings/powerpc/nintendo/gamecube.txt
PowerPC ABI supplement to System V ABI
http://math-atlas.sourceforge.net/devel/assembly/elfspec_ppc.pdf
Developing PowerPC Embedded Application Binary Interface (EABI) Compliant Programs
http://class.ece.iastate.edu/arun/Cpre381_Sp06/lab/labw12a/eabi_app.pdf
GameCube Flipper semidocumentation
Small page about working with GX.
http://www.amnoid.de/gc/tev.html
Making a Wii game in 2024
Mostly talks about working with GX.
https://blog.allpurposem.at/making-a-wii-game-in-2024
Analog Video
Resources on analog video signals, useful for understanding the video interface.
https://www.sciencedirect.com/topics/engineering/composite-video-signal
https://www.analog.com/en/resources/technical-articles/basics-of-analog-video.html
https://www.analog.com/en/resources/technical-articles/understanding-analog-video-signals.html
Tests, Examples and Demos
Dolphin’s Hardware Test Suite
https://github.com/dolphin-emu/hwtests
devkitPro’s GameCube Examples
https://github.com/devkitPro/gamecube-examples
panda.dol
Extremely simple program, probably the best first thing an emulator should try to run!
https://github.com/sliice-emu/test-roms/blob/main/panda/source/panda.asm