Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This book is a collection of notes about the Nintendo GameCube, geared towards emulator development. Currently, it is not sufficient on it’s own - please see the resources page for more.

Note

While information in this book is supposed to be correct, it might just not be! Documentation on the gamecube is scarce and it’s not uncommon for it to be wrong.

These notes are mostly just me trying to share what I’ve learned and found while developing my own GameCube emulator. If something is wrong (or missing), sorry! Consider opening an issue or creating a pull request in that case. Help cubenotes become better!

General Conventions

  • Least significant bit is 0 (unlike PowerPC manuals where the most significant bit is 0)
  • Ranges of values follow Rust’s syntax, i.e. a..b is a range from a (inclusive) to b (exclusive) and a..=b is a range from a (inclusive) to b (inclusive). If either a or b are missing, the ranges start/end at their respective limits (i.e. start or end of valid range)
  • Hexadecimal values are always prefixed with 0x and separated with an underline every 4 digits, e.g. 0xDEAD_BEEF
  • Likewise, binary values are always prefixed with 0b and separated with an underline every 4 digits, e.g. 0b1111_0000_1010_0011
  • Addresses are physical unless stated otherwise
  • Every MMIO register is both readable and writable unless stated otherwise
  • Every bit flag is 0 for false and 1 for true unless stated otherwise
  • Descriptions prefixed with [W] are only valid for writes while those prefixed with [R] are only valid for reads

These are not all of the conventions used in cubenotes, other conventions will be listed when appropriate.

Overview

The Nintendo GameCube is a console released in 2001/2002 and is the successor to the Nintendo 64. It belongs to the sixth generation of consoles, together with others like Sony’s Playstation 2 and Microsoft’s Xbox.

While the GameCube is frequentely overshadowed by it’s big brother, the Nintendo Wii, it is a very charming console, powerful for it’s time.

Some of it’s best selling games are:

  • Super Smash Bros. Melee
  • Mario Kart: Double Dash
  • Super Mario Sunshine
  • The Legend of Zelda: The Wind Waker
  • Luigi’s Mansion
  • Metroid Prime
  • Animal Crossing

And these are just a few of the approximately 650 games released for the system!

Hardware

Memory: 43 MiB total

  • RAM: 24 MiB (2 x 12 MiB) 1T-SRAM1 running at 324 MHz (CPU / 1.5) (Flipper * 2)
  • VRAM: 3 MiB 1T-SRAM1 memory embedded within Flipper, 2 MiB for framebuffers and 1 MiB for textures
  • ARAM: 16 MiB DRAM connected to Flipper, used as Auxiliary RAM, usually for audio

DVD Reader

  • Reads miniDVD sized Nintendo optical discs
  • Discs are 1.46 GiB, read at constant angular velocity (in practice, this means data in the outer part of the disc is read faster than data in the inner part)

Motherboard

Here’s an overview of the GameCube’s motherboard, with the some important components labeled:

And here’s a diagram of the architecture:

A note on buses:

  • Northbridge: Connects the CPU to the Flipper. It is 64-bit and runs on the Flipper clock.
  • Southbridge: Connects both 12 MiB 1T-SRAM chips to the Flipper. It is 64-bit and runs at double the Flipper clock.
  • Eastbridge: Connects ARAM to the Flipper. It is 8-bit and runs at half the Flipper clock.

  1. A kind of pseudo-static RAM (PSRAM). Internally it’s just DRAM, but it’s made to behave like SRAM from an outside point of view. ↩2

Gekko (CPU)

The GameCube CPU is the IBM PowerPC Gekko:

  • Runs at 486 Mhz
  • 32-bit
  • 64-bit Floating Point Unit
  • 32 KiB L1 Instruction Cache
  • 32 KiB L1 Data Cache1
  • 256 KiB L2 Unified Cache
  • Single core
  • Pipelined (instructions are divided into multiple stages of execution, allowing multiple instructions to be at different stages at the same time)
  • Superscalar (contains multiple execution units which can work on independent instructions at the same time)
  • Contains a SIMD extension geared torwards 3D graphics called “Paired Singles”

  1. The data cache can be split into two 16 KiB sections and one of them can be mapped into memory to be used as a “scratchpad” (super fast RAM).

Memory Mapping in the Gekko

The Gekko has two main ways of mapping logical addresses to physical ones. The first is a mechanism called BATs (Block Address Translation), which most games use exclusively. The second one is a classic segmentation mechanism with page tables.

Most games do not use page tables and, instead, just use the default BAT configuration provided by the Dolphin OS.

Note

PowerPC manuals refer to an untranslated address as an effective address, which cubenotes instead calls a logical address.

Warning

BATs and page tables/segmentation are not exclusive and, therefore, used together. Translation using BATs has precedence over translation using page tables, and if both fail then an exception occurs.

Block Address Translation (BAT)

The BAT mechanism allows the Gekko to define up to 8 “blocks” which map a range of logical addresses to a range of physical addresses. These 8 blocks are split into two: one half dedicated to instruction address translation and the other half dedicated to data address translation.

Each block is controlled by a pair of registers, e.g. IBAT2U and IBAT2L control the [I]nstruction BAT block 2. Blocks have three main properties:

Block length

The length of the block, ranging from the minimum of 128 KiB to the maximum of 256 MiB, stepping in powers of two (i.e. 128 Kib, 256 Kib, …, 128 MiB, 256 MiB).

Logical address start

This is the start of the block in the logical address space. This value must be a multiple of the block length.

Physical address start

This is the start of the block in the physical address space. This value must be a multiple of the block length.

Warning

Active BAT blocks should not overlap. If they do, the behaviour is unspecified.

Segmentation and Page Tables

The segmentation mechanism works by splitting the logical address space into 16 contiguous segments of 256 MiB. These segments are then subdivided into 4 KiB pages. Each segment is associated with a segment descriptor register which controls how to map a logical address in it to a virtual address, which is 52 bit.

The most significant 40 bits of a virtual address are what’s called a virtual page number. This value is used to search for a corresponding entry in the page table, which will then map it to a physical page number, which is 20 bit. Replacing the virtual page number with the physical page number yields the physical address and the result of the translation.

Page Table

The page table is hash table of entry groups, each containing 8 entries.

The translation pipeline

The BAT mechanism always take precedence over the segmentation mechanism.

Exceptions

Exceptions in the PowerPC architecture are defined as a mechanism which allow the CPU to change state to deal with unusual conditions which might arise during execution or come from external sources. There are multiple exception kinds.

Classification of exceptions is a somewhat confusing topic in the PowerPC manuals, so here’s the convention used in cubenotes:

ClassificationCause
Internal ExceptionInstruction Execution
External Exception (Interrupt)Anything else

Exception Kinds

VectorExceptionClassification
0x0100ResetInterrupt1
0x0200Machine CheckInterrupt1
0x0300DSIInternal Exception
0x0400ISIInternal Exception
0x0500External InterruptInterrupt
0x0600AlignmentInternal Exception
0x0700ProgramInternal Exception
0x0800Floating Point UnavailableInternal Exception
0x0900DecrementerInterrupt
0x0C00System CallInternal Exception
0x0D00TraceInternal Exception
0x0F00Performance MonitorInternal Exception
0x1300BreakpointInternal Exception

For more information regarding exception kinds, consult the PowerPC manuals.

Masking Exceptions

Some exception kinds may be masked (i.e. disabled) depending on CPU configuration:

  • Floating point exceptions may be disabled depending on MSR.FE0 and MSR.FE1
  • Maskable interrupts must be enabled through MSR.EE

Processing an Exception

When an exception occurs, the CPU saves it’s current state and starts executing at an address specific to the kind of the exception called the exception vector.

Saving State

Whenever the CPU identifies an exception, it handles it by first saving the current state of the CPU to the SRR0 and SRR1 registers.

SRR0: Address to resume after exception handling

The SRR0 register is updated to contain the address where execution should resume once the exception handler is finished.

Usually, this is the address of the instruction which either caused the exception or, for exceptions caused by external sources, that would execute when the exception happened. Some exception kinds, however, have it as the address of the instruction right after that.

SRR1: Machine Status

The SRR1 register is updated to contain parts of the machine status register and, sometimes, extra information that’s exception specific.

For most instructions, bits 0..16 and 22..27 contain the corresponding bits in MSR, while bits 27..30 and 19..22 contain exception specific information.

Update the Machine State Register

MSR is updated to represent the new context of execution. This is equivalent to zeroing out all of it’s bits, except for MSR.ILE, MSR.ME, MSR.IP and MSR.LE (which gets assigned MSR.ILE).

Resume execution

Execution resumes as normal. Eventually, the exception handler will execute a rfi instruction, which will then return to the address in SRR0 and restore MSR to it’s original state by copying bits of SRR1 into it.


  1. These interrupts are non maskable, i.e. cannot be disabled by MSR.EE ↩2

Write Gather Pipe

The write gather pipe is a CPU mechanism for transferring bursts of 32-byte data to external memory. It has a 128 byte ring buffer and is controlled by the HID2 and WPAR registers.

Registers

WPAR (SPR 921)

BitsNameDescription
0BNEWhether the ring buffer has data (Buffer Not Empty) (Read Only)
1..5Reserved
5..32Gather AddressHigh order bits of the address to gather writes from

Operation

It operates by redirecting any writes to the gather address to the internal ring buffer. Once the buffer has 32 bytes of data or more, the write gather pipe will actually transfer the data to memory (in chunks of 32 bytes). The destination is the same address - the write gather pipe acts as a proxy between the CPU and the external memory.

Note

The write gather pipe packs writes - it does not insert any sort of padding in order to “align” values in the internal buffer

In the GameCube, it is always pointed at the address of the PI FIFO (0x0C00_8000) and is used to build display lists.

Flipper

The ATI Flipper is a complex IC with multiple systems embedded into it, such as the DSP, I/O controllers and more. It hosts the graphics processor, also known as the GX, which is the GameCube’s GPU.

  • Runs at 162 Mhz (CPU / 3)

Digital Signal Processor

The DSP in the GameCube is a custom Macronix IC embedded within Flipper. It has it’s own instruction set, it’s own internal memory and it’s own quirks.

  • 16 bit processor
  • Runs at 81 Mhz (CPU / 6)
  • Memory: Addressed by words instead of bytes. Harvard architecture.
    • Instruction RAM: 8 KiB (4096 words)
    • Instruction ROM: 8 KiB (4096 words)
    • Data RAM: 8 KiB (4096 words)
    • Data ROM: 4 KiB (2048 words)

Memory Map

The GameCube’s physical memory map is as follows:

Address RangeSizeRegion
0x0000_0000..0x0180_000024 MiBRAM
0x0C00_0000..0x0C00_10004 KiBCP Registers
0x0C00_1000..0x0C00_20004 KiBPE Registers
0x0C00_2000..0x0C00_30004 KiBVI Registers
0x0C00_3000..0x0C00_40004 KiBPI Registers
0x0C00_4000..0x0C00_50004 KiBMI Registers
0x0C00_5000..0x0C00_60004 KiBDSPI Registers
0x0C00_6000..0x0C00_64001 KiBDI Registers
0x0C00_6400..0x0C00_68001 KiBSI Registers
0x0C00_6800..0x0C00_6C001 KiBEXI Registers
0x0C00_6C00..0x0C00_6C1032 BAI Registers
0x0C00_6C00..0x0C00_800032 BGX FIFO
0xE000_0000..0xE000_400016 KiBScratchpad
0xFFF0_0000..1 MiBIPL (First MiB)

Real addressing mode is, however, very uncommon. Most games use memory address translation in order to map a logical address to a physical one.

Reserved RAM regions

In the PowerPC architecture, some predefined regions of RAM are reserved for specific usages:

Address RangeSizePurpose
0x0000_0000..0x0000_0100256 BReserved for the OS
0x0000_0100..0x0000_10003.75 KiBException Vectors
0x0000_1000..0x0000_30008 KiBImplementation Specific

Dolphin OS uses the implementation specific region as an extension to the exception vectors region.

Middle Endian

Some memory mapped registers are middle endian. Yep, you read that right. They are 4 byte long, but divided into two 2 byte consecutive parts low and high (in that order), and these parts themselves are big endian.

In practice, this means the byte significance order is [1, 0, 3, 2] instead of the expected big endian [3, 2, 1, 0].

Graphics Processor (GX)

The graphics processor (GX or GP) is the GPU within Flipper and is responsible for rendering content into the embedded framebuffer. It operates by reading command lists from memory.

Glossary

Flipper has a lot of acronyms for it’s parts. The following is a list of acronyms for parts that belong to the GX.

  • TEV: Texture Environment
  • RASN: Rasterizer N (0 or 1)
  • PE: Pixel Engine
  • SU: Setup Unit
  • CP: Command Processor
  • XF: Transform Unit

Command Processor (CP)

The command processor (CP) is responsible for fetching and processing GX commands generated by the CPU from an in-memory ring buffer (FIFO).

Registers

Warning

All FIFO registers are middle endian.

CP Status (0x0C00_0000, 2 bytes)

Contains status regarding the command processor.

BitsNameDescription
0CP FIFO OverflowWatermark logic
1CP FIFO UnderflowWatermark logic
2CP read idle
3CP command idle
4Breakpoint Interrupt
5..16Unused

CP Control (0x0C00_0002, 2 bytes)

Controls the command processor.

BitsNameDescription
0CP FIFO Read EnableWhether the CP will read commands from the FIFO
1CP FIFO Breakpoint Enable
2CP FIFO Overflow IRQ Enable
3CP FIFO Underflow IRQ Enable
4CP FIFO Linked ModeControls the FIFO mode. Is set on reset.
5CP FIFO Breakpoint IRQ Enable
6..16Unused

CP Clear (0x0C00_0004, 2 bytes, write only)

BitsNameDescription
0Clear CP FIFO OverflowWrite 1 to clear status bit 0
1Clear CP FIFO UnderflowWrite 1 to clear status bit 1
2..16Unused

CP FIFO Start (0x0C00_0020, 4 bytes)

Start address of the ring buffer.

CP FIFO End (0x0C00_0024, 4 bytes)

End address of the ring buffer (exclusive).

CP FIFO High Watermark (0x0C00_0028, 4 bytes)

Ring buffer’s high watermark, i.e. the remaining capacity of the CP FIFO that triggers a high watermark interrupt.

CP FIFO Low Watermark (0x0C00_002C, 4 bytes)

Ring buffer’s low watermark, i.e. the remaining capacity of the CP FIFO that triggers a low watermark interrupt.

CP FIFO Count (0x0C00_0030, 4 bytes)

Distance between the FIFO read and write pointers.

CP FIFO Write Pointer (0x0C00_0034, 4 bytes)

Current address of the FIFO write pointer (i.e. where new data is going to be written).

CP FIFO Read Pointer (`0x0C00_00348, 4 bytes)

Current address of the FIFO read pointer (i.e. where new data is going to read from).

Command Processor FIFO

The command processor consumes commands from RAM using a ring buffer mechanism controlled by a bunch of registers. The FIFO has two modes of operations: linked and multi-buffer.

Linked Mode

In this mode, the CP FIFO is linked to the PI FIFO. Whenever a value is written to the PI FIFO, the value of the CP FIFO write pointer is increased by 4.

This way, whenever a command is written to the PI FIFO the CP can process it immediatly. Of course, this only really makes sense if the CP FIFO write pointer is the same as the PI FIFO write pointer.

This mode also contains some special logic called “watermark”:

  • If the CP count becomes smaller than the low watermark, then a CP FIFO underflow interrupt is generated.
  • If the CP count is greater than the high watermark, then a a CP FIFO overflow interrupt is generated.

Whenever one of these interrupts is active the CP stops processing new commands.

Watermark essentially allows the CP to signal to the system whether it’s close to filling up or close to running out of commands.

Multi-buffer Mode

TODO

Commands

GX commands are identified by a single byte opcode as described in the following table:

OpcodeName
0b0000_0000NOP
0b0000_1XXXSet CP register
0b0001_0XXXSet XF registers
0b0010_0XXXSet XF registers indexed (A)
0b0010_1XXXSet XF registers indexed (B)
0b0011_0XXXSet XF registers indexed (C)
0b0011_1XXXSet XF registers indexed (D)
0b0100_0XXXCall
0b0100_1XXXInvalidate vertex cache
0b0110_0001Set BP register
0b1000_0VVVDraw quads
0b1001_0VVVDraw triangles
0b1001_1VVVDraw triangle strip
0b1010_0VVVDraw triangle fan
0b1010_1VVVDraw lines
0b1011_0VVVDraw line strip
0b1011_1VVVDraw points

Where VVV is a vertex attribute table index.

These commands are going to be explained in more detail in the following sections. Commands which require extra data will have a table at the end describing it, with rows in the order data is received.

NOP

Does nothing.

Set CP register

Sets a CP register to a given value.

SizeName
1 byteRegister
4 bytesValue

Set XF registers

Sets XF registers to given values.

SizeName
2 bytesLength - 1
2 bytesFirst address
(4 * length) bytesValues

Set XF indexed

Sets XF registers to given values.

SizeName
2 bytesIndex
2 bytesLength and Address

The last 2 bytes have the following format:

SizeName
4 bitsLength - 1
12 bitsFirst Address

Call

Calls a command list.

SizeName
4 bytesAddress
4 bytesCount (in words?)

Invalidate vertex cache

Invalidates the vertex cache.

Set BP register

Sets a BP register to a given value.

SizeName
1 byteRegister
3 bytesValue

All drawing commands

Draws a series of primitives.

SizeName
2 bytesVertex count
VariableVertex attributes

Vertex Attribute Tables

Pixel Engine (PE)

The pixel engine (PE) is responsible for the final stage of the rendering process, performing effects like anti-aliasing and blending operations on the final image.

Transform Unit (XF)

The transform unit (XF) is responsible for performing transformations on positions, normals and even texture coordinates.

Internal memory

Addressable unit is a word (4 bytes), address space is 16-bit.

Address RangeSizeRegion
0x0000..0x01001 KiBPosition matrix memory
0x0400..0x046096 BNormal matrix memory
0x0500..0x0600256 BDual texture transform matrix memory
0x0600..0x0680128 BLight memory
0x1000..0x1057128 BInternal registers

Normal memory and light memory only keep the 20 most significant bits of written values.

Position matrix memory

This region is organized as 64 groups of 4 words. Each group represents a column in a matrix, and each column can be used as the beginning of a 4x3 matrix.

Example matrix starting at 0x0000 (values are offsets):

00 04 08
01 05 09
02 06 10
03 07 11

Matrix beginning at 0x0000 is usually the position matrix.

Normal matrix memory

This region is organized as 32 groups of 3 words. Each group represents a column in a matrix, and each column can be used as the beginning of a 3x3 matrix.

Dual texture transform matrix memory

This region is organized exactly the same as the position matrix memory.

Light memory

Contains all lighting information.

Texture Environment (TEV)

The texture environment (TEV) is responsible for blending vertex colors and textures through a fixed function pipeline of at most 16 stages. These stages are the closest thing to shaders in the GX.

Stages

A TEV stage performs a configurable operation on some color inputs and generates an output color that can be fed into the next stage. Stages are executed sequentially. The output of the last stage is the final color.

The TEV has a set of 4 registers, called R0, R1, R2 and R3 (also called PREV), which are shared between all stages and can be used as either input or output of each one of them. Since the color and alpha components are computed separatedly, they can be referred to as R0C, R0A, R1C, R1A, etc.

TEV stages operate on normalized floating-point RGBA color values (i.e. each channel has a value in [0, 1]). The alpha channel is special and can be configured separatedly.

Each TEV stage has 4 input values A, B, C and D. It has only one output, which will be called OUT. The operation of a TEV stage is always of the following form:

value = [sign * (A * (1.0 - C) + B * C) + D + bias] * scale
if (clamp)
    OUT = clamp(value)
else
    OUT = value

In the operation, sign, bias, scale and clamp are configurable:

VariablePossible values
Sign1 and -1
Bias0, 0.5 and -0.5
Scale0.5, 1, 2 and 4
Clamp0 and 1 (boolean)

Inputs

Each input value (A, B, C and D) can be configured to be sourced from one of the following values for the color operation:

CodeValue
0x0R3C
0x1R3A
0x2R0C
0x3R0A
0x4R1C
0x5R1A
0x6R2C
0x7R2A
0x8Texture Color
0x9Texture Alpha
0xARasterizer Color
0xBRasterizer Alpha
0xCOne
0xDOne half
0xEConstant
0xFZero

And one of the following values for the alpha operation:

CodeValue
0x0R3A
0x1R0A
0x2R1A
0x3R2A
0x4Texture Alpha
0x5Rasterizer Alpha
0x6Constant
0x7Zero

The texture and rasterizer colors available to a given stage are also configurable.

Outputs

The output value (OUT) can only be put into one of the 4 TEV registers (R0, R1, R2 and R3).

Register formats

TEV Stage Color

This is the format of the GX registers for TEV stages color configuration:

BitsNameDescription
0..4Input DSource of input D
4..8Input CSource of input C
8..12Input BSource of input B
12..16Input ASource of input A
16..18BiasBias value (see below)
18NegateSign is -1 if set, +1 otherwise
19ClampEnables clamping
20..22ScaleScale value (see below)
22..24OutDestination register of output value (see below)

Bias value selection:

CodeValue
0b000
0b01+0.5
0b10-0.5
0b11Reserved

Scale value selection:

CodeValue
0b001
0b012
0b104
0b110.5

Destination register selection:

CodeValue
0b00R3
0b01R0
0b10R1
0b11R2

TEV Stage Alpha

This is the format of the GX registers for TEV stages alpha configuration:

BitsNameDescription
0..2Rasterizer swapIndex of swap table to use for rasterizer inputs
2..4Texture swapIndex of swap table to use for texture inputs
4..7Input DSource of input D
7..10Input CSource of input C
10..13Input BSource of input B
13..16Input ASource of input A
16..18BiasBias value (see color format)
18NegateSign is -1 if set, +1 otherwise
19ClampEnables clamping
20..22ScaleScale value (see color format)
22..24OutDestination register of output value (see color format)

Texture Coordinate Generator (TC)

The texture coordinate generator (TC) is responsible for generating texture coordinates to be used by the TEV during texture sampling. The process of generating texture coordinates is called texgen.

There are 8 texture coordinate channels, each configured separatedly.

Video Interface (VI)

The video interface is responsible for outputing a video signal to a display from a framebuffer in main RAM, named the eXternal FrameBuffer (XFB).

For a high level overview on the topic of analog video, take a look at the analog video page.

Registers

Note

Lengths are in samples unless stated otherwise.

Work in Progress

Not all registers are described here yet

VI Vertical Timing (0x0C00_2000, 2 bytes)

This register configures the vertical timing properties of the video signal.

BitsNameDescription
0..4Equalization Pulse LengthDouble the length of the EQU pulse in half-lines
4..14Active Video LengthLength of the active video section in half-lines

VI Display Config (0x0C00_2002, 4 bytes)

BitsNameDescription
0EnableEnable video timing generation and data requests
1ResetClears all requests and puts the interface into an idle state
2ProgressiveWhether progressive video mode is enabled (interlaced otherwise)
3StereoscopicWhether the 3D stereoscopic effect is enabled1
4..6Display Latch 0 Mode
6..8Display Latch 1 Mode
8..10Video FormatCurrent video format (0 = NTSC, 1 = Pal50, 2 = Pal60, 3 = Debug)

VI Horizontal Timing 0 (0x0C00_2004, 4 bytes)

This register configures part of the horizontal timing properties of the video signal.

BitsNameDescription
0..7Sync LengthLength of the HSync pulse
7..17Sync Start to Blank End LengthLength of interval between HSync start and HBlank end
17..27Half-line to Blank Start LengthLength of interval between the half-line and HBlank start

VI Horizontal Timing 1 (0x0C00_2008, 4 bytes)

This register configures part of the horizontal timing properties of the video signal.

BitsNameDescription
0..9Half-line lengthLength of a half-line
16..23Sync Start to Color Burst EndLength of interval between HSync start and Color Burst end
24..31Sync Start to Color Burst StartLength of interval between HSync start and Color Burst start

VI Odd Field Vertical Timing (0x0C00_200C, 4 bytes)

This register configures the vertical timing properties of the video signal of the odd field.

BitsNameDescription
0..10Pre-blanking lengthLength of the pre-blanking period, in half-lines
16..26Post-blanking lengthLength of the pre-blanking period, in half-lines

VI Even Field Vertical Timing (0x0C00_2010, 4 bytes)

This register configures the vertical timing properties of the video signal of the even field.

Same bits as VI Odd Field Vertical Timing.

VI Top Field Base Register (0x0C00_201C, 4 bytes)

This register configures the address of the XFB for the odd field.

BitsNameDescription
0..24BaseBits 0..24 of the XFB address in physical memory
24..28Horizontal Offset
28ShiftWhether to shift the base address right by 5

VI Bottom Field Base Register (0x0C00_2024, 4 bytes)

This register configures the address of the XFB for the even field.

BitsNameDescription
9..24BaseBits 9..24 of the XFB address in physical memory
28ShiftWhether to shift the base address right by 5

VI Current Line (Vertical Counter) (0x0C00_202C, 2 bytes)

This register contains a counter for the current line in the frame. It starts at 1 and increases every HSync, up to the number of lines per frame. It is reset at the start of a new frame.

Note

In interlaced mode, a frame is composed of the two fields.

VI Current Sample (Horizontal Counter) (0x0C00_202E, 2 bytes)

This register contains a counter for the current sample in the line. It starts at 1 and increases every sample, up to the number of samples per frame. It is reset at the start of a new line.

VI Display Interrupt 0 (0x0C00_2030, 4 bytes)

This register configures the VI interrupt 0.

BitsNameDescription
0..9Horizontal TargetTarget value for the current sample
16..25Vertical TargetTarget value for the current line
28MaskWhether this interrupt is enabled
31StatusWhether this interrupt is active

Warning

The value to write in order to acknowledge the interrupt is 0 instead of the usual 1.

VI Display Interrupt 1 (0x0C00_2034, 4 bytes)

This register configures the VI interrupt 1.

Same bits as VI Display Interrupt 0.

VI Display Interrupt 2 (0x0C00_2038, 4 bytes)

This register configures the VI interrupt 2.

Same bits as VI Display Interrupt 0.

VI Display Interrupt 3 (0x0C00_203C, 4 bytes)

This register configures the VI interrupt 3.

Same bits as VI Display Interrupt 0.


  1. Nothing uses this.

Analog Video

Warning

This page is a very high level overview of analog video, and I’m not very knowledgeable on the topic. Take everything here with a grain of salt.

For more information, consult the analog video resources.

Analog video works by scanning an image through a screen from left to right, top to bottom. In order to produce the correct image, the scanning beam must change it’s “color” according to it’s position, which is done by timing parts of the scanning process.

The scanning beam is controlled by an analog video signal, which can be thought of as being composed of three component signals:

  • Color signal, which contains the actual color information (called samples)
  • Horizontal synchronization (HSync) signal, which tells the screen to take the scanning beam back to the start of the next scan line
  • Vertical synchronization (VSync) signal, which tells the screen to take the scanning beam back to the top of the screen so it can start scanning out a new image

There’s two ways to scan an image out: progressive and interlaced. First, we’ll take a look at progressive video, since it’s simpler to understand and works as a base for interlaced.

Progressive Video

In progressive video, the whole image is scanned out at once. The video signal starts scanning out a line of samples (i.e. a row of image data), which goes roughly like this:

                              │────── H. Back Porch ──────│

  │───── H. Front Porch ──────│

  │───────────────── Horizontal Blank ────────────────────│

Color:
1                                   ╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷       ╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷
0 ──────────────────────────────────┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴───────┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴
                                      Color Burst                 Active Video
HSync:
1         ┌───────────────────┐
0 ────────┘                   └───────────────────────────────────────────────────────

VSync:
1
0 ────────────────────────────────────────────────────────────────────────────────────

This diagram represents what the video signal is emitting and what section these parts belong to. Note that this diagram is cyclic (i.e. it repeats once it reaches the end).

The signal starts out by emitting a HSync pulse, telling the screen to move the beam to the start of the next line. Then, there’s a burst of color information (the color burst), used to “calibrate” the color level of the screen. Finally, the actual color information is sent (active video).

The process can be split into three sections:

  • Front Porch: Period between end of active video and the end of HSync pulse.
  • Back Porch: Period between end of HSync pulse and start of active video. This section exists to give time for the beam to move back to the start of the line.
  • Horizontal Blank: This is any period outside of the active video signal.

This process repeats for each scan line until the last one, where things go a little differently: there’s no next line to scan out - the beam needs to go back to the top, where the scanning process can restart. This is when VSync comes into play:

       │──────────────────────── Vertical Blank ──────────────────────────│

Color:
1 ╷╷╷╷╷╷                                                             ╷╷╷  ╷╷╷╷╷╷
0 ┴┴┴┴┴┴─────────────────────────────────────────────────────────────┴┴┴──┴┴┴┴┴┴──────
    AV                                                                B     AV

HSync:
1         ┌───────┐               ┌───────┐               ┌───────┐               ┌───
0 ────────┘       └───────────────┘       └───────────────┘       └───────────────┘

VSync:
1         ┌───────────────────────────────────────────────────────┐
0 ────────┘                                                       └───────────────────

Similarly to the first diagram, this diagram represents what the video signal is emitting when it reaches the end of a scan.

The signal emits a VSync pulse, telling the screen to move the beam back to the top of the screen. The VSync lasts for a few scan lines, where nothing is actually scanned out.

The Vertical Blank is the period between the end of the last visible line in a scan and the start of first visible line in the next scan.

If we zoom out and take a look at a whole scan, it would look something like this:

Color:
1 ╷╷╷╷╷╷╷╷        ╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷╷
0 ┴┴┴┴┴┴┴┴────────┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴┴

HSync:
1 ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷  ╷
0 ┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──

VSync:
1         ┌───────┐
0 ────────┘       └───────────────────────────────────────────────────────────────────

Interlaced Video

Interlaced video is like progressive video, except it splits the scan (i.e. the image) into two parts with alternating scan lines, called fields.

The odd field scans out odd lines, while the even field scans out even lines. Together, these fields make up a whole scan.

Processor Interface (PI)

The processor interface is responsible for connecting the CPU to the Flipper.

Flipper Interrupts

The PI receives interrupt requests from Flipper and passes them along to the CPU.

FIFO

Within the PI exists a FIFO mechanism that catches burst writes1 to 0x0C00_8000 and writes them to a ring buffer in memory. It’s controlled by three registers: FIFO Start, FIFO End and FIFO Current.

Registers

PI Interrupt Cause (0x0C00_3000, 4 bytes)

Contains which external interrupt causes are active. This register is read-only: in order to acknowledge an interrupt, you must write to the controlling status bit of the interrupt specific register.

BitsNameDescription
0GP ErrorGraphics Processor runtime error
1ResetReset switch was pressed
2DVDIDVD interface
3SISerial interface
4EXIExternal Interface
5AIAudio Interface
6DSPIDSP Interface
7MEMMemory Interface
8VIVideo Interface
9PE TokenToken assertion in command list
10PE FinishFrame ready
11CPCommand Processor FIFO
12DebugExternal Debugger
13HSPHigh Speed Port
14..16Reserved
16Reset State
17..32Reserved

PI Interrupt Mask (0x0C00_3004, 4 bytes)

Masks PI interrupts, describing which ones are allowed to be raised.

BitsNameDescription
0GP ErrorGraphics Processor runtime error
1ResetReset switch was pressed
2DVDIDVD interface
3SISerial interface
4EXIExternal Interface
5AIAudio Interface
6DSPIDSP Interface
7MEMMemory Interface
8VIVideo Interface
9PE TokenToken assertion in command list
10PE FinishFrame ready
11CPCommand Fifo
12DebugExternal Debugger
13HSPHigh Speed Port

PI FIFO Start (0x0C00_300C, 4 bytes)

BitsNameDescription
0..5Zeroed
5..27StartThe start address of the ring buffer
27..32Unknown

PI FIFO End (0x0C00_3010, 4 bytes)

BitsNameDescription
0..5Zeroed
5..27EndThe end address of the ring buffer (exclusive)
27..32Unknown

PI FIFO Current (0x0C00_3014, 4 bytes)

BitsNameDescription
0..5Zeroed
5..27CurrentThe current address for writing the next 32 bytes of data
27WrappedWhether the current address reached the end and wrapped around to the start
28..32Unknown

Wrapped is cleared only on CPU writes to the register.


  1. These burst writes are executed by the Write Gather Pipe.

DSP Interface (DSPI)

The DSP interface controls the DSP in the Flipper and allows the CPU to communicate with it through mailboxes and DMAs.

Registers

Mailbox registers

The mailbox registers are used to send and receive small 31 bit messages to/from the DSP. Software often interacts with mailboxes as if they were split into two 2 byte parts (high and low).

DSP Mailbox (0x0C00_5000, 4 bytes)

Mailbox used to send data from the CPU to the DSP. High part is always written first, which probably means a transfer only starts when both parts are written to (otherwise the DSP could read partial data).

BitsNameDescription
0..31Data[W] Send data to the DSP mailbox
31Status[R] Whether the transfer is ongoing

Note

The status bit is set automatically on write.

CPU Mailbox (0x0C00_5004, 4 bytes)

Mailbox used to receive data from the DSP.

BitsNameDescription
0..31Data[R] Receive data from the DSP
31Status[R] Whether there’s new data from the DSP

Note

The status bit is cleared automatically on read.

DSP Control (0x0C00_500A, 2 bytes)

Controls the DSP and also contains some ARAM and AI related bits.

BitsNameDescription
0Reset[W] Reset DSP
[R] Whether DSP is resetting
1Interrupt[W] Assert DSP PI interrupt
2Halt[W] Halt (1) or unhalt (0) DSP
[R] Whether DSP is halted
3AI interrupt status[W] Clear if set
[R] Whether an AI interrupt is pending
4AI interrupt maskAI interrupt assertion allowed
5ARAM interrupt status[W] Clear if set
[R] Whether an ARAM interrupt is pending
6ARAM interrupt maskARAM interrupt assertion allowed
7DSP interrupt status[W] Clear if set
[R] Whether a DSP interrupt is pending
8DSP interrupt maskDSP interrupt assertion allowed
9DSP DMA status[R] Whether ARAM DMA is ongoing
10Unknown
11Also seems to be used as a reset bit

The ARAM interrupt is raised whenever the ARAM DMA completes.

ARAM Size (0x0C00_5012, 2 bytes)

ARAM Mode (0x0C00_5016, 2 bytes)

ARAM Refresh (0x0C00_501A, 2 bytes)

ARAM DMA RAM Address (0x0C00_5020, 4 bytes)

Contains the physical address of the DMA transfer in main RAM.

ARAM DMA ARAM Address (0x0C00_5024, 4 bytes)

Contains the address of the DMA transfer in ARAM.

ARAM DMA Control (0x0C00_5028, 2 bytes)

Controls the ARAM DMA.

BitsNameDescription
0..15LengthLength of the transfer in words
15DirectionWhether to read from ARAM instead of writing

Writing to this register triggers a DMA transfer if the length field is non-zero.

ABI

Software appears to use the PowerPC embedded ABI (or simply EABI), which is a modified version of the PowerPC ABI supplement to System V (described in this document) designed for embedded systems.

The EABI is described in detail in this document, but this page will give an overview over the most important parts of it.

Registers

Registers have three kinds:

  • Volatile: These registers are general purpose and do not need to be preserved by routines.
  • Preserved: These registers are general purpose and must be preserved by routines.
  • Dedicated: These registers have a special purpose in the ABI, and must also be preserved by routines.

General purpose registers

RegisterKindPurpose
R00VolatileLanguage specific
R01DedicatedStack pointer
R02DedicatedRead only small data base address
R03VolatileFirst parameter, first return value
R04VolatileSecond parameter, second return value
R05..R11VolatileParameters after R3 & R4
R11..R13VolatileTemporaries
R14..PreservedGeneral

Floating point registers

RegisterKindPurpose
F00VolatileLanguage specific
F01VolatileFirst parameter, first (and only) float return value
F02..F09VolatileParameters after F01
F09..F14VolatileTemporaries
F14..PreservedGeneral

Condition register fields

All fields in the condition register are volatile except for CR2..CR5, which are preserved.

Other registers

All other registers are volatile.

Stack Frames

The stack pointer points to the lowest word of the current stack frame (i.e. the stack grows down). Stack frames are always aligned on double words (8 bytes). Their format is as follows:

AddressSizeDescription
sp4Previous sp
sp + 44Return address
sp + 8VariableRoutine specific

Registers are callee saved. As a side effect, if a routine is a leaf and does not need the stack or any preserved registers, it can skip creating a stack frame.

Resources

This page is a collection of links that might be referenced through the book or that otherwise are useful as additional material. No particular order.

Documentation

YAGCD - Yet Another GameCube Documentation

Documentation on all aspects of the GameCube.

https://www.gc-forever.com/yagcd

IBM PowerPC 750CL User Manual

Manual for the commercial variant of the GameCube’s CPU.

Note

GameCube’s CPU, the PowerPC Gekko, has it’s own manual. It is, however, marked as “confidential”. That doesn’t make it any harder to find with a simple google search - it’s confidentiality is up to debate.

https://fail0verflow.com/media/files/ppc_750cl.pdf

PowerPC Programming Environments Manual

Manual that describes common features of the PowerPC architecture in detail. CPU manuals often refer to it instead of explaining stuff themselves.

http://refspecs.linux-foundation.org/PPC_hrm.2005mar31.pdf

GameCube Architecture - A pratical analysis

An overview of the GameCube’s architecture. A great starting point for getting to know the system.

https://www.copetti.org/writings/consoles/gamecube/

Dolwin Docs

Collection of docs shared by the developers of the Dolwin emulator.

https://github.com/ogamespec/dolwin-docs/tree/master

libogc API documentation

Documentation of devkitPro’s libogc

https://libogc.devkitpro.org/api_doc.html

Dolphin Emulator

The most comprehensive GameCube/Wii emulator available.

DenSinH’s GameCube Resources

https://github.com/DenSinH/GameCubeResources

Nintendo GameCube device tree - Linux Kernel Docs

https://www.kernel.org/doc/Documentation/devicetree/bindings/powerpc/nintendo/gamecube.txt

PowerPC ABI supplement to System V ABI

http://math-atlas.sourceforge.net/devel/assembly/elfspec_ppc.pdf

Developing PowerPC Embedded Application Binary Interface (EABI) Compliant Programs

http://class.ece.iastate.edu/arun/Cpre381_Sp06/lab/labw12a/eabi_app.pdf

GameCube Flipper semidocumentation

Small page about working with GX.

http://www.amnoid.de/gc/tev.html

Making a Wii game in 2024

Mostly talks about working with GX.

https://blog.allpurposem.at/making-a-wii-game-in-2024

Analog Video

Resources on analog video signals, useful for understanding the video interface.

https://www.sciencedirect.com/topics/engineering/composite-video-signal
https://www.analog.com/en/resources/technical-articles/basics-of-analog-video.html
https://www.analog.com/en/resources/technical-articles/understanding-analog-video-signals.html

Tests, Examples and Demos

Dolphin’s Hardware Test Suite

https://github.com/dolphin-emu/hwtests

devkitPro’s GameCube Examples

https://github.com/devkitPro/gamecube-examples

panda.dol

Extremely simple program, probably the best first thing an emulator should try to run!

https://github.com/sliice-emu/test-roms/blob/main/panda/source/panda.asm