The Stacker-16 Homebrew CPU Project

by commodorejohn


Quite a few years ago, I discovered Bill Buzbee's Magic-1 homebrew computer while browsing around the vintage-computer web. I was instantly hooked on the idea of constructing my own discrete-logic computer with my own bespoke architecture, because I'm weird like that (but evidently not alone, if the Homebrew CPU webring is any indication.) Life and a million other hobby projects (and my lack of knowledge in the field of low-level electronics) got in the way of seriously pursuing this at the time - but that didn't stop me from doing a lot of idle "sketching" of ideas for such a project in the interim.

I toyed with a number of different core concepts over the years, but one that I kept coming back to was stack machines - mostly for the novelty factor (outside of virtual-machine interpreters and a few specialty embedded microprocessors, largely the domain of a few vintage mainframe/minicomputer architectures,) but also because it greatly simplifies instruction decoding (since all the arguments are located in fixed positions relative to the stack pointer.)

Even that basic idea went through a lot of re-thinking before I finally got down to something approaching satisfactory, though; the initial attempts were all much too high-level to be doable in hardware, by an amateur. It wasn't until I happened to start dabbling with assembly language on the PDP-8 and reading up on the DG Nova that I came to understand how many common operations in fancier architectures can be synthesized from simpler ones. This was the last key "piece" I needed in deciding on a direction; a simpler design is easier to implement, but can still be reasonably feature-complete.

(Case in point: there is no subtract instruction on the PDP-8. This is because it already has instructions for ones-complement and increment - which, combined, give the twos-complement a.k.a. negation - and addition, and adding the negative of a number is the same thing as subtracting it. Convoluted? A tad. But does it simplify the hardware? You bet it does.)

After a couple more re-thinks and revisions, I've finally arrived at what I consider a reasonably complete and satisfactory basic architecture. There's a long way to go yet before this thing lives and breathes in hardware, but I want to put this up for public perusal anyway as a sort of journal and project documentation. I'm already much of the way through developing a simulator which will allow me to try it out as a platform for coding; once I've gotten that done, I'm going to need to do a lot of brushing up on the topic of low-level electronics, as well as a bit more on the nitty-gritty of ground-up computer architecture. But for now, I'm going to document the architecture as it appears to the programmer.

Architecture Overview

Stacker-16 (as I'm calling it unless I think of a better name) is, as you might guess, a 16-bit stack-oriented design. It is a Von Neumann architecture (instructions and data occupy the same address space) and memory is byte-addressed, little-endian; all instructions are word-sized. I/O devices are memory-mapped.

The computer maintains two stacks: a data stack (for general operations) and a call stack (for return addresses and temporary storage.) Stacks "grow" downwards in memory (new items are pushed to the address below the current item.) All primary operations affect the top three values on the data stack. These are referred to, in order, as top-of-stack (TOS,) next-on-stack (NOS,) and next-next-on-stack (NNS.) Top-of-call-stack (TCS) is also relevant to the stack/control group. These are not true registers, being kept in memory, but might be transparently cached for speed, depending on the implementation.

The true registers in the computer are the instruction pointer (IP,) the data-stack pointer (DSP, or just SP,) the call-stack pointer (CSP,) the status register/interrupt mask (MSK,) and the carry bit (CB.) CB is treated as the least-significant bit of MSK for state save/restore purposes, and as the most-significant 17th bit of TOS for arithmetic/shift purposes. The stack pointers point at the top item directly.

The instruction set has a few peculiarities that I think will work for programming in a stack-oriented way. First off, it's designed with an eye towards subroutine-threaded code, as seen in some Forth dialects. All even opcodes are calls to the subroutine whose address is the opcode value - meaning that most higher-level programs will consist largely of strings of bare subroutine addresses, which makes for compact code and easy disassembly/tracing.

Beyond the subroutine-call group, there are four primary opcode groups: the ALU-operations group, the stack/control-operations group, the load/store group, and the branch group. These are identified by the most-significant bit and the least-significant three bits - i.e. the first and last octal digits - and the instruction parameters are contained in the remaining opcode bits. Primary number representation for opcodes and addresses is therefore octal, though it's not a religious requirement.

Both the ALU group and the stack/control group are what the PDP-8 literature called "microcoded" - this being their peculiar way of saying that any given opcode encoded a sequence of simple "micro-instructions" to do (or not do) based on the values of specific bitfields in the opcode. Here, this means that the parameter fields for these instructions consist of individual fields of 1-4 bits which specify the component operations of a larger instruction. These are (almost without exception) executed in-order, from least-significant to most-significant bit, and the combination of available simple operations yields a variety of useful "complex" operations.

(Not very.)

Both the load/store group and the branch group include address offsets in their parameter fields. Branch offsets are always signed word displacements, and are twelve bits (four octal digits) long. Load/store offsets may be byte- or word-sized depending on the size of the operation, and may be signed or unsigned depending on the addressing mode. They are nine bits (three octal digits) long.

The load/store group has four addressing modes: zero-based, DSP-relative, IP-relative, and TOS-relative. The first two use unsigned offsets; the last two use signed. Byte values are loaded to and stored from the LSB of a stack word, and are zero-extended. There is no immediate or absolute addressing, so constants and fixed addresses will need to be stored in "zero page" or within offset range of the code. In an architecture geared towards subroutine threading and stack-oriented programming, I don't foresee this being too much of a limitation.

The full instruction set is as follows:

AAAAAAAAAAAAAAA0Subroutine-Call Group (-----0, -----2, -----4, -----6)
The only parameter is the subroutine address, which is the value of the opcode itself. The return address is saved to the call stack, IP is loaded from the opcode directly, and execution continues from the target address.
~JSRSave return address and jump to subroutine.
0JR<>AASCXP+~001ALU-Operations Group (0----1)
The primary operations group, containing all arithmetic/logic and shift/rotate operations, as well as operations for indirect jump/subroutine calls. TOS and another operand are "consumed" and the result left on top of the data stack.
1CMPOnes-complement (bitwise NOT) TOS. Does not affect CB.
1INCIncrement TOS. Does not affect CB.
1PCPPreserve carry if positive; skips the CLC and CMC operations if the MSB of TOS is clear.
1CLCClear CB.
1CMCComplement CB.
0NOSUse NOS for ALU operand B.
1-1Use $FFFF for ALU operand B.
00ADDAdd operand B to TOS.
01ANDAND operand B with TOS.
10OROR operand B with TOS.
11XORXOR operand B with TOS.
00Don't rotate.
01RORRotate TOS/CB right.
10ROLRotate TOS/CB left.
11BSWExchange the MSB and LSB of TOS. Does not affect CB.
1SRASave the address of the next instruction to the call stack.
1JMPJump to @TOS.
0X+-<>MPISCBA011Stack/Control-Operations Group (0----3)
The secondary operations group, containing operations for manipulating the top three stack items, preserving/altering CPU registers, transferring items between the two stacks, and executing software interrupts.
1SWNExchange TOS and NOS.
1SNNExchange TOS and NNS.
1XSPExchange TOS and CSP.
10RTSRestore IP from the call stack.
11SRASave the address of the next instruction to the call stack.
10RSPRestore DSP from the call stack.
11SSPSave DSP to the call stack.
10RMCRestore MSK/CB from the call stack.
11SMCSave MSK/CB to the call stack.
00No stack transfer.
01CTDMove TCS to the data stack.
10DTCMove TOS to the call stack.
11XCDExchange TOS and TCS.
1DUPDuplicate TOS.
111BRKSoftware interrupt
Register-restore operations occur in reverse order, so that the CPU state is saved and restored correctly. For security purposes, BRK sets these bits.
0IIWDDDDDDDDD1S1Load/Store Group (0----5, 0----7)
The memory-access group, containing instructions for loading values to and saving values from the stack. Memory addresses are formed from an index value and a displacement. Unaligned word accesses are illegal.
00LDBPush the byte at the address to the data stack.
01STB"Consume" TOS and store the LSB to the address.
10LDWPush the word at the address to the data stack.
11STW"Consume" TOS and store to the address.
~Displacement, added to the index value to form the final address. Signed for IP-relative and TOS-relative, unsigned for zero-indexed and stack-indexed. Scaled to match the operation size.
00@ZZero-indexed addressing.
01@SPStack-indexed addressing.
10@IPIP-relative addressing.
11@TOSTOS-relative addressing.
TOS is "consumed" to form the address in this mode; store instructions "consume" the value in NOS instead.
1DDDDDDDDDDDDCC1Branch Group (1----1, 1----3, 1----5, 1----7)
The relative-branch instructions, conditional and unconditional. Conditional branches "consume" TOS and branch based on its value.
00BEQBranch if TOS is zero.
01BPLBranch if TOS is positive (non-zero, non-negative.)
10BMIBranch if TOS is negative (MSB set.)
11BRABranch unconditionally.
~Branch displacement. Signed word offset; added to the address of the next instruction.

Synthetic Instructions

In addition to the core instruction set, the "microcoded" groups can give a variety of additional operations that are really the combination of basic operations. Care has to be taken with these; combining two synthetic instructions may yield an opcode that is perfectly legal, but doesn't behave as expected, and operations may not be combined across groups. Actually understanding why the synthetic operations do what they do is important. The "official" list of named synthetic instructions is as follows:

0JR<>AASCXP+~001ALU-Operations Group (0----1)
11NEGOnes-complement and increment TOS, negating it.
11STCClear and complement CB, setting it.
0011SUBNegate TOS and add it to operand B, subtracting it.
11111DECComplement TOS, increment it, and complement it again, decrementing it. Like INC, does not affect C.
011NAOAND TOS with 0xFFFF, leaving it effectively untouched.
101SETOR TOS with 0xFFFF, setting it to all-ones.
011SHRClear CB and rotate TOS right, shifting it.
101SHLClear CB and rotate TOS left, shifting it.
01111ASRSet the carry bit to match the MSB of TOS and rotate it right, preserving the sign.
11JSRSave the return address and jump to the subroutine at @TOS.
0X+-<>MPISCBA011Stack/Control-Operations Group (0----3)
11ROTSwap TOS and NOS, then swap TOS and NNS, cycling NNS to the top of the stack.
1110RTIRestore CPU state and return from interrupt.
1010RTTRestore MSK and return from trap.

Modes, Traps, and Interrupts

At any given time, the CPU is running in one of three modes: user, supervisor, or interrupt. Four levels of interrupt priority are supported. The current mode is controlled/indicated by the upper bits of MSK, which is formatted like so:

4321Interrupt-mask bits.
II/O-enable bit (IB.)
SSupervisor bit (SB.)
CCarry bit (CB.)

The supervisor bit is set in all but user mode. It does little without extended addressing/memory protection, which isn't defined yet. However, it does prevent the privileged bits of MSK (currently all of them, except for CB) from being altered in user mode; an RMC instruction will not affect them unless SB is currently set. Additionally, the BRK micro-operation forces an SMC, so that it's not possible to "lie" to a trap-service routine by pushing a return address and a false value for MSK with the supervisor bit set, and thereby gain illicit supervisor privileges when the service routine returns.

The I/O-enable bit also relates to memory mapping; the CPU sees the I/O area in the upper half of zero page (@001000-@001776) when set, and RAM there when clear. All stack and instruction-stream operations (direct stack access, instruction fetch, stack-based and IP-relative addressing) treat IB as clear, regardless of its value; this prevents the I/O area from being unintentionally clobbered by traps or interrupts, which would cause Bad Things to happen. Programs should still keep the stack pointers out of this area; but interrupt-service routines should avoid using fancy stack addressing (i.e. saving DSP and doing operations on it, then using TOS-relative addressing) just to be on the safe side.

There are three ways for mode switches to happen: a software-triggered trap, a hardware-triggered interrupt, or an externally-triggered reset. Interrupts are higher-priority than traps, which have no priority as such; reset is the highest-priority and cannot be masked. The sequence for each is slightly different:

1*Save IP to the call stack.
2*Save DSP to the call stack.
3**Save MSK to the call stack.
4**Mask out current and lower-priority interrupts.
5***Set SB.
6**Set IB.
7*Load DSP from the interrupt-vector table.
8***Load IP from the interrupt-vector table.
BRK does not force this step; you can leave a different return address on the call stack if desired. ‡ Only occurs when going from user/supervisor mode to interrupt mode.

The interrupt-vector table is located at the top of memory, and holds the addresses of the handlers for interrupts and traps, as well as the address of a temporary data stack for use by interrupt-service routines. It is laid out as follows:

@177776IRQ4 vector.
@177774IRQ3 vector.
@177772IRQ2 vector.
@177770IRQ1 vector.
@177766Reset vector.
@177764Illegal-operation trap vector.
@177762Software-interrupt trap vector.
@177760ISR stack pointer.

To-Do List

There's a lot that remains to be done before this thing actually takes a tangible form. As mentioned, I have a reasonable understanding of digital logic in general terms, but I need to read up on some low-level computer-architecture topics like state sequencing, and take the time to properly wrap my head around low-level electronics in general.

I also need to sit down and hash out the constraints and hazards of the sequence-of-operations - what needs to happen when and what can or can't be deferred in order for correct operation to be assured. This will be important if I want to try for any kind of pipelining or caching the quasi-register stack items in hardware registers. I may or may not actually do that (no discrete-logic design is going to be fast by modern standards in any case,) but it'd be good to know.

Some details of the architecture remain to be defined, particularly how extended memory addressing and memory protection will work. These are less immediately critical design decisions, but will need to be hashed out at some point.

I also need to try writing some test programs to see how comfortable the instruction set actually is to code for. This is something I'll focus on after getting the simulator to a reasonably complete state.

The simulator itself should be functionally complete as far as the backend goes, but hasn't been tested yet as the user interface is still in development. This should hopefully be complete soon, and testing can begin.

At some point, I'll have to hash out the specs and memory mapping for peripheral components (UART, mass-storage, etc.) The simulator uses a generic set of emulated peripherals at present which don't really correspond to real-world parts; this works for basic testing purposes, but in the hardware world I plan to take advantage of standard parts for these rather than try to do everything myself. This can come after all the more critical hardware-design work is done, though; the architecture itself is agnostic. Do have to make sure to add a dual-channel DAC so that I can hook up my X/Y display and play Spacewar! on it, though ;)

Hosting for this site is provided by

The SDF Public Access UNIX System