Category Archives: FPGA

Stack Computers, Forth, RISC, CISC, etc.

I finished reading Stack Computers. This class of processors is a bit different from the mainstream today — neither is it like the classic RISC load/store architectures (MIPS, SPARC, PPC, etc…), nor is it like the surviving (x86) or dead (VAX) CISC architectures. In a way, it’s slightly reminiscent of the high-level language architectures of the 1970s which never even saw commercial application. They were processors designed intended to implement much of the semantics of a high-level language in hardware — quite the opposite of the RISC approach. (The difference being that RISC architectures provide a simple, orthogonal set of operations and a large register file to make it possible for modern compilers to compile high level languages with high performance results, while these high-level language architectures aimed to make the task of compilation for a particular language trivial.)

The analogy here is that most Stack Computer architectures (at the very least) have a close match to the execution model of the Forth languages, and often have opcodes which directly correspond to some common Forth primitives. However, I think the reason stack architectures are not entirely without commercial success is that most of those forth primitives are very low-level operations — it doesn’t require a lot of silicon to implement one of these stack CPUs, and it can be a very reasonable choice for an embedded system.

One downside of some of these stack CPUs is that while they all provide at least two hardware stacks (data stack and return stack), some of them do not provide a high-performance means of loading and storing from a base-pointer-plus-offset, making them poor targets for the C language since it generally accesses function arguments and local variables that way. (A few do extend the processor with enough operations to make them a decent target for C and similar languages.)

Besides reading that, I’ve been slowly trying to beat Forth into my head — the basics are incredibly simple, but doing serious projects in it requires a change of perspective. It seems that there are a number of techniques people have developed to provide abstraction and work with particular problems using domain-specific “little languages”, but it still seems like one unavoidable part of Forth is that it’s essentially untyped.

I’m still aiming to develop a small computer in an FPGA with Forth as the OS and programming language. Not for practical reasons, exactly, but as a way to learn two fairly big things at once.

First FPGA project ideas

I decided recently that I want to really learn computer hardware design. Not just the theoretical “I know how the machine works” understanding, but actually having done it. So, the obvious thing was to get an FPGA and design stuff using it. fpga4fun was some of the inspiration for this. I ended up ordering one of their Cyclone II boards after deciding that I was going to play with the Altera stuff (Xilinx has comparable parts. For more sources of FPGA dev boards, see Sparkfun(only one), Digilent, and XESS.) The Altera software (Quartus II) seemed a little bit less obtuse. Otherwise I can’t say I found myself strongly in favor of one or the other. As for Verilog versus VHDL, I’m learning Verilog because it’s more readable, and more commercially popular (at least in the USA). Once I understand what I’m doing, I’m sure I can learn VHDL later.

Over the weekend, I’d been playing with the free version of Altera’s Quartus II software, writing a bit of Verilog and seeing what it compiles into. The Cyclone II EP2C5 is a reasonably capable part — 4608 “Logic Elements” (each is a 4-input lookup table — 4 bits in, 1 bit out, any truth table), carry chain logic, a register on the output, various stuff to connect it locally, across the chip, and to the clock distribution network), 119808 bits of ram in “block ram” scattered about the chip, 13 18×18 multipliers, and 2 PLLs for clock generation.

So, I got to thinking that a suitably hard project is a modern reimplementation of the 1980s home computer / game console. This would consist of a number of subprojects:

  • a CPU — probably early MIPS, R2000/R3000 or something. a subset of MIPS32. Not too hard to understand, I still have the Patterson and Hennessy textbook that gave a pretty nice introduction to it. Also, existing GCC support.
  • SDRAM controller — having looked at a datasheet, supporting a single 4Mx16 SDRAM chip shouldn’t be that hard.
  • Display controller. Low-color-depth VGA wouldn’t be too hard, but then I’d constantly have to hook a VGA monitor up, etc. A more interesting choice seems to be to support one these Sharp LCDs
  • SPI interface to an SD card. Have the first stage bootloader in the FPGA block ram load code from here into the SDRAM.
  • USB low-speed host controller, really only supporting keyboard/mouse/joystick. Another challenging piece, but not that bad
  • Serial port for debugging software.
  • Sound controller. Maybe an all-digital implementation of the SID in a C-64, or maybe the much simpler sound chip found in the TI 99/4a, IBM PC Jr., Sega Master System, etc.

So far, I’ve written Verilog for parts of a 3 voice + 1 noise sound chip (the simple one above), and I’ve implemented a register file and an ALU. Most importantly, I figured out how to use the right idioms to make the register file out of block ram, instead of consuming a quarter of all the logic elements on the chip.

But, wow, that’s a big first project. While I can design and test it piece by piece, I think it might be more fun to start out with something simple: A 16 bit stack-based CPU, interfaced to the on-board block ram (only about 13K bytes), with a serial console. I’ll have to cobble together some sort of assembler, but that’s not that hard. Especially when the instruction set is simple. I’ve been reading Stack Computers for inspiration.

Another possibility, instead of a stack CPU is some kind of stripped-down mips-like 16 bit architecture. You get into some compromises with 16 bit instructions if you want a reasonable number of instructions and the usual 3-operand instruction format, though.

(For more links, see my