7. Design, Test and Integration

Designing a Custom Component

In this section, we will design a simple hardware accelerator that treats its 64-bit values as vectors of eight 8-bit values. It takes two 64-bit vectors, adds them, and returns the resulting 64-bit sum vector. This accelerator will sit on a Rocket Tile and communicate through the RoCC interface.

The purpose of this section of the lab is to teach how to integrate a custom hardware component into Chipyard. All lines of code are provided to you. If you're interested in learning more about how to write Chisel, look at sections 1 through 5 of the Chisel Bootcamp.

RoCC Design

RoCC stands for Rocket Custom Coprocessor.
A block using the RoCC interface sits on a Rocket Tile.
Such a block uses custom non-standard instructions reserved in the RISC-V ISA encoding space.
It can use a variety of interfaces (ready-valid, HellaCacheIO, TileLink, etc) to communicate with the following:
- A core, such as BOOM or Rocket
- L1 D$
- Page Table Walker (available by default on a Rocket Tile)
- SystemBus, which can be used to communicate with the outer memory system, for instance

For more on RoCC, we encourage you to refer to:

Sections 6.5 and 6.6 of the Chipyard docs, and related examples
Bespoke Silicon Group's RoCC Doc V2

Here's an overview of the custom-acc-rocc directory inside $chipyard/generators/.

 custom-acc-rocc/
  baremetal_test/       <--------- (4) bare-metal functional tests
    functionalTest.c
  project/              <---------- project properties/settings
  src/                  <---------- source code
    main/               <---------- Chisel RTL
      scala/
        Configs.scala   <---------- (3) Config to include this accelerator
        CustomAccRoCC.scala <------ RoCC Scaffolding RTL
        VectorAdd.scala     <------ (1) Accelerator RTL 
    test/               <---------- Chisel tests
      scala/
        TestVectorAdd.scala <------ (2) Basic unit test
  target/               <---------- output from build system

Let us begin by inspecting src/main/scala/CustomAccRoCC.scala. LazyRoCC and LazyRoCCModuleImp are abstract classes that allows us to separate the implementation of a RoCC accelerator from the definition and implementation of the RoCC interface. CustomAcceleratorModule provides the implementation of our specific accelerator module by extending LazyRoCCModuleImp (remember object oriented programming?). For ease of understanding, we define all functionality in a module called VectorAdd, and wire up RoCC I/O signals to VectorAdd I/O signals.

LazyRoCC and LazyRoCCModuleImp are defined in $chipyard/generators/rocket-chip/src/main/scala/tile/LazyRoCC.scala. Notice the use of inheritance and types for the different classes that define the IO for RoCC accelerators.

Answer the following question:

7.1. What fields ofRoCCIO does our accelerator use? Hint: What class does RoCCIO extend?

Accelerator RTL

Let us now implement the accelerator in src/main/scala/VectorAdd.scala, as described above. Your task here is to fill in all blocks/lines marked /* YOUR CODE HERE */ using the lines of code provided in the dropdown below. Additional questions to consider:

What kinds of inputs/outputs does the VectorAdd module use? You should inspect the io field of the module for this.
Does this module use ready-valid interfaces for I/O? How many ready-valid interfaces, and in which directions?

In no particular order, here are the required lines of code:

sum_vec_wire(i) := in1_vec_wire(i) + in2_vec_wire(i)

cmd_bits_reg.rs2.asTypeOf(in2_vec_wire)

WireInit(VecInit(Seq.fill(8) {0.U(8.W)}))

cmd_bits_reg.inst.rd

Testing

The next step is testing the VectorAdd module to ensure it behaves as expected. There are two main ways to test your design:

using ChiselTest
baremetal functional testing: baremetal here refers to the fact that your tests directly run on the hardware, i.e., no OS underneath.

The former is more useful for fine-grained module-specific testing while the latter is more useful to test the accelerator as a whole, and its interactions with the rest of the SoC. Both kinds of tests will be run in RTL simulation.

We will unit test with ChiselTest right now, and come back to baremetal testing when integrating our accelerator with the rest of the SoC.

ChiselTest

ChiselTest is the batteries-included testing and formal verification library for Chisel-based RTL designs. It emphasizes tests that are lightweight (minimizes boilerplate code), easy to read and write (understandability), and compose (for better test code reuse). You can find the repo with a README guide here.

Let us now write a unit test using Chiseltest in src/test/scala/TestVectorAdd.scala.

VectorAddTest is our test class here, and "Basic Testcase" is the name of our only test case. A test case is defined inside a test() block, and takes the DUT as a parameter. There can be multiple test cases per test class, and we recommend one test class per Module being tested, and one test case per individual test.

Here, we will be using VCS as our simulator backend, and generate waveforms in an fsdb file.

Most simulation testing infrastructure is based on setting signals, advancing the clock, and checking signals, and asserting their values. ChiselTest does the same with poke, step, peek, and expect respectively.

Complete the unit test named "Basic Testcase" in TestVectorAdd.scala by filling in all lines marked /* YOUR CODE HERE */.

In no particular order, here are the required lines of code:

"h_0F_0D_0B_09_07_05_03_01".U

c.clock.step(1)

true.B

Before we run any tests, we must keep in mind that our RTL is written in Chisel whereas most simulator backends and VLSI tools expect Verilog/SystemVerilog. Thus, we compile our code from Chisel down to an Intermediate Representation (FIRRTL), and finally the relevant Verilog/System Verilog.

To compile the design and run our tests, we use the Scala Build Tool (sbt). $chipyard/build.sbt (in the root Chipyard directory) contains project settings, dependencies, and sub-project settings. Feel free to search for custom_acc_rocc to find the sub-project entry.

In a new terminal window inside the root Chipyard directory, run:

bsub -I -XF -q ee194 xterm

This will open up a terminal GUI on the compute machine that you have been given. Next, inside the terminal GUI, run

sbt

Give it a minute or so to launch the sbt console and load all settings.

In the sbt console, set the current project by running:

project custom_acc_rocc

To compile the design, run compile in the sbt console, as follows:

compile

This might take a while as it compiles all dependencies of the project.

To run all tests, run test in the sbt console, as follows:

test

Exit the sbt console with:

exit

(You can use testOnly <test names> to run specific ones.) Test outputs will be visible in the console. You can find waveforms and test files in $chipyard/test_run_dir/<test_name>.

Use verdi -ssf <fsdb file> to inspect the waveform at $chipyard/test_run_dir/VectorAdd_RoCC_Accelerator_should_add_two_vectors/VectorAdd.fsdb .

Please ensure your accelerator passes the basic test case before proceeding.

Integrating our Accelerator

Now that our accelerator works, it is time to incorporate it into an SoC. We do this by:

Defining a config fragment for our accelerator
Defining a new config that uses this config fragment

Inside $chipyard/generators/custom-acc-rocc, inspect src/main/scala/Configs.scala. WithCustomAccRoCC is our config fragment here.

Answer the following questions:

7.2. What does p do here? (Think about how it could be used, consider the object-oriented, generator-based style of writing, and feel free to look through other generators in Chipyard for examples.)

7.3. Give the 7-bit opcode used for instructions to our accelerator. Searching for the definition of OpcodeSet will be useful.

We want to add our accelerator to a simple SoC that uses Rocket. To do this, we must make our config fragment accessible inside the chipyard generator. Open $chipyard/build.sbt. At line 153, add custom_acc_rocc to the list of dependencies of the chipyard project.

Next, navigate to $chipyard/generators/chipyard/src/main/scala/config/RocketConfigs.scala. Define CustomAccRoCCConfig such that it adds our accelerator to RocketConfig. The previous step made custom_acc_rocc available as a package here.

Hint: CustomAccRoCCConfig should look like the following:

class CustomAccRoCCConfig extends Config(
  /* YOUR CODE HERE */
)

Baremetal Functional Testing

Inside $chipyard/generators/custom-acc-rocc, let us inspect baremetal_test/functionalTest.c. rocc.h contains definitions for different kinds of RoCC instructions and the custom opcodes. We use the same test case as before, but we test integration of the whole system as values are loaded into registers on the Rocket core, sent to the RoCC accelerator, and results from the accelerator are loaded into a register.

Since our accelerator reads two source registers and writes to one destination register, we use ROCC_INSTRUCTION_DSS.

Inline assembly instructions in C are invoked with the asm volatile command. Before the first instruction, and after each RoCC instruction, the fence command is invoked. This ensures that all previous memory accesses will complete before executing subsequent instructions, and is required to avoid mishaps as the Rocket core and coprocessor pass data back and forth through the shared data cache. (The processor uses the “busy” bit from your accelerator to know when to clear the fence.) A fence command is not strictly required after each custom instruction, but it must stand between any use of shared data by the two subsystems.

While one can compute results for each test case a priori, and test for equality against the accelerator's results, such a strategy is not reliable nor scalable as tests become complex - such as when using random inputs or writing multiple tests. Thus, there lies significant value in writing a functional model that performs the same task as the accelerator, but in software. Of course, care must be taken in writing a correct functional model that adheres to the spec.

Inspect $chipyard/tests/rocc.h.

Answer the following question:

7.4. What does the last argument of ROCC_INSTRUCTION_DSS stand for? In what situation would you need to use that argument?

Next, we compile our test by running the following in the baremetal_test directory:

riscv64-unknown-elf-gcc -fno-common -fno-builtin-printf -specs=htif_nano.specs -c functionalTest.c
riscv64-unknown-elf-gcc -static -specs=htif_nano.specs functionalTest.o -o functionalTest.riscv

Here, we're using a version of gcc with the target architecture set to riscv (without an OS underneath). This comes as part of the riscv toolchain. Since we want a self-contained binary, we compile it statically.

Now, let's disassemble the executable functionalTest by running:

riscv64-unknown-elf-objdump -d functionalTest.riscv | less

Inspect the output. Answer the following question:

7.5. What is the address of the ROCC_INSTRUCTION_DSS? Looking through <main> and looking for opcode0 should be helpful.

It's time to run our functional test. Let us use VCS this time around. Navigate to $chipyard/sims/vcs, run:

bsub -I -XF -q ee194 make -j16 CONFIG=CustomAccRoCCConfig BINARY=../../generators/custom-acc-rocc/baremetal_test/functionalTest.riscv run-binary-debug

It might take a few minutes to build and compile the test harness, and run the simulation.

Inside, $chipyard/sims/vcs, for each config,

generated-src contains the test harness
output contains output files (log/output/waveform) for each config.

Inspect the log and output for our config. Do the results of the accelerator and model match? (** PASSED ** in the .out file and output values matching in the .log file should indicate this.)

Inspect the waveform (.fsdb) for our config. Using either x2go, nomachine, or X-forwarding, run verdi -ssf <fsdb file>. Synopsys has transitioned to a new waveform viewer called Verdi that is much more capable than DVE. Verdi uses an open file format called fsdb (Fast Signal Database), and hence VCS has been set up to output simulation waveforms in fsdb.

In the bottom pane of your Verdi window, navigate to Signal > Get Signals.... Follow the module hierarchy to the correct module.

TestDriver
  .testHarness
    .chiptop0
      .system
        .tile_prci_domain
          .tile_reset_domain
            .rocket_tile
              .customAccRoCC

Next Step

page8. Lab Submission

Last updated 3 months ago