Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 4 additions & 8 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,7 @@ jobs:
matrix:
compiler: [gcc, clang]
architecture: [arm, riscv]
link_mode: [static]
include:
- compiler: gcc
architecture: arm
link_mode: dynamic
- compiler: clang
architecture: arm
link_mode: dynamic
link_mode: [static, dynamic]
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand All @@ -27,6 +20,9 @@ jobs:
sudo apt-get install -q -y qemu-user
sudo apt-get install -q -y build-essential
sudo apt-get install -q -y gcc-arm-linux-gnueabihf
sudo wget -q https://github.com/riscv-collab/riscv-gnu-toolchain/releases/download/2026.04.05/riscv32-glibc-ubuntu-24.04-gcc.tar.xz
sudo tar Jxf riscv32-glibc-ubuntu-24.04-gcc.tar.xz -C /opt
echo "/opt/riscv/bin" >> "$GITHUB_PATH"
- name: Determine static or dynamic linking mode
id: determine-mode
run: |
Expand Down
25 changes: 8 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,6 @@ STAGE0_FLAGS ?= --dump-ir
STAGE1_FLAGS ?=
DYNLINK ?= 0
ifeq ($(DYNLINK),1)
ifeq ($(ARCH),riscv)
# TODO: implement dynamic linking for RISC-V.
$(error "Dynamic linking mode is not implemented for RISC-V")
endif
STAGE0_FLAGS += --dynlink
STAGE1_FLAGS += --dynlink
endif
Expand Down Expand Up @@ -108,8 +104,10 @@ check-sanitizer: $(OUT)/$(STAGE0)-sanitizer tests/driver.sh
$(Q)rm $(OUT)/shecc

check-snapshots: $(OUT)/$(STAGE0) $(SNAPSHOTS) tests/check-snapshots.sh
# static linking
$(Q)$(foreach SNAPSHOT_ARCH, $(ARCHS), $(MAKE) distclean config check-snapshot ARCH=$(SNAPSHOT_ARCH) DYNLINK=0 --silent;)
$(Q)$(MAKE) distclean config check-snapshot ARCH=arm DYNLINK=1 --silent
# dynamic linking
$(Q)$(foreach SNAPSHOT_ARCH, $(ARCHS), $(MAKE) distclean config check-snapshot ARCH=$(SNAPSHOT_ARCH) DYNLINK=1 --silent;)
$(VECHO) "Switching backend back to %s (DYNLINK=0)\n" arm
$(Q)$(MAKE) distclean config ARCH=arm DYNLINK=0 --silent

Expand All @@ -118,24 +116,17 @@ check-snapshot: $(OUT)/$(STAGE0) tests/check-snapshots.sh
tests/check-snapshots.sh $(ARCH) $(DYNLINK)
$(VECHO) " OK\n"

# TODO: Add an ABI conformance test suite for the RISC-V architecture
check-abi-stage0: $(OUT)/$(STAGE0)
$(Q)if [ "$(ARCH)" = "arm" ]; then \
tests/$(ARCH)-abi.sh 0 $(DYNLINK); \
else \
echo "Skip ABI compliance validation"; \
fi
tests/$(ARCH)-abi.sh 0 $(DYNLINK);

check-abi-stage2: $(OUT)/$(STAGE2)
$(Q)if [ "$(ARCH)" = "arm" ]; then \
tests/$(ARCH)-abi.sh 2 $(DYNLINK); \
else \
echo "Skip ABI compliance validation"; \
fi
tests/$(ARCH)-abi.sh 2 $(DYNLINK);

update-snapshots: tests/update-snapshots.sh
# static linking
$(Q)$(foreach SNAPSHOT_ARCH, $(ARCHS), $(MAKE) distclean config update-snapshot ARCH=$(SNAPSHOT_ARCH) DYNLINK=0 --silent;)
$(Q)$(MAKE) distclean config update-snapshot ARCH=arm DYNLINK=1 --silent
# dynamic linking
$(Q)$(foreach SNAPSHOT_ARCH, $(ARCHS), $(MAKE) distclean config update-snapshot ARCH=$(SNAPSHOT_ARCH) DYNLINK=1 --silent;)
$(VECHO) "Switching backend back to %s (DYNLINK=0)\n" arm
$(Q)$(MAKE) distclean config ARCH=arm DYNLINK=0 --silent

Expand Down
20 changes: 16 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Despite its simplistic nature, it is capable of performing basic optimization st
while the second pass translates these operations into Arm/RISC-V machine code.
* Develop a register allocation system that is compatible with RISC-style architectures.
* Implement an architecture-independent, [static single assignment](https://en.wikipedia.org/wiki/Static_single-assignment_form) (SSA)-based middle-end for enhanced optimizations.
* Support dynamic linking to allow generated executables to run with glibc.

## Compatibility

Expand Down Expand Up @@ -62,6 +63,7 @@ Code generator in `shecc` does not rely on external utilities. You only need
ordinary C compilers such as `gcc` and `clang`. However, `shecc` would bootstrap
itself, and Arm/RISC-V ISA emulation is required. Install QEMU for Arm/RISC-V user
emulation on GNU/Linux:

```shell
$ sudo apt-get install qemu-user
```
Expand All @@ -79,14 +81,23 @@ To execute the snapshot test, install the packages below:
$ sudo apt-get install graphviz jq
```

Additionally, because `shecc` supports the dynamic linking mode for the Arm architecture,
it needs to install the ARM GNU toolchain to obtain the ELF interpreter and other dependencies:
### Additional packages

Because `shecc` supports the dynamic linking mode for both the Arm and RISC-V architectures,
it needs to install cross-compile GNU toolchains to obtain the ELF interpreter and other dependencies.

For the Arm architecture, you can install the ARM GNU toolchain using `apt-get`:

```shell
$ sudo apt-get install gcc-arm-linux-gnueabihf
```
Another approach is to manually download and install the toolchain from [ARM Developer website](https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads).
Select "x86_64 Linux hosted cross toolchains" - "AArch32 GNU/Linux target with hard float (arm-none-linux-gnueabihf)"
to download the toolchain.

Select "x86_64 Linux hosted cross toolchains" - "AArch32 GNU/Linux target with hard float (arm-none-linux-gnueabihf)" to download the toolchain.
Since `apt-get` does not provide the necessary RISC-V GNU toolchain, it must be downloaded manually if you want to
run a dynamically linked `shecc` targeting the RISC-V architecture. For instance, you can download and extract the
`riscv32-glibc-ubuntu-22.04-gcc.tar.xz` package from the [riscv-gnu-gcc](https://github.com/riscv-collab/riscv-gnu-toolchain) repository.

## Build and Verify

Expand All @@ -113,6 +124,7 @@ $ make
Run `make DYNLINK=1` to use the dynamic linking mode and generate the dynamically linked compiler:
```shell
# If using the dynamic linking mode, you should add 'DYNLINK=1' for each 'make' command.
# Append 'ARCH=arm' or 'ARCH=riscv' to specify the target architecture (default: arm).
$ make DYNLINK=1
CC+LD out/inliner
GEN out/libc.inc
Expand Down Expand Up @@ -152,7 +164,7 @@ $ qemu-arm fib

Example 2: dynamic linking mode

Notice that `/usr/arm-linux-gnueabihf` is the ELF interpreter prefix. Since the path may be different if you manually install the ARM GNU toolchain instead of using `apt-get`, you should set the prefix to the actual path.
Notice that `/usr/arm-linux-gnueabihf` is the ELF interpreter prefix. Since the path may be different if you manually install the ARM/RISC-V GNU toolchain instead of using `apt-get`, you should set the prefix to the actual path.
```shell
$ out/shecc --dynlink -o fib tests/fib.c
$ chmod +x fib
Expand Down
143 changes: 123 additions & 20 deletions docs/dynamic-linking.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@

## Build dynamically linked shecc and programs

Build the dynamically linked version of shecc, but notice that shecc currently doesn't support dynamic linking for the RISC-V architecture:
Build the dynamically linked version of shecc:

```shell
$ make ARCH=arm DYNLINK=1
$ make ARCH=riscv DYNLINK=1
```

Next, you can use shecc to build dynamically linked programs by adding the `--dynlink` flag:
Expand All @@ -15,12 +16,16 @@ Next, you can use shecc to build dynamically linked programs by adding the `--dy
$ out/shecc --dynlink -o <output> <input.c>
# Use the stage 1 or stage 2 compiler
$ qemu-arm -L <LD_PREFIX> out/shecc-stage2.elf --dynlink -o <output> <input.c>
$ qemu-riscv32 -L <LD_PREFIX> out/shecc-stage2.elf --dynlink -o <output> <input.c>

# Execute the compiled program
$ qemu-arm -L <LD_PREFIX> <output>
$ qemu-riscv32 -L <LD_PREFIX> <output>
```

When executing a dynamically linked program, you should set the ELF interpreter prefix so that `ld.so` can be invoked. Generally, it should be `/usr/arm-linux-gnueabihf` if you have installed the ARM GNU toolchain by `apt`. Otherwise, you should find and specify the correct path if you manually installed the toolchain.
When executing a dynamically linked program, you should set the ELF interpreter prefix so that `ld.so` can be invoked.

Generally, the prefix should be `/usr/arm-linux-gnueabihf` for the Arm architecture if you have installed the ARM GNU toolchain by `apt`. Otherwise, you should find and specify the correct path if you manually installed the toolchain. For RISC-V, you must manually download a 32-bit RISC-V GNU toolchain since `apt` may not provide any package to install the necessary toolchain.

## Stack frame layout

Expand Down Expand Up @@ -69,30 +74,69 @@ Low Address

### RISC-V

(Currently not supported)
```
High Address
+------------------+
| ... |
+------------------+ <- sp + total_size
| preserved return |
| address |
+------------------+
| (padding) |
+------------------+
| local variables |
+------------------+ <- sp + (MAX_PARAMS - MAX_ARGS_IN_REG) * 4
| (unused space) |
+------------------+ <- sp (MUST be aligned to 16 bytes)
Low Address
```

`total_size`: includes the size of the following elements:

* `unused space`: a fixed size - `(MAX_PARAMS - MAX_ARGS_IN_REG) * 4` bytes
* `local variables`
* `preserved return address`: a fixed size - 4 bytes

## Calling Convention

Regardless of which mode is used, callers are ensured to perform a collection of required operations for complying with the ABI of the target architecture when calling a function.

### Arm32

Regardless of which mode is used, the caller performs the following operations to comply with the Arm Architecture Procedure Call Standard (AAPCS) when calling a function.
Caller's behavior:

* The first four arguments are put into registers `r0` - `r3`
* The first four arguments are put into registers `r0` - `r3`.
* Any additional arguments are passed on the stack. Arguments are pushed onto the stack starting from the last argument, so the fifth argument resides at a lower address and the last argument at a higher address.
* Align the stack pointer to 8 bytes, as external functions may access 8-byte objects that require such alignment.

Then, the callee will perform these operations:
Callee's behavior:

- Preserve the contents of registers `r4` - `r11` on the stack upon function entry.
- The callee also pushes the content of `lr` onto the stack to preserve the return address; however, this operation is not required by the AAPCS.

- Restore these registers from the stack upon returning.
- Allocate necessary space on the stack and align the stack pointer to 8-byte, as external functions may access 8-byte objects that require such alignment.
- Restore registers `r4` - `r11` from the stack upon returning, and load the saved `lr` to `pc` to return.

### RISC-V

In the RISC-V architecture, registers `a0` - `a7` are used as argument registers; that is, the first eight arguments are passed into these registers.
Caller's behavior:

- Preserve caller-saved registers:
- `a0` - `a7`.
- `ra` is always saved upon caller's entry.
- Exception: `t0` - `t6` are always used to store temporary values by the code generator, so these temporary registers are not necessary to be saved.

- The first eight arguments are passed into registers `a0` - `a7`.
- Since the current implementation of shecc supports up to 8 arguments, no argument needs to be passed onto the stack.

Callee's behavior

- Allocate necessary space on the stack and align the stack pointer to 128-bit (16-byte).
- Preserve callee-saved registers:
- Although `sp` is not explicitly saved onto the stack after allocating space for local variables, the code generator guarantees that `sp` is correctly restored for the caller prior to returning. Therefore, `sp` is not necessary to be additionally handled.
- Exception:
- `s1` - `s11` are not used by the code generator, so they are unnecessary to be processed.
- `s0` is unused during static linking and is only accessed at the program's entry point under dynamic linking, there is no need to save this register.

Since the current implementation of shecc supports up to 8 arguments, no argument needs to be passed onto the stack.
- Restore the return address and the stack pointer before returning.

## Runtime execution flow of a dynamically linked program

Expand Down Expand Up @@ -136,14 +180,20 @@ kernel | | +--->| +--->| |
2. Kernel validates the executable and creates a process image if the validation passes.
3. Dynamic linker (`ld.so`) is invoked by the kernel's program loader.
* For the Arm architecture, the dynamic linker is `/lib/ld-linux-armhf.so.3`.
* For the RISC-V architecture, the dynamic linker is `/lib/ld-linux-riscv32-ilp32d.so.1`.
4. Linker loads shared libraries such as `libc.so`.
5. Linker resolves symbols and fills global offset table (GOT).
6. Control transfers to the program, which starts at the entry point.
7. Program executes `__libc_start_main` at the beginning.
8. `__libc_start_main` calls the *main wrapper*, which pushes registers r4-r11 and lr onto the stack, sets up a global stack for all global variables (excluding read-only variables), and initializes them.
8. `__libc_start_main` calls the *main wrapper*, which includes the following operations:
* Architecture-specific behavior:
* Arm: push registers `r4`-`r11` and `lr` onto the stack.
* RISC-V: store register `ra` onto the stack (preserve the address back to `__libc_start_main`).
* Preserve `argc` and `argv` for the main function.
* Set up a global stack for all global variables (excluding read-only variables) and initialize them.
9. Execute the *main wrapper*, and then invoke the main function.
10. After the `main` function returns, the *main wrapper* restores the necessary registers and passes control back to `__libc_start_main`, which implicitly calls `exit(3)` to terminate the program.
* Or, the `main` function can also call `exit(3)` or `_exit(2)` to directly terminate itself.
* Alternatively, the `main` function can also call `exit(3)` or `_exit(2)` to directly terminate itself.

## Dynamic sections

Expand All @@ -159,10 +209,14 @@ When using dynamic linking, the following sections are generated for compiled pr

### Initialization of all GOT entries

* `GOT[0]` is set to the starting address of the `.dynamic` section.
* `GOT[1]` and `GOT[2]` are initialized to zero and reserved for the `link_map` and the resolver (`__dl_runtimer_resolve`).
* The dynamic linker modifies them to point to the actual addresses at runtime.
* `GOT[3]` - `GOT[N]` are initially set to the address of `PLT[0]` at compile time, causing the first call to an external function to invoke the resolver at runtime.
* Arm:
* `GOT[0]` is set to the starting address of the `.dynamic` section.
* `GOT[1]` and `GOT[2]` are initialized to zero and reserved for `link_map` and resolver (`__dl_runtimer_resolve`), and they are modified to point to the actual addresses by the dynamic linker at runtime.

* RISC-V:
* `GOT[0]` and `GOT[1]` are initialized to zero and reserved for resolver (`__dl_runtimer_resolve`) and `link_map`, and they are modified to point to the actual addresses by the dynamic linker at runtime.

* The remaining entries are initially set to the address of `PLT[0]` at compile time, causing the first call to an external function to invoke the resolver at runtime.

### Explanation for PLT stubs (Arm32)

Expand All @@ -187,6 +241,8 @@ ldr pc, [lr]
3. Move the value of `sl` to `lr`.
4. Load the value located at `[lr]` into the program counter (`pc`)

------

The remaining PLT entries correspond to all external functions, and each entry includes the following instructions to fulfill the second requirement:

```
Expand All @@ -198,6 +254,49 @@ ldr pc, [ip]
1. Set register `ip` to the address of `GOT[x]`.
2. Assign register `pc` to the value of `GOT[x]`. That is, set `pc` to the address of the callee.

### Explanation for PLT stubs (RISC-V)

In the RISC-V ABI document, the first entry of PLT can be produced as follows:

```
1: auipc t2, %pcrel_hi(.got)
sub t1, t1, t3
lw t3, %pcrel_lo(1b)(t2)
addi t1, t1 -(PLT0_SIZE + 12) # PLT0_SIZE is 32 bytes.
addi t0, t2, %pcrel_lo(1b)
srli t1, t1, log2(16 / PTRSIZE) # PTRSIZE is 4 bytes.
lw t0, PTRSIZE(t0)
jr t3
```

- `t0` is set to `GOT[1]`, which is the `link_map` pointer.

- `t1` is a `.got` offset:

| External Function | Corresponding GOT element | `.got` offset |
| ----------------- | ------------------------- | ------------- |
| 1st function | `GOT[2]` | `0` |
| 2nd function | `GOT[3]` | `4` |
| ... | ... | ... |
| N-th function | `GOT[N + 1]` | `N * 4` |

- `t2` is `%hi(%pcrel(.got))`, but it is not used by `__dl_runtime_resolve()`.

- `t3` is `GOT[0]` (a pointer to `__dl_runtime_resolve()`), and `PLT[0]` finally uses `t3` to jump to the resolver.

------

Each of the remaining entries can be generated with the following instructions:

```
1: auipc t3, %pcrel_hi(function@.got)
lw t3, %pcrel_lo(1b)(t3)
jalr t1, t3
nop
```

This instruction sequence sets `t1` and `t3` to the address of `nop` and `GOT[N]` respectively, and performs a jump via `t3` to call an external function.

## PLT execution path and performance overhead

Since calling an external function needs a PLT stub for indirect invocation, the execution path of the first call is as follows:
Expand Down Expand Up @@ -243,7 +342,11 @@ This implies that:

* man page: `ld(1)`
* man page: `ld.so(8)`
* glibc - [`__dl_runtime_resolve`](https://elixir.bootlin.com/glibc/glibc-2.41.9000/source/sysdeps/arm/dl-trampoline.S#L30) implementation (for Arm32)
* glibc implementation
* [`__dl_runtime_resolve`](https://elixir.bootlin.com/glibc/glibc-2.41.9000/source/sysdeps/arm/dl-trampoline.S#L30) (Arm32)
* [`__dl_runtime_resolve`](https://elixir.bootlin.com/glibc/glibc-2.41.9000/source/sysdeps/riscv/dl-trampoline.S#L34) (for RISC-V)
* Application Binary Interface for the Arm Architecture - [`abi-aa`](https://github.com/ARM-software/abi-aa)
* `aaelf32`
* `aapcs32`
* `aaelf32.pdf`
* `aapcs32.pdf`
* RISC-V ABIs Specification - [`riscv-elf-psabi-doc`](https://github.com/riscv-non-isa/riscv-elf-psabi-doc)
* `riscv-abi.pdf`
Loading
Loading