Skip to content

Conversation

@ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Jan 9, 2026

This pr adds a new types of Errors for all possible operations possible on struct HyperlightVm. The main purpose for doing so if to more accurately being able to determine whether a sandbox should be poisoned, since any error that doesn't let the guest properly unwind should poison a sandbox. However, another benefit with this PR is that errors surfaced to the user will be much richer in information about what went wrong. Due to these errors having tree-like structure, they almost serve as a kind of backtrace. To accomplish this, new richer errors types have been added to all possible code paths taken by all HyperlightVm, and existing HyperlightErrors are entirely avoided in these codepaths (with a couple of small exceptions).

To remain backwards compatible (with tests and not to break users relying on certain existing errors), some of these internal errors are surfaced as existing errors. For example, stack overflow errors will be turned into the existing top-level error HyperlightError::StackOverflow, and when guest execution is canceleld by host, the old error will be returned similarly.

This PR does not change any logic at all, it only modifies returned error types.

My eventual (far away) goal is to add richer errors to all hyperlight modules, and very carefully distinguish between internal errors (that are not supposed to happen) and other errors that indeed are supposed to be able to happen. All new errors in this PR are "internal errors" in that they should not be relied upon to adhere to semver, etc, and may change heavily between versions. That said, they are still pub since they are returned to users. The errors in this PR can be visualized like this:

Spoiler warning
HyperlightVmError
├── DispatchGuestCall(DispatchGuestCallError)
│   ├── ConvertRspPointer(String)
│   ├── SetupRegs(RegisterError)              ──► [A]
│   └── Run(RunVmError)                       ──► [B]
│
├── Initialize(InitializeError)
│   ├── ConvertPointer(String)
│   ├── SetupRegs(RegisterError)              ──► [A]
│   └── Run(RunVmError)                       ──► [B]
│
├── Create(CreateHyperlightVmError)
│   ├── NoHypervisorFound
│   ├── Vm(VmError)                           ──► [C]
│   ├── ConvertRspPointer(Box)
│   ├── SendDbgMsg(SendDbgMsgError)           ──► [D] [gdb]
│   └── AddHwBreakpoint(DebugError)           ──► [G] [gdb]
│
├── MapRegion(MapRegionError)
│   ├── NotPageAligned { page_size, region }
│   └── MapMemory(MapMemoryError)
│
└── UnmapRegion(UnmapRegionError)
    ├── RegionNotFound(MemoryRegion)
    └── UnmapMemory(UnmapMemoryError)

[A] RegisterError
├── GetRegs(HypervisorImplError)
├── SetRegs(HypervisorImplError)
├── GetFpu(HypervisorImplError)
├── SetFpu(HypervisorImplError)
├── GetSregs(HypervisorImplError)
├── SetSregs(HypervisorImplError)
├── GetDebugRegs(HypervisorImplError)
├── SetDebugRegs(HypervisorImplError)
├── GetXsave(HypervisorImplError)
└── XsaveSizeMismatch { expected, actual }

[B] RunVmError
├── StackOverflow
├── MemoryAccessViolation { addr, access_type, region_flags }
├── MmioReadUnmapped(u64)
├── MmioWriteUnmapped(u64)
├── ExecutionCancelledByHost
├── UnexpectedVmExit(String)
├── RunVcpu(RunVcpuError)
├── HandleIo(HandleIoError) ──► [E]
├── GetRegs(RegisterError) ──► [A] [trace_guest]
├── CheckStackGuard(Box)
├── DebugHandler(HandleDebugError) ──► [F] [gdb]
├── VcpuStopReason(VcpuStopReasonError) [gdb]
└── CrashdumpGeneration(Box) [crashdump]

[C] VmError
├── CreateVm(CreateVmError)
├── RunVcpu(RunVcpuError)
├── Register(RegisterError) ──► [A]
├── MapMemory(MapMemoryError)
├── UnmapMemory(UnmapMemoryError)
└── Debug(DebugError) ──► [G] [gdb]

[D] SendDbgMsgError [gdb]
├── DebugNotEnabled
└── SendFailed(GdbTargetError)

[E] HandleIoError
├── NoData
├── GetRegs(RegisterError) ──► [A] [mem_profile]
└── Outb(HandleOutbError)
├── StackOverflow
├── GuestAborted { code, message }
├── InvalidPort(String)
├── ReadLogData(String)
├── TraceFormat(String)
├── ReadHostFunctionCall(String)
├── LockFailed(&str, u32, String)
├── WriteHostFunctionResponse(String)
├── InvalidDebugPrintChar(u32)
└── MemProfile(String) [mem_profile]

[F] HandleDebugError [gdb]
├── DebugNotEnabled
├── SendMessage(SendDbgMsgError) ──► [D]
├── ReceiveMessage(RecvDbgMsgError)
│ ├── DebugNotEnabled
│ └── RecvFailed(GdbTargetError)
└── ProcessRequest(ProcessDebugRequestError)
├── DebugNotEnabled
├── TryLockError(&str, u32)
├── Vm(VmError) ──► [C]
├── Debug(DebugError) ──► [G]
├── SwBreakpointNotFound(u64)
├── ReadMemory(DebugMemoryAccessError)
└── WriteMemory(DebugMemoryAccessError)

[G] DebugError [gdb]
├── HwBreakpointNotFound(u64)
├── TooManyHwBreakpoints(usize)
├── Register(RegisterError) ──► [A]
├── TranslateGva(u64)
└── Intercept { enable, inner: HypervisorImplError }

Partially addresses #998 (fixes the bug, but doesn't update the errors everywhere)

@ludfjig ludfjig added kind/bugfix For PRs that fix bugs kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. and removed kind/bugfix For PRs that fix bugs labels Jan 10, 2026
@ludfjig ludfjig changed the title [draft] Richer errors for entire HyperlightVm [draft] Richer errors for HyperlightVm Jan 10, 2026
@ludfjig ludfjig changed the title [draft] Richer errors for HyperlightVm Richer errors for HyperlightVm Jan 12, 2026
@ludfjig ludfjig marked this pull request as ready for review January 12, 2026 23:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive richer error type system for HyperlightVm operations, enabling more accurate determination of when a sandbox should be poisoned. The changes maintain backwards compatibility by converting internal errors to existing error types where appropriate, while providing much more detailed error information through a tree-like error structure.

Changes:

  • Introduces new error enums with displaydoc for all HyperlightVm operations (dispatch, initialize, create, map/unmap regions, run, I/O handling, debug operations)
  • Updates all virtual machine implementations (KVM, MSHV, WHP) to use new error types
  • Adds poison detection logic based on error types to properly track sandbox state
  • Removes old hypervisor-specific error variants from HyperlightError in favor of the new HyperlightVmError hierarchy
  • Adds displaydoc dependency for better error message generation

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/hyperlight_host/src/hypervisor/hyperlight_vm.rs Defines new error types (DispatchGuestCallError, InitializeError, RunVmError, HandleIoError, MapRegionError, UnmapRegionError, CreateHyperlightVmError, HandleDebugError, SendDbgMsgError, RecvDbgMsgError, HyperlightVmError) and updates function signatures
src/hyperlight_host/src/hypervisor/virtual_machine/mod.rs Adds VmError, CreateVmError, RunVcpuError, RegisterError, MapMemoryError, UnmapMemoryError, HypervisorImplError enums
src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs Updates Windows Hypervisor Platform implementation to use new error types
src/hyperlight_host/src/hypervisor/virtual_machine/mshv.rs Updates MSHV implementation to use new error types
src/hyperlight_host/src/hypervisor/virtual_machine/kvm.rs Updates KVM implementation to use new error types
src/hyperlight_host/src/hypervisor/gdb/mod.rs Adds DebugError, DebugMemoryAccessError and updates debug trait methods
src/hyperlight_host/src/hypervisor/gdb/arch.rs Adds VcpuStopReasonError enum
src/hyperlight_host/src/sandbox/initialized_multi_use.rs Updates dispatch call handling with error promotion and poison detection
src/hyperlight_host/src/sandbox/uninitialized_evolve.rs Updates to use new HyperlightVmError wrapping
src/hyperlight_host/src/sandbox/outb.rs Introduces HandleOutbError enum for outb operation errors
src/hyperlight_host/src/sandbox/trace/mem_profile.rs Updates to return HandleOutbError instead of Result
src/hyperlight_host/src/error.rs Adds HyperlightVmError variant, removes old hypervisor-specific errors, updates is_poison_error logic
src/hyperlight_host/Cargo.toml Adds displaydoc 0.2.5 dependency
Cargo.lock Updates dependencies (displaydoc, flatbuffers, proc-macro2, quote, syn)
src/tests/rust_guests/witguest/Cargo.lock Updates test guest dependencies
src/tests/rust_guests/simpleguest/Cargo.lock Updates test guest dependencies
src/hyperlight_host/src/hypervisor/mod.rs Updates test to use unwrap instead of ? operator

@ludfjig ludfjig force-pushed the richer_error branch 4 times, most recently from 5ea16c0 to 9d19dda Compare January 15, 2026 22:13
@jsturtevant
Copy link
Contributor

whats the difference between
[F] HandleDebugError [gdb] and [G] DebugError [gdb] errors?

@jsturtevant
Copy link
Contributor

All new errors in this PR are "internal errors" in that they should not be relied upon to adhere to semver, etc, and may change heavily between versions.

If the user is now being exposed to these errors won't they take a dependency on them to do things. For instance a user matched on the variant HyperlightError::HyperlightVmError(_)

@jsturtevant
Copy link
Contributor

Partially addresses #998 (fixes the bug, but doesn't update the errors everywhere)

did we add any tests for this?

@ludfjig ludfjig force-pushed the richer_error branch 2 times, most recently from f24d902 to 9c5e1df Compare January 20, 2026 23:53
@ludfjig
Copy link
Contributor Author

ludfjig commented Jan 20, 2026

Partially addresses #998 (fixes the bug, but doesn't update the errors everywhere)

did we add any tests for this?

Added one test!

All new errors in this PR are "internal errors" in that they should not be relied upon to adhere to semver, etc, and may change heavily between versions.

If the user is now being exposed to these errors won't they take a dependency on them to do things. For instance a user matched on the variant HyperlightError::HyperlightVmError(_)

Yes they could technically. I added a comment and doc(hidden) to prevent this as much as possible.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.

Comment on lines +726 to +728
return Err(windows_result::Error::new(
HRESULT::from_win32(0x8007007F), // ERROR_PROC_NOT_FOUND
"Failed to find WHvMapGpaRange2 in winhvplatform.dll",
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error code HRESULT::from_win32(0x8007007F) used for ERROR_PROC_NOT_FOUND is incorrect. The value 0x8007007F represents a Win32 error code that's already been converted to HRESULT format, but from_win32() will convert it again. Use HRESULT(0x8007007F) directly or HRESULT::from_win32(127) where 127 is the actual Win32 error code for ERROR_PROC_NOT_FOUND.

Copilot uses AI. Check for mistakes.
use crate::mem::shared_mem::HostSharedMemory;
use crate::{HyperlightError, new_error};

#[derive(Debug, Error)]
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GdbTargetError enum visibility has been changed from pub(crate) to pub. This expands the public API surface area. Since this error is now part of public error chains (through SendDbgMsgError and RecvDbgMsgError which are public), this change is necessary but should be noted as an API expansion. Ensure this is intentional and documented if needed.

Suggested change
#[derive(Debug, Error)]
#[derive(Debug, Error)]
/// Errors produced by the GDB server / target integration for Hyperlight.
///
/// This type is part of the public error surface of `hyperlight_host`. It may
/// appear as the source of other public error types (for example,
/// `SendDbgMsgError` and `RecvDbgMsgError`) when interacting with a GDB
/// debugger over the remote protocol.

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +240
#[error("Region is not page-aligned (page size: {0:#x})")]
NotPageAligned(usize),
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion of MapRegionError::NotPageAligned should include the region information in the error message. The old code logged "region is not page-aligned {:x}, {region:?}" with both page size and region details. The new error only captures page_size, losing the region information which could be valuable for debugging. Consider adding a region field to NotPageAligned or including it in the error message.

Copilot uses AI. Check for mistakes.
match try_load_whv_map_gpa_range2() {
Ok(func) => func,
Err(e) => return Err(new_error!("Can't find API: {}", e)),
Err(e) => return Err(MapMemoryError::LoadApi(e)),
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling pattern changed from returning Err(new_error!(...)) to Err(MapMemoryError::LoadApi(e)). The old pattern would have wrapped the error in an anyhow error with a custom message. Verify that MapMemoryError::LoadApi(windows_result::Error) provides sufficient context about what API failed to load, or consider adding more context to the error message.

Copilot uses AI. Check for mistakes.
Comment on lines 343 to 354
.map_err(|e| {
new_error!(
"Failed to convert WHP registers to CommonRegisters: {:?}",
e
RegisterError::GetRegs(
crate::hypervisor::virtual_machine::HypervisorError::WindowsError(
windows_result::Error::new(
HRESULT::from_win32(160), // ERROR_BAD_ARGUMENTS
format!(
"Failed to convert WHP registers to CommonRegisters: {:?}",
e
),
),
),
)
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a hardcoded HRESULT value HRESULT::from_win32(160) (ERROR_BAD_ARGUMENTS) for conversion errors is fragile. The error code should be derived from the actual error condition or use a more appropriate Windows error code. Consider using E_INVALIDARG or handling the conversion error more generically without inventing error codes.

Copilot uses AI. Check for mistakes.
Comment on lines +560 to +574
/// Test that executing an OUT instruction with an invalid port causes an error and poisons the sandbox.
#[test]
fn guest_outb_with_invalid_port_poisons_sandbox() {
let mut sbox = new_uninit_rust().unwrap().evolve().unwrap();

// Port 0x1234 is not a valid hyperlight port
let res = sbox.call::<()>("OutbWithPort", (0x1234_u32, 0_u32));
assert!(res.is_err(), "Expected error from invalid OUT port");

// The sandbox should be poisoned because the guest didn't complete normally
assert!(
sbox.poisoned(),
"Sandbox should be poisoned after invalid OUT"
);
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test guest_outb_with_invalid_port_poisons_sandbox verifies that using an invalid OUT port poisons the sandbox. However, there's no test coverage for the new error types themselves, particularly for verifying that the error conversion logic in DispatchGuestCallError::promote() works correctly. Consider adding tests that validate the error conversion path and ensure that appropriate errors are mapped to the correct HyperlightError variants.

Copilot uses AI. Check for mistakes.
Comment on lines +239 to +240
#[error("Region is not page-aligned (page size: {0:#x})")]
NotPageAligned(usize),
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MapRegionError::NotPageAligned error message format changed from a detailed log with hex-formatted page size and debug-formatted region info to a simpler message with only the page size. Consider formatting the page size as hex (using {:#x}) to match the original format, as hex is more conventional for memory-related sizes and alignments.

Copilot uses AI. Check for mistakes.
Comment on lines +1114 to 1129
.inspect_err(|e| {
log::error!("Failed to add hw breakpoint: {:?}", e);

e
})
.is_ok(),
)),
DebugMsg::AddSwBreakpoint(addr) => Ok(DebugResponse::AddSwBreakpoint(
self.add_sw_breakpoint(addr, mem_access)
.map_err(|e| {
.inspect_err(|e| {
log::error!("Failed to add sw breakpoint: {:?}", e);

e
})
.is_ok(),
)),
DebugMsg::Continue => {
self.vm.set_single_step(false).map_err(|e| {
self.vm.set_single_step(false).inspect_err(|e| {
log::error!("Failed to continue execution: {:?}", e);

e
})?;
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation uses .inspect_err() in multiple places (lines 1114, 1121, 1127, etc.) to log errors, but doesn't chain a ? operator after these calls. This is correct usage. However, the previous code used .map_err() which both logged and returned the error. The new pattern with .inspect_err() followed by ? is cleaner and separates concerns, but verify that the error handling behavior is equivalent - specifically that the original error is still propagated unchanged.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants