-
Notifications
You must be signed in to change notification settings - Fork 157
Richer errors for HyperlightVm #1162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive richer error type system for HyperlightVm operations, enabling more accurate determination of when a sandbox should be poisoned. The changes maintain backwards compatibility by converting internal errors to existing error types where appropriate, while providing much more detailed error information through a tree-like error structure.
Changes:
- Introduces new error enums with displaydoc for all HyperlightVm operations (dispatch, initialize, create, map/unmap regions, run, I/O handling, debug operations)
- Updates all virtual machine implementations (KVM, MSHV, WHP) to use new error types
- Adds poison detection logic based on error types to properly track sandbox state
- Removes old hypervisor-specific error variants from HyperlightError in favor of the new HyperlightVmError hierarchy
- Adds displaydoc dependency for better error message generation
Reviewed changes
Copilot reviewed 14 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/hyperlight_host/src/hypervisor/hyperlight_vm.rs | Defines new error types (DispatchGuestCallError, InitializeError, RunVmError, HandleIoError, MapRegionError, UnmapRegionError, CreateHyperlightVmError, HandleDebugError, SendDbgMsgError, RecvDbgMsgError, HyperlightVmError) and updates function signatures |
| src/hyperlight_host/src/hypervisor/virtual_machine/mod.rs | Adds VmError, CreateVmError, RunVcpuError, RegisterError, MapMemoryError, UnmapMemoryError, HypervisorImplError enums |
| src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs | Updates Windows Hypervisor Platform implementation to use new error types |
| src/hyperlight_host/src/hypervisor/virtual_machine/mshv.rs | Updates MSHV implementation to use new error types |
| src/hyperlight_host/src/hypervisor/virtual_machine/kvm.rs | Updates KVM implementation to use new error types |
| src/hyperlight_host/src/hypervisor/gdb/mod.rs | Adds DebugError, DebugMemoryAccessError and updates debug trait methods |
| src/hyperlight_host/src/hypervisor/gdb/arch.rs | Adds VcpuStopReasonError enum |
| src/hyperlight_host/src/sandbox/initialized_multi_use.rs | Updates dispatch call handling with error promotion and poison detection |
| src/hyperlight_host/src/sandbox/uninitialized_evolve.rs | Updates to use new HyperlightVmError wrapping |
| src/hyperlight_host/src/sandbox/outb.rs | Introduces HandleOutbError enum for outb operation errors |
| src/hyperlight_host/src/sandbox/trace/mem_profile.rs | Updates to return HandleOutbError instead of Result |
| src/hyperlight_host/src/error.rs | Adds HyperlightVmError variant, removes old hypervisor-specific errors, updates is_poison_error logic |
| src/hyperlight_host/Cargo.toml | Adds displaydoc 0.2.5 dependency |
| Cargo.lock | Updates dependencies (displaydoc, flatbuffers, proc-macro2, quote, syn) |
| src/tests/rust_guests/witguest/Cargo.lock | Updates test guest dependencies |
| src/tests/rust_guests/simpleguest/Cargo.lock | Updates test guest dependencies |
| src/hyperlight_host/src/hypervisor/mod.rs | Updates test to use unwrap instead of ? operator |
5ea16c0 to
9d19dda
Compare
|
whats the difference between |
If the user is now being exposed to these errors won't they take a dependency on them to do things. For instance a user matched on the variant |
did we add any tests for this? |
f24d902 to
9c5e1df
Compare
Added one test!
Yes they could technically. I added a comment and doc(hidden) to prevent this as much as possible. |
Signed-off-by: Ludvig Liljenberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.
| return Err(windows_result::Error::new( | ||
| HRESULT::from_win32(0x8007007F), // ERROR_PROC_NOT_FOUND | ||
| "Failed to find WHvMapGpaRange2 in winhvplatform.dll", |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error code HRESULT::from_win32(0x8007007F) used for ERROR_PROC_NOT_FOUND is incorrect. The value 0x8007007F represents a Win32 error code that's already been converted to HRESULT format, but from_win32() will convert it again. Use HRESULT(0x8007007F) directly or HRESULT::from_win32(127) where 127 is the actual Win32 error code for ERROR_PROC_NOT_FOUND.
| use crate::mem::shared_mem::HostSharedMemory; | ||
| use crate::{HyperlightError, new_error}; | ||
|
|
||
| #[derive(Debug, Error)] |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GdbTargetError enum visibility has been changed from pub(crate) to pub. This expands the public API surface area. Since this error is now part of public error chains (through SendDbgMsgError and RecvDbgMsgError which are public), this change is necessary but should be noted as an API expansion. Ensure this is intentional and documented if needed.
| #[derive(Debug, Error)] | |
| #[derive(Debug, Error)] | |
| /// Errors produced by the GDB server / target integration for Hyperlight. | |
| /// | |
| /// This type is part of the public error surface of `hyperlight_host`. It may | |
| /// appear as the source of other public error types (for example, | |
| /// `SendDbgMsgError` and `RecvDbgMsgError`) when interacting with a GDB | |
| /// debugger over the remote protocol. |
| #[error("Region is not page-aligned (page size: {0:#x})")] | ||
| NotPageAligned(usize), |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion of MapRegionError::NotPageAligned should include the region information in the error message. The old code logged "region is not page-aligned {:x}, {region:?}" with both page size and region details. The new error only captures page_size, losing the region information which could be valuable for debugging. Consider adding a region field to NotPageAligned or including it in the error message.
| match try_load_whv_map_gpa_range2() { | ||
| Ok(func) => func, | ||
| Err(e) => return Err(new_error!("Can't find API: {}", e)), | ||
| Err(e) => return Err(MapMemoryError::LoadApi(e)), |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling pattern changed from returning Err(new_error!(...)) to Err(MapMemoryError::LoadApi(e)). The old pattern would have wrapped the error in an anyhow error with a custom message. Verify that MapMemoryError::LoadApi(windows_result::Error) provides sufficient context about what API failed to load, or consider adding more context to the error message.
| .map_err(|e| { | ||
| new_error!( | ||
| "Failed to convert WHP registers to CommonRegisters: {:?}", | ||
| e | ||
| RegisterError::GetRegs( | ||
| crate::hypervisor::virtual_machine::HypervisorError::WindowsError( | ||
| windows_result::Error::new( | ||
| HRESULT::from_win32(160), // ERROR_BAD_ARGUMENTS | ||
| format!( | ||
| "Failed to convert WHP registers to CommonRegisters: {:?}", | ||
| e | ||
| ), | ||
| ), | ||
| ), | ||
| ) |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a hardcoded HRESULT value HRESULT::from_win32(160) (ERROR_BAD_ARGUMENTS) for conversion errors is fragile. The error code should be derived from the actual error condition or use a more appropriate Windows error code. Consider using E_INVALIDARG or handling the conversion error more generically without inventing error codes.
| /// Test that executing an OUT instruction with an invalid port causes an error and poisons the sandbox. | ||
| #[test] | ||
| fn guest_outb_with_invalid_port_poisons_sandbox() { | ||
| let mut sbox = new_uninit_rust().unwrap().evolve().unwrap(); | ||
|
|
||
| // Port 0x1234 is not a valid hyperlight port | ||
| let res = sbox.call::<()>("OutbWithPort", (0x1234_u32, 0_u32)); | ||
| assert!(res.is_err(), "Expected error from invalid OUT port"); | ||
|
|
||
| // The sandbox should be poisoned because the guest didn't complete normally | ||
| assert!( | ||
| sbox.poisoned(), | ||
| "Sandbox should be poisoned after invalid OUT" | ||
| ); | ||
| } |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test guest_outb_with_invalid_port_poisons_sandbox verifies that using an invalid OUT port poisons the sandbox. However, there's no test coverage for the new error types themselves, particularly for verifying that the error conversion logic in DispatchGuestCallError::promote() works correctly. Consider adding tests that validate the error conversion path and ensure that appropriate errors are mapped to the correct HyperlightError variants.
| #[error("Region is not page-aligned (page size: {0:#x})")] | ||
| NotPageAligned(usize), |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MapRegionError::NotPageAligned error message format changed from a detailed log with hex-formatted page size and debug-formatted region info to a simpler message with only the page size. Consider formatting the page size as hex (using {:#x}) to match the original format, as hex is more conventional for memory-related sizes and alignments.
| .inspect_err(|e| { | ||
| log::error!("Failed to add hw breakpoint: {:?}", e); | ||
|
|
||
| e | ||
| }) | ||
| .is_ok(), | ||
| )), | ||
| DebugMsg::AddSwBreakpoint(addr) => Ok(DebugResponse::AddSwBreakpoint( | ||
| self.add_sw_breakpoint(addr, mem_access) | ||
| .map_err(|e| { | ||
| .inspect_err(|e| { | ||
| log::error!("Failed to add sw breakpoint: {:?}", e); | ||
|
|
||
| e | ||
| }) | ||
| .is_ok(), | ||
| )), | ||
| DebugMsg::Continue => { | ||
| self.vm.set_single_step(false).map_err(|e| { | ||
| self.vm.set_single_step(false).inspect_err(|e| { | ||
| log::error!("Failed to continue execution: {:?}", e); | ||
|
|
||
| e | ||
| })?; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation uses .inspect_err() in multiple places (lines 1114, 1121, 1127, etc.) to log errors, but doesn't chain a ? operator after these calls. This is correct usage. However, the previous code used .map_err() which both logged and returned the error. The new pattern with .inspect_err() followed by ? is cleaner and separates concerns, but verify that the error handling behavior is equivalent - specifically that the original error is still propagated unchanged.
This pr adds a new types of Errors for all possible operations possible on struct
HyperlightVm. The main purpose for doing so if to more accurately being able to determine whether a sandbox should be poisoned, since any error that doesn't let the guest properly unwind should poison a sandbox. However, another benefit with this PR is that errors surfaced to the user will be much richer in information about what went wrong. Due to these errors having tree-like structure, they almost serve as a kind of backtrace. To accomplish this, new richer errors types have been added to all possible code paths taken by allHyperlightVm, and existing HyperlightErrors are entirely avoided in these codepaths (with a couple of small exceptions).To remain backwards compatible (with tests and not to break users relying on certain existing errors), some of these internal errors are surfaced as existing errors. For example, stack overflow errors will be turned into the existing top-level error
HyperlightError::StackOverflow, and when guest execution is canceleld by host, the old error will be returned similarly.This PR does not change any logic at all, it only modifies returned error types.
My eventual (far away) goal is to add richer errors to all hyperlight modules, and very carefully distinguish between internal errors (that are not supposed to happen) and other errors that indeed are supposed to be able to happen. All new errors in this PR are "internal errors" in that they should not be relied upon to adhere to semver, etc, and may change heavily between versions. That said, they are still
pubsince they are returned to users. The errors in this PR can be visualized like this:Spoiler warning
HyperlightVmError ├── DispatchGuestCall(DispatchGuestCallError) │ ├── ConvertRspPointer(String) │ ├── SetupRegs(RegisterError) ──► [A] │ └── Run(RunVmError) ──► [B] │ ├── Initialize(InitializeError) │ ├── ConvertPointer(String) │ ├── SetupRegs(RegisterError) ──► [A] │ └── Run(RunVmError) ──► [B] │ ├── Create(CreateHyperlightVmError) │ ├── NoHypervisorFound │ ├── Vm(VmError) ──► [C] │ ├── ConvertRspPointer(Box) │ ├── SendDbgMsg(SendDbgMsgError) ──► [D] [gdb] │ └── AddHwBreakpoint(DebugError) ──► [G] [gdb] │ ├── MapRegion(MapRegionError) │ ├── NotPageAligned { page_size, region } │ └── MapMemory(MapMemoryError) │ └── UnmapRegion(UnmapRegionError) ├── RegionNotFound(MemoryRegion) └── UnmapMemory(UnmapMemoryError)[A] RegisterError
├── GetRegs(HypervisorImplError)
├── SetRegs(HypervisorImplError)
├── GetFpu(HypervisorImplError)
├── SetFpu(HypervisorImplError)
├── GetSregs(HypervisorImplError)
├── SetSregs(HypervisorImplError)
├── GetDebugRegs(HypervisorImplError)
├── SetDebugRegs(HypervisorImplError)
├── GetXsave(HypervisorImplError)
└── XsaveSizeMismatch { expected, actual }
[B] RunVmError
├── StackOverflow
├── MemoryAccessViolation { addr, access_type, region_flags }
├── MmioReadUnmapped(u64)
├── MmioWriteUnmapped(u64)
├── ExecutionCancelledByHost
├── UnexpectedVmExit(String)
├── RunVcpu(RunVcpuError)
├── HandleIo(HandleIoError) ──► [E]
├── GetRegs(RegisterError) ──► [A] [trace_guest]
├── CheckStackGuard(Box)
├── DebugHandler(HandleDebugError) ──► [F] [gdb]
├── VcpuStopReason(VcpuStopReasonError) [gdb]
└── CrashdumpGeneration(Box) [crashdump]
[C] VmError
├── CreateVm(CreateVmError)
├── RunVcpu(RunVcpuError)
├── Register(RegisterError) ──► [A]
├── MapMemory(MapMemoryError)
├── UnmapMemory(UnmapMemoryError)
└── Debug(DebugError) ──► [G] [gdb]
[D] SendDbgMsgError [gdb]
├── DebugNotEnabled
└── SendFailed(GdbTargetError)
[E] HandleIoError
├── NoData
├── GetRegs(RegisterError) ──► [A] [mem_profile]
└── Outb(HandleOutbError)
├── StackOverflow
├── GuestAborted { code, message }
├── InvalidPort(String)
├── ReadLogData(String)
├── TraceFormat(String)
├── ReadHostFunctionCall(String)
├── LockFailed(&str, u32, String)
├── WriteHostFunctionResponse(String)
├── InvalidDebugPrintChar(u32)
└── MemProfile(String) [mem_profile]
[F] HandleDebugError [gdb]
├── DebugNotEnabled
├── SendMessage(SendDbgMsgError) ──► [D]
├── ReceiveMessage(RecvDbgMsgError)
│ ├── DebugNotEnabled
│ └── RecvFailed(GdbTargetError)
└── ProcessRequest(ProcessDebugRequestError)
├── DebugNotEnabled
├── TryLockError(&str, u32)
├── Vm(VmError) ──► [C]
├── Debug(DebugError) ──► [G]
├── SwBreakpointNotFound(u64)
├── ReadMemory(DebugMemoryAccessError)
└── WriteMemory(DebugMemoryAccessError)
[G] DebugError [gdb]
├── HwBreakpointNotFound(u64)
├── TooManyHwBreakpoints(usize)
├── Register(RegisterError) ──► [A]
├── TranslateGva(u64)
└── Intercept { enable, inner: HypervisorImplError }
Partially addresses #998 (fixes the bug, but doesn't update the errors everywhere)