-
-
Notifications
You must be signed in to change notification settings - Fork 23.9k
visionOS XR module #109975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
visionOS XR module #109975
Conversation
33c345a to
adea1be
Compare
dsnopek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I skimmed the changes to XRInterface and provided some notes how these could maybe be unified with the way other platforms handle the same features. I'd personally prefer not to have virtual methods that are only for a single platform, which is something we've managed to avoid up until now
servers/xr/xr_interface.h
Outdated
| virtual Rect2i get_viewport_for_view(uint32_t p_view) { | ||
| return Rect2i(); | ||
| } /* get each view viewport rect, used for visionOS VR */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it may be like get_render_region() but needs to be different per view? If so, it might be better to add a p_view argument to get_render_region() (and provide compatibility methods in OpenXRInterfaceExtension), so that we have a single virtual method for this for all platforms, rather than get_viewport_for_view() which is just for VisionOS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to work on this next. A question: conceptually, what's the difference between get_render_region() and get_render_target_size()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_render_target_size should return the size of the render target used. The assumption is that we use the same resolution for both eyes and we use a layered texture, one layer per eye.
By default Godot will create this render target and render to each layer using the multiview extension (or equivelent), the XRInterface can however supply it's own render target texture if it requires additional setup (and since this often is part of a swapchain, updates on each frame to the destination currently in use).
We assume layers are always used due to relying on multiview, it seems stereo rendering where a single layer texture is used alongside with regions for a side by side layout was already going the way of the dodo at the time. Does Apple Vision Pro expect this capability?
get_render_region optionally returns a region within that texture that we're using so we can create dynamic resolution rendering without having to do expensive recreations of swapchains and such. This is generally only used if dynamic resolution is supported and enabled (through something like XR_META_recommended_layer_resolution).
If VisionPro just expects us to render to a fixed resolution texture, I would only use get_render_target_size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, that makes sense.
Actually, I think the proper solution for visionOS would be moving what's currently in get_viewport_for_view() to get_render_region_for_view() as @dsnopek suggested, because the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by get_render_target_size() (to account for foveation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by
get_render_target_size()(to account for foveation).
Hm, so the render region is bigger than the render target size? Or the other way around?
The render target size is supposed to be the pixel dimensions of the texture we're rendering to, and, generally, the render region is smaller. This is to allow rendering to just a subset of the texture - it's basically what you'd pass to glViewport() (in OpenGL) or vkCmdSetViewport() (in Vukan) - I don't know the Metal equivalent. I'm not a rendering expert, but I don't know that it can make sense for the viewport (as used in the aforementioned OpenGL and Vulkan functions) to be bigger than the underlying texture?
Anyway, it would be help to get some more clarification on what the Rect2i will be used for in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think part of the reason is that the region is not only used to set the viewports, but it's also used in other parts of the pipeline as viewport_size is passed at the end of draw_list_begin() to draw_graph.add_draw_list_begin() and _draw_list_start()
That's correct. The render region that is passed to the draw list is distinct from setting viewports, as it specifies the area that may be affected by any initial attachment operations, such as CLEAR, which will only affect the specified region. After that, viewports can be set to limit the area the draw calls can affect.
In Metal, that controls the renderTargetWidth and renderTargetHeight of the MTLRenderPassDescriptor. The width is extended from x=0 to the total with, since it is not a region.
And it is similar for D3D12 and Vulkan. For example, in Vulkan, it affects the render area of the subpass:
godot/drivers/vulkan/rendering_device_driver_vulkan.cpp
Lines 4689 to 4692 in b962b38
| render_pass_begin.renderArea.offset.x = p_rect.position.x; | |
| render_pass_begin.renderArea.offset.y = p_rect.position.y; | |
| render_pass_begin.renderArea.extent.width = p_rect.size.x; | |
| render_pass_begin.renderArea.extent.height = p_rect.size.y; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().
Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.
In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsnopek Yeah, if this is a concept on visionOS only, it makes sense not to add it to xr_interface. Do you have any suggestions for the cleanest way of backchannel this? Move this method to visionos_xr_interface and access it directly from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly - this probably needs some input from the rendering team.
But I will say that, if possible, you'd probably want code from the visionos platform reaching into the renderer, rather than renderer code reaching into the visionos platform. That's generally the preferred data flow, but I'm not sure if that'd be possible, or what the rendering team would say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.
In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.
I logged the viewport returned by this method. It always return the same value for both eyes.
servers/rendering/renderer_rd/forward_mobile/render_forward_mobile.cpp
Outdated
Show resolved
Hide resolved
3b24eb6 to
e82fb78
Compare
a0451d2 to
94f2dc0
Compare
| void set_head_pose_from_arkit(bool p_use_drawable); | ||
|
|
||
| public: | ||
| static StringName name() { return "visionOS"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this name() function? It doesn't seem to be any different than get_name(), other than not being virtual. If the goal is to avoid calling the virtual function, the StringName could just be put in a static variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objective was to provide the find_interface helper function, but I agree this makes more sense as a static variable. I'll update it.
static StringName name() { return "visionOS"; }
static Ref<VisionOSXRInterface> find_interface() {
return XRServer::get_singleton()->find_interface(name());
}
servers/xr/xr_interface.h
Outdated
| virtual Rect2i get_viewport_for_view(uint32_t p_view) { | ||
| return Rect2i(); | ||
| } /* get each view viewport rect, used for visionOS VR */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().
Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.
In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel
fd72e79 to
c3d534e
Compare
BastiaanOlij
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Ricardo,
I finally had some proper time to look into this, though I lost a bunch of time getting everything setup and up and running and ran afoul with an issue with my Apple developer account (sigh) so I haven't been able to do a full end to end test yet.
Did go through all the code finally, started just putting some remarks on code but gave up on that as the real stuff is all joined together.
Also this is a big brain dump after a long day of trying to figure things out so probably plenty of things that need further feedback and throwing back and forth.
Overall I think we still have some ways to go. We need to find different ways to solve a few things.
As a ground rule, platform dependent code should be kept to the platform. There is a lot of vision pro specific code that is ending up in the renderer and that won't ever be acceptable to merge. We kinda knew that already so I'm probably stating the obvious.
We can get away with some constructive solutions on the Metal driver side, and around the XRInterface implementation as they are closely linked to the platform.
While we should first get things on the correct path and see if this is still the case, I also found that there are changes here that probably deserve to be in their own PR. Especially if we do need to make changes to the certain base classes, they should be separated out so the relevant teams can review them more closely.
So I think there are 3 main areas that we can focus on first. 2 relatively small ones, one big one.
HDR output
I have to admit that I'm not up to date with the work being done in this area but I do know that @allenwp is working on various things related to output to HDR capable displays, so he may be able to weight in here if he has time.
Bottom line for me is that users know they are targeting VisionPro and can turn HDR on for the main viewport and possibly preventing conversion to the wrong color space) is a function of the display server.
This way we don't need an extra API on the XR interface nor pollute the rendering code with checks.
Sky depth changes
This one puzzled me. Does it really have an issue if the depth is at zero? Again similar thing, it should not have a VisionPro overrule in this place.
We need to either expose it in some generic way, or maybe just set the depth for Multiview as it probably can't hurt for other devices.
I would also change the vertex shader to gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0); or something to that effect, instead of manipulating gl_FragDepth or certain platforms will disable early z checks.
This is also a good example of something that should be in its own PR and clearly documented why we're making the change.
VRS/rasterization map
So finally the biggy, this one we need to drastically change. We can't have such big changes to the rendering server just for VisionPro.
The problem here is that this is all new to us, and we're still wrapping our heads around it, and don't really have the time to dedicate to it.
If I understand correctly the main difference is that when we're dealing with OpenGL/Vulkan, we render at a larger resolution which then gets lens distorted by the VR compositor and output to the device. To speed this up, we use VRS so we don't render every pixel in that large image, but duplicate the output of fragments the further from the center we get. Most mobile hardware does something smart by having tiles on the periphery cover larger areas and just upscale when writing to the image.
On VisionPro, our stored image isn't at the large resolution, I'm guessing its either on the output resolution, or something sightly enlarged to pack in the tiled results? I don't fully get whether it's just applying lens distortion early, or whether it's just smartly packing the tiles so it needs less space. Doesn't really matter i guess.
Then we have a virtual resolution which would be the actual texture as we'd render it before lens distortion without any VRS applied. Each tile is the same resolution but tiles rendered towards the outside cover a larger area then tiles closer to the center, and they are written into our output image.
If this assumption is close enough to the truth, I think we can rewrite this logic to make the rendering server completely ignorant of the virtual resolution. We don't need any new functions on the XR Interface, we toss away most of the changes on the renderer classes (maybe all), we just pretend that all that exists is the output resolution, because that is the resolution at which our images are being allocated.
The trick will be to make sure the driver knows when and what virtual resolution to apply, and with which rasterization map.
This is where Godots RID system becomes handy because the RID can point to a struct that has both the rasterization map, and the virtual resolution. Now we can use this RID as the VRS RID, and have the VRS RID being set be the trigger for the driver to know additional stuff needs to be done. We probably do need to wrap this in a texture RID to make the rendering server happy but thats no biggy.
Note that there are two modes (because Vulkan has two ways of doing density map VRS), one where you can set it per subpass, and one where its set for the whole pass. We don't need a new field, we can just reuse the existing one.
Then in the driver when the VRS RID is set, you can lookup the rasterization map and virtual resolution and apply it. The whole rendering engine needs to be none the wiser we're doing this extra logic, as long as the metal driver understands it.
I think you don't even need the new VRS format, just use XR_VRS_TEXTURE_FORMAT_FRAGMENT_DENSITY_MAP and return the above mentioned RID with get_vrs_texture.
Anyway, its way too late, I'm probably overlooking things, but in broad strokes, I think this is a direction we should consider and talk more about, and the solution should have a lot less impact on existing code than it has now.
scene/3d/xr/xr_nodes.cpp
Outdated
| } | ||
|
|
||
| if (get_near() < 0.1) { | ||
| warnings.push_back(RTR("XRCamera3D doesn't support a Near value lower than 0.1 on the visionOS platform. The scene won't be displayed.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems arbitrary, people will hold things closer to their face than 10cm and set this to a lower value on many platforms. Not sure if everyone should be bothered by a warning for this.
Does this break break stuff on visionos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In VisionOSXRInterface::get_projection_for_view, the supported min near plane is retrieved with cp_layer_renderer_capabilities_supported_minimum_near_plane_distance. But here the magic number 0.1 occurs.
I haven't found any documentation that says about the actual value returned by layer capabilities.
The value 0.1 itself is fine. Given the thickness of VP headset is roughly 5cm. But will the actual value changes if some thinner model reveals in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, @huisedenanhai is right, we cannot have a zNear value lower than cp_layer_renderer_capabilities_supported_minimum_near_plane_distance on visionOS, the API refuses to render anything if you set it so. For now, I have removed the editor warning, and added a descriptive error message explaining the issue if it happens at runtime here
scene/3d/xr/xr_nodes.cpp
Outdated
| warnings.push_back(RTR("XROrigin3D requires an XRCamera3D child node.")); | ||
| } | ||
|
|
||
| if (get_scale() != Vector3(1, 1, 1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should possible be in its own PR, this has trapped people before. I'd also add a reference to world_scale which was introduced to allow scaling of this sort and properly scales the projection data.
| PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND]; | ||
| SkyVersion version; | ||
| if (sky_scene_state.view_count > 1) { | ||
| #if defined(VISIONOS_ENABLED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a different way to solve this, having platform dependent code in the renderer is a big nono.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's the way the system is designed, as an optimization to avoid doing reprojection on parts of the screen when nothing is being rendered. I have removed this from the PR, and we can figure out solutions for the skybox and/or semi-transparent content separately.
drivers/metal/metal_objects.mm
Outdated
| } | ||
| } | ||
|
|
||
| #ifdef VISIONOS_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinks like this, I would remove the #ifdef if we can, instead just use:
if (subpass.rasterization_rate_map != nil) {
desc.rasterizationRateMap = subpass.rasterization_rate_map;
} else {
desc.renderTargetWidth = MAX((NSUInteger)MIN(render.render_area.position.x + render.render_area.size.width, fb.size.width), 1u);
desc.renderTargetHeight = MAX((NSUInteger)MIN(render.render_area.position.y + render.render_area.size.height, fb.size.height), 1u);
}
And just make sure subpass.rasterization_rate_map can't ever be set unless its supported by the platform.
This way we make the code much cleaner and we just implement the features as part of the metal implementation, and just leave it up to the platforms supporting it to use these features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. MTLRasterizationRateMap is supported on iOS 13+. It can used for some interesting optimization for mobile device. For example, reducing scene rendering rate under HUD. No need to mark it visionOS only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both. I ended up adopting @huisedenanhai proposed solution of storing the rasterizationRateMap in MDFrameBuffer.
|
Thanks for the ping, @BastiaanOlij! HDR Output@rsanchezsaez HDR output in Godot will be using Apple's Extended Dynamic Range (EDR) paradigm and adapted to work well across all platforms. Main development is happening in #94496 and will likely be merged soon, as it's a priority for 4.6. @stuartcarnie put together a rough draft prototype of how this might work on Mac OS and I have done a rebase or two that has resulted in the draft PR getting even more "hacky": #106814 Once we've merged the Windows HDR output, we will shift to making sure Mac OS is implemented correctly. We will be having a meeting on Sunday to firm up the public-facing part of the HDR output API. We're coordinating in RocketChat in the rendering chat channel and you're welcome to join if you'd like, but the focus will be specifically on the user-facing HDR API and ensuring it will be reasonable for all platforms. Does VisionOS use the same EDR and colour space paradigms as Mac OS and iOS with the same APIs? |
|
Hi @rsanchezsaez. I tried the code and it works great! I haven't look into the implementation, but I do have some suggestions from a user's perspective. Need some documentation/warning on transparent objects and backgroundWhen in mixed immersion mode, CompositorService requires zero alpha channel if depth is zero. otherwise blocky artifact occurs. This poses some issue when setting up background and transparent objects, as they don't write depth, but provide a non zero alpha value. This is what it looks like when open a existing scene directly in mixed immersion mode before adjustment. Notice the blocky grey artifacts around the character, and the transparent halo object is no displayed. That's because the background has non-zero alpha value, and the halo object does not writes depth. The following image shows how the character should look like in normal window.
Recently I got some experience on CompositorService by implementing a custom AR renderer with metal and objc. So I'm no suprised by the artifact. The solution for me it quite simple. Adjust the background alpha to zero, force the halo object as opaque, done. But this needs more clarification for normal users. Here is how my scene looks like after fix. I suspect correctly handling transparent object rendering for visionOS might worth another PR. I only suggest some notes in documentation after this PR got merged. Warning for near planeWarning for near plane does not correctly update in editor after param adjustment. This is important. In current implementation, NOTHING is shown in view if near plane is less than 0.1m. That's quite awkward. Because the default value is 0.05m.
Also is it possible to clamp the near plane to 0.1m automatically? That might be more intuitive than showing an empty space and spaming the log with warning. |
In case it helps someone else, these were the settings I changed based on the original demo used at the top of this PR to fix the same problem. It's not exactly @huisedenanhai's solution and likely has unimportant additions, but I'm just confirming that changes like this would be needed to remove the blocky halo effect from Mixed immersion experiences. I'm not as familiar with the inner workings of CompositorService, but if something like this does make it into the documentation direct instructions akin to the following could be helpful: world environment settings: I've been playing around with visionOS development and identified a couple potential errors, but they seem obscure and situational enough that I'm planning on waiting until after this PR is merged to make issues for them. All in all, great work @rsanchezsaez! It's been amazing experimenting with visionOS experiences from this PR. |
servers/rendering/rendering_server.h
Outdated
| VIEWPORT_VRS_DISABLED, | ||
| VIEWPORT_VRS_TEXTURE, | ||
| VIEWPORT_VRS_XR, | ||
| VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.
doc/classes/RenderingServer.xml
Outdated
| Variable rate shading texture is supplied by the primary [XRInterface]. Note that this may override the update mode. | ||
| </constant> | ||
| <constant name="VIEWPORT_VRS_MAX" value="3" enum="ViewportVRSMode"> | ||
| <constant name="VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP" value="3" enum="ViewportVRSMode"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum seems no longer used
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_DISABLED); | ||
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_TEXTURE); | ||
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR); | ||
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.
platform/visionos/app_visionos.swift
Outdated
|
|
||
| let options: LayerRenderer.Capabilities.SupportedLayoutsOptions = foveationEnabled ? [.foveationEnabled] : [] | ||
| let supportedLayouts = capabilities.supportedLayouts(options: options) | ||
| let layout: LayerRenderer.Layout = supportedLayouts.contains(.layered) ? .layered : .dedicated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be device that does not support layered mode?
I don't see the need to support layout other than .layered.
Also current implementation crashes when I force the layout to be .dedicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. .layered is the recommended mode, and new visionOS 26 feature need this mode. Hence, I have changed this file to hardcode .layered.
servers/xr/xr_interface.h
Outdated
| virtual Rect2i get_viewport_for_view(uint32_t p_view) { | ||
| return Rect2i(); | ||
| } /* get each view viewport rect, used for visionOS VR */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.
In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.
I logged the viewport returned by this method. It always return the same value for both eyes.
| virtual void initialize() = 0; | ||
| virtual void begin_frame(double frame_step) = 0; | ||
|
|
||
| virtual Error prepare_screen_for_drawing(DisplayServer::WindowID p_screen) { return OK; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a simple wrapper for RD::get_singleton()->screen_prepare_for_drawing(p_screen). Seems redundent to me. Just call the inner method when needed. No need to add a virtual method to the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, furthermore, I've moved this call to VisionOSXRInterface::post_draw_viewport(), to avoid platform-specific code in the main rendering path.
| PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND]; | ||
| SkyVersion version; | ||
| if (sky_scene_state.view_count > 1) { | ||
| #if defined(VISIONOS_ENABLED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?
| frag_color.rgb += interleaved_gradient_noise(gl_FragCoord.xy) * params.luminance_multiplier; | ||
| #endif | ||
| #ifdef WRITE_DEPTH | ||
| gl_FragDepth = SKY_DEPTH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Don't adjust depth in fragment shader. That disables early-Z test. Very inefficient when user have heavy sky shader.
If you want to apply a global depth offset, you can adjust z coordinate in vertex shader.
In vertex shader, change the line
gl_Position = vec4(uv_interp, 0.0, 1.0);to
gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted this for now for simplicity, and we can discuss other solutions separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed all codes. The implementation is good IMO. Great thanks to @rsanchezsaez. This PR uses some uncommon Apple device features, which has sparked some discussion.
MTLRasterizationRateMap
Rendering XR on Vision Pro is pretty similar to other devices. The Vision Pro gives you a texture array for color and a texture array for depth. That's all you need if you're not using foveated rendering.
The main difference is how Vision Pro handles foveated rendering using VRS (Variable Rate Shading). It uses MTLRasterizationRateMap, which is Apple's specific way of implementing VRS.
VRS comes in various forms on different platforms. Some devices provide a texture map to control the shading rate, but that only adjusts the shading rate; you still need to allocate a full-resolution buffer for pixel storage.
Apple takes this a step further. Why allocate extra pixels if they won't be rendered? Apple lets you have a distorted framebuffer, with the distortion controlled by MTLRasterizationRateMap. You can allocate a smaller, regular physical texture, where each row and column of this physical texture can have varying logical sizes on the screen. This is why a viewport can be bigger than the physical framebuffer size.
For example, let's say you want to render to a 5x5 screen, but you only expect high density in the center. You can only allocate a 3x3 texture, then stretch the border pixels to cover the whole screen.
This way of VRS reduces both computation cost and memory consumption.
This solution has its own limitation. Your physical texture is still a regular grid. The MTLRasterizationRateMap only controls the logical sizes of the columns and rows. Therefore, the shading rate configuration is not as flexible as a shading rate map.
This is NOT a new feature. It is introduced since iOS 13. I think it's below godot's current minimun supported iOS version. It is available on nearly all apple devices, which actually can have more usage other than foveated rendering.
Current usage of MTLRasterizationRateMap seems hacky. It have no analog on other platform. RenderingDevice does not have an abstraction for it. @BastiaanOlij suggests to combine the physical texture and MTLRasterizationRateMap as a single texture, and hide all these details behind rendering device. I don't think this is a good idea. You WILL need explicit access to the MTLRasterizationRateMap to decode distortion in shader, especially when you want to do some screen space effect.
If RenderingDevice does not provide special construct to MTLRasterizationRateMap, I prefer the current implementation, which wraps MTLRasterizationRateMap as a special kind of texture. MTLRasterizationRateMap itself is not texture. It has its own type and APIs to be accessed in metal shader. If you want first class support for it in shader, you will need to modify the whole shader translation stack to emit correct code. That is way too much work. Instead, you can pre-decode MTLRasterizationRateMap as a LUT, with size of 2 x max(logical_width, logical_height), using a tiny metal compute shader. The LUT stores mapped physical texture coordinate from logical coordinate, one row for each xy axis. Accessing it in shader will be the same as reading a texture. I think this will be a good enough abstraction given limited time and resources.
Depth Output
CompositorService requires non-zero depth for non-zero alpha pixels. The simplest way to address this is to add an extra full-screen pass that globally sets the depth to a non-zero value. However, this approach requires further discussion.
If you perform a Metal frame capture in Xcode, you'll find some extra compute dispatches at the end of the frame. I haven't reverse-engineered what it's actually doing, but judging by the intermediate texture outputs, it looks like some kind of pixel binning.
My guess: CompositorService groups zero depth pixels together to accelerate passthrough. Zero depth tile => full passthrough with no blending. This can explain the blocky artifacts when depth and alpha are not correctly configured.
If my guess is true, we can't simply mark the entire depth buffer as non-zero. We might need a more sophisticated approach that only marks the non-empty pixels to achieve the best performance.
That said, maybe we should fix the artifact first with an additional full-screen draw, and improve performance later?
Conclusion
This PR provides a solid foundation for visionOS XR rendering support. While it does introduce some new design challenges, these problems are inherent to the device. Therefore, I suggest we merge this PR first and address other problems in subsequent work. These problems are much easier to solve than I thought to be. We do not need a seperate following PR to address them. I proposed some solutions to these problems, see my comments below.
| do not rely on the user providing most of these settings (though enhancing this with auto detection features | ||
| based on the device we're running on would be cool). I'm mostly adding this as an example or base plate for | ||
| more advanced interfaces. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated comment in modules/mobile_vr/mobile_vr_interface.h
|
@dsnopek You are right. I've focused too much on the handling of rendering. I think the biggest problem is the RenderingDevice. VisionOS support can not be implemented elegantly until metal rendering device is aware of MTLRasterizationRateMap. I concluded the workaround in this PR is fine given the current state of RenderingDevice. For the interface part, I feel the implementation can be adjusted to avoid modify the XRInterface entirely. As I pointed out in related comment, the p_view argument for get_viewport is actually not required on visionOS. We should be able to use
To be honest I don't think visionOS XR support can be added without VRS support. When foveated rendering is disabled, the rendering result looks too jaggy for a $3499 device, as the CompositorService will only give you a texture with relatively low resolution :( |
|
I'll leave it up to @rsanchezsaez to give feedback on what is, and isn't possible on Metal, I am not an expert on Metal. But we can't introduce an invasive API change just for one platform. It will create a massive burden for maintainers of the rendering engine, many of which don't even have access to a visionPro to test if changes don't break things. We already have things in the engine that are causing us issues here, but which we can justify because they target architecture decisions prevalent within the industry. We must find a way to hide implementation details such as these either on the platform side or find a really good way to abstract this. Metal in this case is the odd duck. If that introduces limitations in say, post processes, I can live with that. Longer term thinking there has been a plan to abstract out |
|
I pushed a working implementation that removes those invasive API change introduced in this PR. I have made a PR to rsanchezsaez#2 . @dsnopek @BastiaanOlij you can take a look and see if that solves your concern on base class API change. |
|
I just find out the sky modification in this PR is NOT needed. All these modification is to apply a small depth to screen. The same thing can be achieved with a custom post processing gdshader. You can draw a full screen quad that adjusts depth following the tutorial. I put my working shader below in case anyone is interested. shader_type spatial;
render_mode unshaded, fog_disabled, depth_draw_always, blend_mix;
#define DEPTH_BIAS 0.00000001
void vertex() {
POSITION = vec4(VERTEX.xy, DEPTH_BIAS, 1.0);
}
void fragment() {
ALBEDO = vec3(0,0,0);
ALPHA = 0.0;
}This is a more general solution. It also works for background with clear color. It does not require any engine source change. Now I think all problems introduced in this PR have a proper solution.
I pushed a PR rsanchezsaez#2 that encapsulates MTLRasterizationRateMap as a special kind of VRS texture. All ad hoc methods and fields like
This is absolutely unnecessary. As user can already do the same thing with custom post processing. Modification of sky in this PR should be reverted. |
|
There is another question, which is about HDR. I haven't look at HDR implementation in Godot. But in my experience, HDR is not a feature that should always be enabled. This PR forcefully enables HDR on visionOS. @allenwp Will there be a feature toggle for HDR? If that's true, visionOS should also response to user's HDR setting. |
|
@BastiaanOlij @huisedenanhai @dsnopek @CCranney Thanks a lot for the in-depth reviews, for the comments detailing your experiments, and for all the details! I agree with the desire to have as little platform specific code as possible in the common rendering path, and I also agree about trying not to have @huisedenanhai Thanks for providing a PR with some of the suggested improvements! I'll take a deeper look at all the provided details and code this coming week, and I'll strive to provide an updated PR soon. I'll also look into splitting smaller independent improvements such as new warnings into separate PRs. |
|
@huisedenanhai Thanks for all your detailed experiments, all your comments, and for your PR. I looked at your patch, and it looks great! It's a very good approach to avoid adding visionOS specific logic to the main render path, and to However, I ran into a small problem. When running a visionOS immersive project with your patch in debug mode, the project breakpoints in this I tried setting these spoofed flags to the fake texture wrapping the but that didn't help. Do you know a good way of fixing this issue?
|
|
Oops, I forgot to check debug build. I used |
The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.
To use the visionOS XR module you must set the new 'application/app_role' export setting to Immersive. You can choose if you want passthrough or not by the new 'application/immersion_style' export option.
Then, initialize the visionOS VR module in a script:
```
var interface = XRServer.find_interface("visionOS")
if interface and interface.initialize():
var viewport : Viewport = get_viewport()
viewport.use_xr = true
viewport.vrs_mode = Viewport.VRS_XR
viewport.use_hdr_2d = true
```
Implementation details:
- The visionOS platform now has two different execution paths implemented by the `GodotWindowScene` and `CompositorServicesImmersiveSpace` scenes in `app_visionos.swift`. The `application/app_role` export setting controls which scene is used.
- The visionOS VR interface tries to be as close to the OpenXR interface as possible, to keep main renderer code changes to a minimum. It adopts Compositor Services and ARKit APIs, which is how you render Metal content on visionOS.
- `XRInterface` has these new methods to spport the platform: `get_viewports_are_hdr()` (which avoids the sRGB conversion step on the tonemapper), `get_viewport_for_view()` (which provides individual viewports for each eye, which are larger than the render area when foveation is on).
- We obtain and set the head pose twice, once in `process()` so scripts can use it if needed, and another in `pre_render()`, so the pose is more accurate for rendering.
- The projection matrices returned by visionOS have an inverse depth correction applied (visionOS uses the [0, 1] z space, but Godot expects the [-1, 1] z space until the rendering step).
- The `rasterizationRateMap` (the structure that supports foveation on visionOS) is provided through the `get_vrs_texture()` function, using the new `XR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAP` texture type. It's' passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.
- The Metal driver now needs to be able to set several viewports (one for each eye), so the DrawListInstructions `TYPE_SET_VIEWPORT` and TYPE_SET_SCISSOR have been replaced by `TYPE_SET_VIEWPORTS` and `TYPE_SET_SCISSORS`. If you pass a single viewport to the functions, they work as it used to.
- The skybox shader has a new `SKY_VERSION_BACKGROUND_MULTIVIEW_WRITE_DEPTH` variant, because visionOS in immersive mode needs to write appropriate depth values.
- Apple Vision Pro's' minimum supported near plane is `0.1`. There's a new editor warning emited by `XRCamera3D` if you set it to a lower value.
- `XRCamera3D` also emits a warning if you change the FOV, as this has no effect on XR platforms. They use the platforms FOV.
- `XROrigin3D` emits a new editor warning if you change the scale, as it causes the rendered depth texture values to be incongruent to what XR platforms expect to perform reprojection (this applies to OpenXR platforms as well).
- The Metal driver has a new dummy `SurfaceCompositorServices`, which replaces `SurfaceLayer` when running in immersive mode. The reason for this is that the Compositor Services API needs to do a `cp_drawable_encode_present()` step with the `MTLCommandBuffer` used by the renderer, and this seemed the most natural way of overriding the `present()` call normally done by the Metal driver.
That's correct: HDR should be optional. It can be difficult for a game developer to author correctly balanced bright scenes with unbounded HDR output enabled, so I believe there is still significant value to SDR scenes, even in VR. In the future, we may add an option to Additionally, I'm not sure, but will it be possible to use visionOS with HDR 2D disabled? If so, that's another reason that SDR is needed; HDR requires HDR 2D to be enabled, so if we want to support HDR 2D disabled, we need to also support SDR mode. BUT, I could imagine that after the new (Also, three weeks ago we firmed up the public API for the HDR output PR, which can be found in #94496; viewing the docs changes of the commit is a nice way to see the public API.) |
…pple/visionos-xr-apple
…pple/visionos-xr-apple
69e2842 to
c5302de
Compare
|
Apologies for the extended delay in updating this PR. I expect we'll be able to iterate on this PR in a quicker manner from now on. @huisedenanhai Thanks so much for your suggestions, and your proposed solutions. I have pushed a new version of this PR adopting most of your proposed @BastiaanOlij @dsnopek I agree with @huisedenanhai that @BastiaanOlij With this push, I think we have addressed your big three areas of concern:
@CCranney Thank you for your experiments and additional details in testing this code. @allenwp Thanks for your thoughts on HDR. Metal content rendered on visionOS through Compositor Services must be HDR, so in a way HDR cannot be turned off. For now, I think it's ok to not do anything special in this PR, and just recommend enabling I think I have addressed everybody comments. If anybody feels your comment was not sufficiently addressed, please feel free to respond again, and I'll do my best to address it and/or iterate further. |
…ing() call from renderer_viewport Instead, call it on VisionOSXRInterface::post_draw_viewport()



PR co-authored by @huisedenanhai
Dear Godot community,
This PR is a followup to #105628 and #109974 (I encourage you to read the descriptions of those PR to get the full context of this contribution).
This change adds a visionOS XR module to Godot, which allows building immersive experiences on visionOS.
This PR builds on top of #109974, so that one should be merged first.
As always, we have tried to align with Godot's quality, style, and engine direction, to minimize any disruption coming from these changes. We'll be more than happy to work with the community on iterating this patch.
cc @BastiaanOlij @stuartcarnie @bruvzg @clayjohn
Technical Discussion
I have developed and tested these changes using Xcode 26.
The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.
To use the visionOS XR module you must set the new
application/app_roleexport setting toImmersive. You can choose if you want passthrough or not by the newapplication/immersion_styleexport setting.Then, initialize the visionOS XR module in a script:
Some implementation details:
Implementation details:
GodotWindowSceneandCompositorServicesImmersiveSpacescenes inapp_visionos.swift. Theapplication/app_roleexport setting controls which scene is used.XRInterfacehas these new methods to spport the platform:get_viewports_are_hdr()(which avoids the sRGB conversion step on the tonemapper),get_viewport_for_view()(which provides individual viewports for each eye, which are larger than the render area when foveation is on).process()so scripts can use it if needed, and another inpre_render(), so the pose is more accurate for rendering.rasterizationRateMap(the structure that supports foveation on visionOS) is provided through theget_vrs_texture()function, using the newXR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAPtexture type. It's' passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.TYPE_SET_VIEWPORTand TYPE_SET_SCISSOR have been replaced byTYPE_SET_VIEWPORTSandTYPE_SET_SCISSORS. If you pass a single viewport to the functions, they work as it used to.SKY_VERSION_BACKGROUND_MULTIVIEW_WRITE_DEPTHvariant, because visionOS in immersive mode needs to write appropriate depth values.0.1. There's a new editor warning emited byXRCamera3Dif you set it to a lower value.XROrigin3Demits a new editor warning if you change the scale, as it causes the rendered depth texture values to be incongruent to what XR platforms expect to perform reprojection (this applies to OpenXR platforms as well).SurfaceCompositorServices, which replacesSurfaceLayerwhen running in immersive mode. The reason for this is that the Compositor Services API needs to do acp_drawable_encode_present()step with theMTLCommandBufferused by the renderer, and this seemed the most natural way of overriding thepresent()call normally done by the Metal driver. If anybody has a better suggestion, we'd be happy to explore that, and we could completely remove theSurfaceCompositorServicesclass.Testing
We have been testing this PR in windowed mode with the Platformer demo project. We have verified the project continues to work on iOS and visionOS, both with the Mobile and Forward+ renderers using the Metal rendering driver.
I'm also including a minimum example of a project rendering an immersive scene on visionOS. Currently the example has no interactivity, but you can move your head and look around the scene.
It looks like this:
ScreenRecording_08-25-2025.16-14-23_1_out.mp4
Missing Functionality
Next steps
We're working on hand input and PSVR2 controller support next. Once that's done, we believe porting any existing VR games to visionOS would be relatively easy.