Skip to content

Conversation

@rsanchezsaez
Copy link
Contributor

@rsanchezsaez rsanchezsaez commented Aug 26, 2025

PR co-authored by @huisedenanhai

Dear Godot community,

This PR is a followup to #105628 and #109974 (I encourage you to read the descriptions of those PR to get the full context of this contribution).

This change adds a visionOS XR module to Godot, which allows building immersive experiences on visionOS.

This PR builds on top of #109974, so that one should be merged first.

As always, we have tried to align with Godot's quality, style, and engine direction, to minimize any disruption coming from these changes. We'll be more than happy to work with the community on iterating this patch.

cc @BastiaanOlij @stuartcarnie @bruvzg @clayjohn

Technical Discussion

I have developed and tested these changes using Xcode 26.

The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.

To use the visionOS XR module you must set the new application/app_role export setting to Immersive. You can choose if you want passthrough or not by the new application/immersion_style export setting.

Then, initialize the visionOS XR module in a script:

    var interface = XRServer.find_interface("visionOS")
    if interface and interface.initialize():
        var viewport : Viewport = get_viewport()
        viewport.use_xr = true
        viewport.vrs_mode = Viewport.VRS_XR

Some implementation details:
Implementation details:

  • The visionOS platform now has two different execution paths implemented by the GodotWindowScene and CompositorServicesImmersiveSpace scenes in app_visionos.swift. The application/app_role export setting controls which scene is used.
  • The visionOS VR interface tries to be as close to the OpenXR interface as possible, to keep main renderer code changes to a minimum. It adopts Compositor Services and ARKit APIs, which is how you render Metal content on visionOS.
  • XRInterface has these new methods to spport the platform: get_viewports_are_hdr() (which avoids the sRGB conversion step on the tonemapper), get_viewport_for_view() (which provides individual viewports for each eye, which are larger than the render area when foveation is on).
  • We obtain and set the head pose twice, once in process() so scripts can use it if needed, and another in pre_render(), so the pose is more accurate for rendering.
  • The projection matrices returned by visionOS have an inverse depth correction applied (visionOS uses the [0, 1] z space, but Godot expects the [-1, 1] z space until the rendering step).
  • The rasterizationRateMap (the structure that supports foveation on visionOS) is provided through the get_vrs_texture() function, using the new XR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAP texture type. It's' passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.
  • The Metal driver now needs to be able to set several viewports (one for each eye), so the DrawListInstructions TYPE_SET_VIEWPORT and TYPE_SET_SCISSOR have been replaced by TYPE_SET_VIEWPORTS and TYPE_SET_SCISSORS. If you pass a single viewport to the functions, they work as it used to.
  • The skybox shader has a new SKY_VERSION_BACKGROUND_MULTIVIEW_WRITE_DEPTH variant, because visionOS in immersive mode needs to write appropriate depth values.
  • Apple Vision Pro's' minimum supported near plane is 0.1. There's a new editor warning emited by XRCamera3D if you set it to a lower value.
  • XROrigin3D emits a new editor warning if you change the scale, as it causes the rendered depth texture values to be incongruent to what XR platforms expect to perform reprojection (this applies to OpenXR platforms as well).
  • The Metal driver has a new dummy SurfaceCompositorServices, which replaces SurfaceLayer when running in immersive mode. The reason for this is that the Compositor Services API needs to do a cp_drawable_encode_present() step with the MTLCommandBuffer used by the renderer, and this seemed the most natural way of overriding the present() call normally done by the Metal driver. If anybody has a better suggestion, we'd be happy to explore that, and we could completely remove the SurfaceCompositorServices class.

Testing

We have been testing this PR in windowed mode with the Platformer demo project. We have verified the project continues to work on iOS and visionOS, both with the Mobile and Forward+ renderers using the Metal rendering driver.

I'm also including a minimum example of a project rendering an immersive scene on visionOS. Currently the example has no interactivity, but you can move your head and look around the scene.

It looks like this:

ScreenRecording_08-25-2025.16-14-23_1_out.mp4

Missing Functionality

  • Hand input and PSVR2 controller support have not been implemented yet. Only game controller input are supported.
  • High quality video captures as detailed here are not currently supported. We are preparing a followup PR to support this.

Next steps

We're working on hand input and PSVR2 controller support next. Once that's done, we believe porting any existing VR games to visionOS would be relatively easy.

Copy link
Contributor

@dsnopek dsnopek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I skimmed the changes to XRInterface and provided some notes how these could maybe be unified with the way other platforms handle the same features. I'd personally prefer not to have virtual methods that are only for a single platform, which is something we've managed to avoid up until now

Comment on lines 144 to 147
virtual Rect2i get_viewport_for_view(uint32_t p_view) {
return Rect2i();
} /* get each view viewport rect, used for visionOS VR */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it may be like get_render_region() but needs to be different per view? If so, it might be better to add a p_view argument to get_render_region() (and provide compatibility methods in OpenXRInterfaceExtension), so that we have a single virtual method for this for all platforms, rather than get_viewport_for_view() which is just for VisionOS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to work on this next. A question: conceptually, what's the difference between get_render_region() and get_render_target_size()?

Copy link
Contributor

@BastiaanOlij BastiaanOlij Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_render_target_size should return the size of the render target used. The assumption is that we use the same resolution for both eyes and we use a layered texture, one layer per eye.

By default Godot will create this render target and render to each layer using the multiview extension (or equivelent), the XRInterface can however supply it's own render target texture if it requires additional setup (and since this often is part of a swapchain, updates on each frame to the destination currently in use).

We assume layers are always used due to relying on multiview, it seems stereo rendering where a single layer texture is used alongside with regions for a side by side layout was already going the way of the dodo at the time. Does Apple Vision Pro expect this capability?

get_render_region optionally returns a region within that texture that we're using so we can create dynamic resolution rendering without having to do expensive recreations of swapchains and such. This is generally only used if dynamic resolution is supported and enabled (through something like XR_META_recommended_layer_resolution).

If VisionPro just expects us to render to a fixed resolution texture, I would only use get_render_target_size.

Copy link
Contributor Author

@rsanchezsaez rsanchezsaez Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, that makes sense.

Actually, I think the proper solution for visionOS would be moving what's currently in get_viewport_for_view() to get_render_region_for_view() as @dsnopek suggested, because the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by get_render_target_size() (to account for foveation).

Copy link
Contributor

@dsnopek dsnopek Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by get_render_target_size() (to account for foveation).

Hm, so the render region is bigger than the render target size? Or the other way around?

The render target size is supposed to be the pixel dimensions of the texture we're rendering to, and, generally, the render region is smaller. This is to allow rendering to just a subset of the texture - it's basically what you'd pass to glViewport() (in OpenGL) or vkCmdSetViewport() (in Vukan) - I don't know the Metal equivalent. I'm not a rendering expert, but I don't know that it can make sense for the viewport (as used in the aforementioned OpenGL and Vulkan functions) to be bigger than the underlying texture?

Anyway, it would be help to get some more clarification on what the Rect2i will be used for in this case

Copy link
Contributor

@stuartcarnie stuartcarnie Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the reason is that the region is not only used to set the viewports, but it's also used in other parts of the pipeline as viewport_size is passed at the end of draw_list_begin() to draw_graph.add_draw_list_begin() and _draw_list_start()

That's correct. The render region that is passed to the draw list is distinct from setting viewports, as it specifies the area that may be affected by any initial attachment operations, such as CLEAR, which will only affect the specified region. After that, viewports can be set to limit the area the draw calls can affect.

In Metal, that controls the renderTargetWidth and renderTargetHeight of the MTLRenderPassDescriptor. The width is extended from x=0 to the total with, since it is not a region.

And it is similar for D3D12 and Vulkan. For example, in Vulkan, it affects the render area of the subpass:

render_pass_begin.renderArea.offset.x = p_rect.position.x;
render_pass_begin.renderArea.offset.y = p_rect.position.y;
render_pass_begin.renderArea.extent.width = p_rect.size.x;
render_pass_begin.renderArea.extent.height = p_rect.size.y;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().

Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.

In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsnopek Yeah, if this is a concept on visionOS only, it makes sense not to add it to xr_interface. Do you have any suggestions for the cleanest way of backchannel this? Move this method to visionos_xr_interface and access it directly from there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly - this probably needs some input from the rendering team.

But I will say that, if possible, you'd probably want code from the visionos platform reaching into the renderer, rather than renderer code reaching into the visionos platform. That's generally the preferred data flow, but I'm not sure if that'd be possible, or what the rendering team would say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.

In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.

I logged the viewport returned by this method. It always return the same value for both eyes.

@rsanchezsaez rsanchezsaez requested a review from a team as a code owner August 27, 2025 02:09
@rsanchezsaez rsanchezsaez changed the title visionOS VR module visionOS XR module Sep 17, 2025
@rsanchezsaez rsanchezsaez requested review from a team as code owners October 2, 2025 22:33
@rsanchezsaez rsanchezsaez force-pushed the apple/visionos-xr branch 2 times, most recently from a0451d2 to 94f2dc0 Compare October 4, 2025 01:59
void set_head_pose_from_arkit(bool p_use_drawable);

public:
static StringName name() { return "visionOS"; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this name() function? It doesn't seem to be any different than get_name(), other than not being virtual. If the goal is to avoid calling the virtual function, the StringName could just be put in a static variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objective was to provide the find_interface helper function, but I agree this makes more sense as a static variable. I'll update it.

	static StringName name() { return "visionOS"; }
	static Ref<VisionOSXRInterface> find_interface() {
		return XRServer::get_singleton()->find_interface(name());
	}

Comment on lines 144 to 147
virtual Rect2i get_viewport_for_view(uint32_t p_view) {
return Rect2i();
} /* get each view viewport rect, used for visionOS VR */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().

Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.

In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel

Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ricardo,

I finally had some proper time to look into this, though I lost a bunch of time getting everything setup and up and running and ran afoul with an issue with my Apple developer account (sigh) so I haven't been able to do a full end to end test yet.

Did go through all the code finally, started just putting some remarks on code but gave up on that as the real stuff is all joined together.
Also this is a big brain dump after a long day of trying to figure things out so probably plenty of things that need further feedback and throwing back and forth.

Overall I think we still have some ways to go. We need to find different ways to solve a few things.
As a ground rule, platform dependent code should be kept to the platform. There is a lot of vision pro specific code that is ending up in the renderer and that won't ever be acceptable to merge. We kinda knew that already so I'm probably stating the obvious.
We can get away with some constructive solutions on the Metal driver side, and around the XRInterface implementation as they are closely linked to the platform.

While we should first get things on the correct path and see if this is still the case, I also found that there are changes here that probably deserve to be in their own PR. Especially if we do need to make changes to the certain base classes, they should be separated out so the relevant teams can review them more closely.

So I think there are 3 main areas that we can focus on first. 2 relatively small ones, one big one.

HDR output

I have to admit that I'm not up to date with the work being done in this area but I do know that @allenwp is working on various things related to output to HDR capable displays, so he may be able to weight in here if he has time.

Bottom line for me is that users know they are targeting VisionPro and can turn HDR on for the main viewport and possibly preventing conversion to the wrong color space) is a function of the display server.

This way we don't need an extra API on the XR interface nor pollute the rendering code with checks.

Sky depth changes

This one puzzled me. Does it really have an issue if the depth is at zero? Again similar thing, it should not have a VisionPro overrule in this place.
We need to either expose it in some generic way, or maybe just set the depth for Multiview as it probably can't hurt for other devices.
I would also change the vertex shader to gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0); or something to that effect, instead of manipulating gl_FragDepth or certain platforms will disable early z checks.

This is also a good example of something that should be in its own PR and clearly documented why we're making the change.

VRS/rasterization map

So finally the biggy, this one we need to drastically change. We can't have such big changes to the rendering server just for VisionPro.

The problem here is that this is all new to us, and we're still wrapping our heads around it, and don't really have the time to dedicate to it.

If I understand correctly the main difference is that when we're dealing with OpenGL/Vulkan, we render at a larger resolution which then gets lens distorted by the VR compositor and output to the device. To speed this up, we use VRS so we don't render every pixel in that large image, but duplicate the output of fragments the further from the center we get. Most mobile hardware does something smart by having tiles on the periphery cover larger areas and just upscale when writing to the image.

On VisionPro, our stored image isn't at the large resolution, I'm guessing its either on the output resolution, or something sightly enlarged to pack in the tiled results? I don't fully get whether it's just applying lens distortion early, or whether it's just smartly packing the tiles so it needs less space. Doesn't really matter i guess.
Then we have a virtual resolution which would be the actual texture as we'd render it before lens distortion without any VRS applied. Each tile is the same resolution but tiles rendered towards the outside cover a larger area then tiles closer to the center, and they are written into our output image.

If this assumption is close enough to the truth, I think we can rewrite this logic to make the rendering server completely ignorant of the virtual resolution. We don't need any new functions on the XR Interface, we toss away most of the changes on the renderer classes (maybe all), we just pretend that all that exists is the output resolution, because that is the resolution at which our images are being allocated.

The trick will be to make sure the driver knows when and what virtual resolution to apply, and with which rasterization map.

This is where Godots RID system becomes handy because the RID can point to a struct that has both the rasterization map, and the virtual resolution. Now we can use this RID as the VRS RID, and have the VRS RID being set be the trigger for the driver to know additional stuff needs to be done. We probably do need to wrap this in a texture RID to make the rendering server happy but thats no biggy.

Note that there are two modes (because Vulkan has two ways of doing density map VRS), one where you can set it per subpass, and one where its set for the whole pass. We don't need a new field, we can just reuse the existing one.
Then in the driver when the VRS RID is set, you can lookup the rasterization map and virtual resolution and apply it. The whole rendering engine needs to be none the wiser we're doing this extra logic, as long as the metal driver understands it.

I think you don't even need the new VRS format, just use XR_VRS_TEXTURE_FORMAT_FRAGMENT_DENSITY_MAP and return the above mentioned RID with get_vrs_texture.

Anyway, its way too late, I'm probably overlooking things, but in broad strokes, I think this is a direction we should consider and talk more about, and the solution should have a lot less impact on existing code than it has now.

}

if (get_near() < 0.1) {
warnings.push_back(RTR("XRCamera3D doesn't support a Near value lower than 0.1 on the visionOS platform. The scene won't be displayed."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems arbitrary, people will hold things closer to their face than 10cm and set this to a lower value on many platforms. Not sure if everyone should be bothered by a warning for this.

Does this break break stuff on visionos?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In VisionOSXRInterface::get_projection_for_view, the supported min near plane is retrieved with cp_layer_renderer_capabilities_supported_minimum_near_plane_distance. But here the magic number 0.1 occurs.

I haven't found any documentation that says about the actual value returned by layer capabilities.

The value 0.1 itself is fine. Given the thickness of VP headset is roughly 5cm. But will the actual value changes if some thinner model reveals in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @huisedenanhai is right, we cannot have a zNear value lower than cp_layer_renderer_capabilities_supported_minimum_near_plane_distance on visionOS, the API refuses to render anything if you set it so. For now, I have removed the editor warning, and added a descriptive error message explaining the issue if it happens at runtime here

warnings.push_back(RTR("XROrigin3D requires an XRCamera3D child node."));
}

if (get_scale() != Vector3(1, 1, 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should possible be in its own PR, this has trapped people before. I'd also add a reference to world_scale which was introduced to allow scaling of this sort and properly scales the projection data.

PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND];
SkyVersion version;
if (sky_scene_state.view_count > 1) {
#if defined(VISIONOS_ENABLED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a different way to solve this, having platform dependent code in the renderer is a big nono.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's the way the system is designed, as an optimization to avoid doing reprojection on parts of the screen when nothing is being rendered. I have removed this from the PR, and we can figure out solutions for the skybox and/or semi-transparent content separately.

}
}

#ifdef VISIONOS_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinks like this, I would remove the #ifdef if we can, instead just use:

	if (subpass.rasterization_rate_map != nil) {
		desc.rasterizationRateMap = subpass.rasterization_rate_map;
	} else {
		desc.renderTargetWidth = MAX((NSUInteger)MIN(render.render_area.position.x + render.render_area.size.width, fb.size.width), 1u);
		desc.renderTargetHeight = MAX((NSUInteger)MIN(render.render_area.position.y + render.render_area.size.height, fb.size.height), 1u);
	}

And just make sure subpass.rasterization_rate_map can't ever be set unless its supported by the platform.
This way we make the code much cleaner and we just implement the features as part of the metal implementation, and just leave it up to the platforms supporting it to use these features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. MTLRasterizationRateMap is supported on iOS 13+. It can used for some interesting optimization for mobile device. For example, reducing scene rendering rate under HUD. No need to mark it visionOS only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both. I ended up adopting @huisedenanhai proposed solution of storing the rasterizationRateMap in MDFrameBuffer.

@allenwp
Copy link
Contributor

allenwp commented Oct 14, 2025

Thanks for the ping, @BastiaanOlij!

HDR Output

@rsanchezsaez HDR output in Godot will be using Apple's Extended Dynamic Range (EDR) paradigm and adapted to work well across all platforms. Main development is happening in #94496 and will likely be merged soon, as it's a priority for 4.6. @stuartcarnie put together a rough draft prototype of how this might work on Mac OS and I have done a rebase or two that has resulted in the draft PR getting even more "hacky": #106814 Once we've merged the Windows HDR output, we will shift to making sure Mac OS is implemented correctly.

We will be having a meeting on Sunday to firm up the public-facing part of the HDR output API. We're coordinating in RocketChat in the rendering chat channel and you're welcome to join if you'd like, but the focus will be specifically on the user-facing HDR API and ensuring it will be reasonable for all platforms.

Does VisionOS use the same EDR and colour space paradigms as Mac OS and iOS with the same APIs?

@huisedenanhai
Copy link
Contributor

Hi @rsanchezsaez. I tried the code and it works great! I haven't look into the implementation, but I do have some suggestions from a user's perspective.

Need some documentation/warning on transparent objects and background

When in mixed immersion mode, CompositorService requires zero alpha channel if depth is zero. otherwise blocky artifact occurs.

This poses some issue when setting up background and transparent objects, as they don't write depth, but provide a non zero alpha value.

This is what it looks like when open a existing scene directly in mixed immersion mode before adjustment.

IMG_0078

Notice the blocky grey artifacts around the character, and the transparent halo object is no displayed. That's because the background has non-zero alpha value, and the halo object does not writes depth. The following image shows how the character should look like in normal window.

Screenshot 2025-10-18 at 16 25 14

Recently I got some experience on CompositorService by implementing a custom AR renderer with metal and objc. So I'm no suprised by the artifact. The solution for me it quite simple. Adjust the background alpha to zero, force the halo object as opaque, done. But this needs more clarification for normal users.

Here is how my scene looks like after fix.

IMG_0079

I suspect correctly handling transparent object rendering for visionOS might worth another PR. I only suggest some notes in documentation after this PR got merged.

Warning for near plane

Warning for near plane does not correctly update in editor after param adjustment. This is important. In current implementation, NOTHING is shown in view if near plane is less than 0.1m. That's quite awkward. Because the default value is 0.05m.

Screenshot 2025-10-18 at 16 40 59

Also is it possible to clamp the near plane to 0.1m automatically? That might be more intuitive than showing an empty space and spaming the log with warning.

@CCranney
Copy link

The solution for me it quite simple. Adjust the background alpha to zero, force the halo object as opaque, done. But this needs more clarification for normal users.

In case it helps someone else, these were the settings I changed based on the original demo used at the top of this PR to fix the same problem. It's not exactly @huisedenanhai's solution and likely has unimportant additions, but I'm just confirming that changes like this would be needed to remove the blocky halo effect from Mixed immersion experiences. I'm not as familiar with the inner workings of CompositorService, but if something like this does make it into the documentation direct instructions akin to the following could be helpful:

world environment settings:
Background -> Mode: Clear Color
Ambient Light -> Source: Color (make it grey?)
Glow: Off (?)
Fog: On (?)
Fog -> Density: 0.0015
Fog -> Sky Affect: 0.0

I've been playing around with visionOS development and identified a couple potential errors, but they seem obscure and situational enough that I'm planning on waiting until after this PR is merged to make issues for them. All in all, great work @rsanchezsaez! It's been amazing experimenting with visionOS experiences from this PR.

VIEWPORT_VRS_DISABLED,
VIEWPORT_VRS_TEXTURE,
VIEWPORT_VRS_XR,
VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.

Variable rate shading texture is supplied by the primary [XRInterface]. Note that this may override the update mode.
</constant>
<constant name="VIEWPORT_VRS_MAX" value="3" enum="ViewportVRSMode">
<constant name="VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP" value="3" enum="ViewportVRSMode">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum seems no longer used

BIND_ENUM_CONSTANT(VIEWPORT_VRS_DISABLED);
BIND_ENUM_CONSTANT(VIEWPORT_VRS_TEXTURE);
BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR);
BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.


let options: LayerRenderer.Capabilities.SupportedLayoutsOptions = foveationEnabled ? [.foveationEnabled] : []
let supportedLayouts = capabilities.supportedLayouts(options: options)
let layout: LayerRenderer.Layout = supportedLayouts.contains(.layered) ? .layered : .dedicated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be device that does not support layered mode?

I don't see the need to support layout other than .layered.

Also current implementation crashes when I force the layout to be .dedicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. .layered is the recommended mode, and new visionOS 26 feature need this mode. Hence, I have changed this file to hardcode .layered.

Comment on lines 144 to 147
virtual Rect2i get_viewport_for_view(uint32_t p_view) {
return Rect2i();
} /* get each view viewport rect, used for visionOS VR */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.

In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.

I logged the viewport returned by this method. It always return the same value for both eyes.

virtual void initialize() = 0;
virtual void begin_frame(double frame_step) = 0;

virtual Error prepare_screen_for_drawing(DisplayServer::WindowID p_screen) { return OK; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is a simple wrapper for RD::get_singleton()->screen_prepare_for_drawing(p_screen). Seems redundent to me. Just call the inner method when needed. No need to add a virtual method to the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, furthermore, I've moved this call to VisionOSXRInterface::post_draw_viewport(), to avoid platform-specific code in the main rendering path.

PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND];
SkyVersion version;
if (sky_scene_state.view_count > 1) {
#if defined(VISIONOS_ENABLED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?

frag_color.rgb += interleaved_gradient_noise(gl_FragCoord.xy) * params.luminance_multiplier;
#endif
#ifdef WRITE_DEPTH
gl_FragDepth = SKY_DEPTH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Don't adjust depth in fragment shader. That disables early-Z test. Very inefficient when user have heavy sky shader.

If you want to apply a global depth offset, you can adjust z coordinate in vertex shader.
In vertex shader, change the line

	gl_Position = vec4(uv_interp, 0.0, 1.0);

to

	gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted this for now for simplicity, and we can discuss other solutions separately.

Copy link
Contributor

@huisedenanhai huisedenanhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all codes. The implementation is good IMO. Great thanks to @rsanchezsaez. This PR uses some uncommon Apple device features, which has sparked some discussion.

MTLRasterizationRateMap

Rendering XR on Vision Pro is pretty similar to other devices. The Vision Pro gives you a texture array for color and a texture array for depth. That's all you need if you're not using foveated rendering.

The main difference is how Vision Pro handles foveated rendering using VRS (Variable Rate Shading). It uses MTLRasterizationRateMap, which is Apple's specific way of implementing VRS.

VRS comes in various forms on different platforms. Some devices provide a texture map to control the shading rate, but that only adjusts the shading rate; you still need to allocate a full-resolution buffer for pixel storage.

Apple takes this a step further. Why allocate extra pixels if they won't be rendered? Apple lets you have a distorted framebuffer, with the distortion controlled by MTLRasterizationRateMap. You can allocate a smaller, regular physical texture, where each row and column of this physical texture can have varying logical sizes on the screen. This is why a viewport can be bigger than the physical framebuffer size.

For example, let's say you want to render to a 5x5 screen, but you only expect high density in the center. You can only allocate a 3x3 texture, then stretch the border pixels to cover the whole screen.

Image

This way of VRS reduces both computation cost and memory consumption.

This solution has its own limitation. Your physical texture is still a regular grid. The MTLRasterizationRateMap only controls the logical sizes of the columns and rows. Therefore, the shading rate configuration is not as flexible as a shading rate map.

This is NOT a new feature. It is introduced since iOS 13. I think it's below godot's current minimun supported iOS version. It is available on nearly all apple devices, which actually can have more usage other than foveated rendering.

Current usage of MTLRasterizationRateMap seems hacky. It have no analog on other platform. RenderingDevice does not have an abstraction for it. @BastiaanOlij suggests to combine the physical texture and MTLRasterizationRateMap as a single texture, and hide all these details behind rendering device. I don't think this is a good idea. You WILL need explicit access to the MTLRasterizationRateMap to decode distortion in shader, especially when you want to do some screen space effect.

If RenderingDevice does not provide special construct to MTLRasterizationRateMap, I prefer the current implementation, which wraps MTLRasterizationRateMap as a special kind of texture. MTLRasterizationRateMap itself is not texture. It has its own type and APIs to be accessed in metal shader. If you want first class support for it in shader, you will need to modify the whole shader translation stack to emit correct code. That is way too much work. Instead, you can pre-decode MTLRasterizationRateMap as a LUT, with size of 2 x max(logical_width, logical_height), using a tiny metal compute shader. The LUT stores mapped physical texture coordinate from logical coordinate, one row for each xy axis. Accessing it in shader will be the same as reading a texture. I think this will be a good enough abstraction given limited time and resources.

Depth Output

CompositorService requires non-zero depth for non-zero alpha pixels. The simplest way to address this is to add an extra full-screen pass that globally sets the depth to a non-zero value. However, this approach requires further discussion.

If you perform a Metal frame capture in Xcode, you'll find some extra compute dispatches at the end of the frame. I haven't reverse-engineered what it's actually doing, but judging by the intermediate texture outputs, it looks like some kind of pixel binning.

My guess: CompositorService groups zero depth pixels together to accelerate passthrough. Zero depth tile => full passthrough with no blending. This can explain the blocky artifacts when depth and alpha are not correctly configured.

If my guess is true, we can't simply mark the entire depth buffer as non-zero. We might need a more sophisticated approach that only marks the non-empty pixels to achieve the best performance.

That said, maybe we should fix the artifact first with an additional full-screen draw, and improve performance later?

Conclusion

This PR provides a solid foundation for visionOS XR rendering support. While it does introduce some new design challenges, these problems are inherent to the device. Therefore, I suggest we merge this PR first and address other problems in subsequent work. These problems are much easier to solve than I thought to be. We do not need a seperate following PR to address them. I proposed some solutions to these problems, see my comments below.

do not rely on the user providing most of these settings (though enhancing this with auto detection features
based on the device we're running on would be cool). I'm mostly adding this as an example or base plate for
more advanced interfaces.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated comment in modules/mobile_vr/mobile_vr_interface.h

@huisedenanhai
Copy link
Contributor

@dsnopek You are right. I've focused too much on the handling of rendering.

I think the biggest problem is the RenderingDevice. VisionOS support can not be implemented elegantly until metal rendering device is aware of MTLRasterizationRateMap. I concluded the workaround in this PR is fine given the current state of RenderingDevice.

For the interface part, I feel the implementation can be adjusted to avoid modify the XRInterface entirely. As I pointed out in related comment, the p_view argument for get_viewport is actually not required on visionOS. We should be able to use get_region instead of the newly introduced hacky method get_viewport_for_view, as long as the RenderingDevice is happy with potentially larger render region than render target size.

get_viewports_are_hdr can also be added after HDR implementation on other platforms are stablized.

To be honest I don't think visionOS XR support can be added without VRS support. When foveated rendering is disabled, the rendering result looks too jaggy for a $3499 device, as the CompositorService will only give you a texture with relatively low resolution :(

@BastiaanOlij
Copy link
Contributor

I'll leave it up to @rsanchezsaez to give feedback on what is, and isn't possible on Metal, I am not an expert on Metal.

But we can't introduce an invasive API change just for one platform. It will create a massive burden for maintainers of the rendering engine, many of which don't even have access to a visionPro to test if changes don't break things.

We already have things in the engine that are causing us issues here, but which we can justify because they target architecture decisions prevalent within the industry.

We must find a way to hide implementation details such as these either on the platform side or find a really good way to abstract this. Metal in this case is the odd duck. If that introduces limitations in say, post processes, I can live with that.

Longer term thinking there has been a plan to abstract out RendererSceneRenderRD or maybe even RendererSceneRender so we can create a 3D render implementation specific for a platform or specific for a use case. But that isn't in the cards just yet.

@huisedenanhai
Copy link
Contributor

I pushed a working implementation that removes those invasive API change introduced in this PR. I have made a PR to rsanchezsaez#2 . @dsnopek @BastiaanOlij you can take a look and see if that solves your concern on base class API change.

@huisedenanhai
Copy link
Contributor

huisedenanhai commented Oct 25, 2025

I just find out the sky modification in this PR is NOT needed. All these modification is to apply a small depth to screen. The same thing can be achieved with a custom post processing gdshader. You can draw a full screen quad that adjusts depth following the tutorial. I put my working shader below in case anyone is interested.

shader_type spatial;
render_mode unshaded, fog_disabled, depth_draw_always, blend_mix;

#define DEPTH_BIAS 0.00000001

void vertex() {
	POSITION = vec4(VERTEX.xy, DEPTH_BIAS, 1.0);
}

void fragment() {
	ALBEDO = vec3(0,0,0);
	ALPHA = 0.0;
}

This is a more general solution. It also works for background with clear color. It does not require any engine source change.

Now I think all problems introduced in this PR have a proper solution.

  1. Fields/APIs introduced specifically for MTLRasterizationRateMap in forward mobile renderer

I pushed a PR rsanchezsaez#2 that encapsulates MTLRasterizationRateMap as a special kind of VRS texture. All ad hoc methods and fields like get_viewport_for_view, xr_viewports, rasterization_rate_map can be removed.

  1. Sky shader variant introduced for depth write

This is absolutely unnecessary. As user can already do the same thing with custom post processing. Modification of sky in this PR should be reverted.

@huisedenanhai
Copy link
Contributor

There is another question, which is about HDR. I haven't look at HDR implementation in Godot. But in my experience, HDR is not a feature that should always be enabled. This PR forcefully enables HDR on visionOS.

@allenwp Will there be a feature toggle for HDR? If that's true, visionOS should also response to user's HDR setting.

@rsanchezsaez
Copy link
Contributor Author

@BastiaanOlij @huisedenanhai @dsnopek @CCranney Thanks a lot for the in-depth reviews, for the comments detailing your experiments, and for all the details! I agree with the desire to have as little platform specific code as possible in the common rendering path, and I also agree about trying not to have XRInterface API changes that only work for a single platform.

@huisedenanhai Thanks for providing a PR with some of the suggested improvements!

I'll take a deeper look at all the provided details and code this coming week, and I'll strive to provide an updated PR soon. I'll also look into splitting smaller independent improvements such as new warnings into separate PRs.

@rsanchezsaez
Copy link
Contributor Author

rsanchezsaez commented Oct 31, 2025

@huisedenanhai Thanks for all your detailed experiments, all your comments, and for your PR. I looked at your patch, and it looks great! It's a very good approach to avoid adding visionOS specific logic to the main render path, and to XRInterface.

However, I ran into a small problem. When running a visionOS immersive project with your patch in debug mode, the project breakpoints in this DEV_ASSERT, and the p_usage parameter is RESOURCE_USAGE_NONE. When running in release, everything seems to work correctly, because I assume DEV_ASSERT is disabled.

bool RenderingDeviceGraph::_is_write_usage(ResourceUsage p_usage) {
	switch (p_usage) {
                        // ...
			DEV_ASSERT(false && "Invalid resource tracker usage.");
			return false;
	}
}

I tried setting these spoofed flags to the fake texture wrapping the rasterizationRateMap, by mimicking the flags that are set by the OpenXR module:

texture.usage_flags = RD::TEXTURE_USAGE_COLOR_ATTACHMENT_BIT | RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_STORAGE_BIT | RD::TEXTURE_USAGE_VRS_ATTACHMENT_BIT;

but that didn't help.

Do you know a good way of fixing this issue?

Screenshot 2025-10-30 at 9 50

@huisedenanhai
Copy link
Contributor

Oops, I forgot to check debug build. I used RESOURCE_USAGE_NONE for rasterization rate map. The render graph does not like it. The assert can be fixed by adding RESOURCE_USAGE_ATTACHMENT_RASTERIZATION_RATE_MAP_READ. I have updated the PR and confirmed the debug build work. @rsanchezsaez

@huisedenanhai
Copy link
Contributor

MSAA does not work in current implementation. The fix depends on another PR #78598.

I put a fix in my draft PR #112439 in case anyone needs it. The proper fix need to wait for #78598.

The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.

To use the visionOS XR module you must set the new 'application/app_role' export setting to Immersive. You can choose if you want passthrough or not by the new 'application/immersion_style' export option.

Then, initialize the visionOS VR module in a script:

```
    var interface = XRServer.find_interface("visionOS")
    if interface and interface.initialize():
        var viewport : Viewport = get_viewport()
        viewport.use_xr = true
        viewport.vrs_mode = Viewport.VRS_XR
        viewport.use_hdr_2d = true
```

Implementation details:
- The visionOS platform now has two different execution paths implemented by the `GodotWindowScene` and `CompositorServicesImmersiveSpace` scenes in `app_visionos.swift`. The `application/app_role` export setting controls which scene is used.
- The visionOS VR interface tries to be as close to the OpenXR interface as possible, to keep main renderer code changes to a minimum. It adopts Compositor Services and ARKit APIs, which is how you render Metal content on visionOS.
- `XRInterface` has these new methods to spport the platform: `get_viewports_are_hdr()` (which avoids the sRGB conversion step on the tonemapper), `get_viewport_for_view()` (which provides individual viewports for each eye, which are larger than the render area when foveation is on).
- We obtain and set the head pose twice, once in `process()` so scripts can use it if needed, and another in `pre_render()`, so the pose is more accurate for rendering.
- The projection matrices returned by visionOS have an inverse depth correction applied (visionOS uses the [0, 1] z space, but Godot expects the [-1, 1] z space until the rendering step).
- The `rasterizationRateMap` (the structure that supports foveation on visionOS) is provided through the `get_vrs_texture()` function, using the new `XR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAP` texture type. It's' passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.
- The Metal driver now needs to be able to set several viewports (one for each eye), so the DrawListInstructions `TYPE_SET_VIEWPORT` and TYPE_SET_SCISSOR have been replaced by `TYPE_SET_VIEWPORTS` and  `TYPE_SET_SCISSORS`. If you pass a single viewport to the functions, they work as it used to.
- The skybox shader has a new `SKY_VERSION_BACKGROUND_MULTIVIEW_WRITE_DEPTH` variant, because visionOS in immersive mode needs to write appropriate depth values.
- Apple Vision Pro's' minimum supported near plane is `0.1`. There's a new editor warning emited by `XRCamera3D` if you set it to a lower value.
- `XRCamera3D` also emits a warning if you change the FOV, as this has no effect on XR platforms. They use the platforms FOV.
- `XROrigin3D` emits a new editor warning if you change the scale, as it causes the rendered depth texture values to be incongruent to what XR platforms expect to perform reprojection (this applies to OpenXR platforms as well).
- The Metal driver has a new dummy `SurfaceCompositorServices`, which replaces `SurfaceLayer` when running in immersive mode. The reason for this is that the Compositor Services API needs to do a `cp_drawable_encode_present()` step with the `MTLCommandBuffer` used by the renderer, and this seemed the most natural way of overriding the `present()` call normally done by the Metal driver.
@allenwp
Copy link
Contributor

allenwp commented Nov 18, 2025

There is another question, which is about HDR. I haven't look at HDR implementation in Godot. But in my experience, HDR is not a feature that should always be enabled. This PR forcefully enables HDR on visionOS.

@allenwp Will there be a feature toggle for HDR? If that's true, visionOS should also response to user's HDR setting.

That's correct: HDR should be optional. It can be difficult for a game developer to author correctly balanced bright scenes with unbounded HDR output enabled, so I believe there is still significant value to SDR scenes, even in VR. In the future, we may add an option to Environment that will allow the user to limit the max output to their choice of a value, which could be as low as 1.0 (equivalent to SDR with the HDR 2D setting enabled).

Additionally, I'm not sure, but will it be possible to use visionOS with HDR 2D disabled? If so, that's another reason that SDR is needed; HDR requires HDR 2D to be enabled, so if we want to support HDR 2D disabled, we need to also support SDR mode.

BUT, I could imagine that after the new Environment parameter for limiting the output range is merged, it's possible that SDR would no longer be absolutely necessary, as the user could use this to get something that is effectually SDR within HDR. Not sure about this idea, due to the paradigm being inconsistent with how HDR/SDR works on other platforms.

(Also, three weeks ago we firmed up the public API for the HDR output PR, which can be found in #94496; viewing the docs changes of the commit is a nice way to see the public API.)

@rsanchezsaez
Copy link
Contributor Author

rsanchezsaez commented Dec 8, 2025

Apologies for the extended delay in updating this PR. I expect we'll be able to iterate on this PR in a quicker manner from now on.

@huisedenanhai Thanks so much for your suggestions, and your proposed solutions. I have pushed a new version of this PR adopting most of your proposed rasterizationRateMap handling. I have also listed you as a co-author at the description at the top of the PR.

@BastiaanOlij @dsnopek I agree with @huisedenanhai that rasterizationRateMap is a useful feature for all Apple platforms, and hence it's in Godot's benefit to have some explicit official support as part of the renderer. I think the solution of supplying it wrapped on an additional fake-texture attachment (with added logic to inform the renderer to ignore this texture for texture operations), and storing it on the MDFramebuffer object is reasonable. Let me know if disagree, and I'm happy to continue iterating it this (cc @clayjohn as well).

@BastiaanOlij With this push, I think we have addressed your big three areas of concern:

  • HDR output changes have been reverted, and now we recommend enabling it on the xr intialization script.
  • sky shader changes have been reverted, and we can come up with a separate solution for skybox/transparent content. Perhaps we can also note this visionOS XR module #109975 (comment) possible solution in the documentation.
  • rasterizationRateMap handling now is cleaner, and all XRInterface changes have been reverted.

@CCranney Thank you for your experiments and additional details in testing this code.

@allenwp Thanks for your thoughts on HDR. Metal content rendered on visionOS through Compositor Services must be HDR, so in a way HDR cannot be turned off. For now, I think it's ok to not do anything special in this PR, and just recommend enabling viewport.use_hdr_2d = true on the init script. If after the HDR APIs for Apple platforms stabilize, there is a different preferred solution for visionOS, we can address separately.

I think I have addressed everybody comments. If anybody feels your comment was not sufficiently addressed, please feel free to respond again, and I'll do my best to address it and/or iterate further.

…ing() call from renderer_viewport

Instead, call it on VisionOSXRInterface::post_draw_viewport()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants