Skip to content

[wasm-split] Remove unnecessary global exports#8832

Merged
aheejin merged 13 commits into
WebAssembly:mainfrom
aheejin:wasm_split_global_transitive_fix
Jun 16, 2026
Merged

[wasm-split] Remove unnecessary global exports#8832
aheejin merged 13 commits into
WebAssembly:mainfrom
aheejin:wasm_split_global_transitive_fix

Conversation

@aheejin

@aheejin aheejin commented Jun 11, 2026

Copy link
Copy Markdown
Member

Globals and tables can have initializers that can contain other globals. Currently we just scan them as uses. For example, if global $g is used both in the primary and the secondary and its initializer is (global.get $h), $h is also marked as "used" in both modules.

But currently we only move a module item to a secondary module only when that item is exclusively used by that module. So if a global is used in the primary and the secondary, it will stay in the primary and then be exported to the secondary.

But in the current code, becaus $g is marked as used in both modules and its initializer will be walked in both modules, $h is also marked as used in both modules. Becuase $g doesn't move to the secondary and only is imported there, the secondary doesn't need $h. But because it is marked as "used", the secondary module imports $h unnecessarily. The multi-split case is similar.

The case is the same for table initiaializers. The difference between the two is global initializers can contain another global, so we need a worklist to compute the transitive closure.

This fixes it by figuring out who the "owner" is for each global and table, and mark it "used" in a secondary module only when that is the sole user. Otherwise it will be marked as "used" in the primary.

This does not meaningfully change computation time and reduces the primary module size around 0.3% for new acx_gallery and essentials and 1% for old acx_gallery.

Globals and tables can have initializers that can contain other globals.
Currently we just scan them as uses. For example, if global $g is used
both in the primary and the secondary and its initializer is
`(global.get $h)`, $h is also marked as "used" in both modules.

But currently we only move a module item to a secondary module only when
that item is exclusively used by that module. So if a global is used in
the primary and the secondary, it will stay in the primary and then be
exported to the secondary.

But in the current code, becaus $g is marked as used in both modules and
its initializer will be walked in both modules, $h is also marked as
used in both modules. Becuase $g doesn't move to the secondary and only
is imported there, the secondary doesn't need $h. But because it is
marked as "used", the secondary module imports $h unnecessarily. The
multi-split case is similar.

The case is the same for table initiaializers. The difference between
the two is global initializers can contain another global, so we need a
worklist to compute the transitive closure.

This fixes it by figuring out who the "owner" is for each global and
table, and mark it "used" in a secondary module only when that is the
sole user. Otherwise it will be marked as "used" in the primary.

This does not meaningfully change computation time and reduces the
primary module size around 0.3% for new acx_gallery and essentials and
1% for old acx_gallery.
@aheejin aheejin requested a review from tlively June 11, 2026 17:44
@aheejin aheejin requested a review from a team as a code owner June 11, 2026 17:44
Comment thread src/ir/module-splitting.cpp
Comment thread src/ir/module-splitting.cpp Outdated
Comment on lines +844 to +846
// Scan table initializers into their owning modules. If a table is used by a
// single secondary module, its initializer dependencies belong to that
// secondary module. Otherwise, they belong to the primary module.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle the case where the same global is used both in a moved table initializer and in some other location that prevents it from being moved? It looks like the code might handle this, but the comment suggests it does not.

It would be good to add tests for this kind of case if we don't have them already.

@aheejin aheejin Jun 12, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks I can't create that test though:

// Check that no module-defined globals are referenced.
for (auto* get : FindAll<GlobalGet>(table->init).list) {
auto* global = module.getGlobalOrNull(get->name);
info.shouldBeTrue(
global && global->imported(),
table->init,
"table initializer may not refer to module-defined globals");
}

But yeah because UsedNames::globals are managed separately, so if a global is used in two different places it will be pinned to the primary module. I'll rephrase the comment: b490f2a

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I guess only imported globals can be referenced from table initializers.

Comment thread src/ir/module-splitting.cpp Outdated
Comment thread src/ir/module-splitting.cpp Outdated
Comment on lines +872 to +878
if (UsedNames* owner = getOwner(name, &UsedNames::globals)) {
for (auto* get : FindAll<GlobalGet>(global->init).list) {
if (owner->globals.insert(get->name).second) {
worklist.push(get->name);
}
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a name may end up in the primaryUsed map as well as a secondaryUsed map. This can happen if the visitation order is such that we think a secondary module owns the name until later we discover that there is another use so the primary module should own the name. Is this a problem? Should we ensure that each name only ends up in at most one map?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I changed the scanning algorithm and added a test case: 407072c

In the new test case here, $g1 was unnecessarily exported to the secondary module in the previous code, but not after this commit.

@tlively tlively left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow-up it might be worth investigating replacing the secondaryUsed vector of used name sets with a map from Name to vector of using modules (or maybe just a single using module, where the primary module is considered the user if there are ever two different using modules). Then there would be no need for the loop over secondaryUsed in getOwner.

@aheejin

aheejin commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

As a follow-up it might be worth investigating replacing the secondaryUsed vector of used name sets with a map from Name to vector of using modules (or maybe just a single using module, where the primary module is considered the user if there are ever two different using modules). Then there would be no need for the loop over secondaryUsed in getOwner.

Not sure if I understand. primaryUsed is a UsedNames, and secondaryUsed is a vector of UsedNames. UsedNames is created per module, and contains sets of used global/table/memory/tag names. And we compute the results of getOwner based on these UsedNames. How can we replace UsedNames with it? Can we directly create the results of getOwner, bypassing UsedNames collection above?

@aheejin aheejin merged commit 389d044 into WebAssembly:main Jun 16, 2026
16 checks passed
@aheejin aheejin deleted the wasm_split_global_transitive_fix branch June 16, 2026 00:42
@tlively

tlively commented Jun 16, 2026

Copy link
Copy Markdown
Member

Yeah, I'm thinking that we could collect the UsedNames for each module in parallel (excluding global initializers), but then collect them directly into this map from names to owning modules. Then we could walk the table and global initializers just like the current code does, but use this map to efficiently look up the existing owner for each global instead of having to separately look for uses in each secondary module.

@aheejin

aheejin commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Yeah, I'm thinking that we could collect the UsedNames for each module in parallel (excluding global initializers),

We are already using ParallelFunctionAnalysis within a module:

// Given a module, collect names used in the module
auto scanModule = [&](Module& module) {
UsedNames used;
ModuleUtils::ParallelFunctionAnalysis<UsedNames> nameCollector(
module, [&](Function* func, UsedNames& used) {
if (!func->imported()) {
NameCollector(used).walk(func->body);
}
});

You want to parallelize module scanning inter-module on top of that? (How do you usually do that in Binaryen? ThreadPool?)

but then collect them directly into this map from names to owning modules.

I'm not sure if I understand what "collect directly into this map" means. We need to collect UsedNames first. Do you mean pre-compute getOwner for all names and save them into a map, so that we don't recompute later? (But note that the result of getOwner changes, for example, after we compute the transitive global closures, the results of getOwner on those involved globals will change. But I guess we can update the precomputed map too.)

@tlively

tlively commented Jun 16, 2026

Copy link
Copy Markdown
Member

I'm not suggesting that anything change about the parallelism.

Here's pseudocode for what I have in mind:

owningModules = {}
for m in modules:
  for name in usedNames[m]:
    if owningModules.contains(name):
        owningModules[name] = primary
    else
        owningModules[name] = m

for g in globals:
  gOwner = owningModules[g]
  for dep in g.deps:
    if owningModules.contains(dep) and owningModules[dep] != gOwner
      owningModules[dep] = primary
    else
      owningModules[dep] = gOwner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants