Skip to content

feat: F3 e2e lifecycle#1469

Open
karlem wants to merge 58 commits intomainfrom
f3-lifecycle
Open

feat: F3 e2e lifecycle#1469
karlem wants to merge 58 commits intomainfrom
f3-lifecycle

Conversation

@karlem
Copy link
Contributor

@karlem karlem commented Oct 28, 2025

Closes #1441 and #1442


Note

High Risk
High risk because it changes consensus-critical top-down finality initialization/execution paths, updates on-chain actor state schema for the F3 light client, and alters genesis creation to pull/derive F3 parameters from a parent chain.

Overview
Adds an end-to-end F3 proof-based top-down finality path: new app::service::topdown bootstraps either legacy vote-based topdown or F3 (with persistent proof cache, proof-service background task, and retry-on-cache-miss execution config), and node startup is refactored to use this pre/post-App::new() initialization.

Reworks the f3-light-client actor to store the power table as a HAMT rooted by power_table_root, track processed_instance_id with monotonicity checks, and materialize the table on GetState; updates associated types/tests and adds shared fendermint_vm_evm_event_utils for decoding EVM event proofs.

Extends genesis-from-parent to fetch/validate an F3 certificate, derive base_epoch and its ETH block hash, parse/sort power using big-int bytes, and adds configurable F3 proof-service + execution retry settings in fendermint_app_settings (with updated example config).

Written by Cursor Bugbot for commit 682a9e8. This will update automatically on new commits. Configure here.

@karlem karlem changed the title feat: init lifecycle feat: F3 e2e lifecycle Oct 29, 2025
@karlem karlem force-pushed the f3-lifecycle branch 2 times, most recently from 91db005 to cbce51c Compare November 4, 2025 17:20
Base automatically changed from f3-proofs-cache to main December 18, 2025 16:15
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment on lines +303 to +304
//
// Note: the first epoch proven in a certificate does not have a previous cursor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to initialize the cursor from the base tipset's storage commitment (used only as parent here), which is the same as the last tipset in the previous cert (used as child there). WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a workaround, maybe we could fetch parent tipsets (which are not very useful anyway) right in generate_proof_for_epoch, call it for the base tipset, get the nonces from the resulting proof bundle and discard the proof. Perhaps, we can do this only once and then maintain the cursor in ProofGeneratorService struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best way here is to store it on L2 ledger...

Copy link

@sergefdrv sergefdrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work, so we can test it against calibration and finally get merged 🚀 Good job! 💪

Comment on lines 200 to 211
verify_sequence_against_storage_next(
"top-down message nonces",
next_topdown,
cursor.as_ref().map(|c| c.next_topdown_message_nonce),
Some(cursor.next_parent_topdown_nonce),
&nums.topdown_nonces,
)?;
verify_sequence_against_storage_next(
"power-change configuration numbers",
next_cfg,
cursor.as_ref().map(|c| c.next_power_change_config_number),
Some(cursor.next_parent_power_change_config_number),
&nums.config_numbers,
)?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe verify_sequence_against_storage_next could also require the nonce rather than taking it wrapped in Option.

Comment on lines +564 to +609
let start = Instant::now();
let (state, (apply_ret, emitters)) = state.call(*msg.clone()).await?;
let latency = start.elapsed().as_secs_f64();
let exit_code = apply_ret.msg_receipt.exit_code.value();
emit(MsgExec {
purpose: MsgExecPurpose::Call,
height: state.block_height(),
message: *msg,
duration: latency,
duration: 0.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to keep this

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Power table canonical ordering gets lost
    • Fixed get_state to sort power table by canonical F3 ordering (power descending, then participant id ascending) instead of sorting by id only, ensuring consistency with F3 certificates and power-table CIDs.

Create PR

Or push these changes by commenting:

@cursor push 7d00f19e5a
Preview (7d00f19e5a)
diff --git a/fendermint/actors/f3-light-client/src/lib.rs b/fendermint/actors/f3-light-client/src/lib.rs
--- a/fendermint/actors/f3-light-client/src/lib.rs
+++ b/fendermint/actors/f3-light-client/src/lib.rs
@@ -6,6 +6,7 @@
 use fil_actors_runtime::builtin::singletons::SYSTEM_ACTOR_ADDR;
 use fil_actors_runtime::runtime::{ActorCode, Runtime};
 use fil_actors_runtime::{actor_dispatch, actor_error, ActorError};
+use fvm_shared::bigint::{BigInt, Sign};
 use fvm_shared::METHOD_CONSTRUCTOR;
 use num_derive::FromPrimitive;
 
@@ -103,7 +104,13 @@
                 });
                 Ok(())
             })?;
-            out.sort_by_key(|e| e.id);
+            // Sort by canonical F3 ordering: power descending, then participant id ascending.
+            // This matches the ordering used in F3 certificates and power-table CIDs.
+            out.sort_by(|a, b| {
+                let pa = BigInt::from_bytes_be(Sign::Plus, &a.power_be);
+                let pb = BigInt::from_bytes_be(Sign::Plus, &b.power_be);
+                pb.cmp(&pa).then_with(|| a.id.cmp(&b.id))
+            });
             out
         };
 
@@ -137,8 +144,18 @@
     use fil_actors_runtime::SYSTEM_ACTOR_ADDR;
     use fvm_ipld_encoding::ipld_block::IpldBlock;
     use fvm_shared::address::Address;
+    use fvm_shared::bigint::{BigInt, Sign};
     use fvm_shared::error::ExitCode;
 
+    /// Helper function to sort power entries to canonical F3 ordering
+    fn sort_power_entries_canonical(entries: &mut Vec<PowerEntry>) {
+        entries.sort_by(|a, b| {
+            let pa = BigInt::from_bytes_be(Sign::Plus, &a.power_be);
+            let pb = BigInt::from_bytes_be(Sign::Plus, &b.power_be);
+            pb.cmp(&pa).then_with(|| a.id.cmp(&b.id))
+        });
+    }
+
     /// Helper function to create test power entries
     fn create_test_power_entries() -> Vec<PowerEntry> {
         fn u64_to_power_be(x: u64) -> Vec<u8> {
@@ -321,7 +338,9 @@
 
         let response = result.deserialize::<GetStateResponse>().unwrap();
         assert_eq!(response.processed_instance_id, 42);
-        assert_eq!(response.power_table, power_entries);
+        let mut expected = power_entries;
+        sort_power_entries_canonical(&mut expected);
+        assert_eq!(response.power_table, expected);
     }
 
     #[test]
@@ -384,7 +403,9 @@
             initial.power_table_root, updated.power_table_root,
             "power table root CID should change when table changes"
         );
-        assert_eq!(updated.power_table, new_power_table);
+        let mut expected = new_power_table;
+        sort_power_entries_canonical(&mut expected);
+        assert_eq!(updated.power_table, expected);
     }
 
     #[test]

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Ok(())
})?;
out.sort_by_key(|e| e.id);
out
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Power table canonical ordering gets lost

High Severity

get_state now sorts power_table by validator id, but F3 power tables are canonicalized by power descending then id ascending. Passing this reordered table into power_entries_from_actor changes table semantics and can break certificate/proof progression tied to canonical ordering.

Additional Locations (1)
Fix in Cursor Fix in Web

Comment on lines +198 to +215
let latency = generation_start.elapsed().as_secs_f64();

emit(ProofBundleGenerated {
highest_epoch: parent_epoch,
storage_proofs: bundle.storage_proofs.len(),
event_proofs: bundle.event_proofs.len(),
witness_blocks: bundle.blocks.len(),
bundle_size_bytes,
status: OperationStatus::Success,
latency,
latency: 0.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keep it?

Comment on lines +188 to +201
let persist_start = Instant::now();
if let Err(e) =
self.with_persistence(|p| p.save_certificate_with_epoch_proofs(&cert, &epoch_proofs))
{
emit(ProofCacheAtomicWrite {
instance: instance_id,
epoch_count: epochs.len(),
status: OperationStatus::Failure,
latency: persist_start.elapsed().as_secs_f64(),
latency: 0.0,
});
return Err(e);
}
emit(ProofCacheAtomicWrite {
instance: instance_id,
epoch_count: epochs.len(),
status: OperationStatus::Success,
latency: persist_start.elapsed().as_secs_f64(),
latency: 0.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure?

Comment on lines +343 to +344
/// If cache locks are contended by writers (e.g. proof insertion), this returns `None`
/// immediately so consensus proposal building can continue without stalling.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this may cause starvation on the proposer side...

Comment on lines +172 to +180
let fetch_start = Instant::now();

match self.light_client.get_certificate(instance).await {
Ok(cert) => {
let latency = fetch_start.elapsed().as_secs_f64();
emit(F3CertificateFetched {
instance,
ec_chain_len: cert.ec_chain.suffix().len(),
status: OperationStatus::Success,
latency,
latency: 0.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure?

Comment on lines +191 to +194
let latency = fetch_start.elapsed().as_secs_f64();
emit(F3CertificateFetched {
instance,
ec_chain_len: 0,
status: OperationStatus::Failure,
latency,
latency: 0.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure?

Copy link

@sergefdrv sergefdrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

tracing::info!(height, "PrepareProposal message preparation finished");

let txs = Vec::from_iter(response.messages.into_iter().map(bytes::Bytes::from));
tracing::debug!(height, tx_count = txs.len(), "PrepareProposal response ready");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug-level logging promoted to info in prepare_proposal

Low Severity

Several tracing::info! calls were added to prepare_proposal ("PrepareProposal start", "read_only_view ready", "message preparation finished", "response ready") that fire on every single block proposal. Since prepare_proposal runs on every block, these produce excessive noise at info level in production. They look like development instrumentation that was left at info! instead of being downgraded to debug! or trace!.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F3 topdown: Proof Verification & Completeness Enforcement

2 participants