USDC Withdrawal Issues and Proposed Fix

TL;DR:

  • Users should avoid submitting USDC withdrawals until further resolution.
  • An address format incompatibility issue between Penumbra and Noble currently prevents USDC withdrawals.
  • Fixing this issue is possible but will require the community to coordinate a software upgrade.
  • Withdrawn USDC funds are currently inaccessible, pending a software upgrade.

Address Format Incompatibility

Bech32 is an address encoding scheme used for blockchain addresses, which provides a human-readable label (cosmos, penumbra, etc) and a checksum. There are two variants of the Bech32 encoding: Bech32, and Bech32m (“modified”). Bech32m improves the checksum algorithm used in Bech32. However, Bech32m and Bech32 are not compatible. The Cosmos ecosystem generally uses Bech32. Penumbra uses Bech32m for penumbra1... addresses.

This incompatibility was discovered only after Penumbra stabilized its address format. As a workaround, Penumbra specifies a “compat” encoding using Bech32, with penumbracompat1… prefix. Penumbra addresses can be encoded using Bech32m with a penumbra1… prefix or Bech32 with a penumbracompat1… prefix, and decoding accepts either. Encoding uses the Bech32m method by default.

This difference does not normally matter, as ICS20 transfer packets specify addresses as strings, to avoid any chain-specific assumptions about address formats. However, Noble has an extra middleware that decodes addresses in transfer packets as Bech32 and rejects transfer packets if the address is not Bech32 encoded. This middleware is only applied to USDC.

For this reason, inbound transfer packets from Noble to Penumbra are created client-side using the Bech32 penumbracompat1… format for the receiver address on the Penumbra chain. This works fine. Outbound transfer packets from Penumbra to Noble are created by the Penumbra chain itself, based on the contents of the Penumbra-specific Ics20Withdrawal action. This action specifies the receiver address on the other chain, as well as a (randomized) sender address on the Penumbra chain to return funds to in case of a timeout. However, the Penumbra chain uses the default (Bech32m) encoding for the sender address, and there is no way for a client to request the Bech32 “compat” encoding. This causes the Noble chain to reject returning transfer packets.

Unfortunately, this issue was not detected by earlier testing work:

  • Penumbra is integrated into the InterchainTest framework for automatic conformance testing against the ICS20 standard and Cosmos SDK chains. But this did not detect the issue, as Noble’s middleware is not part of the ICS20 standard or the default Cosmos SDK behavior
  • In addition to automatic testing, manual testing of transfers between Penumbra testnets and the Noble and Osmosis testnets, including withdrawals of assets from Penumbra to both the Noble and Osmosis testnets. However, Noble’s middleware only applies to USDC, and the manual testing of withdrawals was performed with native tokens only.

Acknowledgement Handling

IBC packets are relayed from a host chain to a counterparty chain. The counterparty chain may or may not acknowledge those packets. ICS20 is designed to refund tokens to the sender if the counterparty chain does not successfully acknowledge those relayed packets.

Penumbra correctly handles timeouts, refunding tokens to the sender. However, it appears that the Penumbra chain does not correctly handle a counterparty chain acknowledging the transfer with an error. Root cause analysis about why this is the case and why this was not previously detected as part of earlier assurance is underway.

Impact

This address format incompatibility means that the Noble chain will reject incoming USDC transfers from Penumbra. However, instead of timing out the packet, which would cause Penumbra to automatically return funds to the sender, Noble acknowledges the packet with an error, which is not handled by Penumbra’s ICS20 handler. This prevents the Penumbra chain from correctly minting a refund for the failed withdrawal.

These failed withdrawals are visible on chain. Funds remain in the escrow account on the Penumbra chain, and are not missing, though they are currently inaccessible.

Users should avoid initiating any outbound USDC transfers.

Users who do not initiate outbound USDC transfers are unaffected.

Existing outbound USDC transfers are currently inaccessible, pending remediation as described below.

Inbound and outbound transfers of every other asset continue to operate normally.

Users can monitor IBC transfers using the Range dashboard at https://ibc.range.org

Remediation

There are two parts to remediation: enabling outbound transfers going forward, and restoring funds from existing failed transfers.

To enable outbound transfers going forward, additive changes can be made to the source code.

As mentioned above, the outbound transfer packets are created by the chain, not by the client, so a client-side mitigation is not possible. To address this issue, the community could propose and coordinate a minor chain upgrade that would:

  • Add a bool use_compat_address = 8; field to the Ics20Withdrawal action that instructs the Penumbra chain to use the “compat” address encoding in the transfer packet it creates;
  • Add logic to the Ics20Withdrawal action’s ActionHandler implementation that uses the compat encoding if use_compat_address is true;

This allows clients to ask for the compat encoding for their transfer. Frontends and clients can be updated to special case transfers on channel-4 to Noble and set use_compat_address to true.

This change would be backward-compatible, and not change the processing of any existing transactions (which would all have use_compat_address = false as it would be an unset Protobuf field). It would not migrate state.

In addition, the ICS20 transfer handler’s acknowledge_packet_execute implementation needs to be patched to correctly handle an error acknowledgement.

A patch that fixes this logic going forward is in development and testing against the Noble testnets.

To restore access to funds from existing failed transfers, various options are possible. One option would be to identify incorrectly acknowledged packet commitments and perform a state migration that restores those packet commitments, allowing users to re-exercise the transfer processing logic for existing transfers.

These steps can be performed independently. In the very short term, a patch could be applied to ensure future withdrawals function correctly. In the medium term, a state migration could restore access to funds sent via existing transfers. Both resolution paths can be pursued in parallel, subject to community decision making.

7 Likes

For anyone curious. Namada also ran into this problem and the patch on the Noble side is literally 3 lines.

3 Likes

Good to know @hdevalence!

Thank you for detailing this, Radiant will standby to hear about what the patch entails.

2 Likes

Standing by for any pushes needed on the validator end.

1 Like

A draft PR that addresses the issues is here: fix(ibc)!: handle ibc withdrawals correctly by conorsch · Pull Request #4787 · penumbra-zone/penumbra · GitHub

This PR has two changes:

  1. It adds the compat address selection to the Ics20Withdrawal action, and propagates changes into pcli to allow specifying it on the command line.
  2. It fixes the acknowledgement behavior to correctly handle acknowledgements of success versus errors. This code is written to reuse as much of the previously audited timeout logic as possible.

This code was tested with a local devnet, relaying to the Noble testnet, which verified that:

  • When using the compat address flag, the outbound transfer succeeds.
  • When not using the compat address flag, the outbound transfer fails, and the error causes a refund on the Penumbra chain.

It is believed that this would fix the issue for all transfers going forward, and community review of the patch would be invaluable. It does not address the existing incorrectly processed transfers.

This change would need to be applied in an upgrade, as it changes the consensus rules (to correct their handling of transfer packets), and applying those rules to previous blocks would give different results.

Since there needs to be an upgrade, it may be best to fix the transfer logic and the failed transfers in one upgrade. On the one hand, this delays applying a fix. On the other hand, it would mean that only one upgrade is required, which could reduce overall risk and operational procedure.

As described above, one line of investigation is into whether the existing failed transfers can be restored by simply re-inserting the relevant packet commitments, which would allow their ACKs to be re-relayed (using the new logic). This has not yet been implemented or tested. The steps would be to create a migration that can restore packet commitments, then applying it to a devnet that had experienced failed transfers and seeing that the transfers are restored post-upgrade.

Another approach would be to have the migration create new notes for the failed transfers. This seems less preferable for several reasons: (1) Penumbra migrations occur “between blocks”, so it’s unclear how a client would scan those notes (2) it involves direct editing of chain state, which is both risky and conceptually dangerous (3) it risks incorrectly modifying some other invariants.

In any case, however, the community needs to align on one of two options:

  1. Applying one upgrade that enables USDC withdrawals and restores the existing transfers in one go;
  2. Applying a minimal upgrade now that enables USDC withdrawals and another upgrade later that restores the existing transfers in two separate steps.
7 Likes

I would like to express my preference for option 2, which involves deploying a minimal upgrade to immediately enable USDC withdrawals. Given that the network is still in the process of bootstrapping liquidity, it is crucial to minimize disruptions for users who need to move their funds into and out of Penumbra.

While it may be more efficient to address both current transfer errors and future transfers in a single upgrade, deploying multiple upgrades at this early stage could also provide the community with valuable insights into the upgrade proposal and deployment process.

3 Likes

I would also like to show preference for option 2. I have some outbound USDC suck and still feel that restoring functionality to Noble transfers now then fixing past transfers after is the way to go. This way people who are not in the loop do not add to the number of stuck transfers.

1 Like

I have no very strong preference, but slightly leaning towards option 2, so the community can resume these early stages of trying out functionality on the chain as soon as possible

2 Likes

Hey everyone, my first post on the Penumbra forum, so GM… or LM… or in my case NM (Noble morning)!

Full disclosure, I was a part of the initial triage effort for these issues that Henry has documented above. For background, I am CTO at Noble.

We’re personally in favor of Option 2. Although there isn’t a significant amount of funds that are currently stuck in channel-2 between Penumbra and Noble, we think it’s important that a remediation for these issues gets deployed as soon as possible to prevent future funds from being stuck. I understand that the minifront has been updated to disable this flow, but apparently a majority of RPC providers have yet to update.

Regarding the follow up upgrade, we’re in favor of simply re-adding the packet commitments to state (and potentially removing the existing packet acknowledgements). This would allow relayers to replay the acknowledgements automatically, removing a lot of manual logic from the upgrade itself. Noble would be happy to help in the replay of those transfers.

4 Likes

Hello everyone, i would like to show a very strong preference for option 1. Im no dev but in my experience, it’s always better to rip the band-aid and treat the wound directly than apply some lotion to the sore. The future is definitely bright for penumbra and these setbacks are genuinely important for it story.

1 Like

Hello all, finch here from Starling Cybernetics.

I’m in favor of the second path forward: applying the fix now to re-enable USDC flows between Noble and Penumbra over channel-2, and then coordinating a second upgrade later to unstick the stuck funds, with less of a ticking clock hanging over the more intricate second remediation.

While there is a nonzero coordination cost to two separate upgrades, I stand at the ready to assist and participate in this coordination. I believe that it would de-risk the implementation of these mitigations to have the two logically distinct changes happen as separate steps, tested and deployed separately. Additionally, I concur with @johnletey that it’s important to stop more funds from getting stuck by patching the protocol as soon as is feasible with due caution, since current Minifront deployments baked into RPC nodes can’t be quickly updated.

A related aside

Separately, this issue with Minifront deployments points to a small ecosystem friction with an equally small possible fix: especially in the early days of the network, Minifront moves with a faster development cadence than pd, which means that any given RPC node’s version of Minifront is likely out of date at any given time. Operators of frontends can instead already choose to deploy using GitHub - penumbra-zone/minifront-deployer, which gives them an always-current edge deployment of the frontend[1], but there’s no current way for an RPC to disable the baked-in, oft-outdated frontend and redirect to a better one.

I propose adding a command line flag to pd which instructs the built-in server to issue a redirect to an arbitrary operator-specified URL when someone visits the baked-in frontend site, allowing an RPC operator to offload the serving of the frontend to a separate, continuously updated, deployment. If there’s enough ecosystem uptake of this deployment option, it solves the issue of frontend deployment lag, making possible future quicker remediations of issues which can be addressed by frontend updates.


  1. Starling Cybernetics already does this with our frontend deployment https://stake.with.starlingcyber.net ↩︎

5 Likes

Thanks all for the comments.

From the feedback so far, it seems like the most important goal from the community is to re-enable flows between penumbra-1 and noble-1, and that unfreezing historical transfers is a secondary goal that should not delay resolution of the IBC transfer handling (although this is not unanimous).

Changing the behavior in either case (“option 1” or “option 2”) involves changing consensus rules. The Penumbra node software does not have height-dependent consensus rules, so any change that would affect re-execution of earlier blocks requires using the upgrade process.

This means that in any case, mitigation of the issue would require:

Note that in this process, the only difference between “option 1” and “option 2” is the contents of the upgrade migration:

  1. In “option 1”, the upgrade migration restores existing transfers;
  2. In “option 2”, the upgrade migration is a no-op, and existing transfers are restored in a subsequent upgrade.

Based on the feedback about direction described above, a few things happened to the PR linked above.

  1. The patch set was improved based on community feedback and review, as well as close (re)comparison with the IBC specifications.
  2. The PR was was extended with a no-op migration, to create a candidate software upgrade that would just apply the new consensus rules (aiming at “option 2”).

In parallel, additional inspection of the penumbra-1 and noble-1 chain states was performed. This revealed interesting data.

First, Penumbra deviates defensively from the IBC specification and the ibc-go implementation in a way that proves useful here. The ICS-20 specification only calls for balance tracking of native tokens in an escrow account, while non-native tokens are blindly minted and burned (see here and here). Instead, the ICS20 handler in Penumbra performs balance tracking for both native and non-native tokens:

  • For native tokens (originating on Penumbra), a withdrawal decrements the flow, and a deposit increments it.
  • For non-native tokens (originating on the counterparty chain), a withdrawal increments the flow, and a deposit decrements it.

This means that penumbra-1 maintains an on-chain view of the USDC it has received from the noble-1 chain, and this is queryable via raw state key access:

$ pcli q key "ibc/ics20-value-balance/channel-2/passet1w6e7fvgxsy6ccy3m8q0eqcuyw6mh3yzqu3uq9h58nu8m8mku359spvulf6"
088e87a083fb06

All data in the Penumbra state is Protobuf encoded. A Protobuf decoder reveals that this is the encoding of the uint

239182807950

This can be compared with the value of the escrow account on the noble-1 side, whose USDC balance (at time of analysis; note that values have 6 decimal places) was

272792209511

This provides one source of information on the exact amount of frozen funds, 33609.401561 USDC:

$ python3
>>> 272792209511 - 239182807950
33609401561

If the Penumbra chain had not done its own accounting, this check would not be possible.

Second, code was written to inspect the penumbra-1 chain state programmatically, and determine the exact set of packets whose acknowledgements were incorrectly executed. The sum total of USDC transferred by these packets is 33609.401561. This provides a second source of information on the amount of frozen funds, independent of the first, and they match exactly.

The previous, incorrect implementation of the ack handler deleted the packet commitment and performed no other changes to the chain state. Thus, re-inserting the packet commitments is the only change needed to allow re-execution of the ack handler, as noted by @johnletey:

Regarding the follow up upgrade, we’re in favor of simply re-adding the packet commitments to state (and potentially removing the existing packet acknowledgements).

However, as this code was already written (for the purposes of analysis, and to ensure that there was a solid understanding of exactly what the problem was), it was already available for the purposes of an upgrade migration. Additionally, as the inspection of which packets had their acknowledgements handled incorrectly is performed as part of the migration, it is believed that this would correctly handle any mis-handled packets right up to the moment of the upgrade, leaving no gap prior to correct handling.

Therefore, it appears that there is no additional delay required for option 1 versus option 2, and that it would be possible to fix the issue in a single upgrade after all.

This code, and a candidate migration, is now available in the original PR, and is ready for review and testing by the community in advance of a formal governance proposal for a chain upgrade.

4 Likes

In light of this, I have changed my perspective: I now think it would be best for us to attempt a single unified upgrade and migration to solve the issue in one fell swoop.

I therefore propose:

A test run for the fixes

Before the community votes on a proposal to implement this plan, it would be valuable to practice this upgrade migration. This would achieve increased confidence in both (a) the validity of the fixes in as close to real a testing scenario as possible, and (b) the operational facility of our mainnet validators (myself included) for decentralized coordination of Penumbra’s first consensus- and state-breaking upgrade on mainnet.

Running a realistic test, including a replication of the original bug and a remediation to it, would achieve this confidence so that validators and delegators can vote on an official proposal with conviction.

A proposed test scenario game plan

  1. Perform a testnet genesis with the current version of pd, using a very short voting period as initial chain parameter so we can exercise a vote quickly, and an equal genesis delegation to each participating testnet validator.
  2. Connect this network to the Noble testnet, which requires one participant volunteering to stand up and operating a relayer for the duration of the test.
  3. Send testnet USDC inbound from Noble testnet to ours.
  4. Send testnet USDC back out to the Noble testnet, triggering the bug — the test USDC funds will be frozen by this, replicating the mainnet issue.
  5. Hold an on-chain governance vote to propose an upgrade, vote to approve it, and wait for the test chain to halt at the upgrade height (this should take only a short time, because all participants will be online and the voting period will be artificially short).
  6. Every validator installs the new version of pd with the patches, and runs the upgrade migration to patch the state.
  7. The chain restarts once a critical 2/3+1 have applied the upgrade
  8. We check that relaying continues to work post-upgrade.
  9. We check that packets are cleared correctly and test funds are accessible.
  10. The participants in this exercise post our experience report and its findings to this forum to inform the mainnet upgrade.
  11. Together, the Penumbra community celebrates having participated in a complex realistic scenario test of a fix for this important issue — and we get ready to do it again for real in the near future when the upgrade proposal lands on mainnet.

Logistical coordination

I volunteer as lead coordinator for this test.

I believe it is important for this test to occur as soon as feasible, in order to permit swift resolution to this issue. I am prepared to begin the exercise as early as this Monday, August 5, at 15:00 UTC.

If you are a current mainnet validator, infrastructure operator, developer, or other interested Penumbra community member and you would like to participate or observe, please email finch@starlingcyber.net no later than the above start time. Please include in your email your name, your affiliation, your available start–end times in UTC for Monday, and what role you are willing to play in the exercise (e.g. one or more of “testnet validator”, “relayer operator”, “passive observer”, etc.). Those who volunteer will receive a link to a Signal group which will be used for further coordination.

Note that while passive observers are welcome with no commitment to stay for the full exercise, active participants should expect that the exercise will take at least several hours of time with high uncertainty of ultimate duration, and should commit to staying online until completion to ensure success of the exercise.

By working together as a community, we can ensure a swift, safe, and smooth resolution to this issue. Who’s with me?

Onwards,

—finch

4 Likes

I begin with, chains are sovereign:

Which means that one cannot assume the intentions of another chain, nor can they assume they may have a similar language.

To quote a very wise sorceress , “the entire point of the address being a string in the fungible token packet data is to not make assumptions at the ibc level about address encodings.”

This issue should most likely be solved via the clients, i.e. the relayers, chains should not assume the addresses are some specific encoding, the IBC spec makes room for any address encoding so that sovereign entities may use whatever encoding they desire. Of course malice, and evil will find their way into any land but the protectors have ways to address this! Both Hermes and rly have the ability to filter out spam transactions based on the ics-20 receiver field.

I propose that we engage with the Noble team and in unity, we can ensure interoperation while still keeping out malicious intent.

1 Like

The clarity of explanation of the root cause of this incident and potential routes to remediation as well as testing being done to ensure successful resolution is absolutely fantastic to read!

3 Likes

Hi everyone, another update: a candidate software version, including a migration, can be found here:

This new version was tested against the Noble testnet as follows:

  • A testnet using the current software version (0.79.3) was created, and connected to the grand-1 Noble testnet;
  • USDC was sent from grand-1 to the testnet;
  • USDC was sent back, causing the outbound transfer to get stuck;
  • The testnet was halted, and upgraded by switching to pd 0.80.0 via pd migrate. This exercises a migration that searches the chain state for incorrectly handled acks and reinserts the packet commitments. Because this migration inspects and repairs the chain state automatically, it is expected to work just as completely on mainnet as it did on the testnet. It should correct all errored transfers, even right up to the point where the upgrade occurs.
  • The testnet (but not relayers) was restarted, and it was verified that outbound transfers were still stuck, as there were no other changes to chain state;
  • Relayers were restarted, which automatically cleared packets and restored stuck funds;
  • Inbound and outbound USDC transfers were exercised, using a new pcli flag to control the address encoding. Bech32m transfers were rejected and returned, while Bech32 transfers were accepted.

It is believed that applying this upgrade would immediately clear stuck funds. pcli could be used to withdraw USDC immediately. Changes to enable USDC withdrawals via Prax and web frontends could be made following an upgrade but would require an update to Prax via the Chrome Web Store, which usually takes 1-2 days for approval.

2 Likes

Hi everyone,

Radiant Commons has proposed an upgrade on-chain for height 501975 in light of Penumbra Labs having released a new binary of proposed fixes to the IBC issues detailed above. The proposal is #3 on-chain and links to this forum post.

Please vote your position on the matter and if you are a validator, be prepared to upgrade at aforementioned height if it passes.

2 Likes

In light of this update, I am canceling my previously-planned test run tomorrow, Monday, August 5. Starling Cybernetics intends to vote YES on this proposal to resolve this issue by singular chain upgrade.

3 Likes

Let’s do a “why is it that the work that was supposed to allow safe IBC expansion said things were safe, when they were not”

… The SDK 45, 46, 47 versions of interchaintest are unmaintained.

I know this because I’ve used it and given support to users of ict:

And



The urls, before expansion into cards, point to the same GitHub org.


Regarding the decision, I see a lot of support around the idea of option 2, and would vote that way, because We really just need to fix things.


One of the long-standing issues in Cosmos is version fragmentation, and it is very legitimately a lot of work to stay ahead of it. In the case of noble, they have modified IBC, potentially in ways that are beneficial (I got that error message when I tested Noble for the issue present in P2P storms/ banana King and that error message actually to a large degree prevented one approach to the issue, although [stuff I reported to Noble]. Nonetheless, it doesn’t use the standard.

Checking versions and go.mod files saves lots of pain, and the tooling that is supposed to ensure that pain doesn’t happen, doesn’t prevent pain.


About Noble, nah this isn’t noble’s fault, chains are sovereign and addresses validation matters.

Was actually quite happy to see this in their code.


But we should be clear that there’s no universal solution to IBC testing, or even a solution.


the upgrade

Looks good!

My understanding of it is that basically it allows penumbra to provide noble compatible addresses to noble, and keep default behavior in other cases.

Quick update for everyone: I just participated in the upgrade process for Proposal #3 to fix the USDC withdrawal issues, and the chain is producing blocks again after the upgrade. As I understand it, relayers are working right now to process IBC packets and get transfers back on track, although we will have to wait for further confirmation to be 100% certain that the upgrade fully achieved its intended objective. Right now, from my perspective, it’s looking good.