UIP: Spend backreferences

One idea for improving Penumbra sync speeds in the future is to implement some version of “DAGSync”, which is the idea that clients can walk the DAG of their transaction graph to scan faster.

In DAGSync, once a client detects a single transaction relevant to them, they can observe all of its outputs and check if each output note is unspent. If the output note is spent, they can learn which transaction spent it, and repeat the process, quickly traversing their own transaction history to reach a subset of notes that are still live.

However, Penumbra (and Zcash) only allow traversing the user’s transaction graph in one direction. The reason is that Spend actions only reveal the nullifier of the note they spend, and there is (by design) no way to go “backwards” from the nullifier to the note commitment or the note itself.

This limits the effectiveness of DAGSync, because it means that the choice of the “seed note/tx” has large impacts on how much of the user’s history can be discovered. One mitigation is the idea of “knitting”, prioritizing newly received notes for spending in order to “knit” them into the graph and make their discovery unimportant, but this has other downsides – it makes it easier for attackers to tag activity (e.g., sending many new small notes, which would be prioritized for spending, causing a detectably high-arity transaction), and it doesn’t really solve the user need of wanting to find their complete transaction history.

Instead, if the Spend action were augmented with a new field, backref_commitment, like so:

message SpendBody {
    // A commitment to the value of the input note.
    penumbra.core.asset.v1.BalanceCommitment balance_commitment = 1;
    // The nullifier of the input note.
    penumbra.core.component.sct.v1.Nullifier nullifier = 6;
    // The randomized validating key for the spend authorization signature.
    penumbra.crypto.decaf377_rdsa.v1.SpendVerificationKey rk = 4;
    // NEW: An encryption of the commitment of the input note to the sender's OVK.
    bytes backref_commitment = 7;
}

Then a client could traverse their transaction graph forwards instead of backwards. This would allow searching for a “seed note/tx” starting from newest rather than oldest, and allow finding more branches of the tree without needing to perform knitting.

This field would not be checked by any consensus logic, so a client wishing to have forward secrecy can fill in invalid data. (I don’t think the default clients should support this behavior, but it’s important to be clear about this).

If this change were to be made, it would be good to make it as soon as possible, because it only allows accelerated sync for transactions after it lands. Because Penumbra still has very few users, there are not yet that many transactions, so now would be the second-best time to add this field (the best time having been prior to mainnet).

Questions:

  • Should the backreference be the note commitment, or the transaction hash of the transaction that created the note?
    • My initial preference would be the note commitment, since it’s more self-contained – there’s no extra data needed to be plumbed in. There might be a slight increase in RPC calls needed by the client (first looking up a txid by commitment and then fetching the tx) but this seems pretty unimportant or fixable at the RPC level.
  • Should the field be optional, to allow a phased adoption period, or required, so that all transactions are indistinguishable?
    • My initial preference would be a phased adoption period, making it optional initially and then required later, to minimize breakage of client software.
  • Should the field be part of the effect hash?
    • If not, what are the security effects?
    • If so, what is needed to ensure adding the field to the effect hash doesn’t break anything (in case of a phased adoption / optional field)?
1 Like

With regards to effect hashing:

I think we should include it in the effect hash. Including it means that a transaction is not malleable in this field. If it is malleable, someone could—in theory—replace the back reference with something else. This capability could be used for several potential things, including:

  • Replacing a valid backreference with an invalid one to make vulnerable clients “forget” about certain funds
  • Replacing a valid backreference with a different valid one to try and and influence syncing for enhancing distinguishability of clients.

There are also other possibilities.

2 Likes

In order not to break the EffectHash mechanism and allow for a phased transition, we could do something like: define a new SpendV2 proto message with a new type_url, etc… This message would be identical to the previous Spend except for the additional backref_commitment field in the body. For some period of time, either could be allowed in transactions.

1 Like

Having a SpendV2 message feels like it’s signaling a change in semantics rather than adding a new optional field. I think we just need to spec the EffectHash changes correctly so that:

  • Existing messages (no backreference) are hashed identically as before
  • New messages (with backreference) are hashed differently than any message with a backreference

We’ll also want to think through:

  • How is the note commitment encrypted? What is the actual data in the new field? (And should it be called something like encrypted_backref to signal that it’s the encrypted wrapper rather than the cleartext data?) Should this reuse some parts of the existing “payload” construction used for Output actions – which do OVK-wrapped keys – or use the OVK directly?
  • What data needs to go in the TransactionPerspective ? The TxP can’t have access to long-term key material like the OVK, so should the TxP:
    • have nothing at all?
    • just have the contents of the backreference?
    • reveal some keys that allow the holder of the TxP to decrypt the backreference and verify it independently?
  • What should go in the TransactionView? This is linked to the previous question, as the perspective generates the view. Some options:
    • Nothing at all - don’t expose the backreference in transaction views, treating it only as an internal sync optimization
    • Backreference contents when the Spend is Visible

My initial feeling is that the simplest thing is to exclude backreferences from TxP/TxVs entirely, and treat them only as a sync optimization detail. They can always be added to the TxP/TxV later, but this minimizes the required changes to just speccing how to use the OVK to encrypt the note commitment, how to name the resulting Protobuf field, and how to correctly include it in the EffectHash. Does that seem right?

Some related discussion on Bluesky: @hdevalence.bsky.social on Bluesky

For encrypting the note commitment, this is similar to swap encryption, where we derive a symmetric key from the Ovk and the swap commitment, and then we use a fixed nonce to encrypt such that we do not have duplicate (key, nonce) pairs [0]. Three options are:

  1. We encrypt using a symmetric key derived from Ovk, using a random nonce, e.g. something like:
key = BLAKE2b-512("Penumbra_Backref", ovk)

This adds the cost of a nonce to the size of the spend backreference ciphertext. This means we can scan all spends using this key without having to do key derivation each time, which is a nice benefit.

  1. We encrypt using the Ovk directly as above in 1 and use a nonce derived from another public field such as the nullifier. This field must be guaranteed not to repeat.

  2. We encrypt using a key derived from the Ovk and other public fields in the Spend action such as the Nullifier, which means we can use a fixed nonce (again provided that the public fields do not repeat).

[0] Payload Keys - The Penumbra Protocol

One thing to consider is that if we provide a specialized RPC for streaming just the backreference ciphertexts, then deriving the nonce from public data becomes more annoying, since you then need to also have this data, so it’s more efficient, I think, to just include the nonce with the ciphertext in that case.

Without this RPC, then you’re actually saving something, since you also have the data.

2 Likes

Started a draft UIP here: UIP: Spend Backreferences by redshiftzero · Pull Request #2 · penumbra-zone/UIPs · GitHub

There’s also been discussion about a hybrid codesign using an FMD system to detect the first note, from which point DagSync would take over and traverse a local subgraph. There are interesting design considerations here like: (1) where do you start scanning the chain, (2) how should the TCT be modified to support “partial” state updates, (3) how to integrate a linear scan for background detection for filling in the rest of the partial state, (4) could webgpu hardware acceleration be used, etc. These could be included in the UIP.

I think these would be great topics for the Research forum category but would be good to keep separate from the initial scope in the UIP

1 Like

The UIP has been written up and reviewed here: UIP-4 - Penumbra Improvement Proposals (UIPs)

A next step would be to have a completed implementation for review by the community.

UIP-4 Spend Backreferences by redshiftzero · Pull Request #4922 · penumbra-zone/penumbra · GitHub is the implementation of this proposed UIP.