Calldata Manipulation: ABI Encoding Edge Cases and Decoder Bugs — Darkwave Log

Calldata is the raw byte string that encodes a contract call: the function selector and every argument packed into a single contiguous buffer. The EVM treats it as an immutable, opaque byte array. The ABI decoder — whether Solidity’s compiler-generated code or handwritten assembly — is responsible for slicing it into meaningful values. The gap between “what the byte array contains” and “what the decoder produces” is where calldata manipulation vulnerabilities live.

This article walks through every layer of that gap: how encoding works at the byte level, where decoders tolerate ambiguity, how offsets can be weaponized, and what happens when calldata travels between contracts without being re-encoded.

ABI Encoding at the Byte Level

Every external Solidity call begins with a 4-byte function selector — the first four bytes of the Keccak-256 hash of the canonical function signature — followed by the ABI-encoded arguments.

[selector: 4 bytes][argument encoding: N × 32 bytes]

Each argument occupies one or more 32-byte slots. The encoding rules are defined by the ABI specification and split types into two families.

Static Types

A static type has a fixed, known size at compile time. uint256, address, bool, bytes32, and fixed-size arrays all encode directly into their slot(s), left-padded or right-padded according to type:

Integers and addresses: right-aligned (left-padded with zeroes) in 32 bytes.
bytesN: left-aligned (right-padded with zeroes) in 32 bytes.
bool: encoded as uint8 — 0x01 for true, 0x00 for false.

// uint256(42)
0x000000000000000000000000000000000000000000000000000000000000002a

// address(0xDEAD...BEEF)
0x000000000000000000000000deadbeefdeadbeefdeadbeefdeadbeefdeadbeef

// bytes32("hello")
0x68656c6c6f000000000000000000000000000000000000000000000000000000

Dynamic Types

Dynamic types — bytes, string, and any dynamic array — cannot be placed inline because their size is not known at compile time. Instead, the head slot for a dynamic type contains a byte offset pointing to where the actual data begins, measured from the start of the encoded arguments (not from the start of the full calldata, i.e., not including the selector).

// abi.encode(uint256(1), bytes("AB"), uint256(2))
// Offset of bytes("AB") = 0x60 (3 slots × 32 bytes)

0000000000000000000000000000000000000000000000000000000000000001  // uint256(1)
0000000000000000000000000000000000000000000000000000000000000060  // offset → bytes
0000000000000000000000000000000000000000000000000000000000000002  // uint256(2)
0000000000000000000000000000000000000000000000000000000000000002  // length of bytes = 2
4142000000000000000000000000000000000000000000000000000000000000  // "AB" right-padded

Understanding this structure is prerequisite to understanding every attack that follows.

Dirty Bits: Padding That Decoders Ignore

The ABI specification requires that padding bytes be zero. In practice, Solidity’s decoder does not verify this. Padding bytes outside the meaningful bits of a value are silently discarded during decoding. These spurious non-zero bytes are called dirty bits.

Where Dirty Bits Appear

In the upper bytes of a uint8, uint16, etc. (only the rightmost N bits matter).
In the upper bytes of an address (only the rightmost 20 bytes matter).
In the trailing bytes of a bytesN (only the leftmost N bytes matter).
In the padding bytes of a bool (only the lowest bit matters under raw assembly reads).

// This calldata encodes bool = true BUT with dirty upper bytes
// Canonical encoding: 0x0000...0001
// Dirty encoding:     0xdeadbeef...0001

function check(bool flag) external returns (bool) {
    return flag; // returns true — dirty bytes ignored
}

The Solidity-generated decoder masks the value: for bool it performs iszero(iszero(v)) or a direct and(v, 0x1) depending on context. But when developers write assembly-level calldata parsers, they often read the raw slot without masking, creating a discrepancy between the Solidity and assembly interpretations of the same calldata.

Dirty Bits and `ecrecover` Signature Verification

A more dangerous scenario arises in contracts that hash calldata directly:

function execute(address target, uint256 value, bytes calldata data, bytes calldata sig)
    external
{
    bytes32 digest = keccak256(abi.encodePacked(target, value, data));
    address signer = recoverSigner(digest, sig);
    require(signer == owner, "bad sig");
    (bool ok,) = target.call{value: value}(data);
    require(ok);
}

If the contract signs over a hash of the decoded arguments (not the raw calldata), dirty bits upstream of decoding are invisible to the signature check. An attacker replaces canonical calldata with dirty-padded calldata that decodes to the same values but carries different raw bytes, potentially bypassing replay protection schemes that rely on keccak256(msg.data).

Head-Tail Encoding and Offset Manipulation

The most structurally rich class of calldata attacks exploits the pointer-based nature of dynamic type encoding.

The Offset is Just a Number

Nothing in the EVM enforces that offsets are “sensible.” The decoder reads an offset from the head section, then jumps to argStart + offset to read the dynamic data. An attacker can craft offsets that:

Point into another argument’s data.
Point into the selector bytes (negative relative to argStart).
Point past the end of calldata, triggering an out-of-bounds read.
Create overlapping dynamic data regions so two parameters share the same bytes.

Overlapping Dynamic Arguments

// Function: process(bytes calldata a, bytes calldata b)
// Selector: 0xaabbccdd

// Head:
0000000000000000000000000000000000000000000000000000000000000040  // offset(a) = 64
0000000000000000000000000000000000000000000000000000000000000040  // offset(b) = 64  ← SAME

// Tail:
0000000000000000000000000000000000000000000000000000000000000004  // length = 4
cafebabe00000000000000000000000000000000000000000000000000000000  // data

Both a and b point to the same tail region. Solidity’s decoder simply follows the offsets and returns two bytes slices that alias the same memory. A contract that processes a and b independently — hashing them, copying them, comparing them — now operates on the same bytes from two logical “distinct” inputs. If the contract grants different permissions based on which parameter is which, this aliasing breaks that invariant.

Offset Pointing Into the Selector

The ABI offset is relative to the start of the encoded arguments, which begins at byte 4 of calldata (after the selector). A crafted offset of 0xFFFFFFFF...FFFC (i.e., -4 as a two’s complement interpretation, though the spec uses unsigned arithmetic) can cause the decoder to read past the beginning of the argument region. In practice this causes a revert in Solidity’s generated code, but custom assembly decoders that do not check bounds are susceptible.

Tail Data After End of Calldata

// Vulnerable assembly decoder (simplified)
function parse(bytes calldata) external {
    assembly {
        let offset := calldataload(4)     // read head
        let len    := calldataload(add(4, offset)) // read length
        let ptr    := add(add(4, offset), 32)
        // copy `len` bytes starting at `ptr`
        calldatacopy(0, ptr, len)
        // NO BOUNDS CHECK — len can extend beyond calldatasize()
    }
}

If len extends beyond calldatasize(), calldatacopy pads the out-of-bounds portion with zeroes. The decoder receives a byte slice that is longer than the actual calldata and silently zero-filled. A contract that parses a length-prefixed structure (e.g., a packed command format) from that slice will misparse the trailing zeroes as valid data.

Signature Malleability via Calldata Padding

A common pattern in meta-transaction systems, multisigs, and permit-style contracts is to compute a digest from function arguments and verify an off-chain signature. The vulnerability arises when the digest is computed from decoded arguments rather than from the raw calldata.

The Gap Between `abi.encode` and `msg.data`

abi.encode(a, b, c) always produces canonical encoding. msg.data is whatever the caller sent — it may be non-canonical. If a contract signs over keccak256(abi.encode(a, b, c)), the signature covers the canonical form. But the raw calldata msg.data may differ from the canonical form while still decoding to the same (a, b, c).

// Replay protection using nonces stored by calldata hash — VULNERABLE
mapping(bytes32 => bool) public used;

function relay(address to, uint256 amount, bytes calldata sig) external {
    bytes32 id = keccak256(msg.data); // hashes RAW calldata
    require(!used[id], "replayed");
    used[id] = true;

    bytes32 digest = keccak256(abi.encodePacked(to, amount));
    require(recover(digest, sig) == owner, "bad sig");
    token.transfer(to, amount);
}

An attacker observes a valid transaction, appends zero bytes to sig (or adds dirty bits to amount’s padding), and submits a new transaction. The raw msg.data differs, so keccak256(msg.data) produces a new hash and used[id] is false. The ABI decoder strips the dirty padding and executes the same transfer again.

Fix: hash decoded arguments, not raw calldata, or use a nonce per sender rather than a hash of calldata.

// Correct: hash canonical encoding of decoded arguments
bytes32 id = keccak256(abi.encode(to, amount, nonce));

Function Selector Collision via ABI Encoding Edge Cases

The 4-byte selector space is small. Two functions whose signatures hash to the same first 4 bytes are a selector collision. While accidental collisions are rare, they can be deliberately engineered in upgradeable proxy patterns or when a malicious contract is substituted.

Selector Collision Example

// keccak256("transfer(address,uint256)") → 0xa9059cbb
// keccak256("gasprice_bit_ether(int128)") → 0xa9059cbb  (contrived but possible)

interface IERC20 {
    function transfer(address to, uint256 amount) external returns (bool);
}

contract MaliciousToken {
    // Implements selector 0xa9059cbb with a different ABI
    fallback() external {
        // Decodes calldata as (int128) instead of (address, uint256)
        // Executes attacker-controlled logic
    }
}

ABI Encoding Edge Cases That Generate Collisions

Beyond hash collisions, the encoding of tuples, nested arrays, and bytes arguments can produce unexpected selector candidates:

(uint256[]) and (uint256[1]) have different signatures and different selectors.
(bytes) and (bytes32) encode differently but a naive parser treating both as 32-byte reads will behave identically for short inputs.
Trailing (spaces) in a signature string shift the hash. A contract that computes a selector from a user-supplied string can be tricked into routing to an unintended function.

// Dangerous: computing selector from external input
function dispatch(string calldata sig, bytes calldata args) external {
    bytes4 sel = bytes4(keccak256(bytes(sig)));
    (bool ok,) = implementation.call(abi.encodePacked(sel, args));
    require(ok);
}

An attacker crafts a sig string whose first 4 hash bytes match a privileged function in implementation, then supplies args that decode to the required parameters.

Custom Assembly Calldata Parsing Bugs

Solidity’s generated decoder is conservative and safe for well-formed calldata. Assembly parsers written for gas efficiency frequently omit the checks that make the generated decoder safe.

Missing Length Validation

function fastParse(bytes calldata payload) external {
    assembly {
        // payload.offset and payload.length are available in calldata context
        // but developer uses raw calldataload without checking length
        let cmd    := shr(248, calldataload(payload.offset))        // 1-byte command
        let target := calldataload(add(payload.offset, 1))          // next 32 bytes
        let amount := calldataload(add(payload.offset, 33))         // next 32 bytes
        // Total expected: 65 bytes. If payload.length < 65, reads bleed
        // into adjacent calldata or return zeroes — no revert
    }
}

If payload.length is less than 65, the calldataload instructions silently return zero-padded values. An attacker submitting a short payload causes target and/or amount to be zero, which may be a valid (and harmful) execution path — e.g., transferring zero tokens to address(0) while still advancing state.

Incorrect Offset Arithmetic After the Selector

A very common bug: forgetting that calldataload(0) reads the selector in its high bytes.

assembly {
    // WRONG: reads selector + first 28 bytes of first argument
    let firstArg := calldataload(0)

    // CORRECT: skip the 4-byte selector
    let firstArg := calldataload(4)
}

When the first argument is an address (20 bytes, right-aligned in its slot), calldataload(0) returns a 32-byte value whose high 4 bytes are the selector. A comparison like eq(firstArg, someExpectedAddress) will always fail because the high bytes are contaminated.

Unchecked Dynamic Array Lengths

function batchTransfer(address[] calldata recipients, uint256[] calldata amounts)
    external
{
    assembly {
        let recOffset := calldataload(4)   // offset to recipients array
        let recLen    := calldataload(add(4, recOffset))  // element count
        // Loop `recLen` times — recLen is attacker-controlled
        // No check that recLen == amounts.length
        // No check that recLen * 32 doesn't overflow
    }
}

The attacker sets recLen to a huge value. The loop iterates far beyond the actual calldata, reading zeroes and potentially calling address(0) with amount = 0 thousands of times, exhausting gas or causing unexpected state changes if those zero-address calls have side effects.

Passing Calldata Between Contracts

When a contract forwards calldata to another contract — as proxies, routers, relayers, and multicall contracts do — the raw bytes travel without re-encoding. This creates a class of vulnerabilities rooted in the semantic gap between the forwarding contract’s interpretation and the destination contract’s interpretation.

Proxy Forwarding Without Validation

// Transparent proxy — classic pattern
fallback() external payable {
    address impl = implementation;
    assembly {
        calldatacopy(0, 0, calldatasize())
        let result := delegatecall(gas(), impl, 0, calldatasize(), 0, 0)
        returndatacopy(0, 0, returndatasize())
        switch result
        case 0 { revert(0, returndatasize()) }
        default { return(0, returndatasize()) }
    }
}

The proxy forwards the full raw calldata including any dirty bits, non-canonical offsets, or appended junk. If the proxy performs any access control check based on the selector (calldataload(0) >> 224) before forwarding, that check is safe. But if the proxy checks arguments — for example, gating on msg.value or a parameter — using a different decoder than the implementation, the two decoders may disagree on what the arguments are.

The Multicall Re-Entrancy via Calldata Aliasing

Multicall contracts decode an array of (target, calldata) pairs and execute them in sequence. A crafted multicall payload can alias two calls’ calldata to the same bytes, or overlap a call’s data with the length field of another call, causing the second call to execute with garbage calldata.

// Simplified multicall
function multicall(bytes[] calldata calls) external {
    for (uint256 i = 0; i < calls.length; i++) {
        (bool ok,) = address(this).call(calls[i]);
        require(ok);
    }
}

If calls is ABI-crafted with overlapping offsets, calls[0] and calls[1] may refer to the same underlying bytes. A function that has different behavior on first vs. second call (e.g., nonce-based) is called twice with identical calldata — effectively replaying the first call using the second slot.

`msg.data` vs. Re-Encoded Arguments in Cross-Contract Calls

// Contract A: relays a signed operation to Contract B
function relay(address to, uint256 v, bytes calldata sig) external {
    // A verifies sig over abi.encode(to, v) — canonical
    require(verify(to, v, sig), "bad sig");
    // A forwards to B using raw abi.encodeWithSelector — canonical
    b.execute(to, v);
}

// Contract B: stores a hash to prevent replays
function execute(address to, uint256 v) external {
    bytes32 id = keccak256(msg.data); // B hashes its own msg.data
    require(!seen[id]);
    seen[id] = true;
    // ...
}

When A calls b.execute(to, v), it uses abi.encodeWithSelector internally, which produces canonical calldata. Contract B’s msg.data is always canonical in this path, so keccak256(msg.data) is a stable identifier. This is safe. The danger arises if B is called directly by an attacker with non-canonical calldata: the id differs, replay protection fails, and B executes a replayed operation.

`abi.decode` on Untrusted Calldata Slices

Passing a raw bytes memory slice to abi.decode when that slice came from external calldata without length validation is dangerous:

function processCommand(bytes calldata cmd) external {
    (uint8 cmdType, address target, uint256 amount) =
        abi.decode(cmd, (uint8, address, uint256));
    // abi.decode will revert if cmd.length < 96
    // BUT if cmd came from a calldataload-based assembly slice,
    // it may have been zero-padded to 96 bytes by the assembler
    // — silently passing validation with attacker-controlled zeroes
}

End-to-End Attack: Crafting Malicious Calldata

To make the above concrete, here is a step-by-step crafting of malicious calldata targeting a simple transfer(address,uint256) replay-protection scheme.

Target contract:

function transfer(address to, uint256 amount) external {
    bytes32 id = keccak256(msg.data);
    require(!processed[id], "replay");
    processed[id] = true;
    balances[to] += amount;
}

Canonical calldata for transfer(0xAlice, 100):

a9059cbb                                                         // selector
000000000000000000000000<alice_addr_20bytes>                     // address
0000000000000000000000000000000000000000000000000000000000000064 // 100

Malicious calldata (dirty upper byte on amount):

a9059cbb
000000000000000000000000<alice_addr_20bytes>
0100000000000000000000000000000000000000000000000000000000000064

The high byte 01 in the amount slot is dirty padding. Solidity’s uint256 decoder reads the full 32 bytes as-is — but wait, uint256 uses all 32 bytes, so a non-zero high byte changes the value from 100 to a massive number. In this case the dirty bit causes a different value, which may revert due to insufficient balance.

The right target for dirty-bit replay is a narrower type:

Malicious calldata (dirty upper bytes on uint8 flag):

function execute(address to, uint8 flag) external {
    bytes32 id = keccak256(msg.data);
    ...
}

// Canonical: flag = 1
00000000000000000000000000000000000000000000000000000000000000 01
// Dirty:
00000000000000000000000000000000000000000000000000000000000001 01

The uint8 decoder masks to the lowest byte: flag == 1 in both cases. But keccak256(msg.data) differs. The replay check passes for the dirty version even after the canonical version was already processed.

Mitigations

Normalize Calldata Before Hashing

Never use keccak256(msg.data) as a unique identifier for replay protection. Instead, hash the re-encoded canonical form:

// Vulnerable
bytes32 id = keccak256(msg.data);

// Safe
bytes32 id = keccak256(abi.encode(to, amount, nonce));

Use Nonces, Not Calldata Hashes

Nonces are unambiguous. A monotonically increasing per-sender nonce cannot be replayed regardless of calldata encoding:

mapping(address => uint256) public nonces;

function execute(address to, uint256 amount, uint256 nonce, bytes calldata sig) external {
    require(nonce == nonces[msg.sender]++, "bad nonce");
    bytes32 digest = keccak256(abi.encode(to, amount, nonce, address(this)));
    require(recover(digest, sig) == msg.sender, "bad sig");
    // ...
}

Validate Offsets in Assembly Parsers

Any assembly-level offset read must be bounds-checked before use:

assembly {
    let offset := calldataload(4)
    // Offset must be within calldata (excluding selector)
    if gt(offset, sub(calldatasize(), 4)) { revert(0, 0) }
    let dataStart := add(4, offset)
    let len := calldataload(dataStart)
    // Length must not cause reads past calldatasize()
    if gt(add(add(dataStart, 32), len), calldatasize()) { revert(0, 0) }
}

Mask Values Read in Assembly

After reading a slot in assembly, mask it to the expected type width:

assembly {
    let raw  := calldataload(4)
    let addr := and(raw, 0xffffffffffffffffffffffffffffffffffffffff) // 20 bytes
    let flag := and(calldataload(36), 0xff)                         // 1 byte
}

Validate Array Lengths Before Iteration

assembly {
    let arrLen := calldataload(add(4, offset))
    // Sanity cap — adjust to domain-specific maximum
    if gt(arrLen, 0xffff) { revert(0, 0) }
    // Confirm all elements fit within calldatasize()
    if gt(add(dataStart, mul(arrLen, 32)), calldatasize()) { revert(0, 0) }
}

Re-Encode When Forwarding Between Contracts

Do not forward raw msg.data when the destination contract relies on calldata structure for security decisions. Re-encode using abi.encodeWithSelector:

// Unsafe forward
(bool ok,) = target.call(msg.data);

// Safe re-encode
(bool ok,) = target.call(
    abi.encodeWithSelector(ITarget.execute.selector, decodedArg1, decodedArg2)
);

Calldata Security Checklist

Use this checklist during security reviews of any contract that processes external calldata, acts as a proxy or relayer, uses assembly for calldata parsing, or implements signature verification.

ABI Encoding and Decoding

No keccak256(msg.data) for replay protection. Use keccak256(abi.encode(...)) of decoded values or use nonces.
Signature schemes hash canonical encodings. Verify that EIP-712 or custom digest schemes cover all arguments in canonical form.
Dynamic offsets are validated. Any assembly code that reads ABI offsets checks that the offset does not exceed calldatasize() - 4.
Dynamic lengths are validated. Any assembly code that reads a length field checks that offset + 32 + length <= calldatasize().
No aliasing of dynamic arguments. When two dynamic parameters could theoretically alias (same offset in crafted calldata), the contract logic is safe regardless of aliasing.

Dirty Bits

Assembly reads are masked. Every calldataload result used as a typed value is masked to the appropriate width before comparison or storage.
bool values are normalized. Assembly reads of boolean slots use iszero(iszero(v)) or and(v, 1), not raw comparison.
address values are masked. and(calldataload(x), 0xffffffffffffffffffffffffffffffffffffffff) before use.

Selectors and Dispatch

No selector computation from user input. The selector is never derived from a string or bytes argument supplied by the caller.
Proxy contracts check that selectors map to authorized functions. Access control is applied before forwarding, not only inside the implementation.
Selector collision analysis has been run. Tools such as solc --hashes or a collision checker have been used to confirm no two functions in the system share a selector.

Cross-Contract Calldata Forwarding

Proxies re-encode sensitive arguments. Where feasible, proxies decode and re-encode rather than blindly forwarding msg.data.
Multicall contracts validate sub-call lengths. Each bytes entry in a multicall payload is bounds-checked before being forwarded.
Delegatecall targets are immutable or access-controlled. An attacker cannot substitute the implementation to exploit selector collisions or encoding differences.
No trust in msg.data length for authentication. Contracts that authenticate based on calldata structure do not assume a specific calldatasize().

Meta-Transactions and Relayers

Nonces are per-sender and monotonic. Replay protection does not rely on uniqueness of calldata bytes.
Signed digest covers address(this) and chainid. Prevents cross-contract and cross-chain replay.
Signature schemes are not malleable. ECDSA signatures are constrained to low-S form (using OpenZeppelin’s ECDSA.recover which enforces this).
sig length is validated before passing to ecrecover. A 65-byte check prevents the zero-padding dirty-sig attack.

Assembly Parsers

Every calldataload has a corresponding bounds check. No silent zero-padding from out-of-bounds reads is relied upon.
Loop bounds are capped. Array length reads from calldata are capped to a safe maximum before iteration.
Offset arithmetic is overflow-checked. add(offset, length) is checked against calldatasize() using gt, not lt, to handle wraparound.
Assembly parsers are fuzz-tested. Differential fuzzing against the equivalent Solidity decoder is used to find divergences.

Closing Remarks

Calldata manipulation is not a single vulnerability but a family of issues rooted in the same structural fact: the EVM accepts any byte string as calldata, and decoders decide what it means. The ABI specification defines a canonical form, but it does not — and cannot — enforce that senders comply.

Every layer of the stack introduces a potential gap: dirty bits ignored by the decoder, offsets that alias distinct arguments, assembly parsers that trust lengths without bounds checks, forwarding paths that carry non-canonical bytes to a downstream decoder operating under different assumptions. The attacks are quiet. They do not panic or throw obvious errors. They produce values that look valid.

The defenses are equally unglamorous: mask your assembly reads, hash canonical re-encodings, use nonces, and treat every byte of externally supplied calldata as adversarially crafted — because it may be.