Understanding Hash Collisions: abi.encodePacked in Solidity

Security

October 1, 2024

Intro

Solidity developers commonly use encoding functions like abi.encode and abi.encodePacked to create inputs for the keccak256 hash function. But improper usage of abi.encodePacked with dynamic data types, can introduce vulnerabilities. In this article, we'll explore the intricacies of hash collisions in the context of smart contracts, examine a real-world example of how this vulnerability can be exploited, and discuss best practices and mitigation techniques for a more secure code.

What Is a Hash Collision?

Hash collision is when a hash function creates the same output for two different inputs. Example: hash(A) == hash(B). It is an undesirable property since we expect the hash function to always produce unique outputs. There are hash functions that are vulnerable to hash collision, however, in Solidity we use keccak256 which is collision-resistant, meaning it would always output a unique hash for a unique input. The problem may arise when developers assume that the input is unique, but because of how abi.encodePacked works they may end up with the same input to the keccak256 function, and as a result, get the same hash output.

abi.encode and abi.encodePacked

  • abi.encode: This function is the standard for ABI (Application Binary Interface) encoding. It securely encodes Solidity data types into a standardized format. Using abi.encode, each argument gets padded to a fixed 32-byte size, reducing the risk of ambiguity between arguments.
  • abi.encodePacked: This alternative is more efficient, encoding data without padding, making the result significantly smaller. However, this compact encoding creates a vulnerability when multiple dynamic types such as arrays are packed together, as their boundaries become ambiguous, leading to potential hash collisions.

Differences Between abi.encode and abi.encodePacked example

While both functions encode data, they are used for different purposes:

  1. Padding: abi.encode pads each value to a 32-byte length, reducing ambiguity but increasing the output size.
  2. Effectiveness: abi.encodePacked produces a smaller output size by packing data, but this introduces the possibility of hash collisions when using dynamic types.
  3. Use Case: Developers should use abi.encode when data integrity is crucial, particularly with variable-length arguments. abi.encodePacked is better suited for purposes where space-saving is prioritized over security.
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
import "forge-std/Test.sol";

contract EncodingExample {
    // Function to demonstrate the usage of abi.encode
    function encodeData(uint _id, string memory _text, address _addr) public pure returns (bytes memory) {
        return abi.encode(_id, _text, _addr); // we use abi.encode
    }

    // Function to demonstrate the usage of abi.encodePacked
    function encodePackedData(uint _id, string memory _text, address _addr) public pure returns (bytes memory) {
        return abi.encodePacked(_id, _text, _addr); // we use abi.encodePacked
    }
}

contract EncodingExampleTest is Test {
    EncodingExample public encoder;

    function setUp() public {
        encoder = new EncodingExample();
    }

    function testEncodeData() public {
        uint id = 1;
        string memory text = "Hello, world!";
        address addr = 0x1234567890123456789012345678901234567890;
        bytes memory encoded = encoder.encodeData(id, text, addr);
        bytes32 hash = keccak256(encoded);
        emit log_bytes32(hash);
    }

    function testEncodePackedData() public {
        uint id = 1;
        string memory text = "Hello, world!";
        address addr = 0x1234567890123456789012345678901234567890;
        bytes memory encodedPacked = encoder.encodePackedData(id, text, addr);
        bytes32 hashPacked = keccak256(encodedPacked);
        emit log_bytes32(hashPacked);
    }
}

abi.encode output

0x000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000600000000000000000000000001234567890123456789012345678901234567890000000000000000000000000000000000000000000000000000000000000000d48656c6c6f2c20776f726c642100000000000000000000000000000000000000

abi.encodePacked output

0x000000000000000000000000000000000000000000000000000000000000000148656c6c6f2c20776f726c64211234567890123456789012345678901234567890

You can easily notice the difference in output length between abi.encode and abi.encodePacked. The issue arises in Solidity when abi.encodePacked is used with dynamic types like arrays. This method removes padded zeroes, which can create ambiguity in the encoded data. In certain cases, a malicious user could exploit this by crafting inputs that appear valid but actually give them an unfair advantage in the system if the data isn’t properly verified.

Let’s look at a real-world example below.

Example of a Hash Collision bug in a real-world smart contract

Source: Solodit.xyz — Rengo Labs Uniswap Core-Router Casper

fn permit (& mut self, public_key: String, signature: String,
owner: Key, spender: Key, value: U256, deadline: u64,) {
//..
//..
let data : String = format! (
" {}{}{}{}{}{} ",
permit_type_hash, owner, spender, value, nonce, deadline);

let hash : [ u8 ; 32] = keccak256 ( data . as_bytes ());
//..
//..
}

The permit function above creates the data String without any delimiters between parameters, making it vulnerable to a hash collision attack. An attacker can reuse the same signature, with different values in order to steal tokens.

The code above is written in Rust, but we’ll replicate this in Solidity. We will make a similar version of the function above, that you can use in Remix or your local testing environment.

Note that instead of using the uint256 type for our numeric values we will use the type string type to mimic the behavior of the original Rust function. If we were to use unit256 the test would fail because Solidity pads uint256 ’s with zeroes by default and the hash collision won’t happen.

Since in Solidity, a string is an array of bytes, this setup will allow us to mimic the intended behavior.

Dummy permit function

You can create a new file in your local .src folder called HashCollisionExample.sol and paste the code below.

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

contract HashCollisionExample {
    function permit(
        string memory permit_type_hash,
        address _owner,
        address _spender,
        string memory value, //normally uint, but we use string for demonstration purposes
        string memory nonce, //normally uint, but we use string for demonstration purposes
        string memory deadline //normally uint, but we use string for demonstration purposes
    ) public pure returns (bytes memory) {
        bytes memory data = abi.encodePacked(permit_type_hash, _owner, _spender, value, nonce, deadline);
        return data;
    }
}

Test file

Create a new test folder or use an existing one and create a new test file called HashCollisionTest.t.sol.

Here you must import the previous contract and the Test.sol file from the Foundry.

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

import "forge-std/Test.sol";
import "../src/HashCollisionExample.sol";

contract HashCollisionExampleTest is Test {
    HashCollisionExample public hashCollision;

    function setUp() public {
        hashCollision= new HashCollisionExample();
    }

    function testPermit() public {
        string memory permit_type_hash = "someString";
        address _owner = 0x1234567890123456789012345678901234567890;
        address _spender = 0x1234567890123456789012345678901234567891;
        string memory value = "10000";
        string memory nonce = "0";
        string memory deadline = "1500";
        bytes memory encodedPacked = hashCollision.permit(permit_type_hash, _owner, _spender, value, nonce, deadline);
        bytes32 hashPacked = keccak256(encodedPacked);
        emit log_bytes32(hashPacked);
    }

    function testMaliciousPermit() public {
        string memory permit_type_hash = "someString";
        address _owner = 0x1234567890123456789012345678901234567890;
        address _spender = 0x1234567890123456789012345678901234567891;
        string memory value = "100000"; // Adding one extra zero
        string memory nonce = "1";      // Incremented nonce
        string memory deadline = "500"; // Reduced deadline
        bytes memory encodedPacked = hashCollision.permit(permit_type_hash, _owner, _spender, value, nonce, deadline);
        bytes32 hashPacked = keccak256(encodedPacked);
        emit log_bytes32(hashPacked);
    }
}

Test output

EncodingExampleTest::testPermit()
    ├─ emit log_bytes32(val: 0x94183f5d7bc0dad4504585f2159c4dccbeb3f1e39c6375170bb59185be351de8)
    
EncodingExampleTest::testMaliciousPermit() 
    ├─ emit log_bytes32(val: 0x94183f5d7bc0dad4504585f2159c4dccbeb3f1e39c6375170bb59185be351de8)

Above you can see that the hash output is the same, although the input to the permit function was different. This is how the hash collision vulnerability manifests.

In the example above, a malicious user could forge some input parameters that allow him to withdraw a bigger amount of tokens and bypass the security checks. Luckily, this bug was uncovered during an audit, and the vulnerability was mitigated and it wasn’t deployed to production.

Remember that this vulnerability may arise only when abi.encodePacked is used on multiple dynamic type data that are next to each other. It is a specific vulnerability that can occur in unique circumstances.

Mitigation Strategies

  1. Use abi.encode more often. Prioritize abi.encode over abi.encodePacked for encoding dynamic types to avoid this vulnerability.
  2. Use Unique Separators: When using abi.encodePacked, add unique separators between arguments to distinguish them. Alternatively, hash each input before packing them together.
  3. Thorough Testing: Perform rigorous testing to identify potential hash collisions, particularly when handling dynamic type arguments.
  4. Security Audits: Regularly audit smart contracts to detect vulnerabilities, especially in data handling and encoding logic.

Conclusion

In this article, we went through what the hash collision vulnerability is and how it can manifest by using a real example. You should be aware now of how encoding functions like abi.encode and abi.encodePacked can impact data integrity. By implementing best practices and reviewing the code, developers can safeguard their contracts against this vulnerability.

If you are looking for a professional security review get in touch with the Nethermind Security team of experts.

Schedule a consultation now

Disclaimer: This article has been prepared for the general information and understanding of the readers. No representation or warranty, express or implied, is given by Nethermind as to the accuracy or completeness of the information or opinions contained in the above article. No third party should rely on this article in any way, including without limitation as financial, investment, tax, regulatory, legal, or other advice, or interpret this article as any form of recommendation.

Latest articles