Ethereum Contract Application Binary Interface (ABI) in Elixir
Background
Nowadays Ethereum is one of the most successful blockchain platforms. Foremost things that made it possible are the Ethereum Virtual Machine (EVM) and smart contracts that are executed in the EVM.
Smart contracts are little programs written in high-level languages like Solidity or LLL specifically created for Ethereum. Firstly a contract’s code is compiled to the EVM bytecode and only then it can be executed in the EVM.
Application Binary Interface (ABI) is the standard way to interact with contracts in the Ethereum ecosystem, both from outside the blockchain and for contract-to-contract interaction. An account wishing to use a smart contract’s function uses the ABI to hash the function definition so it can create the EVM bytecode required to call the function.
In this post, I’ll describe how a function and its parameters are encoded to ABI format and how ABI encoding/decoding can be implemented in Elixir.
Specification
Let’s provide formal definition excerpts from the official ABI specification.
Function selector encoding
The first four bytes of the call data for a function call specifies the function to be called. It is the first (left, high-order in big-endian) four bytes of the Keccak-256 (SHA-3) hash of the signature of the function.
Argument encoding
Starting from the fifth byte, the encoded arguments follow.
Types can be static and dynamic. Dynamic types are:
bytes
string
T[]
for anyT
T[k]
for any dynamicT
and anyk >= 0
(T1,...,Tk)
if Ti is dynamic for some1 <= i <= k
Examples of static types:
uint<M>
: unsigned integer type ofM
bits,0 < M <= 256
,M % 8 == 0
. e.g.uint32
,uint8
,uint256
.bool
: equivalent touint8
restricted to the values0
and1
. For computing the function selector,bool
is usedaddress
: equivalent touint160
, except for the assumed interpretation and language typing. For computing the function selector,address
is used.function
: an address (20 bytes) followed by a function selector (4 bytes). Encoded identical to bytes24.
Definition: For any ABI value X
, we recursively define enc(X)
, depending on the type of X
being
(T1,...,Tk)
fork >= 0
and any typesT1
, …,Tk
enc(X) = head(X(1)) ... head(X(k)) tail(X(1)) ... tail(X(k))
whereX = (X(1), ..., X(k))
andhead
andtail
are defined forTi
being a static type ashead(X(i)) = enc(X(i))
andtail(X(i)) = ""
(the empty string) and ashead(X(i)) = enc(len(head(X(1)) ... head(X(k)) tail(X(1)) ... tail(X(i-1)) ))
tail(X(i)) = enc(X(i))
otherwise, i.e. if Ti is a dynamic type.
Let’s give examples of a couple of type encodings:
uint<M>
:enc(X)
is the big-endian encoding ofX
, padded on the higher-order (left) side with zero-bytes such that the length is 32 bytes.bool
: as in theuint8
case, where 1 is used for true and 0 for falseaddress
: as in the uint160 casebytes
, of lengthk
(which is assumed to be of type uint256):enc(X) = enc(k) pad_right(X)
, i.e. the number of bytes is encoded as auint256
followed by the actual value of X as a byte sequence, followed by the minimum number of zero-bytes such thatlen(enc(X))
is a multiple of 32.
Function Selector and Argument Encoding
A call to the function f
with parameters a_1
, …, a_n
is encoded as function_selector(f) enc((a_1, ..., a_n))
and the return values v_1
, …, v_k
of f are encoded as enc((v_1, ..., v_k))
i.e. the values are combined into a tuple and encoded.
Elixir implementation
Let’s see how the ABI encoding can be implemented in Elixir. Decoding is done in reverse order.
Encoding
As described in the specification we should concatenate a function selector encoding and a parameters encoding.
def encode(data, %ABI.FunctionSelector{types: types} = function_selector) do
# calculates parameter data encoding base on function selector's types
{result, []} = encode_type({:tuple, types}, [List.to_tuple(data)])
# concatenates function selector encoding and paramer encoding
encode_method_id(function_selector) <> result
end
We calculate a function selector encoding as the first (left, high-order in big-endian) four bytes of the Keccak-256 (SHA-3) hash of the signature of the function.
defp encode_method_id(function_selector) do
# Encode selector e.g. "baz(uint32,bool)" and take keccak
kec =
function_selector
|> ABI.FunctionSelector.encode()
|> ExthCrypto.Hash.Keccak.kec()
# Take first four bytes
<<init::binary-size(4), _rest::binary>> = kec
# That's our method id
init
end
Here comes the hard part. We should traverse all parameters and encode them based on their type:
- If the type is dynamic we have to encode its size and append it to the head and add the element’s encoding to the tail.
- If the type is static we have to append its encoding to the head.
defp encode_type({:tuple, types}, [data | rest]) do
# all head items are 32 bytes in length and there will be exactly
# `count(types)` of them, so the tail starts at `32 * count(types)`.
tail_start = (types |> Enum.count()) * 32
{head, tail, [], _} =
Enum.reduce(types, {<<>>, <<>>, data |> Tuple.to_list(), tail_start}, fn type,
{head, tail, data,
tail_position} ->
{el, rest} = encode_type(type, data)
if ABI.FunctionSelector.is_dynamic?(type) do
# If we're a dynamic type, just encoded the length to head and the element to body
{head <> encode_uint(tail_position, 256), tail <> el, rest, tail_position + byte_size(el)}
else
# If we're a static type, simply encode the el to the head
{head <> el, tail, rest, tail_position}
end
end)
{head <> tail, rest}
end
We can use neat Elixir pattern matching to determine if a type is dynamic or not.
defmodule ABI.FunctionSelector do
...
def is_dynamic?(:bytes), do: true
def is_dynamic?(:string), do: true
...
def is_dynamic?(_), do: false
...
end
Here’s a couple of examples of data type encodings.
...
defp encode_type(:bool, [data | rest]) do
value =
case data do
true -> encode_uint(1, 8)
false -> encode_uint(0, 8)
_ -> raise "Invalid data for bool: #{data}"
end
{value, rest}
end
defp encode_uint(data, size_in_bits) when rem(size_in_bits, 8) == 0 do
size_in_bytes = round(size_in_bits / 8)
bin = maybe_encode_unsigned(data)
if byte_size(bin) > size_in_bytes,
do:
raise(
"Data overflow encoding uint, data `#{data}` cannot fit in #{size_in_bytes * 8} bits"
)
pad(bin, size_in_bytes, :left)
end
...
ex_abi
Complete Elixir implementation is available in ex_abi
GitHub repo. In this short overview of ABI, I hope you got a basic understanding of this encoding.
Acknowledgement
- Geoff Hayes - original
ex_abi
contributor ex_abi
contributors
Update 2020
I completely re-wrote the encoder and made significant changes to the decoder, so this post is not up to date with the latest ex_abi
version.
Comments