Select * From Web3

It's hard to believe that 2021 is almost coming to an end, and my last update of the site was back in February.

As you probably can tell from the title, I started seriously going down the rabbit hole of crypto/web3 since the middle of this year (during Sydney's 2nd lockdown). According to ethereum.org, the term Web3 refers to decentralised apps that run on the blockchain that allow anyone to participate without monetising their personal data. It's also been one of the 2021 buzzwords as a rebrand of crypto.

I've been enjoying it so far, and the highlight of this year is being part of the OurNetwork Learn x Dune Analytics Web3’s Learn-to-Earn Data Cohort Program. It is Web3's first learn-to-earn data program where a cohort of 30 students spent 30 days to learn from some of the best data professionals in the Web3 space. Students also got to earn ETH bounties from protocol each week, which was designed to motivate course completion and commitment. Throughout the 30-day program, I not only had first-hand experience doing on-chain analytics, but also made connections with so many talented people both in and outside crypto.

This post summarises my learnings from the program, with a specific group of audience in mind, i.e. data analysts and data scientists. If you're completely new to SQL, I highly recommend taking a look at one of our teachers Andrew‘s guide on basic SQL while learning Ethereum at the same time. You can also find all the lectures recordings and materials here.

This guide will cover the following topics:

  1. Web3 Data Flow and Landscape
  2. Decoding Transaction Data
  3. Analysis Logic Flow
  4. DEX (Decentralised Exchange) Data
  5. Useful SQL Guide for Dune Analytics

Web3 Data Flow and Landscape#

As a data person, the most challenging part of Web3 data is understanding the data itself. One thing that really clicked is to treat Ethereum as a giant data warehouse and protocols on top of Ethereum blockchain as various schemas containing multiple tables and views in the warehouse. This applies to other blockchains too.

I'm going to borrow one of the charts from the program, which shows three ways that you can query raw blockchain data. Most of the Web3 data folks are working and building on data mappers like Dune Analytics and Flipside Crypto with decoded blockchain data and built-in visualisation tooling.

There are also products like The Graph, Bitquery and Covalent that ingest and index blockchain data accessible via API endpoints.

A senseless gif
The Web3 (Raw) Data Space. source: https://docs.google.com/presentation/d/1I6vDOS52uMCmWg3KIbuKYe8lK-8_ol9bEksIUEHNNLI/edit#slide=id.gfac7071356_0_1

Below is another very helpful flow chart that breaks down a typical token transfer flow and how data get emitted from transactions, traces and logs on the Ethereum blockchain. Transactions are grouped into a block that is chained together with its previous block.

A senseless gif
Data Lifecycle of a Token Transfer. source: https://docs.google.com/presentation/d/1I6vDOS52uMCmWg3KIbuKYe8lK-8_ol9bEksIUEHNNLI/edit#slide=id.gfac7071356_0_6

Decoding Transaction Data#

We can first try understanding transaction data from the Ethereum blockchain by using Etherscan:

  • On the blockchain, each transaction has a unique hash transaction hash with all sorts of transaction details. For example, we can look up this transaction hash 0x9d999b50f5e205f47d289dba3e0ae020e5dfe0d4d3cbbebd46668a1dbcb65f48 here
  • From is the sending party of the transaction, and To is the receiving party of the transaction, which can also be a contract address. In this case, it's a transfer from Coinbase 1 to Axie Infinity contract
  • Value is the ETH value that was transferred. Sometimes the ETH value can be 0 but in this case, there's 102.56 ASX token transferred, valued at $15k on the day of transfer
  • Transaction Fee = Gas Used by Transaction * Gas Price and you can find more details related to Ethereum gas here
  • Input data contains additional information of the transaction. In this case, it's a transfer function with details including address to and value of the transaction
    A senseless gif
  • The data itself is bytecode and has datatype bytea on Dune
  • The first 8 characters 0xa9059cbb are the MethodId of the function signature, which is not always unique (but should)
  • Every subsequent 64 characters is a different input variable. Here we have two, corresponding to the to address and value

Based on the information above, we can use the following query to pull data on Dune -

select 
  "block_time",
  "from",
  "to",
  "value" as eth_value,
  "gas_used" * "gas_price"/1e18 as gas_price_eth, 
  substring(encode("data", 'hex'), 1, 8) as method_id,
  bytea2numeric(decode(substring(encode("data", 'hex'), 73, 64), 'hex')) / 1e18 as token_amount
from ethereum."transactions"
where "hash" = '\x9d999b50f5e205f47d289dba3e0ae020e5dfe0d4d3cbbebd46668a1dbcb65f48'

You can find a more comprehensive guide by Andrew on working with all the data from a Ethereum transaction here.

Analysis Logic Flow#

Now that you've queried your first transaction data on the blockchain, you might wonder how to quickly get yourself familiar with a protocol and its data. Below is the flow I go through to work around -

  1. Assume you have a protocol to look into, e.g Ethereum
  2. Figure out the entry point function by either checking protocol app page/docs, or inspect a specific transaction using transaction hash
  3. Double check if you're using the right contract address, entry point function, you can pick a handful of random transaction hash and look up on Etherscan
  4. Once you have a question or a goal, you can proceed to query the data. Note that not all function call data would have all things you need, e.g. missing transaction signer, which can be found in ethereum."transactions" or the ETH value, which can be found in ethereum."traces"

Furthermore, you can check if the protocol you're looking at has been decoded either by searching on data tables on the left side of Dune interface or use this query by Boxer.

If you're dealing with non-abstractions (see abstractions here), you typically have to work with base tables like transactions, traces, logs and data from other protocols to complete your query.

Don't forget there's always an easier way to get started by inspecting other people's queries and dashboards. You can find some example queries on Dune, as well as a collection of utility queries.

DEX (Decentralised Exchange) Data#

As per Coinbase, a decentralised exchange (DEX) is a peer-to-peer marketplace where transactions occur directly between crypto traders. DEXs foster financial transactions without intermediaries like banks or brokers. At the time of writing, DeFi has $103b Total Value Locked (TVL) according to DeFi Pulse.

Below is a high level view by Danning on what a typical DEX trade table would look like:

A senseless gif
How should a trade table look like. source: https://docs.google.com/presentation/d/1EzgEGGuscQWBBZDinmBHt_JzROSnPKPz7cKTwXD3bDM/edit#slide=id.gfd1b94fbf6_0_10

Note that category is a flag used to differentiate if it's a DEX or an Aggregator. This is to avoid double counting events when an Aggregator creates log for swap events through Uniswap, which will also log a transaction. Furthermore, Aggregators could also be represented as trader_a or trader_b, which could potentially underestimate the actual number of retail traders. In this case, it's better to check tx_from in the table.

Below is the sudo code if you want to create table for multiple DEXs/Aggregators:

A senseless gif
Sudo code for a trade table. source: https://docs.google.com/presentation/d/1EzgEGGuscQWBBZDinmBHt_JzROSnPKPz7cKTwXD3bDM/edit#slide=id.gfd71581e3d_0_41

Useful SQL Guide for Dune Analytics#

Finally, here is a quick summary of SQL tips when you work on Dune, which adopts PostgresSQL dialect.

Quotation mark for table and column names

  • Reason being that PostgresSQL doesn't recognise camel case column and table names without quotation marks
  • In Postgres, double quotes are reserved for tables and columns, whereas single quotes are reserved for values
  • As a result, Dune would throw an error if you query a table with upper case in the table name
-- this works without the quotes
select 
  block_time
from ethereum.logs
limit 100

-- below also works
select 
  "block_time"
from ethereum."logs"
limit 100

-- this will not work 
select 
  *
from erc20."ERC20_evt_Transfer"
limit 100

Inline Ethereum addresses

  • In Dune, Ethereum addresses are stored as PostgresSQL bytearrays which are encoded with \x prefix rather than the customary 0x when you look up on Etherscan

Removing Decimals

  • If you are working with token amounts, remember to check the decimals in either erc20."tokens"
-- to remove decimals
value / 10^erc20.tokens.decimals

-- to check token and decimal
select 
  *
from erc20."tokens"
where "contract_address" = '{{contract_address}}'
  • Alternatively, you can directly look up on Etherscan by typing the token name or contract address. For example, below is the profile summary for Uniswap with its contract address and having 18 decimals. You can also use prices."usd" or dex."trades" to get the USD value
A senseless gif

Identifying contract addresses in transactions

  • There are times we may want to differentiate a contract address from a normal one, and you can do this by check if CREATE opcode was called in traces, e.g.
-- identify contract address with `CREATE` opcode
select 
  distinct "address"
from polygon."traces" 
where
  "type" = 'create'
  and "success"

Leveraging labels on Dune

  • A label is defined as an identifying contract addresses in transactions. Essentially you can either use labels to look up addresses and see what they are (see the label table below) or create dashboards like this one here
A senseless gif
source: https://docs.dune.xyz/data-tables/data-tables/labels#what-labels-looks-like
  • In Dune, you can use this UDF to get labels for addresses you look at, e.g.
select 
  "trader_a", 
  labels.get(trader_a, 'activity') as label, 
  sum(token_a_amount)
from dex."trades"
where "token_a_symbol" = 'DAI'
  and "block_time" > now() - interval '24 hours'
  and not labels.get("trader_a") is null
group by 1
order by 2 desc
limit 100
  • You can also directly look up address labels here, as well as more documentation here

Generating time series

  • If there's a dataset with discontinuous time series, you can use the following:
with days as (
  select generate_series('2021-10-01'::timestamp, date_trunc('day', now(), '1 day') as day -- generate all days from 2021-10-01 till now
)
select 
  d.day,
  coalesce(t.amount, 0) as amount
from day as d 
left join table as t -- table with missing dates 
  on d.day = t.day

User generated view

  • The schema dune_user_generated is an easy way to construct your own view, function or table inside Dune's database. More details can be found here
create or replace view dune_user_generated.table_name as 

select 
  *
from another_table

Afterword#

Congratulations on making this far 🎉! I hope this post can help you to get started somewhere. The learning curve is so steep when it comes to crypto/Web3, and we are all learning constantly. It is a space that's going through rapid development and high variance, where new ideas, products, communities and experiments spring up every day. It touches or will impact so many aspects in life, including but not limited to payment, finance, art, music, gaming, community, governance and identity, which makes it exciting for anyone to participate. I'm still learning, and feel free to reach out if you need help on getting started.