Funding proposal - DAO data pipeline MVP

Summary
With the increasing launch of DAOs, there’s value in extracting events, transactions, and interactions from the Ethereum blockchain for analysis of governance, community, and protocol evolution. Tools like Dune Analytics and The Graph provide an interface to query indexed blockchain data and share code and results.

However, DAO transaction data is yet to be standardized or easy to construct meaningful datasets from. Without familiarity of a specific smart contract it can be challenging to write a query that returns accurate, trusted results. Knowing which source data to target for metric calculate also requires understanding of the DAO project: governance framework, launch process, tokens/funding source, etc.

With this proposal I will develop a data pipeline of DXdao launch metrics from the manifesto, document the process, and contribute the code for Diamond DAO. This can then be used as a template for other DAO projects we wish to include in our datasets.

Proposal

With this funding proposal I will deliver our MVP capability of:

  • DAO dataset curation
  • Calculating DAO meta metrics
  • Developing useful tutorial content

I will use Dune Analytics, The Graph, Etherscan, and any other tools to accelerate development.

Progress, questions, and discussion will be tracked in this Discourse thread: DAO Data Engineering

I may request help/input from other Diamond DAO members with web3/data mining experience as I scale the learning curve.

Deliverables include:

  1. Reproducible query code that returns DXdao launch metrics (scope in thread)
  2. “How to” article/blog post and documentation
  3. Any 3rd party dashboards, tool configurations, etc developed to complete the work

Estimated completion time: October 2021

Applicant: 0x97b9958faceC9ACB7ADb2Bb72a70172CB5a0Ea7C
Shares: 50
Payment requested: 1000

Thanks for sharing this @lemp.eth, I have a few questions.

  1. I’m not completely sure what you mean by launch metrics, can you provide some more specificity about what those metrics would be? The excerpt you included in the other post doesn’t seem to include any metrics other than the total number of REP holders, which can be queried via the The Alchemy subgraph.

  2. I had identified a few categories of metrics that I wanted to focus on, most of which can generated with data already available from The Graph / Snapshot etc, since each ecosystem (DAOhaus, Alchemy, Snapshot) has their own APIs and Graph Subgraphs.

The on-chain metrics I think we should start with are…

  • Voting power concentration (minimum wallets to affect a vote)
  • Proposal issuance & voting concentration
  • Membership growth (nice one)
  • Total value locked (TVL) & composition of Treasury assets
  • Token velocity (i.e. how long people tend to hold their token, if applicable)

(copy and pasted from here)

I think these metrics are differentiated from existing offerings (like Deep DAO). Are all of these metrics within the scope of what you envisioned?

  1. Should we limit the scope of the MVP to one DAO? On the one hand, a DXdao member is interested in doing an analysis of their REP holders and their voting activity, as a follow-up to this analysis: https://daotalk.org/t/decentralizing-dxdaos-voting-power/2362 I think helping her with that could be a great opportunity for us to gain visibility within that ecosystem; perhaps we could do a deep dive on DXdao. However, I believe we need to be positioned to report on the bulk of major DAOs in the next three months in some fashion; given that popular frameworks (Snapshot, DAOhaus, Aragon, Alchemy) that account for the majority of DAOs have already set up subgraphs to query across their entire ecosystems, I feel like we should expand our scope to at least covering those ecosystems (Snapshot, Aragon, DAOhaus, Alchemy).

  2. I understand that this infrastructure will be complex, but I think 50 shares is a bit high if that infrastructure will only enable reporting on one DAO. I would support 40 shares if we (a) had good documentation, (b) were able to report on the metrics I identified earlier in the post for the major ecosystems (Moloch/Aragon/Snapshot/Alchemy), (c ) were able to assist my colleague at DXdao with a deep dive on their REP holders.

Overall I think this is the right direction. I particularly appreciate the priority you’re giving to good documentation. I just want to make sure we are on the same page before committing several months to building infrastructure for this.

Following feedback from @amphiboly.eth I did more research into the landscape of DAO data availability and ecosystems. I’ll address specific points, and lay out a new scope for MVP proposal here.

Much of the focus for this scope was informed by A Comparative Analysis of the Platforms for Decentralized Autonomous Organizations in the Ethereum Blockchain by Samer Hassan et al., which reports on growth, activity, voting, and funds using available data from the Aragon, DAOhaus, and DAOstack Mainnet and xDAI platforms.

Target DAO data for this proposal

  • DAO Name
  • Voting System
  • Network (Mainnet or xDAI)
  • Data as-of Month
  • Data as-of Year
  • Token(s) used
  • Number of users
  • Number of active users
  • % of users who vote
  • Number of votes cast per user
  • % Positive votes
  • % Proposals approved
  • Funds in treasury

DAO data availability

The paper provides high level statistics on the four areas (growth, activity, voting, funds) as of November 2020, and from these three platforms obtains data on ~2,300 DAOs and ~72,000 members. By focusing on DAOs on these three platforms our Diamond DAO data product will have a sufficient volume of DAO activity to launch.

Aragon

Aragon documentation is here: https://connect.aragon.org/

DAOhaus

DAOhaus lists 360 DAOs on the “explore” page (included Diamond DAO, woot woot). Comparing to Aragon, DAOhaus subgraph documentation is much more accessible and clean.

The Graph DAOhaus subgraph: link

Also a DAOhaus Stats subgraph also exists with some interesting metrics already available (e.g. member count, proposal count, ragequit count, etc).

DAOstack

DAOstack uses the Alchemy platform for DAO management.

The Graph DAOstack subgraph: link

Github repo: link

Technology

I will use GraphQL to query The Graph, and Python for additional data manipulation. Data will initially be extracted to .csv format for easy ingestion into front-end framework (this can change as Diamond DAO platform evolves). Code will be version controlled in a Github repository.

High level data schema

Sample data records

Notes on scope
Some minor details may be modified (with input/agreement from Diamond DAO members) once I dive deeply into the various protocols and learn throughout the process of delivering this proposal.

This will be a lot of work to understand and build queries from the at least five separate subgraphs (Aragon data is distributed across 3 separate ones) and will need code in both GraphQL and Python. Process will be well documented throughout, and the overall output will provide a foundation that we can all build upon (not throwaway code or ad-hoc queries).

Considering this expanded scope from the original proposal @amphiboly.eth I propose 50 shares is reasonable for this effort.

Okay. I have some thoughts on the metrics starting at % of users who vote but i agree with this in principle. It’s great work already.

Do you think it would be helpful if Raid Guild helped with this?

Suggestions:

Metrics capturing decentralization of voting power.

  • Dictator,, a boolean field that captures whether or not there is a single wallet whose support is necessary to pass a proposal
  • Minimum winning coalition size: The minimum % of wallets necessary to pass a proposal.

Metrics capturing degree of participation in other DAOs

  • % of members in multiple DAOs
  • % of voting power held by members who participate in multiple DAOs

Following @steffbrowne’s post on social attributes, I really like the % of members in multiple DAOs metric (and there’s a lot of great content in that post beyond this one variable too).

These other metrics will be valuable, but more complicated to calculate and require the foundation to be built first. I propose adding them to a backlog vs. including them in this initial MVP. I need to keep a tight scope for this first iteration, and then have an efficient process for adding onto it.

Adding them to the backlog sounds reasonable. Are we building this in a way where someone (say @ro4438) could figure out how to calculate those and then add their queries?

Yes, one I set up our Diamond DAO Github all this backlog, milestones, next development steps will be transparent and easy for us to track.

@steffbrowne and I connected today to discuss this list, and she had really good feedback to slightly modify the target set of initial variables based on her knowledge and input. I’m really happy with the outcome.

Notes from our discussion

  • Voting System (remove from scope for now)

    • Can be hard to evaluate, usually a mishmash of voting platforms. Will actually require more data on technology used (Snapshot, different tech, etc). Probably better fit as a bigger component later.
  • Active users (keep in scope)

    • Will need a specific definition, e.g. voting activity within last 30, 60, 90 days, etc
  • Number of votes per cast user (keep in scope)

    • Median number of votes per user to report at the DAO level
  • Number of positive/negative votes per user (remove from scope for now)

    • Actually reflects the operations of the DAO, not a lot of value from these raw numbers alone - lots of consensus building happens in other places, needs much more focus to make these metrics useful
  • Funds in treasury (keep in scope, but modify)

    • Two most interesting metrics in financials were money in and money out
    • Need to show money in since last snapshot, current treasury, money out since last snapshot
    • Money out is velocity, money in is hope, current treasury is resting potential
1 Like

sounds good. look forward to discussing more on Friday.

With DAOs launching left and right, there’s a goldmine of information trapped on the Ethereum blockchain. But here’s the catch: it’s a tangled mess! Tools like Dune and The Graph exist, but extracting meaningful insights requires wrestling with unstandardized transactions and cryptic smart contracts. Just figuring out where to look for the right data to answer basic questions about a DAO’s launch, governance, or community is a head-scratcher.
This proposal focuses on DXdao, but the bigger question is: can we create a standardized approach to DAO data enrichment and analysis? Is a “one size fits all” template even possible across the diverse DAO landscape?