Building a Mirror Data Pipeline

Hi everyone.

I have started working on the task of getting data from Mirror publications, based on the bounty posted here. stores all content on Arweave. Right now I am working on getting a list of all Arweave transactions for the mirror project. After those are collected, the data for each one can be pulled from the Arweave network.

Once that data is collected, I plan to part out the mirror publications which seem to be communities, and take a look at those.

I’ve made this repo for this first part of collecting arweave txs. Will update this thread with future progress.


1 Like

Quick progress report

So far, I’ve come up with the code to generate the collection of Arweave transaction ID’s, which include the wallet address of the contributor. Some preliminary findings around that data:

  • At the time of taking the initial data, there were a total of 75102 mirror articles, from 18412 contributors
  • 8745 contributors have a single article
  • The contributer with the most mirror publications is 0x942cBEa64876Ff0b2e23c0712B37Dc0091804e9c with 484.

I’ve exported three views into CSV’s, which are available in the repo:

1 Like

Nice. Started playing with the data. Can you link the Github repo when you have a chance? Created a notebook to look at overlap with different categories of wallets we have associated with DAOs (i.e. members, votes, etc).

Sure, it’s here.

Corrected total number of articles is 68463 — previous count had some duplicates.

I’ve now completed the Mirror data pipeline (repo here) and have posted some analysis as a twitter thread.

This data is still ripe for further exploration. Since I focused on creating the pipeline, future bounties could include connecting the pipeline to the chainverse database and setting up regular capture, or connecting the data to Chainverse and finding connections across wallet addresses.

I’ve updated the bounty to reflect the direction I took this project, and will request the bounty from the treasury shortly.

Hey! Andrew from the Mirror team here. Pipeline looks dope, I’m working on an enhanced data API to get stuff like this more easily.

Also for on-chain stuff I already put together a bunch of aggregated views on Dune (Dune Analytics). will attend the meeting next week to chat about collaborations and stuff :slight_smile: