A brief history of the Patron Watch project.
January 27, 2015
Today I started researching Patreon.
I was talking with my colleague Christina Xu about patterns in crowdfunding and support for creative work over time, and we decided to dig in and find some answers.
Our plan is to monitor the behavior of project funding over time, interview people about their experiences running a project, and mash our findings together into an enlightening story.
The first step was to get our hands on some data. Patreon doesn’t publish a comprehensive index of projects, but I discovered a URL from a previous version of the website that was still working:
page; by incrementing it, you can scroll through a comprehensive list of projects.
2appears to indicate reverse chronological order.
So there’s the index of projects we need. Add a web scraper for individual project pages, slap together a data model, and we’re in business!
February 3, 2015
One tidbit we learned was that most of his fans don’t collect their rewards or expect anything in return—they simply want to support him to keep doing what he’s already doing.
His insight was much more interesting (and fun!) than any of the numbers we gathered thus far.
March 16, 2015
Patreon acquired their competitor Subbable and announced a “matching” program to placate Subbable users, offering up to $100,000 in matching funds for the first 45 days of the transition.
March 31, 2015
There was a sudden, sustained spike in all levels of funding. Is it related to the matching program or the acquisition, or is it pure venture capital black magic?
I didn’t touch the crawler.
The median pledge (only counting projects with at least one patron) jumped from $0.42 two days ago to $3.60 today.
May 5, 2015
There was a small but sharp decline in median pledges: down to $3.50 from a steady $4.00. I suppose this was the end of the matching program.
I should note that our numbers don’t always agree with the figures quoted by Patreon. Their framing can be a little weird if you want to reason about a single project, but major discrepancies are most likely due to an incomplete crawl and not any statistical funny business.
July 21, 2015
I shut off Yahoo! Pipes, which had been our main page fetcher, and reluctantly ordered Google Compute Engine (previously the understudy) to pick up the slack.
Pipes had some issues with excessive politeness: it cached aggressively, timed out early, and obeyed robot directives strictly. But the main issue was Yahoo’s announcement that Pipes was shutting down at the end of September.
Pipes was the only way I know to make a major search engine scrape the web for you.
August 13, 2015
I checked up on the crawler logs and noticed that Patreon had removed their
/discoverNext URL. Good for ‘em, tightening up security!
Small parts of the site layout also changed, and this broke the corresponding scraping code.
With the comprehensive feed of new projects gone, and no candidate for a substitute, it seems that the data collection portion of this research project is winding down.
September 30, 2015
Suddenly I can easily obtain the original database, not just incomplete snapshots. All that web scraping is obsolete in the shadow of this shiny new hi-fi artifact.
But I can’t bring myself to load the database dump. Using hacked data feels morally questionable, while crawling the public web does not.
October 5, 2015
I wonder who is combing through the leaked database. Researchers, business competitors, curious creative types, snoopers, scammers, stalkers, and malicious attackers can all take a peek.
It’s not just the leak that’s making me rethink things, though. The index we were using to find newly created projects is gone. We’ve lost momentum as our attention shifts to other projects. And this feels like a rotten time to interview, study, and audit Patreon’s users, who now have harassment, fraud, and theft to worry about.
We are left with quite a few unanswered questions and unfinished inquiries. These include:
How consistent is crowdfunding support over time? How does funding variability correlate with a project’s number of supporters, subject matter, and deliverables?
What can project creators reasonably expect from subscription-based donation support? Do they need to sell merchandise to survive? How can they expect different sizes of audiences to behave?
What are the best ways to classify a project? Patreon does not have clear-cut categories. We classified them roughly by medium (like writing, video, software, and comics), but this is a perfect opportunity to apply some machine learning.
Our aggregate numbers didn’t match those released by Patreon. Did the crawler miss a lot of projects? Or was there private activity we couldn’t see?
What happened on March 31? Was there a problem with our scraper before then?
But rather than use the cheat code (or is it the teachers edition?), we are letting this project go.