People say you should build stuff in public and just be completely open about the goals and progress, both as an accountability step but also I think to show your work to people who might help out and suggest different paths. So here goes.
I’m creating a web app for litigators called DocketView and the primary feature right now is taking Notices of Electronic Filing from the federal courts’ Pacer system and turning it into a database of PDFs organized by date and document number, as well as by court and name of the matter.
After an exhaustive review, it seems like the best approach is to do this as a “serverless” app with a bunch of discrete modules so it can “scale to zero” in the industry parlance — not charge me for the computer time when the computer’s not computing. Because there’s very little traffic on the weekends and evenings and no sense paying for a box to run 24/7 when that’s really not needed.
I investigated the pricing and performance of a number of different serverless providers, small scrappy start-ups like Amazon, Microsoft, and Google, as well as a new generation of companies like Vercel and Netlify and Supabase. It can work on all of them and the costs are more or less in line, except I think I may be facing a huge bill for egress if this gets traction and I have a few hundred lawyers and their office staff pulling down PDFs every hour. Which is to say, storage and egress of files.
When I was originally sketching this out a few years ago, the old school idea of some jQuery on the front end with a MySQL database in the back chugging away was appealing. But that’s like $150 per month in fixed costs on a hosted platform like Digital Ocean or Linode (or AWS or Azure) and as the database grows the storage can require a larger disc space. And on top of that the egress charges are frightening: 500 gb per month is included, unless you power down the compute instance in which case that reduces your included bandwidth accordingly, and are $0.01 per gb after that.
Serverless providers, by contrast, don’t seem to charge for the web app itself (not the largest part of the potential costs, to be sure) but they do charge for data leaving a storage bucket. AWS’ S3 is the original version of this, Super Simple Storage, and has different tiers of pricing for “egress” based on how close to the edge you want it. For ice cold, don’t think I’ll need it -type archives, it’s very cheap. But this ain’t that and current pricing on close-to-the-edge is free for the first 100 gb and $0.09 per gb after that up to 10 TB and falling as the data grows larger. Supabase does something similar but the “free” is smaller and the egress is cheaper. There are all various dials to turn to calculate pricing.
Cloudflare, a massive infrastructure provider, got into the serverless field several years ago and has shockingly good pricing: free. As in beer. As in, no egress charges at all. They’ve added compute modules called Workers and a database service they call D1, and their bucket storage to compete with S3 is called R2 — one notch better on everything, I guess. I think they’ve got cabinets in a thousand datacenters all over the world, a truly massive company but sort of sneaky stealthy because they seem not to care all that much about winning hearts and minds of developers.
So I picked them. And I’m now in the process of figuring out the architecture of what-goes-where and how to make it all fit together. Stay tuned!