DeGoogle: Photos
defaultOne of my bigger DeGoogle efforts so far has been getting off of Google Photos. The process took about a week, but I’m happy with the result and thought I’d share the process.
Motivation
I don’t want Google (or any one third party) knowing too much about me. With my devices synched to Google, I have to think twice about taking pics of sensitive docs, locations, and people thanks to their penchant for running AI/ML and OCR against all of the content I upload. They’ve collaborated with law enforcement in secret to perform unwarranted searches of user data, have complied with geofence warrants in the past, and they continue to self-deal and sell user behavioral data. I don’t care if it’s in aggregate or not. Put more simply, I simply do not trust them to care about me as a user, as they have no incentive to. Surely my own infra will still be hackable, but at least an attacker will have to work for it.
Tool Selection
After some lazy research I narrowed things down to three setups:
- PhotoPrism
- Immich
- NextCloud Photos
I chose immich because it has a native phone app and it had all the features I needed (tested migration path from Google Photos, search, self-hosting). It also turns out to have a fair bit of momentum behind it; if you choose it I highly recommend paying for a lifetime license, despite it being optional.
Installation
I opted to install on kubernetes, onprem. This was pretty straightforward, even though I opted to roll my own resources (I don’t like Helm for many reasons I won’t get into here, but there is an official Helm chart).
The components:
- deployments (immich-ml, rproxy)
- statefulsets (immich-server, redis)
- services (ClusterIP for immich-ml, immich-server, redis)
- associated configmaps and secrets needed by the services
I’m connecting to a shared postgres in a different namespace for the database state.
The rproxy establishes an connection to a static VPS, allowing for a custom domain and traffic routing from the internet (though there are lots of ways to solve this problem, including a tailscale/headscale setup, IPv6 routing or IPv4 PAT, et cetera).
Migration
The process was pretty simple:
- use Google Takeout to package and pull Photos data
- use immich-go to upload the zip contents
- inspect the immich-go logs to detect any errors or misses
- perform any other validation steps
- set up a backup
- delete everything from Google Photos
For a collection of tens of thousands of photos and videos, this worked mostly fine. I did have to modify my rproxy to support large files (for video uploads) but there were no other big problems.
Backup
For backup I run a k8s cronjob that does full (weekly) and incremental (hourly) backups of the data files to a local backup volume.
Depending on your compute substrate, you might decide to do this another way (e.g. a VPS with block-level snapshots and/or an ordinary cronjob).
The backup volume itself does offsite backups, preserving the path name and only uploading to the backup bucket if the file hash is not already there (SHA256). The contents are AES encrypted prior to being uploaded. I use Backblaze as a backing store for this but you could use anything (scp, ftp, rsync, S3, Wasabi, etc); a python script for this leveraging boto3 and the cryptography library is only a couple hundred lines.
The database backups are done in a similar way, being periodically dumped to a local volume then also copied to the offsite store.
Anyone doing a similar setup should not skimp on backups… this is one of the big value-adds of using Google (or any other cloud provider) and is tricky to get right, but it’s also essential. Do not sacrifice this essential quality, even if it’s hard to set up.
Monitoring
Nominal restores are run daily; simple probes of the site for fetching a set of shared images are also run. These are exposed as Prometheus metrics, graphed in Grafana and alerted on. As before, there are many ways to do these verifications and notifications. You don’t have to set up alerts, but having some basic monitoring may save you tears later on. It would suck to lose a disk and with it, all the copies of your photos. The cool thing is once you get good at this for photos, you can use the same tooling for anything else you decide to self-host as well.
Deleting From Google
I did this by hand; I suppose you could make an Goog API script to do it programmatically if you have 100s of thousands of photos, but for most people a hand deletion is probably quicker. Surely a bulk delete tool exists in the wild, I just felt it was quicker to highlight and delete in the UI.
Summary
Immich’s features are nice, my photo and post-processing workflow hasn’t been disrupted, and my data is all totally private now. The hosting costs are also cheaper (a couple dollars a month for backups vs. $10/month for Google One, and $1/month amortized from my shared VPS rproxy; YMMV). Even if it happened to cost a little more, but still in the same ballpark, it would be worth the gains in privacy for me.