Storage for Photographers

A proposal for an underserved market.

There comes a time in every photographer's life when they must ask themselves what to do about all those photo RAWs filling up the tiny-compared-to-spinning-platters SSD on their primary machine. Easy, just get an external hard drive. Redundancy? Just get a Drobo or Synology NAS. More redundancy? Just backup the Drobo to Crashplan or Synology to Amazon S3 or Glacier. Done right?

Note: I have since written a sequel about how I use a combination of NAS and cloud storage to backup my photos: Storage for Photographers (Part 2)

Sure those are all valid strategies for gadget lovers and professional photographers. I don't want to own, manage and think about more devices. Out of sight, out of mind. It all just takes up space, needs an extra plug and makes noise1. I've done a decent job paring down my belongings and am always seeking ways to simplify.

One of my favorite photographers Art Chang echoes this sentiment:

What a lot of photographers want is to leave their external hard drives, and constant redundant backups behind. Cloud storage is the way to go, especially since they already have the infrastructure setup to be redundant enough to not worry about your files.

It would then seem like all I want is to backup to S3 or Google Cloud Storage, perhaps with RRS or DRA, respectively, to save money. When you're talking 500GB of RAWs — a not unreal amount for any casual DSLR photographer that unloads 5-15GB per shoot — that'll cost $47 per month on S3 ($38 on S3 w/ RRS). We're talking around $500 per year for 500GB, not including bandwidth. Imagine the cost for more frequent hobbyist photographers with 9TB to manage.

With Glacier, Amazon's super cheap long-term storage, storing 500GB number whittles away to $5 per month, excluding retrieval2.

How important is quick retrieval to me? Surprisingly little. Let me explain.

What I Shoot

I almost always shoot in small (5.5MP) or medium (11MP) on my 22.3MP Canon 5D Mark III3. Update: I now shoot with Sony mirrorless cameras. I just don't need anything as large as 5760×3840 when all I do is downsize considerably to post online. This obviously helps with file sizes and it makes post-processing applications much snappier.

My shooting largely falls into three categories: nightlife & events, architecture & landscapes and family.

Dirty South performing at Ruby Skye, San Francisco
Dirty South performing at Ruby Skye

Event shots are usually just fleeting. They're relevant for a while and then I don't really need to keep the RAWs permanently. I cull the shots, process the good ones4, post on Facebook and tag my friends.

Bay Bridge Lights
Bay Lights Project taking over the Bay Bridge. Taken from One Rincon Hill with a 24mm prime.

I tend to keep most RAWs for landscape and architecture shots in case I want to go back and adjust how I processed them and try out new techniques. These are the most enjoyable shots to capture and edit and I can spend hours getting creative with Lightroom 5 as well as Google Nik Collection software like Color/Silver Efex Pro. Update: I talk about my Lightroom process in my new post: Building a Lightroom PC.

Just like it's hard to know when you're "done" designing a UI, you can keep finding interesting ways to post-process photos. That's why I love photography. Half of the fun is the journey of going out to shoot. The rest is the anticipation of what those shots could turn out to be after you load them up in Lightroom.

Michael on July 4th

However it's really the family shots that made me think more about my storage needs and write this article. I want to keep photos of my family, especially my niece and nephew as they grow up, our vacations and holidays for many years to come. Even some DSLR-taken HD baby videos.

I don't necessarily need access to these photos too long after I take them, nor do I publicly share these kinds of photos. I just need to know they are somewhere safe for decades to come.

I currently have several hundred gigs stored on Amazon Glacier via Arq, but that's not what this article is about.

Cloud-stored, locally-managed

Here's the relatively simple idea. I'll begin with a user story as designers are wont to do:

As a weekend amateur DSLR photographer, I just finished up post-processing a set of 367 family photos in Lightroom that I took over the July 4th weekend. I've completed my immediate task with the photos — sharing the best shots with family via email. I'm now done with the RAWs and don't need them in any foreseeable near future. I want to back them up for good and not have them fill up my 256GB SSD that only has 3GB left.

I open the folder where Lightroom imports are stored and drag these shots to the RAWbox (fake name for this ideal photo app) desktop application. I previously provided my Amazon AWS credentials and set it to upload all shots to Amazon Glacier by default. Also per my settings, RAWbox exported small 1000px-wide jpegs of all uploaded photos, just for me to locally see which photos are backed up.

There are a few photos I specifically tag as priority as I feel like I may want them sooner. These photos can be set to be stored on S3 instead and/or remain local.

RAWbox encrypted and uploaded the 6.62GB of RAWs. It took around 18 minutes to upload on my 50/50Mbps Internet connection. Upon a successful upload, RAWbox deleted the RAWs locally and let me reclaim my precious SSD gigs. I check the RAWbox catalog (backed up on S3) to see that the small jpeg exports of those 367 photos only consumes 201MB. I can also specify what resolution to store locally, to store just one local album cover shot or none at all5.

These local jpegs could be set to any size and are merely to see what photos I have backed up remotely.

The interface is not unlike iPhoto where I can see photos taken by year, album and location if I like. I can request that it initiate Glacier retrieval for individual photos or albums. If it was more robust than just an AWS front-end, it could do Glacier retrievals on its own, regardless of my laptop being awake and email me when they have completed and transferred to S3 for immediate download.

In a utopian world, RAWbox is an open-source app so there's no fear of the company shutting down (aside from Amazon, but I believe in Bezos) and waking up to a "download your archives" email.

There are of course issues with the RAWbox idea. It's still just file storage, so your Lightroom edits won't get saved in the RAWs unless you manually export .xmp sidecar files too or a Lightroom export plugin is built.

Idea tl;dr

  • a) Keeping anything near a terabyte or more RAW photos on your tiny SSD is not realistic and I'd prefer to not deal with external disks/NAS. Backup solutions like Dropbox and Backblaze are sync-focused and require everything to be on disk, so they are a no-go.

  • b) After I upload them, I don't need immediate access to the files. I could easily wait a week to retrieve a set of photos I needed at some point in the future.

The desktop app uploads all RAWs to Glacier or other such cheap, long-term commodity cloud storage. It exports tiny jpegs of the RAWs to make it easy to identify which photos you have backed up and allow you to retrieve individual photo RAWs from the cloud, or entire sets instead of having to rely on dates and album names. You get the benefit of significantly reduced local storage while still being able to manage TBs of RAWs safely stored in the cloud at a fraction of the cost of S3.

And last but not least you get to live life simply without those extra disks on your desk.

What do you think?

Note: I wrote a sequel to this article! Storage for Photographers (Part 2)

1 Initially I began searching for an external SSD enclosure that could be powered by a single USB cable instead of a y-cable but nothing (good) came up. And besides, it would most likely need to be a spinning disk as SSDs larger than 512GB are still pricey.

2 I won't go into the details but Glacier charges retrieval fees in addition to storage. Still much cheaper than S3 with any kind of reasonable and infrequent activity. Here's the tl;dr from Amazon: "Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time. You can retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. If you choose to retrieve more than this amount of data in a month, you are charged a retrieval fee starting at $0.01 per gigabyte. In addition, there is a pro-rated charge of $0.03 per gigabyte for items deleted prior to 90 days."

3 Initially I did this because I thought bayer interpolation wasn't used on the smaller file formats and it resulted in ever so slightly sharper images, but that seems to have been debunked.

4 The majority of shots at music venues tend to be groups of people posing. Those are easy to work with (some simple presets to adjust white balance, maybe a crop and little else) but it's a largely boring task to go through a few hundred photos from a night and pick the ones where people aren't blinking, et cetera. I tend to mix in some more artistic shots of performers on stage and I take my time post-processing those and tinkering with settings.

5 Just for your reference in the app so you know what photos look like the next time you wish to request a Glacier retrieval. Otherwise you'd be relying on your folder naming alone. This is similar to how Lightroom keeps thumbnails of photos lying around, even if you move the originals or disconnect an external drive.