How I Use Amazon S3

June 8, 2008 · 40 comments

This post suggestion has been sitting in my Skribit account for a while so I thought I would finally address it. S3 is Amazon’s developer-aimed online storage solution. In recent years, consumer-friendly applications and tools have added support for Amazon S3, making it a viable backup solution for anyone, especially those with a fast Internet connection. The price is right too: cheap.

I won’t go through the details of getting started with S3 as I briefly mentioned that in my S3 server backup post.

What Stays and What Goes: Relying on the Cloud

As a general rule of thumb, if I have worked on a file in the last 10 days or so I keep it on my machine and on S3. If I don’t use it that often, it goes to my S3 account and I delete it from my computer. I’ve gotten into the habit of manually putting files into S3 with Transmit.

Panic Transmit S3 Folders

I have most of my files in the same S3 bucket and within that bucket I have a few folders to organize my documents.

JungleDisk rsync s3 PDFs folder

I’ll be the first to say this is not the most efficient method. Launching a separate app to transfer files breaks up the user experience from the seamless action that it should be. I imagine that most people would prefer using something like JungleDisk and its automatic backup feature. Once JungleDisk is fired up, your S3 account is accessible through a regular Explorer/Finder/Nautilus window.

JungleDisk Backup

Other Uses

After a comment by lead developer for WordPress, Mark Jaquith, I’ve additionally started backing up my home directory (/Users/Paul) with rsync and JungleDisk. While it is possible to backup my home directory entirely with JungleDisk, I’m more accustomed to having control with rsync. I usually run a command like this after JungleDisk is active:

rsync -avvz --size-only --delete --exclude .DS_Store --exclude .Trash --exclude .svn --exclude Library/Caches --exclude Library/Mail/IMAP-myemailaccount@gatech.edu --exclude Library/Application\ Support/Inquisitor/IconsA/  --exclude Library/Logs --exclude Library/Application\ Support/SyncServices/ /Users/Paul /Volumes/JungleDisk/

That command has many excludes to make the entire rsync quicker. Transferring a few large files is a lot faster than transferring many small files due to opening and closing connections, latency and the like. There’s no need for me to backup my Mail.app inbox since it is already on the mail server with IMAP. If you end up running rsync often, it’s worth setting up an excludes file.

JungleDisk rsync s3 home directory

Backing up your entire home directory, including music, will take a long time during the first run regardless of the connection you have. Transferring many small files such as Adium chat logs takes time. For that reason, most people will be better off using S3 for important documents and leaving larger backups to an external hard drive and imaging software like Carbon Copy Cloner or Super Duper.

Rsync backup music jungledisk
Example rsync backing up music to S3 via JungleDisk. Bandwagon is easier to work with though.

Do you use Amazon S3? How so? All of that being said, if my laptop (my only computer, except for a window manager-less media server) were stolen right now I wouldn’t lose enough data to worry about. Most everything I could possibly need is on Amazon S3, Gmail and a hosted SVN account.

PaulStamatiou.com runs on the Thesis Theme for WordPress

How smart is your Theme?  How good is your support? Check out ThesisTheme for WordPress.

Thesis is the search engine optimized WordPress theme of choice for serious online publishers. If you’re a blogger who doesn’t understand a lot of PHP, Thesis will give a ton of functionality without having to alter any code. For the advanced, Thesis has incredible customization possibilities via Thesis hooks.

With so many design options, you can use the template over and over and never have it look like the same site. The theme is robust and flexible enough not only to accommodate a site like PaulStamatiou.com, but also to enable the site to run far more efficiently than it ever has before.

{ 9 trackbacks }

Weekend Reading: June 08 2008
June 8, 2008 at 12:44 pm
A » Blog Archive » How I Use Amazon S3
June 8, 2008 at 2:57 pm
Box.net - online storage with an iPhone twist | The-iBlog
June 8, 2008 at 4:32 pm
Amazon S3 - The Beginner’s Guide | Web Tricks
September 22, 2008 at 8:05 am
Amazon S3 - The Beginner’s Guide | Web Tricks
September 22, 2008 at 8:05 am
Блог Волотко Дмитрия - Это нормально © :: Entries :: Amazon S3 - руководство для начинаю
September 22, 2008 at 9:00 am
Amazon S3 - The Beginner’s Guide | Web Burning Blog
October 19, 2008 at 7:48 am
Holiday Giveaway: 1.5TB of Seagate Drives - PaulStamatiou.com
December 7, 2008 at 2:16 pm
Elle date de combien de temps ta dernière sauvegarde? | PoXd - The Pollux World
February 26, 2009 at 12:55 pm

{ 31 comments… read them below or add one }

1 Guillermo Esteves June 8, 2008 at 2:13 am

I’m curious: how much data do you have in there, and how much (on average) do you end up paying?

Reply

2 Paul Stamatiou June 8, 2008 at 2:29 am

I paid $11 last month and I would say I have maybe 20 gigs on there. But I also use S3 to serve public files, such as some things on this site, so I don’t think it’s safe to go by my numbers.

Reply

3 Michael Buckbee June 8, 2008 at 2:48 am

I’m still backing up a good deal of my work files to S3, but have migrated my personal (music, movies, apps) over to Mozy. For personal use they have an unlimited plan for around $5 a month.

Reply

4 Tim Trueman June 8, 2008 at 3:08 am

I’ve been waiting to hear exactly how well this works, thanks for the article. I’m curious how reliably Time Machine is supported…

Reply

5 Jon Stacey June 8, 2008 at 4:11 am

Out of curiosity what is your backup plan should your MBA die (heaven forbid) from a hardware point of view? Obviously you would have access to all of your files, but the tools and environment you are used to would be lacking in a 1 computer setup.

Back on topic: I’ve played around with S3 but don’t make regular use of the service. It makes for a nice emergency file storage option if I happen to be on a fast network and don’t have room on my laptop or server but it just isn’t feasible in my every day situation: Time Warner 6mbps down, 256kbit up if I’m lucky. Just transferring working files (~8GB) would be a painfully long first upload. Additionally, to backup all of my digital assets (~250GB) the cost is rather high ($37.50/mo). In comparison, the two 500GB external Western Digital drives that I bought have paid for themselves (1 for local backups/storage, the 2nd mirrors the 1st and is taken off-site).

That’s my situation anyways. I am considering using it to backup my server however for a faster restore in a catastrophic event.

Reply

6 ThinkingSerious June 8, 2008 at 4:21 am

Nice post! How do you use S3 with SVN?

Reply

7 rmaspero June 8, 2008 at 4:37 am

The initial price of a backup seems quite a bit. The idea of backing up to S3 sounds appealing but the speed of my internet would be my concern. I have one question for you Paul does Jungle disk just back up the changed files.

Tim Trueman I don’t believe that Time Machine works with S3 at the moment but that would be nice if it did.

Thanks for the posts it was a really good insight to how you use S3.

Reply

8 Brendan Falkowski June 8, 2008 at 8:13 am

Been waiting for a writeup on S3, now thinking it might be overkill to keep synced backups externally and cloudy. Manually syncing (or rather dumping) works great at first, but volumes of music+pictures change incrementally not uniformly. I feel like sync-management software is always bloating in process of updating. Liking the looks of Drobo the Storage Robot though.

SidebarAds? A few somebodies must think you’re special :)

Reply

9 Tim Fletcher June 8, 2008 at 10:38 am

You say..

‘If I don’t use it that often, it goes to my S3 account and I delete it from my computer’

Of course this means you only have a single copy of a file i.e. no backup. Do you consider Amazon’s S3 a secure and reliable enough service to be a sole point of storage? Do you not also quickly throw stuff onto an external drive as a failsafe?

Reply

10 Oli from the-iBlog.com June 8, 2008 at 10:46 am

I like the idea of online data storage, but until upload speeds increase, I think local solutions such as Time Machine are still the way to go for backups. If you use smaller files on the go though, something like this seems to be a good solution. Does anyone use Dropbox?

Reply

11 Zac Garrett June 8, 2008 at 11:33 am

I’ve been using S3 with jungledisk for a little over a month now. I have no idea how I got along without it. I backed up all of the pictures I’ve taken as well as my personal files and it cost me about $4 for the month. That’s about 25gb of data.

I’ve got jungledisk setup to only backup specific directories, ~/pictures, ~/documents, and a few others. I found out that I have a ton of junk on my computer that would be quite easy to replace. Downloaded software would be a pain to get back, but I don’t really think it is worth paying money to back them up.

The one problem I’ve come across with doing this sort of backup is that you can’t change directory structure without reuploading everything. I don’t like the structure of my pictures directory, but with 20+ gigs of data it would be costly to switch the format. For new pictures I’ve changed the structure, but I have yet to go through the hassle of moving the old files.

Reply

12 Dan Cameron June 8, 2008 at 11:48 am

I would suggest Mozy if you plan on backing up more than 30GBs. The unlimited plan at $5/month can’t be beat and their tool is basic and the equivalent to rsync.

What I do:
RAID setup for local backup of important files
Backup my servers to my local RAID
Backup as much as I can through Time Machine
and Backup everything to Mozy 50+GB

Hope that helps someone.

Reply

13 Tim Linden June 8, 2008 at 12:20 pm

I started using it to server static content for my sites. Started with my blog using “Amazon S3 for Wordpress” which moves files I upload with wordpress. Then I used S3Fox for the other files. But I realized S3Fox didn’t set the expires headers so I took apart the wordpress plugin to figure out how it did it and made my own hack of a script to upload the rest of my content. (will probably release it when it’s user friendly) My next move is to host all the user generated images on S3 so I can reduce the load on my dedicated server. After realizing that I’m using up apache slots to serve it all, moving to S3 will help in all areas.

Then I also figured out how to use “BackupManager” on my server to copy the MySQL database over daily. It can do home directories, incremental, and syn too but I’m just using it for DB right now (have NAS for all that). Even wrote a tutorial on how to set it up!

I want to thank you for those other posts Paul, because for whatever reason that’s what motivated me to try out S3. I think previously I just thought of it as some difficult thing to use but you pointed me towards the tools (s3fox) that got me excited about it. And PS – Your OpenID field showed me the URL to the last person’s OpenID (Dan Cameron)

Reply

14 Jon Stacey June 8, 2008 at 12:31 pm

Out of curiosity what is your backup plan should your MBA die (heaven forbid) from a hardware point of view? Obviously you would have access to all of your files, but the tools and environment you are used to would be lacking in a 1 computer setup.

Back on topic: I’ve played around with S3 but don’t make regular use of the service. It makes for a nice emergency file storage option if I happen to be on a fast network and don’t have room on my laptop or server but it just isn’t feasible in my every day situation: Time Warner 6mbps down, 256kbit up if I’m lucky. Just transferring working files (~8GB) would be a painfully long first upload. Additionally, to backup all of my digital assets (~250GB) the cost is rather high ($37.50/mo). In comparison, the two 500GB external Western Digital drives that I bought have paid for themselves (1 for local backups/storage, the 2nd mirrors the 1st and is taken off-site).

That’s my situation anyways. I am considering using it to backup my web server for a faster restore in a catastrophic event.

Reply

15 Matthew Williams June 8, 2008 at 12:46 pm

Have you taken a look at Mozy (http://mozy.com/)?

$5 unlimited storage. Native Mac client. I’ve been really torn between going with Mozy or an S3 solution similar to what you’re running. I need to try out their MozyHome Free which gives you 2gb of backup.

Reply

16 James Cassell June 8, 2008 at 12:53 pm

You say, “I wouldn’t lose enough data to worry about.” But, would that small amount of data be valuable to the person who stole your computer? Do you have any kind of hard drive encryption going on to prevent someone abusing your data?

As far as my own data goes, I built a RAID-6 server that holds a backup of all my files and media (actually, it’s the primary storage for my media.) My laptop hard drive is encrypted to prevent my data from falling into the wrong hands.

Reply

17 GuillaumeB June 8, 2008 at 1:24 pm

@Oli: I do use Dropbox and find it amazing. As a tester I got 5 gigs free but in the future they plan to have storage options. They have based their service on S3 so that should be cheap.

i dont need much storage I dont understand why people backup their entire hard drive. I format my MBP way to often for that. Instead I simply have coupled Dropbox to Backup.app to process to regular incremental backup of my Document folder.My medias usually stays on a separate external hard drive and i might actually invest in Time Capsule to access them wirelessly from the Finder.

The advantage of Dropbox, IMO is the web interface access and the seamless experience. It’s actually really easy and very fast.

Reply

18 Julien June 8, 2008 at 1:27 pm

I use S3 as well… and with CyberDuck.

What I would love to have however is a TimeMachine that supports S3…

Reply

19 ken June 8, 2008 at 2:09 pm

Can it save metadata, like Spotlight indexing information?

Is there an easy way to do (client-side) encryption?

Reply

20 Paul Stamatiou June 8, 2008 at 3:47 pm

@Tim – “I’m curious how reliably Time Machine is supported…”

I tried getting Time Machine to backup to JungleDisk using the unsupported network volumes plist hack but that didn’t work. =/

@ThinkingSerious – “Nice post! How do you use S3 with SVN?”

Oh I was talking about something separate and unrelated – Assembla.com SVN hosting for Skribit-related code/files.

@Zac – “The one problem I’ve come across with doing this sort of backup is that you can’t change directory structure without reuploading everything”

I hate that too!

@Dan – “I would suggest Mozy if you plan on backing up more than 30GBs. The unlimited plan at $5/month can’t be beat and their tool is basic and the equivalent to rsync.”

I would probably consider Mozy, but I use S3 for many things already such as server and Skribit backups so it’s more convenient for me.. and I trust Amazon more. Does Mozy have an SLA (ie, 99.99% uptime guarantee)?

Reply

21 Jennifer June 8, 2008 at 3:58 pm

On the subject of file backup, sharing and storage …

Online backup is becoming common these days. It is estimated that 70-75% of all PC’s will be connected to online backup services with in the next decade.

Thousands of online backup companies exist, from one guy operating in his apartment to fortune 500 companies.

Choosing the best online backup company will be very confusing and difficult. One website I find very helpful in making a decision to pick an online backup company is:

http://www.BackupReview.info

This site lists more than 400 online backup companies in its directory and ranks the top 25 on a monthly basis.

Reply

22 Oli from the-iBlog June 8, 2008 at 4:36 pm

Has anyone used box.net? I’ve posted about it on my blog: http://www.the-iblog.com/2008/06/08/boxnet-online-storage-with-an-iphone-twist/

Really quite good if you have an iPhone.

Reply

23 Daniel Andrade June 8, 2008 at 10:13 pm

Right now I’m backing up my photos to my Media Temple’s GS. I don’t think it will be a problem, as I’ve 100GB Storage and 1TB Bandwidth. Just would like to know if I can rely on them. I think so. What do you think?

Reply

24 Issac June 9, 2008 at 8:31 am

Pretty straightforward guide to using s3 as a personal backup solution.

I use s3 for my backups (because I use ec2 for hosting, so it only made sense).

The major problem that I’ve run into is that in s3, the concept of a folder is pretty foreign, and every backup tool has its own way of handling that.

This is a problem when I want to (or have to) use two different tools for backing up, and restoring my data. I have a script that backs up my web files and my databases, but I can’t then login with transmit and see it, because they use a different folder scheme.

The ‘one giant folder’ idea doesn’t really work, in my mind, as it gives a lot of overhead, in one form or another.

You could either rename all of your files into their folder structure name with some other character, and then parse them back, or you could check the uniqueness of the names, and then keep their folder information in a database for retrieval and reconstitution…but then you’re dependent on another database.

I’m still pretty keen on s3 as a whole, but I wish they would get folders figured out.

Reply

25 Mark Jaquith June 9, 2008 at 2:59 pm

does Jungle disk just back up the changed files.

Yes. My JungleDisk home directory backup is about 60GB, but it manages to do the “changed files only” backup before I wake up (it lets you schedule it). I pay less than $15 a month (which is nothing for the peace of mind that it provides.)

Reply

26 Aron Clark June 12, 2008 at 4:22 pm

Good post.

But to be honest, I think I would just love something a bit more.. Mac-esque. I understand that this is probably the best option for backing up critical data, but theres just something inside me thats saying ‘I want a hard drive’. I can’t really put my finger on it, to be honest.

And also, I have not had the pleasure of playing with Time Machine yet, but I intend to do so in the very near future, which I won’t be able to do if I used Amazon S3 and any of the (very practical – I must admit) solutions here.

It is a shame, really, that things just can’t ‘get along’ and we can mix and match our favourite programs with our favourite means of doing things.

Still, each to their own!

Reply

27 David Kanter June 13, 2008 at 7:10 pm

very nice article.
I wrote a similar review of services for online storage I’m reviewing here:
http://davidkanter.com/post/38322200/cloud-storage-options-got-you-confused

Reply

28 michelle79 June 25, 2008 at 9:43 am

I discovered a Memopal (www.memopal.com) “cutting edge solution for online
backup”

They merged online backup, online storage and file sharing services into one product.

If you try this service you will notice that (contrary to most competitors):
- You can access your files in (true) real time with a web browser
- They really offer 250 GB (some competitors offer a fake unlimited web
space, they say “fair use”)
- You can share a file or many files with the 1-click-share functionality
- Some of your files will be uploaded very very fast (turboupload)
- The service and website are in 10 different languages

I’ve also found two useful guide to online backup on Wikipedia:
http://en.wikipedia.org/wiki/Online_backup

Reply

29 duivesteyn July 6, 2008 at 4:49 am

I’ve modified this to better suit CPanel based sites with sql support at http://duivesteyn.net/2008/amazon-s3-backup-for-webserver-public_html-sql-bash/

hope it helps someone

i’m paying a few dollars a month

Reply

30 David Hamer November 3, 2008 at 8:11 am

Thanks Paul – really interesting stuff. I’ve been checking out S3 to store video files for my websites. The only issue I’m struggling with is S3Fox – sometimes it works fine and then other times it just won’t start to upload – so it’s inconsistent. I’m not really into the technical side like you guys so I’m just browsing around to try and find out what the problem might be?

Reply

31 Vinodh Ramasubramanian May 20, 2009 at 3:31 pm

Nice writeup on S3.

I think one major advantage of S3 is that it could be used as an archive and the storage/backup is not tied to one PC. Unlike Mozy or Carbonite which would remove the file from the online store if it is removed locally.

I am currently evaluating different solutions and JungleDisk with S3 seems like the way to go. Especially cause JungleDisk gives you a perpetual license on thier software for just $20

Reply

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous post:

Next post: