Twitter: reading for class

How I Use Amazon S3

Jun 08, 2008 in , , , ,

This post suggestion has been sitting in my Skribit account for a while so I thought I would finally address it. S3 is Amazon’s developer-aimed online storage solution. In recent years, consumer-friendly applications and tools have added support for Amazon S3, making it a viable backup solution for anyone, especially those with a fast Internet connection. The price is right too: cheap.

I won’t go through the details of getting started with S3 as I briefly mentioned that in my S3 server backup post.

What Stays and What Goes: Relying on the Cloud

As a general rule of thumb, if I have worked on a file in the last 10 days or so I keep it on my machine and on S3. If I don’t use it that often, it goes to my S3 account and I delete it from my computer. I’ve gotten into the habit of manually putting files into S3 with Transmit.

Panic Transmit S3 Folders

I have most of my files in the same S3 bucket and within that bucket I have a few folders to organize my documents.

JungleDisk rsync s3 PDFs folder

I’ll be the first to say this is not the most efficient method. Launching a separate app to transfer files breaks up the user experience from the seamless action that it should be. I imagine that most people would prefer using something like JungleDisk and its automatic backup feature. Once JungleDisk is fired up, your S3 account is accessible through a regular Explorer/Finder/Nautilus window.

JungleDisk Backup

Other Uses

After a comment by lead developer for WordPress, Mark Jaquith, I’ve additionally started backing up my home directory (/Users/Paul) with rsync and JungleDisk. While it is possible to backup my home directory entirely with JungleDisk, I’m more accustomed to having control with rsync. I usually run a command like this after JungleDisk is active:

rsync -avvz --size-only --delete --exclude .DS_Store --exclude .Trash --exclude .svn --exclude Library/Caches --exclude Library/Mail/IMAP-myemailaccount@gatech.edu --exclude Library/Application\ Support/Inquisitor/IconsA/  --exclude Library/Logs --exclude Library/Application\ Support/SyncServices/ /Users/Paul /Volumes/JungleDisk/

That command has many excludes to make the entire rsync quicker. Transferring a few large files is a lot faster than transferring many small files due to opening and closing connections, latency and the like. There’s no need for me to backup my Mail.app inbox since it is already on the mail server with IMAP. If you end up running rsync often, it’s worth setting up an excludes file.

JungleDisk rsync s3 home directory

Backing up your entire home directory, including music, will take a long time during the first run regardless of the connection you have. Transferring many small files such as Adium chat logs takes time. For that reason, most people will be better off using S3 for important documents and leaving larger backups to an external hard drive and imaging software like Carbon Copy Cloner or Super Duper.

Rsync backup music jungledisk
Example rsync backing up music to S3 via JungleDisk. Bandwagon is easier to work with though.

Do you use Amazon S3? How so? All of that being said, if my laptop (my only computer, except for a window manager-less media server) were stolen right now I wouldn’t lose enough data to worry about. Most everything I could possibly need is on Amazon S3, Gmail and a hosted SVN account.

Promote this article on various sites or email to your friends:     



35 Comments

  1. I’m curious: how much data do you have in there, and how much (on average) do you end up paying?

  2. I paid $11 last month and I would say I have maybe 20 gigs on there. But I also use S3 to serve public files, such as some things on this site, so I don’t think it’s safe to go by my numbers.

  3. I’m still backing up a good deal of my work files to S3, but have migrated my personal (music, movies, apps) over to Mozy. For personal use they have an unlimited plan for around $5 a month.

  4. I’ve been waiting to hear exactly how well this works, thanks for the article. I’m curious how reliably Time Machine is supported…

  5. Out of curiosity what is your backup plan should your MBA die (heaven forbid) from a hardware point of view? Obviously you would have access to all of your files, but the tools and environment you are used to would be lacking in a 1 computer setup.

    Back on topic: I’ve played around with S3 but don’t make regular use of the service. It makes for a nice emergency file storage option if I happen to be on a fast network and don’t have room on my laptop or server but it just isn’t feasible in my every day situation: Time Warner 6mbps down, 256kbit up if I’m lucky. Just transferring working files (~8GB) would be a painfully long first upload. Additionally, to backup all of my digital assets (~250GB) the cost is rather high ($37.50/mo). In comparison, the two 500GB external Western Digital drives that I bought have paid for themselves (1 for local backups/storage, the 2nd mirrors the 1st and is taken off-site).

    That’s my situation anyways. I am considering using it to backup my server however for a faster restore in a catastrophic event.

  6. Nice post! How do you use S3 with SVN?

  7. The initial price of a backup seems quite a bit. The idea of backing up to S3 sounds appealing but the speed of my internet would be my concern. I have one question for you Paul does Jungle disk just back up the changed files.

    Tim Trueman I don’t believe that Time Machine works with S3 at the moment but that would be nice if it did.

    Thanks for the posts it was a really good insight to how you use S3.

  8. Been waiting for a writeup on S3, now thinking it might be overkill to keep synced backups externally and cloudy. Manually syncing (or rather dumping) works great at first, but volumes of music+pictures change incrementally not uniformly. I feel like sync-management software is always bloating in process of updating. Liking the looks of Drobo the Storage Robot though.

    SidebarAds? A few somebodies must think you’re special :)

  9. You say..

    ‘If I don’t use it that often, it goes to my S3 account and I delete it from my computer’

    Of course this means you only have a single copy of a file i.e. no backup. Do you consider Amazon’s S3 a secure and reliable enough service to be a sole point of storage? Do you not also quickly throw stuff onto an external drive as a failsafe?

  10. I like the idea of online data storage, but until upload speeds increase, I think local solutions such as Time Machine are still the way to go for backups. If you use smaller files on the go though, something like this seems to be a good solution. Does anyone use Dropbox?

  11. I’ve been using S3 with jungledisk for a little over a month now. I have no idea how I got along without it. I backed up all of the pictures I’ve taken as well as my personal files and it cost me about $4 for the month. That’s about 25gb of data.

    I’ve got jungledisk setup to only backup specific directories, ~/pictures, ~/documents, and a few others. I found out that I have a ton of junk on my computer that would be quite easy to replace. Downloaded software would be a pain to get back, but I don’t really think it is worth paying money to back them up.

    The one problem I’ve come across with doing this sort of backup is that you can’t change directory structure without reuploading everything. I don’t like the structure of my pictures directory, but with 20+ gigs of data it would be costly to switch the format. For new pictures I’ve changed the structure, but I have yet to go through the hassle of moving the old files.

  12. I would suggest Mozy if you plan on backing up more than 30GBs. The unlimited plan at $5/month can’t be beat and their tool is basic and the equivalent to rsync.

    What I do:
    RAID setup for local backup of important files
    Backup my servers to my local RAID
    Backup as much as I can through Time Machine
    and Backup everything to Mozy 50+GB

    Hope that helps someone.

  13. I started using it to server static content for my sites. Started with my blog using “Amazon S3 for Wordpress” which moves files I upload with wordpress. Then I used S3Fox for the other files. But I realized S3Fox didn’t set the expires headers so I took apart the wordpress plugin to figure out how it did it and made my own hack of a script to upload the rest of my content. (will probably release it when it’s user friendly) My next move is to host all the user generated images on S3 so I can reduce the load on my dedicated server. After realizing that I’m using up apache slots to serve it all, moving to S3 will help in all areas.

    Then I also figured out how to use “BackupManager” on my server to copy the MySQL database over daily. It can do home directories, incremental, and syn too but I’m just using it for DB right now (have NAS for all that). Even wrote a tutorial on how to set it up!

    I want to thank you for those other posts Paul, because for whatever reason that’s what motivated me to try out S3. I think previously I just thought of it as some difficult thing to use but you pointed me towards the tools (s3fox) that got me excited about it. And PS - Your OpenID field showed me the URL to the last person’s OpenID (Dan Cameron)

  14. Out of curiosity what is your backup plan should your MBA die (heaven forbid) from a hardware point of view? Obviously you would have access to all of your files, but the tools and environment you are used to would be lacking in a 1 computer setup.

    Back on topic: I’ve played around with S3 but don’t make regular use of the service. It makes for a nice emergency file storage option if I happen to be on a fast network and don’t have room on my laptop or server but it just isn’t feasible in my every day situation: Time Warner 6mbps down, 256kbit up if I’m lucky. Just transferring working files (~8GB) would be a painfully long first upload. Additionally, to backup all of my digital assets (~250GB) the cost is rather high ($37.50/mo). In comparison, the two 500GB external Western Digital drives that I bought have paid for themselves (1 for local backups/storage, the 2nd mirrors the 1st and is taken off-site).

    That’s my situation anyways. I am considering using it to backup my web server for a faster restore in a catastrophic event.

  15. Have you taken a look at Mozy (http://mozy.com/)?

    $5 unlimited storage. Native Mac client. I’ve been really torn between going with Mozy or an S3 solution similar to what you’re running. I need to try out their MozyHome Free which gives you 2gb of backup.

  16. You say, “I wouldn’t lose enough data to worry about.” But, would that small amount of data be valuable to the person who stole your computer? Do you have any kind of hard drive encryption going on to prevent someone abusing your data?

    As far as my own data goes, I built a RAID-6 server that holds a backup of all my files and media (actually, it’s the primary storage for my media.) My laptop hard drive is encrypted to prevent my data from falling into the wrong hands.

  17. @Oli: I do use Dropbox and find it amazing. As a tester I got 5 gigs free but in the future they plan to have storage options. They have based their service on S3 so that should be cheap.

    i dont need much storage I dont understand why people backup their entire hard drive. I format my MBP way to often for that. Instead I simply have coupled Dropbox to Backup.app to process to regular incremental backup of my Document folder.My medias usually stays on a separate external hard drive and i might actually invest in Time Capsule to access them wirelessly from the Finder.

    The advantage of Dropbox, IMO is the web interface access and the seamless experience. It’s actually really easy and very fast.

  18. I use S3 as well… and with CyberDuck.

    What I would love to have however is a TimeMachine that supports S3…

  19. Can it save metadata, like Spotlight indexing information?

    Is there an easy way to do (client-side) encryption?

  20. @Tim - “I’m curious how reliably Time Machine is supported…”

    I tried getting Time Machine to backup to JungleDisk using the unsupported network volumes plist hack but that didn’t work. =/

    @ThinkingSerious - “Nice post! How do you use S3 with SVN?”

    Oh I was talking about something separate and unrelated - Assembla.com SVN hosting for Skribit-related code/files.

    @Zac - “The one problem I’ve come across with doing this sort of backup is that you can’t change directory structure without reuploading everything”

    I hate that too!

    @Dan - “I would suggest Mozy if you plan on backing up more than 30GBs. The unlimited plan at $5/month can’t be beat and their tool is basic and the equivalent to rsync.”

    I would probably consider Mozy, but I use S3 for many things already such as server and Skribit backups so it’s more convenient for me.. and I trust Amazon more. Does Mozy have an SLA (ie, 99.99% uptime guarantee)?

  21. On the subject of file backup, sharing and storage …

    Online backup is becoming common these days. It is estimated that 70-75% of all PC’s will be connected to online backup services with in the next decade.

    Thousands of online backup companies exist, from one guy operating in his apartment to fortune 500 companies.

    Choosing the best online backup company will be very confusing and difficult. One website I find very helpful in making a decision to pick an online backup company is:

    http://www.BackupReview.info

    This site lists more than 400 online backup companies in its directory and ranks the top 25 on a monthly basis.

  22. Has anyone used box.net? I’ve posted about it on my blog: http://www.the-iblog.com/2008/06/08/boxnet-online-storage-with-an-iphone-twist/

    Really quite good if you have an iPhone.

  23. Right now I’m backing up my photos to my Media Temple’s GS. I don’t think it will be a problem, as I’ve 100GB Storage and 1TB Bandwidth. Just would like to know if I can rely on them. I think so. What do you think?

  24. Pretty straightforward guide to using s3 as a personal backup solution.

    I use s3 for my backups (because I use ec2 for hosting, so it only made sense).

    The major problem that I’ve run into is that in s3, the concept of a folder is pretty foreign, and every backup tool has its own way of handling that.

    This is a problem when I want to (or have to) use two different tools for backing up, and restoring my data. I have a script that backs up my web files and my databases, but I can’t then login with transmit and see it, because they use a different folder scheme.

    The ‘one giant folder’ idea doesn’t really work, in my mind, as it gives a lot of overhead, in one form or another.

    You could either rename all of your files into their folder structure name with some other character, and then parse them back, or you could check the uniqueness of the names, and then keep their folder information in a database for retrieval and reconstitution…but then you’re dependent on another database.

    I’m still pretty keen on s3 as a whole, but I wish they would get folders figured out.

  25. does Jungle disk just back up the changed files.

    Yes. My JungleDisk home directory backup is about 60GB, but it manages to do the “changed files only” backup before I wake up (it lets you schedule it). I pay less than $15 a month (which is nothing for the peace of mind that it provides.)

  26. Good post.

    But to be honest, I think I would just love something a bit more.. Mac-esque. I understand that this is probably the best option for backing up critical data, but theres just something inside me thats saying ‘I want a hard drive’. I can’t really put my finger on it, to be honest.

    And also, I have not had the pleasure of playing with Time Machine yet, but I intend to do so in the very near future, which I won’t be able to do if I used Amazon S3 and any of the (very practical - I must admit) solutions here.

    It is a shame, really, that things just can’t ‘get along’ and we can mix and match our favourite programs with our favourite means of doing things.

    Still, each to their own!

  27. very nice article.
    I wrote a similar review of services for online storage I’m reviewing here:
    http://davidkanter.com/post/38322200/cloud-storage-options-got-you-confused

  28. I discovered a Memopal (www.memopal.com) “cutting edge solution for online
    backup”

    They merged online backup, online storage and file sharing services into one product.

    If you try this service you will notice that (contrary to most competitors):
    - You can access your files in (true) real time with a web browser
    - They really offer 250 GB (some competitors offer a fake unlimited web
    space, they say “fair use”)
    - You can share a file or many files with the 1-click-share functionality
    - Some of your files will be uploaded very very fast (turboupload)
    - The service and website are in 10 different languages

    I’ve also found two useful guide to online backup on Wikipedia:
    http://en.wikipedia.org/wiki/Online_backup

  29. I’ve modified this to better suit CPanel based sites with sql support at http://duivesteyn.net/2008/amazon-s3-backup-for-webserver-public_html-sql-bash/

    hope it helps someone

    i’m paying a few dollars a month

  1. [...] Stamatiou has a write-up of how he uses Amazon S3 as his cloud backup. If you have less than 2GB to backup, the free MozyHome or IDrive might be [...]

  2. [...] How I Use Amazon S3 S3 is Amazon’s developer-aimed online storage solution. In recent years, consumer-friendly applications and tools have added support for Amazon S3, making it a viable backup solution for anyone, especially those with a fast Internet … [...]

  3. [...] or only upload small-ish files. Amazon offer the S3 solution (explained more by Paul Stamatiou here) and dropcopy is also used by [...]

  4. [...] How I Use Amazon S3- Read Paul’s method to use Amazon S3 with rsync and JungleDisk. [...]

  5. [...] How I Use Amazon S3- Read Paul’s method to use Amazon S3 with rsync and JungleDisk. [...]

  6. [...] How I Use Amazon S3- как Paul использовал Amazon S3 с rsync и JungleDisk. [...]

Post a comment, receive Stammy points.


Send a trackback.


  • If you plan on posting code, run it through Postable first.
Copyright © 2005 - 2008 PaulStamatiou.com  Privacy Policy - Terms of Service Can't spell my name? Use PSTAM.com. Go back up ↑.