My last startup update discussed Skribit's recent redesign and entrance into the GRA/TAG Business Launch competition. Since then a few things have been keeping us busy. For one, we have had more and more server issues. While Skribit doesn't use much in the way of CPU resources, it can take all the RAM it can get. That being said, we've had a few annoying downtime episodes in the recent past.
Scaling Step 1: Find Your Bottlenecks (Assess Your WTFs)
Up until now (well still now but not for long after this post is published) Skribit has been using a MediaTemple (dv) server with 1.3GB of physical RAM, 40GB of hard drive space and dual dual-core Xeon processors in an HP Proliant G4 server running Ubuntu. With each mongrel and mysql instance using around 60MB of RAM each, Skribit easily came to be very memory hungry.
Of course there are always things you can do before upgrading hardware that will help you scale, and that's best practice. Things like memcached as well as caching frequently accessed items or slow queries. We actually ran into an issue of fragment caching too much and it was eating up a good bit of RAM, but I digress. The first thing you need to do is find out why your app is getting slower and slower and your server can't keep up anymore.
For that, we use a myriad of Unix system administration and Rails monitoring tools:
I love htop.. put it on all my linux boxes.
Monit tracks processes and restarts them if they get unruly.. which happens often. For example, it will restart a mongrel that meets this criteria: "If Memory amount limit (incl. children) greater than 66560 1 times within 1 cycle(s) then restart else if passed 1 times within 1 cycle(s) then alert"
We used to use exception logger but switched to Hoptoadapp after exception logger started using some resources (we started seeing some items from the slow query log coming from exception logger), and we like Hoptoad.
Hoptoad App telling us what exceptions need to be fixed. Click the image for more detail.
Some example reports from the free version of New Relic RPM
Fiveruns TuneUp telling me what takes up the most time on a page load and breaks it down by MVC. It also lets you click on any action and opens up the responsible file to the exact line in your default text editor. Calvin has a guide for using TuneUp in a separate environment.
Do you use any similar server/app monitoring tools?
Scaling Step 2: What does it all mean?
So you have more information than you know what to do with, now what? If your data is anything like ours, there isn't just one magical fix. It's an ongoing process and there are tons of individual cases where certain pieces of code could do things in a more efficient manner. Or you might be realizing that your entire code base needs some restructuring and a better database design. Sorry if that wasn't the answer you were looking for; scaling isn't easy. Fortunately all of those tools make it pretty easy to identify where things are slowing down.
And if you are on top of your bug fixes and small performance tweaks and your app/server combo is still failing you, there are more intricate setups involving things like Varnish HTTP accelerator, database denormalization, database sharding, master-slave database setups, load balancing, global queuing and other such terms that will make your lead dev/sys admin/DBA want to hit the bar. Skribit is still pretty young so we have yet to tackle any overly-complex solutions for the sake of performance, but it's definitely in our near future.
While we're on the subject of scaling, I'd like to point out an old but tried and true post by Rails creator DHH. The context is that it's cheaper to pay for more server resources and a developer that's more productive from using enjoyable and easy-to-use tools and languages like Ruby:
The point is that the cost per request is plummeting, but the cost of programming is not. Thus, we have to find ways to trade efficiency in the runtime for efficiency in the "thought time" in order to make the development of applications cheaper. I believed we've long since entered an age where simplicity of development and maintenance is where the real value lies.
Getting back on track, some things don't change and sometimes you just need better hardware as your web app grows faster than you can speed it up and squash slow queries. We have been working closely with Media Temple (shameless plug... the same and only host I've been using since 2005) and we are now configuring a new 2 server CentOS setup. More on that later as Calvin gets it up and running. With our new setup we will be trying something completely different and give Apache with Phusion Passenger a shot, alongside Ruby Enterprise Edition.
In a nutshell, one box will run the bulk of the app server while the other will doing the database grunt work in addition to running a little bit of the app as well.
What should you take away from this? Not much. I'm just spewing stuff that's flying around in my head. We are at the tip of the iceberg for Skribit scaling and I'm sure I'll have more interesting things to say about future scaling headaches. But if you like reading about technical issues like scaling like I do here are two presentations on scaling: one from Scribd themselves and one from another Rails coder.
Have you ever had to deal with scaling issues for your app or site?
On the business side of things..
The business plan due date (April 7th) for the GRA/TAG Business Launch competition is nearing. When not battling Microsoft Word 2008's annoyances (I started writing in Google Docs but needed some more advanced Word features unfortunately), I'm describing who we are, what we do, how people will find us, why people will use us, how we'll make a return on investment and how we are different from our competitors. I started off with the competition's judging criteria and began filling out the sections:
I. Market a. Clear pain in the market b. Market Overview i. Size of the market ii. Initial Target market c. Competition i. Current solutions ii. Future solutions iii. Barriers to Entry iv. Competitive advantage d. Sales and Channel strategies
II. Technology a. Differentiation b. Source of Intellectual Property c. Protection of Intellectual Property d. Development plan e. Scalability
III. Management/Organizational structure a. Management b. Board of Directors c. Advisory Board
IV. Money a. Financials (3 years) b. Revenue Model c. Investment Needs i. Current investors ii. Fundraising
- How much do you need
- How long will it carry you
- What you plan to do with funds iii. Exit strategies
And now I'm going back and adding detail; I'm somewhere around 14 pages now and I have yet to stick in relevant graphs and screenshots that illustrate how Skribit works. The next step is getting business plan draft feedback from Lance Weatherby (Skribit uber advisor) and Scott Burkett (assigned competition mentor and local entrepreneur of StarPound) and overhauling the business plan. I have a tendency to dumb things down a ton, which can be good at times, but I have a feeling the type of people that will be reading this business plan will expect it to be formal and have fancy words like synergy, core competencies, mission-critical, incentivize, supply chain, rollout, SWOT and paradigm shifts.
For those in the same boat, Mahalo has a ridiculously comprehensive How to Write a Business Plan page.
Overall
Skribit is in a pretty interesting position right now and I'm eager to see where we're headed. If I had one complaint though, it's that I can't clone myself (yet) and get things done faster. This business stuff is interesting and I love writing down everything we plan on doing in detail and elaborating on our bigger picture, but I want to get back to coding.. I was just getting the hang of this Rails thing.
Thoughts? How is your startup going?