Web applications can grow in fits and starts. Customer numbers can increase rapidly, and application usage patterns can vary seasonally. This unpredictability necessitates an application that is scalable. What is the best way of achieving scalability?
This investigation reveals 20 of the biggest bottlenecks that reduce and slow down scalability. By ferreting these out in your environment and applications, and stamping out the worst offenders, you will be on your way to
Performance is key to Web scalability. If you add more customers, you want your application to continue servicing them all equally quickly. Too much latency will cause users to give up. You can keep your application's performance from degrading by knowing all the ways it can get clogged up and avoiding those bottlenecks.
Normally when data is changed in a database, it is written both to memory and to disk. When a commit happens, a relational database makes a commitment to freeze the data somewhere on real storage media. Remember, memory doesn't survive a crash or reboot. Even if the data is cached in memory, the database still has to write it to disk. MySQL binary logs or Oracle redo logs fit the bill.
With a MySQL cluster or distributed file system such as DRBD (Distributed Replicated Block Device) or Amazon
Synchronous replication has these issues as well; hence, MySQL's solution is
Caching is very important at all layers, so where is the best place to cache: at the browser, the page, the object, or the database layer? Let's work through each of these.
Browser caching might seem out of reach, until you realize that the browser takes directives from the Web server and the pages it renders. Therefore, if the objects contained therein have longer expire times, the browser will cache them and will not need to fetch them again. This is faster not only for the user, but also for the servers hosting the Web site, as all returning visitors will weigh less.
More information about browser caching is available at http://www.iheavy.com/2011/11/01/5-tips-cache-websites-boost-speed/. Be sure to set expire headers and cache control.
Page caching requires using a technology such as Varnish
Object caching is done by something like memcache. Think of it as a
Everything, everything, everything in a database is constrained by
If you're using physical servers, watch out for RAID 5, a type of RAID (redundant array of independent disks) that uses one disk for both parity and protection. It comes with a huge write penalty, however, which you must always carry. What's more, if you lose a drive, these arrays are unusably slow during rebuild.
The solution is to start with RAID 10, which gives you striping over mirrored sets. This results in no parity calculation and no penalty during a rebuild.
Cloud environments may work with technology such as Amazon EBS (Elastic Block Store), a virtualized disk similar to a storage area network. Since it's network based, you must contend and compete with other tenants (aka customers) reading and writing to that storage. Further, those individual disk arrays can handle only so much reading and writing, so your neighbors will affect the response time of your Web site and application.
Recently Amazon rolled out a badly branded offering called Provisioned IOPS (I/O operations per second). That might sound like a great name to techies, but to everyone else it doesn't mean anything noteworthy. It's nonetheless important. It means you can lock in and guarantee the disk performance your database is thirsty for. If you're running a database on Amazon, then definitely take a look at this.
When customers are waiting to check out in a grocery store with 10 cash registers open, that's working in parallel. If every cashier is taking a lunch break and only one register is open, that's serialization. Suddenly a huge line forms and snakes around the store, frustrating not only the customers checking out, but also those still shopping. It happens at bridge tollbooths when not enough lanes are open, or in sports arenas when everyone is leaving at the same time.
Web applications should definitely avoid serialization. Do you see a backup waiting for API calls to return, or are all your Web nodes working off one search server? Anywhere your application forms a line, that's serialization and should be avoided at all costs.
Developers normally build in features and functionality for business units and customers. Feature flags are operational necessities that allow those features to be turned on or off in either
Why are they so important? If you have ever had to put out a fire at 4 a.m., then you understand the need for contingency plans. You must be able to disable ratings, comments, and other auxiliary features of an application, just so the whole thing doesn't fall over. What's more, as new features are rolled out, sometimes the kinks don't show up until a horde of Internet users hit the site. Feature flags allow you to disable a few features, without taking the whole site offline.
You should always have at least one read replica or MySQL slave online. This allows for faster recovery in the event that the master fails, even if you're not using the slave for
Having multiple copies of a database suggests horizontal scale. Once you have two, you'll see how three or four could benefit your infrastructure.
A MySQL database server is great at storage tables or data, and relationships between them. Unfortunately, it's not great at serving as a queue for an application. Despite this, a lot of developers fall into the habit of using a table for this purpose. For example, does your app have some sort of jobs table, or perhaps a status column, with values such as
Such solutions run into scalability
Page searching is another area where applications get caught. Although MySQL has had
One solution is to go with a dedicated search server such as Solr (http://lucene.apache.org/solr/). These servers have good libraries for whatever language you're using and
Alternatively, Sphinx SE, a storage engine for MySQL, integrates the Sphinx server right into the database. If you're looking on the horizon, Fulltext is coming to InnoDB, MySQL's default storage engine, in version 5.6 of MySQL.
The ORM (object relational model), the bane of every Web company that has ever used it, is like cooking with MSG. Once you start using it, it's hard to wean yourself off.
The plus side is that ORMs help with rapid prototyping and allow developers who aren't SQL masters to read and write to the database as objects or memory structures. They are faster, cleaner, and offer quicker delivery of
Then your DBA (database administrator) will come to the team with a very
The ability to track down bad SQL and root it out is essential. Bad SQL will happen, and your DBA team will need to index properly. If queries are coming from ORMs, they don't lend themselves to all of this. Then you're faced with a huge technical debt and the challenge of ripping and replacing.
Instrumentation provides speedometers and fuel needles for Web applications. You wouldn't drive a car without them, would you? They expose information about an application's internal workings. They record timings and provide feedback about where an application spends most of its time.
One very popular Web services solution is New Relic (http://newrelic.com/), which provides visual dashboards that appeal to
Some open source instrumentation projects are also available.
Speed isn't the only thing that can gum up scalability. The following 10 problems affect the ability to maintain and build scalability for a Web application. Best practices can avoid these issues.
Though it's rare these days, some Internet companies do still try to build software without version control. Those who use it, however, know the everyday advantage and organizational control it provides for a team.
If you're not using it, you are going to spiral into technical debt as your application becomes more complex. It won't be possible to add more developers and work on different parts of your architecture and scaffolding.
Once you starting using version control, be sure to get all the components in there, including configuration files and other essentials. Missing pieces that have to be located and tracked down at deployment time become an additional risk.
If your data is on a single master database, that's a single point of failure. If your server is sitting on a single disk, that's a single point of failure. This is just technical vernacular for an Achilles heel.
These single points of failure must be rooted out at all costs. The trouble is recognizing them. Even relying on a single cloud provider can be a single point of failure. Amazon's data center or zone failures are a case in point. If it had had multiple providers or used Amazon differently, AirBNB would not have experienced downtime when part of Amazon Web Services went down in October 2012
If you've ever tried to post a comment on Yelp, Facebook, or Tumblr late at night, you've probably gotten a message to the effect of, "This feature isn't available. Try again later." "Later" might be five minutes or 60 minutes. Or maybe you're trying to book airline tickets and you have to retry a few times. To nontechnical users, the site still appears to be working normally, but it just has this strange blip.
What's happening here is that the application is allowing you to browse the site, but not make any changes. It means the master database or some storage component is offline.
Communication may seem a strange place to take a discussion on scalability, but the technical layers of Web applications cannot be separated from the social and cultural ones that the team navigates.
Strong lines of communication are necessary, and team members must know whom to go to when they're in trouble. Good communication demands confident and knowledgeable leadership, with the openness to listen and improve.
Documentation happens at a lot of layers in a Web application. Developers need to document procedures, functions, and pages to provide hints and insight to future generations looking at that code. Operations teams need to add comments to config files to provide change history and insight when things break. Business processes and relationships can and should be documented in company wikis to help people find their own solutions to problems.
Documentation helps at all levels and is a habit everyone should embrace.
Fire drills always get pushed to the backburner. Teams may say, "We have our backups; we're covered." True, until they try to restore those backups and find that they're incomplete, missing some config file or crucial piece of data. If that happens when you're fighting a real fire, then something you don't want will be hitting that office fan.
Fire drills allow a team to run through the motions, before they really need to. Your company should task part of its ops team with restoring its entire application a few times a year. With AWS and cloud servers, this is easier than it once was. It's a good idea to spin up servers just to prove that all your components are being backed up. In the process you will learn how long it takes, where the difficult steps lie, and what to look out for.
Monitoring falls into the same category of best practices as version control: it should be so basic you can't imagine working without it; yet there are Web shops that go without, or with insufficient
Collecting this data over time for system and
You roll into town on a fast horse, walk into the saloon with guns blazing, and you think you're going to make friends? Nope, you're only going to scare everyone into complying but with no real loyalty. That's because you'll probably break things as often as you fix them. Confidence is great, but it's best to work with teams. The intelligence of the team is greater than any of the individuals.
Teams need to communicate what they're changing, do so in a managed way, plan for any outage, and so forth. Caution and risk aversion win the day. Always have a plan B. You should be able to undo the change you just made, and be aware which commands are destructive and which ones cannot be undone.
As an app evolves over the years, the team may spend more and more time maintaining and supporting old code, squashing bugs, or ironing out kinks. Therefore, they have less time to devote to new features. This balance of time devoted to debt servicing versus real new features must be managed closely. If you find your technical debt increasing, it may be time to bite the bullet and rewrite. Rewriting will take time away from the immediate business benefit of new functionality and customer features, but it is best for the long term.
Technical debt isn't always easy to recognize or focus on. As you're building features or squashing bugs, you're more attuned to details at the
Logging is closely related to metrics and monitoring. You may enable a lot more of it when you're troubleshooting and debugging, but on an ongoing basis you'll need it for key essential services. Server syslogs, Apache and MySQL logs, caching logs, etc., should all be working. You can always dial down logging if you're getting too much of it, or trim and rotate log files, discarding the old ones.
These 20 obstacles can affect scalability and degrade the performance of a Web application. By avoiding these obstacles and following the practices outlined here, Web application developers will be able to minimize latency and guarantee that their applications can scale as needed.
LOVE IT, HATE IT? LET US KNOW
Sean Hull is the founder and senior consultant at Heavyweight Internet Group in New York. He has more than two decades of experience as a technology consultant, adviser, author, speaker, and entrepreneur, working with clients such as AppNexus, The Hollywood Reporter, Billboard, NBC Universal, Zagats, Rent the Runway, and ideeli. These
© 2013 ACM
Originally published in Queue vol. 11, no. 7—
see this item in the ACM Digital Library
Taylor Savage - Componentizing the Web
We may be on the cusp of a new revolution in web development.
Arie van Deursen - Beyond Page Objects: Testing Web Applications with State Objects
Use states to drive your tests
Rich Harris - Dismantling the Barriers to Entry
We have to choose to build a web that is accessible to everyone.
Conditional dependency resolution