cloud tutorial home
  Cloud Computing Types
  NIST Cloud definition
  Cloud Computing events
  Free Applications
  Storage space on web
  Online Image Storage
  Books on Cloud
  Related Technologies
  Cloud computing sites
  Making Software a Service
  SOA Governance
  Symposium Offer
  about theCloudTutorial


  Amazon SimpleDB
  Google App Engine
  Cloud Computing Standards
  Why Cloud Computing
  Cloud computing Economics platform
  cloud computing & SOA

  Cloud Vendors

  Cloud Computing Vendors
  AT&T Synaptic

  Leader interview

  George Reese
  Marc Benioff
  Michael Miller
  Rick Jackson
  Tim Mather
  Toby Velte
  Raju Vegesna
Welcome to
home | Cloud Types | Related Technologies

Improving App Engine Performance

This excerpt is from the book, 'Essential App Engine', authored by Adriaan de Jonge, published by Pearson/Addison-Wesley Professional, Oct. 2011, ISBN 032174263, copyright 2012 Pearson Education, Inc. Pls refer to the book page for more details."

Throughout this book, a lot of attention is given to performance optimization. By improving performance, you get the added benefit of lowering the usage costs of your application when you surpass the App Engine’s free quota.This chapter explains performance characteristics specific to the Google App Engine environment. It starts by discussing the process of starting and stopping instances in the cloud.The cost of starting an instance is demonstrated by showing the performance of a servlet using a third-party library compared to the performance of a plain vanilla servlet.This chapter also offers pointers for minimizing and, where possible, avoiding cold startups. Finally, it provides a high-level overview of performance-related topics you can find in other chapters in this book.

Performing in the Cloud

One of the unique selling points of cloud computing over traditional hosting is high scalability and flexibility when responding to changes in the demand of your application. The pricing model of cloud computing is especially convenient if you experience sudden high spikes in the number of visitors on a regular basis.

In the cloud, you pay for what you use. On the App Engine, this means that if your traffic is usually below Google’s free daily quota and you have only incidental traffic spikes, you pay only for the computing power used during the days with high spikes.The advantage of cloud computing over having a physical machine park capable of handling high-traffic spikes is that you are not paying for machines that remain idle except during a traffic spike.

This flexibility also introduces a new challenge that might not be apparent at first sight. Responding to changes in demand means starting and stopping instances multiple times per hour.The time necessary to respond to a change in demand is directly related to the time necessary to start your web application.This means that your web application does not necessarily become flexible and scalable simply because it is deployed on the App Engine.You need to optimize your application to get the most out of the specific circumstances of running on the Google App Engine.

Comparing the App Engine to Traditional Web Applications

Whereas the lifetime of a typical App Engine instance is measured in minutes and hours, the lifetime of a traditional web application instance is measured in weeks or months. Traditional web application here means a web application running on a physical machine that you maintain yourself rather than an application running in the cloud.

One of the most common approaches to optimizing the performance of a traditional web application is to take a performance hit on startup of the instance. For example, if you load a lot of classes and data into memory during startup, you can save loading time while processing the actual user requests because starting and stopping an application instance is unrelated to handling a request.

Taking a performance hit during the startup of a new instance is not such a good idea, though, if a website visitor is waiting while your application is starting.You may lose a visitor every time a new instance is started.

In addition, the scalability requirements of the App Engine ask for different storage strategies. Most traditional web applications are based on relational databases. Strategies for optimal usage of a relational database can sometimes be catastrophic when applied to NoSQL storages like the Google App Engine datastore.

As a result, web application frameworks originally designed for use with software stacks can lead to bad results when used on the App Engine without consideration.

Optimizing Payments for Resources

On the App Engine, you pay for the resources you use.This means that optimizing your application to use resources also leads to cost reductions.

On the App Engine, some resources are more expensive than others.The optimal usage versus cost ratio depends on the characteristics of your application. How much data do you store? How much traffic is generated by your visitors? How is the traffic distributed over the total data set? How much data processing is involved? How is the number of visitors distributed over time?

When you consider these questions and look at the current pricing tables on Google's site, you quickly find that you may have an optimization challenge. Take a look on for more information.

Although there is no silver bullet for an optimal cost reduction, this book aims to give you the most control over the performance and costs of your web application.

Measuring the Cost of Class Loading

Every library or framework you introduce brings lots of additional classes to load at startup. For this reason, this book introduces only three third-party JARs to help with the code examples: Commons FileUpload, StringTemplate, and ANTLR. Commons FileUpload is used to process form submits with files as content. StringTemplate is used as a template language to generate output for the visitors, and it can also be used to generate text for an e-mail.ANTLR stands for Another Tool for Language Recognition and is a dependency of StringTemplate.

To show you the cost of class loading, this chapter investigates the startup time of the App Engine instance with StringTemplate and without StringTemplate. In addition, there is a startup time comparison between a web.xml file of roughly 400 lines and a web.xml of 21 lines.

Timing a Servlet That Contains a Library

Listing 2.1 shows a very simple servlet that processes a template using the StringTemplate framework and shows "Hello,World" in the browser window.

Listing 2.1

Writing Hello World with StringTemplate

01 package com.appspot.template;
03 import;
05 import javax.servlet.ServletException;
06 import javax.servlet.http.HttpServlet;
07 import javax.servlet.http.HttpServletRequest;
08 import javax.servlet.http.HttpServletResponse;
10 import org.antlr.stringtemplate.StringTemplate;
11 import org.antlr.stringtemplate.StringTemplateGroup;
13 public class StringTemplateServlet extends HttpServlet {
15 protected void doGet(HttpServletRequest request,
16 HttpServletResponse response)
17 throws ServletException, IOException {
18 long startTime = System.currentTimeMillis();
20 StringTemplateGroup group = new StringTemplateGroup("xhtml",
21 "WEB-INF/templates/xhtml");
22 StringTemplate hello = group.getInstanceOf("hello-world");
23 hello.setAttribute("name", "World");
24 response.getWriter().write(hello.toString());
26 long diff = System.currentTimeMillis() - startTime;
27 response.getWriter().write("time: " + diff);
29 }
30 }

Lines 18 and 26 process the timer, while the code loading the StringTemplate and ANTLR JARs are on lines 20 through 24.

Writing the resulting time at the bottom of the HTML (line 27) is not really elegant, but it works sufficiently for the simple timer required in this example.

Line 22 refers to an external file with an HTML template.This template is shown in Listing 2.2.

Listing 2.2

Setting Up the HTML Template for StringTemplate

01 <html>
02 <head>
03 <title>Test</title>
04 </head>
05 <body>
06 Hello, $name$ from a file!
07 </body>
08 </html>

Line 6 processes the attribute provided in line 23 of Listing 2.1.The rest of the HTML template should not require any explanation.The resulting screen just after a new instance is launched is displayed in Figure 2.1.

Reloading the same servlet when the instance is already started is a lot faster. Processing the StringTemplate takes 10 to 15 milliseconds on subsequent requests.

image 1

Figure 2.1 Displaying the resulting time in the browser screen with StringTemplate.

Timing a Servlet That Does Not Contain a Library

Writing Hello World to a browser screen is simple enough to do without a library like StringTemplate. If you modify the code to write Hello World directly to the browser, you get a servlet as shown in Listing 2.3.

Listing 2.3 Writing Hello World without StringTemplate

01 package com.appspot.template;
03 import;
05 import javax.servlet.ServletException;
06 import javax.servlet.http.HttpServlet;
07 import javax.servlet.http.HttpServletRequest;
08 import javax.servlet.http.HttpServletResponse;
10 public class StringTemplateServlet extends HttpServlet {
12 protected void doGet(HttpServletRequest request,
13 HttpServletResponse response)
14 throws ServletException, IOException {
16 long startTime = System.currentTimeMillis();
18 response.getWriter().write("Hello World without ST! ");
20 long diff = System.currentTimeMillis() - startTime;
21 response.getWriter().write("time: " + diff);
23 }
24 }

The only difference is in line 18.To avoid wasting too much code, the HTML is left out. Seven short lines of HTML do not have a significant influence on the loading time: they account for less than a millisecond.

Figure 2.2 shows the browser window loading the servlet from Listing 2.3 while starting a new instance.The decrease in loading time is substantial!

If loading the StringTemplate library increases the loading time of a new App Engine instance by 300 milliseconds, then why not switch to FreeMarker,Velocity, or Java Server Pages (JSP), you might ask. Or perhaps you know another template engine not mentioned here.You are encouraged to investigate and find out for yourself which library has the most efficient loading times on cold startup.

For any other library or framework you’d like to introduce, you should first investigate what the effect is on the total load time.Adding an additional JAR is always a big step.


Figure 2.2 Displaying the resulting time in the browser screen without StringTemplate.

Reducing the Size of web.xml

Explicit changes like adding JARs are relatively simple to manage. More tricky is making changes more gradually over time. For example, this book is full of servlets. As servlets were added, the web.xml file grew. At the end of the writing, the web.xml file contained more than 400 lines of configuration setting up all the examples demonstrated in the book.

The number of servlets declared in web.xml has a significant influence on the class loading time.To test the difference, the web.xml was reduced to minimal size, as shown in Listing 2.4. Just a single servlet is included in the servlet from Listings 2.1 and 2.3.

Listing 2.4 Reducing web.xml to an Absolute Minimum

01 <?xml version="1.0" encoding="utf-8"?>
02 <web-app xmlns:xsi=""
03 xmlns=""
04 xmlns:web=""
05 xsi:schemaLocation="
07 version="2.5">
08 Measuring the Cost of Class Loading
09 <!-- Template -->
10 <servlet>
11 <servlet-name>StringTemplateServlet</servlet-name>
12 <servlet-class>
13 com.appspot.template.StringTemplateServlet
14 </servlet-class>
15 </servlet>
16 <servlet-mapping>
17 <servlet-name>StringTemplateServlet</servlet-name>
18 <url-pattern>/st</url-pattern>
19 </servlet-mapping>
21 </web-app>

Take a look at the log files before and after the web.xml size reduction. Figure 2.3 shows the difference in CPU usage for both scenarios.

As you can see, the difference in load time on cold startup is significant.This is an indication that you should be careful with the number of servlets you declare in a web application. On the other hand, one very large servlet is unlikely to perform much better


Figure 2.3 Displaying the logged CPU times before and after a web.xml reduction.

than several smaller ones, so you must consider the trade-off. How do you divide your code over a number of servlets with the least class loading overhead? Again, there is no silver bullet for doing so.The important thing is that you think about this trade-off in your specific situation.

Avoiding Cold Startups

In the early days of the Google App Engine, any request could lead to a new instance being launched. For applications with low traffic, there was a high risk of long response times on the first request by a visitor, especially if the application was not optimized for fast cold startups.

Only high-traffic applications with a relative constant load could serve a large percentage of users without confronting them with longer response times. But even those would lose a few visitors with instance starts and stops.

Later, Google added new features for paying customers that help avoid longer response times. It should be noted that these strategies may fail when the application experiences very sudden spikes in traffic.

Reserving Instances with Always On

Paying customers can hire instances that are never turned off.This solves the problem of low-traffic applications, where almost every visit leads to an instance being launched.

The Always On instances are supplemented with dynamic instances when the demand exceeds the capabilities of the available Always On instances.This means that just switching to Always On does not completely fix the problem with long responses on cold startups.

Always On can be configured in the admin console, as described in Google's documentation on

Preloading Classes Using Warm-Up Requests

When at least one instance is running, either Always On or dynamic instances, the App Engine can sometimes predict when a new instance will be required.

As long as you haven’t explicitly turned off warm-up requests in the appengine- web.xml configuration file, the App Engine can send a request to /_ah/warmup sometime before a new instance is required.You can configure your own servlet to listen on that address and make sure that classes and other data are preloaded before a visitor starts accessing that instance.

Warm-up requests do not work when no instances are running.They do not add much value for low-traffic applications unless Always On is used.

Even with instances running, warm-up requests do not always work.The App Engine is not always capable of predicting traffic in advance.

More information on warm-up requests is found on

Handling Concurrent Requests with Thread-Safe Mode

By default, an instance handles only a single request at a time. If an instance takes long to respond and there are other requests at the same time, the App Engine launches additional instances to handle the rest of the traffic.

In some cases, loading new instances can be avoided by allowing concurrent requests. This requires you to develop thread-safe servlets. More information on thread-safe mode is found on

Handling Memory Intensive Requests with Backends

In addition to Always On instances, you can purchase, for a higher fee, specialized instances that are optimized for handling requests of a backend nature—that is, requests that require longer than 30 seconds to finish. Another characteristic of backend applications is higher memory consumption.

More information on backend instances can be found on Google’s website at

Improving Performance in General

The subtitle of this book is Building High-Performance Java Apps with Google App Engine because this book focuses on performance optimization more than do other books.This section provides a general overview of possibilities for performance optimization.

Optimizing Your Data Model for Performance

If you model your data for the App Engine datastore the same way you model your data for a relational database, you can be certain that you will run into performance problems at some point.The way the App Engine datastore divides data over multiple machines in the cloud is fundamentally different from the way a relational database stores data on disk. In many cases, you need to do the exact opposite of what you are used to doing. For example, you need to denormalize your data instead of normalizing it.

Because you can store arrays of data, there is less need of relationships between tables, although you should be cautious if you feel the need to index the array, because the size of your total index may explode.

You should consider the need for transactions before you set up your data model. Transactions require entity groups, and larger entity groups may harm scalability.

Chapter 4,"Data Modeling for the Google App Engine Datastore," presents a detailed discussion of datastore characteristics. Using the APIs is demonstrated in Chapter 10, "Storing Data in the Datastore and Blobstore."

Avoiding Redundant Processing Using Cache

Many time-consuming tasks are done repeatedly for subsequent requests think of tasks that require gathering data or processing intensive calculations.The same processing might be repeated for a single visitor or for multiple visitors.

Proper caching can help avoid repetitive processes.This book explains both fine-grained caching using memcache and page-level caching on the Internet. See Chapter 14,"Optimizing Performance Using the Memory Cache," for in-depth information.

Postponing Long-Running Tasks Using the Task Queue

In many cases, high responsiveness is more important than high performance. Responding quickly to a visitor’s request can sometimes be done by postponing the actual work.As long as the visitor can trust that the work will be done eventually, he or she will be pleased with the quick response.

The Task Queue API can be used in multiple ways.You can preschedule tasks at regular intervals, or you can post tasks to the queue on demand. Both methods can help improve performance and responsiveness.

Details on Task Queue are discussed in Chapter 12,"Running Background Work with the Task Queue API and Cron."

Improving Page Load Performance in the Browser

A high-performing server is practically useless if the page loading in the browser ruins the total response time. For example, if your HTML is full of useless elements, classes, and IDs, your Cascading Style Sheet (CSS) file beats the size of an average phone book, and you reach a megabyte of JavaScript files, all server-side efforts are lost.You could make it even worse by adding one or more Flash files in your page. But then you are clearly working in the wrong direction.

With HTML5 and CSS3, you hardly need Flash anymore except, perhaps, for an incidental video player being used until HTML5 videos are sufficiently mature.The newly added elements in HTML5 may help you downsize your CSS files.The less specific your CSS file, the easier it is to maintain.

The way you load your JavaScript has a large impact on the page load time. Loading JavaScript unobtrusively at the bottom of the page allows the rest of the page to render before the JavaScript is interpreted.This improves the responsiveness to the visitor.

Part III,"User Interface Design Essentials," covers HTML5, CSS3,JavaScript, and AJAX, providing details on browser optimization from a technical perspective.

Working with Asynchronous APIs

Page loading generally does not entail heavy data processing. Mostly it consists of waiting for services such as the datastore to respond. If you know in advance that you need to make multiple backend requests and the backend requests are independent of each other, you can work with asynchronous APIs.

One of the most important asynchronous APIs is described in Chapter 10,"Storing Data in the Datastore and Blobstore."

Optimizing Your Application before Deployment

Some performance optimizations are a result of planning and designing.The more effective performance improvements usually result from careful experimentation and measurements.

You can profile calls to Google’s backend services using AppStats. Most of the overhead in an average App Engine application is in the backend calls. If you do a lot of heavy lifting in your own code, you are encouraged to profile this code and optimize where possible.

AppStats is explained in Chapter 19,"Assuring Quality Using Measuring Tools."


Cloud solutions, and specifically the Google App Engine, are designed for scalability and flexible usage from scratch. However, in the case of the Google App Engine, this design may mean that some classic performance optimization strategies are counterproductive. This chapter focused on cold startup time and why you should avoid cold startups when possible. It also discussed the overhead of frameworks and libraries also to be avoided when possible.The end of this chapter presented a few performance questions with cross references to the chapters where you can find the answers.