Chapter 11 Critiquing the Product Using Technology-Facing Tests
This chapter is focused on the bottom right corner of our testing quadrant. We’ve looked at driving development with both business-facing and technology-facing tests. After the code is written, we are no longer driving the development but are looking at ways to critique the product. In the previous chapter, we examined ways to critique from a business point of view. Now we look at ways to critique from a technology-facing point of view. These tests are an important means of evaluating whether our product delivers the right business value.
Introduction to Quadrant 4
Individual stories are pieces of the puzzle, but there’s more to an application than that. The technology-facing tests that critique the product are more concerned with the nonfunctional requirements than the functional ones. We worry about deficiencies in the product from a technical point of view. Rather than using the business domain language, we describe requirements using a programming domain vocabulary. This is the province of Quadrant 4 (see Figure 11-1).
Figure 11-1 Quadrant 4 tests
Nonfunctional requirements include configuration issues, security, performance, memory management, the “ilities” (e.g., reliability, interoperability, and scalability), recovery, and even data conversion. Not all projects are concerned about all of these issues, but it is a good idea to have a checklist to make sure the team thinks about them and asks the customer how important each one is.
Our customer should think about all of the quality attributes and factors that are important and make informed trade-offs. However, many customers focus on the business side of the application and don’t understand the criticality of many nonfunctional requirements in their role of helping to define the level of quality needed for the product. They might assume that the development team will just take care of issues such as performance, reliability, and security.
We believe that the development team has a responsibility to explain the consequences of not addressing these nonfunctional or cross-functional requirements. We’re really all part of one product team that wants to deliver good value, and these technology-oriented factors might expose make-or-break issues.
Many of these nonfunctional and cross-functional issues are deemed low-risk for many applications and so are not added to the test plan. However, when you are planning your project, you should think about the risks in each of these areas, address them in your test plan, and include the tools and resources needed for testing them in your project plan.
Lisa’s Story
In the past, I’ve been asked by specialists in areas such as performance and security testing why they didn’t hear much about “ility” testing at agile conferences or in publications about agile development. Like Janet, I’ve always seen these areas of testing as critical, so this wasn’t my perception. But as I thought about it, I had to agree that this wasn’t a much-discussed topic at the time (although that’s changed recently).
Why would agile discussions not include such important considerations as load testing? My theory is that it’s because agile development is driven by customers, from user stories. Customers simply assume that software will be designed to properly accommodate the potential load, at a reasonable rate of performance. It doesn’t always occur to them to verbalize those concerns. If not asked to address them, programmers may or may not think to prioritize them. I believe that one area where testers have contributed greatly to agile teams is in bringing up questions such as, “How many concurrent users should the application support?” and “What’s the average response time required?”
—Lisa
Because the types of testing in this quadrant are so diverse, we’ll give examples of tools that might be helpful as we go along instead of a separate toolkit section. Tools, whether homegrown or acquired, are essential to succeed with Quadrant 4 testing efforts. Still, the people doing the work count, so let’s consider who on an agile team can perform these tests.
Who Does It?
All of the agile literature talks about teams being generalists; anyone should be able to pick up a task and do it. We know that isn’t always practical, but the idea is to be able to share the knowledge so that people don’t become silos of information.
However, there are many tasks that need specialized knowledge. A good example is security testing. We’re not talking about security within an application, such as who has access rights to administer it. Because that type of security is really part of the functional requirements and will be covered by regular stories, verifying that it works falls within the first three quadrants. We’re talking about probing for external security flaws and knowing the types of vulnerabilities in systems that hackers exploit. That is a specialized skill set.
Performance testing can be done by testers and programmers collaborating and building simple tools for their specific needs. Some organizations purchase load-testing tools that require team members who specialize in that tool to build the scripts and analyze and interpret the results. It can be difficult for a software development organization, especially a small one, to have enough resources to duplicate an accurate production-level load for a test, so external providers of performance testing may be needed.
Larger organizations may have groups such as database experts that your team can use to help with data conversion, security groups that will help you identify risks to your application, or a production support team that can help you test recovery or failover. Build a close relationship with these specialists. You’ll need to work together as a virtual team to gather the information you need about your product.
Chapter 15, “Tester Activities in Release or Theme Planning,” explains how to plan to work with external teams.
The more diverse the skill sets are in your team, the less likely you are to need outside consultants to help you. Identify the resources you need for each project. Many teams find that a good technical tester or toolsmith can take on many of these tasks. If someone already on the team can learn whatever specialized knowledge is required, great; otherwise, bring in the expertise you need.
Skills within the Team
Jason Holzer, Product Owner for Property Testing (performance, security, stability, and reliability) at Ultimate Software, tells us that a good programmer can write a multithreaded engine to call a function concurrently and test performance. Jason feels that agile teams do have the skills to do their own performance testing; they just may not realize it.
Performance testing does require a controlled, dedicated environment. Some specialized tools are needed, such as a profiler to measure code performance. But, in Jason’s view, performance, stability, scalability, and reliability (PSR) tests can, and should, be done at the unit level. There’s a mind-set that holds that these tests are too complex and require specialists when in fact the teams do possess the necessary skills.
Jason finds that awareness of the “PSR” aspects of code needs to be part of the team’s culture.
If stakeholders place a high priority on performance, stability, scalability, and the like, Jason recommends that the team talk about ways to verify these aspects of the application. When teams understand the priority of qualities such as performance and reliability, they figure out how to improve their code to ensure them. They don’t need to depend on an outside, specialized team. Jason explains his viewpoint.
The potential resistance I see today to this plan is that someone believes that programmers don’t know how to PSR test and that there will need to be a great deal of training. In my opinion, a more accurate statement is that programmers are not aware that PSR testing is a high priority and a key to quality. I don’t think it has anything to do with knowing how to PSR test. PSR testing is a combination of math, science, analysis, programming, and problem solving. I am willing to bet that if you conducted a competition at any software development organization where you asked every team to implement a tree search algorithm, and the team with the fastest algorithm would win, that every team will do PSR testing and provide PSR metrics without teaching them anything new.
PSR testing is really just telling me “How fast?” (performance), “How long?” (stability), “How often?” (reliability), and “How much?” (scalability). So, as long as the awareness is there and the organization is seriously asking those questions with everything they develop, then PSR testing is successfully integrated into a team.
Take a second look at the skills that your team already possesses, and brainstorm about the types of “ility” testing that can be done with the resources you already have. If you need outside teams, plan for that in your release and iteration planning.
Regardless of whether or not your team brings in additional resources for these types of tests, your team is still responsible for making sure the minimum testing is done. The information these tests provide may result in new stories and tasks in areas such as changing the architecture for better scalability or implementing a system-wide security solution. Be sure to complete the feedback loop from tests that critique the product to tests that drive changes that will improve the nonfunctional aspects of the product.
Just because this is the fourth out of four agile testing quadrants doesn’t mean these tests come last. Your team needs to think about when to do performance, security, and “ility” tests so that you ensure your product delivers the right business value.
When Do You Do It?
As with functional testing, the sooner technology-facing tests that support the team are completed, the cheaper it is to fix any issues that are found. However, many of the cross-functional tests are expensive and hard to do in small chunks.
Technical stories can be written to address specific requirements, such as: “As user Abby, I need to retrieve report X in less than 20 seconds so that I can make a decision quickly.” This story is about performance and requires specialized tests to be written, and it can be done along with the story to code the report, or in a later iteration.
Consider a separate row on your story board for tasks needed by the product as a whole. Lisa’s team uses this area to put cards such as “Evaluate load test tools” or “Establish a performance test baseline.” Janet has successfully used different colored cards to show that the story is meant for one of the expert roles borrowed from other areas of the organization.
Some performance tests might need to wait until much of the application is built if you are trying to baseline full end-to-end workflows. If performance and reliability are a top priority, you need to find a way to test those early in the project. Prioritize stories so that a steel thread or thin slice is complete early. You should be able to create a performance test that can be run and continue to run as you add more and more functionality to the workflow. This may enable you to catch performance issues early and redesign the system architecture for improvements. For many applications, correct functionality is irrelevant without the necessary performance.
The time to think about your nonfunctional tests is during release or theme planning. Plan to start early, tackling small increments as needed. For each iteration, see what tasks your team needs in order to determine whether the code design is reliable, scalable, usable, and secure. In the next section, we’ll look at some different types of Quadrant 4 tests.
Performance Testing from the Start
Ken De Souza, a software developer/tester at NCR [2008], responded to a question on the agile-testing mailing list about when to do stress and performance testing in an agile project with an explanation of how he approaches performance testing.
I’d suggest designing your performance tests from the start. We build data from the first iteration, and we run a simple performance test to make sure it all holds together. This is more to see that the functionality of the performance scripts holds together.
I used JMeter because I can hook FTP, SOAP, HTTP, RegEx, and so on, all from a few threads, with just one instance running. I can test out my calls right from the start (or at least have the infrastructure in place to do it).
My eventual goal is that when the product is close to releasing, I don’t have to nurse the performance test; I just have to crank up the threads and let go. All my metrics and tasks have already been tested out for months, so I’m fairly certain that anyone can run my performance test.
Performance testing can be approached using agile principles to build the tools and test components incrementally. As with software features, focus on getting the performance information you need, one small chunk at a time.
“ility” Testing
If we could just focus on the desired behavior and functionality of the application, life would be so simple. Unfortunately, we have to be concerned with qualities such as security, maintainability, interoperability, compatibility, reliability, and installability. Let’s take a look at some of these “ilities.”
Security
OK, it doesn’t end in -ility, but we include it in the “ility” bucket because we use technology-facing tests to appraise the security aspects of the product. Security is a top priority for every organization these days. Every organization needs to ensure the confidentiality and integrity of their software. They want to verify concepts such as no repudiation, a guarantee that the message has been sent by the party that claims to have sent it and received by the party that claims to have received it. The application needs to perform the correct authentication, confirming each user’s identity, and authorization, in order to allow the user access only to the services they’re authorized to use. Testing so many different aspects of security isn’t easy.
In the rush to deliver functionality, both business experts and development teams in newly started organizations may not be thinking of security first. They just want to get some software working so they can do business. Authorization is often the only aspect of security testing that they consider as part of business functionality.
Lisa’s Story
My current team is a case in point. The business was interested in automating functionality to manage 401(k) plans. They did take pains to secure the software and data, but it wasn’t a testing priority. When I “got religion” after hearing some good presentations about security testing at conferences, I bought a book on security testing and started hacking around on the site. I found some serious issues, which we fixed, but we realized we needed a comprehensive approach to ensuring security. We wrote stories to implement this. We also started including a “security” task card with every story so that we’d be mindful of security needs while developing and testing.
—Lisa
Budgeting this type of work has to be a business priority. There’s a range of alternatives available, depending on your company’s priorities and resources. Understand your needs and the risks before you invest a lot of time and energy.
Janet’s Story
One team that I worked with has a separate corporate security team. Whenever functionality is added to the application that might expose a security flaw, the corporate team runs the application through a security test application and produces a report for the team. It performs static testing using a canned black-box probe on the code and has exposed a few weak areas that the developers were able to address. It does not give an overall picture of the security level for the application, but that was not deemed a major concern.
—Janet
Testers who are skilled in security testing can perform security risk-based testing, which is driven by analyzing the architectural risk, attack patterns, or abuse and misuse cases. When specialized skills are required, bring in what you need, but the team is still responsible for making sure the testing gets done.
There are a variety of automated tools to help with security verification. Static analysis tools, which can examine the code without executing the application, can detect potential security flaws in the code that might not otherwise show up for years. Dynamic analysis tools, which run in real time, can test for vulnerabilities such as SQL injection and cross-site scripting. Manual exploratory testing by a knowledgeable security tester is indispensable to detect issues that automated tests can miss.
Security Testing Perspectives
Security testing is a vast topic on its own. Grig Gheorghiu shares some highlights about resources that can help agile teams with security testing.
Just like functional testing, security testing can be done from two perspectives: from the inside out (white-box testing) and from the outside in (black-box testing). Inside-out security testing assumes that the source code for the application under test is available to the testers. The code can be analyzed statically with a variety of tools that try to discover common coding errors that can make the application vulnerable to attacks such as buffer overflows or format string attacks.
See http://en.wikipedia.org/wiki/Buffer_overflow and http://en.wikipedia.org/wiki/Format_string_vulnerabilities for more information.
See http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis for a list of tools that can be used for static code analysis.
The fact that the testers have access to the source code of the application also means that they can map what some books call “the attack surface” of the application, which is the list of all of the inputs and resources used by the program under test. Armed with a knowledge of the attack surface, testers can then apply a variety of techniques that attempt to break the security of the application. A very effective class of such techniques is called fuzzing and is based on fault injection. Using this technique, the testers try to make the application fail by feeding it various types of inputs (hence the term fault injection). These inputs can be carefully crafted strings used in SQL injection attacks, random byte changes in given input files, or random strings fed as command line arguments.
More resources on this subject can be found at: www.fuzzing.org/category/fuzzing-book/ and www.fuzzing.org/fuzzing-software
The outside-in approach is the one mostly used by attackers who try to penetrate into the servers or the network hosting your application. As a security tester, you need to have the same mind-set that attackers do, which means that you have to use your creativity in discovering and exploiting vulnerabilities in your own application. You also need to stay up-to-date with the latest security news and updates related to the platform/operating system your application runs on, which is not an easy task.
So what are agile testers to do when faced with the apparently insurmountable task of testing the security of their application? Here are some practical, pragmatic steps that anybody can follow:
1. Adopt a continuous integration (CI) process that periodically runs a suite of automated tests against your application.
2. Learn how to use one or more open source static code analysis tools. Add a step to your CI process that consists of running these tools against your application code. Mark the step as failed if the tools find any critical vulnerabilities.
3. Install an automated security vulnerability scanner such as Nessus (http://www.nessus.org/nessus/). Nessus can be run in a command-line, non-GUI mode, which makes it suitable for inclusion in a CI tool. Add a step to your CI process that consists of running Nessus against your application. Capture the Nessus output in a file and parse that file for any high-importance security holes found by the scanner. Mark the step as failed when any such holes are found.
4. Learn how to use one or more open source fuzzing tools. Add a step to your CI process that consists of running these tools against your application code. Mark the step as failed if the tools find any critical vulnerabilities.
As with any automated testing effort, running these tools is no guarantee that your code and your application will be free of security defects. However, running these tools will go a long way toward improving the quality of your application in terms of security. As always, the 80/20 rule applies. These tools will probably find the 80% most common security bugs out there while requiring 20% of your security budget.
To find the remaining 20% of the security defects, you’re well advised to spend the other 80% of your security budget on high-quality security experts. They will be able to test your application security thoroughly by the use of techniques such as SQL injection, code injection, remote code inclusion, and cross-site scripting. While there are some tools that try to automate some of these techniques, they are no match for a trained professional who takes the time to understand the inner workings of your application in order to craft the perfect attack against it.
Security testing can be intimidating, so budget time to adopt a hacker mind-set and decide on the right approach to the task at hand. Use the resources Grig suggests to educate yourself. Take advantage of these tools and techniques in order to achieve security tests with a reasonable return on investment.
Just this brief look at security testing shows why specialized training and tools are so important to do a good job of it. For most organizations, this testing is absolutely required. One security intrusion might be enough to take a company out of business. Even if the probability were low, the stakes are too high to put off these tests.
Code that costs a lot to maintain might not kill an organization’s profitability as quickly as a security breach, but it could lead to a long, slow death. In the next section we consider ways to verify maintainability.
Maintainability
Maintainability is not something that is easy to test. In traditional projects, it’s often done by the use of full code reviews or inspections. Agile teams often use pair programming, which has built-in continual code review. There are other ways to make sure the code and tests stay maintainable.
We encourage development teams to develop standards and guidelines that they follow for application code, the test frameworks, and the tests themselves. Teams that develop their own standards, rather than having them set by some other independent team, will be more likely to follow them because they make sense to them.
The kinds of standards we mean include naming conventions for method names or test names. All guidelines should be simple to follow and make maintainability easier. Examples are: “Success is always zero and failure must be a negative value,” “Each class or module should have only one single responsibility,” or “All functions must be single entry, single exit.”
Standards for developing the GUI also make the application more testable and maintainable, because testers know what to expect and don’t need to wonder whether a behavior is right or wrong. It also adds to testability if you are automating tests from the GUI. Simple standards such as, “Use names for all GUI objects rather than defaulting to the computer assigned identifier” or “You cannot have two fields with the same name on a page” help the team achieve a level where the code is maintainable, as are the automated tests that provide coverage for it.
Maintainable code supports shared code ownership. It is much easier for a programmer to move from one area to another if all code is written in the same style and easily understood by everyone on the team. Complexity adds risk and also makes code harder to understand. The XP value of simplicity should be applied to code. Simple coding standards can also include guidelines such as, “Avoid duplication—Don’t copy-paste methods.” These same concepts apply to test frameworks and the tests themselves.
Maintainability is an important factor for automated tests as well. Test tools have lagged behind programming tools in features that make them easy to maintain, such as IDE plug-ins to make writing and maintaining test scripts simpler and more efficient. That’s changing fast, so look for tools that provide easy refactoring and search-and-replace, and for other utilities that make it easy to modify the scripts.
Database maintainability is also important. The database design needs to be flexible and usable. Every iteration might bring tasks to add or remove tables, columns, constraints, or triggers, or to do some kind of data conversion. These tasks become a bottleneck if the database design is poor or the database is cluttered with invalid data.
Lisa’s Story
A serious regression bug went undetected and caused production problems. We had a test that should have caught the bug. However, a constraint was missing from the schema used by the regression suite. Our test schemas had grown haphazardly over the years. Some had columns that no longer existed in the production schema. Some were missing various constraints, triggers, and indices. Our DBA had to manually make changes to each schema as needed for each story instead of running the same script in each schema to update it. We budgeted time over several sprints to recreate all of the test schemas so that they were identical and also matched production.
—Lisa
Plan time to evaluate the database’s impact on team velocity, and refactor it just as you do production and test code. Maintainability of all aspects of the application, test, and execution environments is more a matter of assessment and refactoring than direct testing. If your velocity is going down, is it because parts of the code are hard to work on, or is it that the database is difficult to modify?
Interoperability
Interoperability refers to the capability of diverse systems and organizations to work together and share information. Interoperability testing looks at end-to-end functionality between two or more communicating systems. These tests are done in the context of the user—human or a software application—and look at functional behavior.
In agile development, interoperability testing can be done early in the development cycle. We have a working, deployable system at the end of each iteration so that we can deploy and set up testing with other systems.
Quadrant 1 includes code integration tests, which are tests between components, but there is a whole other level of integration tests in enterprise systems. You might find yourself integrating systems through open or proprietary interfaces. The API you develop for your system might enable your users to easily set up a framework for them to test easily. Easier testing for your customer makes for faster acceptance.
In Chapter 20, “Successful Delivery,” we discuss more about the importance of this level of testing.
In one project Janet worked on, test systems were set up at the customer’s site so that they could start to integrate them with their own systems early. Interfaces to existing systems were changed as needed and tested with each new deployment.
If the system your team works on has to work together with external systems, you may not be able to represent them all in your test environments except with stubs and drivers that simulate the behavior of the other systems or equipment. This is one situation where testing after development is complete might be unavoidable. You might have to schedule test time in a test environment shared by several teams.
Consider all of the systems with which yours needs to communicate, and make sure you plan ahead to have an appropriate environment for testing them together. You’ll also need to plan resources for testing that your application is compatible with the various operating systems, browsers, clients, servers, and hardware with which it might be used. We’ll discuss compatibility testing next.
Compatibility
The type of project you’re working on dictates how much compatibility testing is required. If you have a web application and your customers are worldwide, you will need to think about all types of browsers and operating systems. If you are delivering a custom enterprise application, you can probably reduce the amount of compatibility testing, because you might be able to dictate which versions are supported.
As each new screen is developed as part of a user interface story, it is a good idea to check its operability in all supported browsers. A simple task can be added to the story to test on all browsers.
One organization that Janet worked at had to test compatibility with reading software for the visual impaired. Although the company had no formal test lab, it had test machines available near the team area for easy access. The testers made periodic checks to make sure that new functionality was still compatible with the third-party tools. It was easy to fix problems that were discovered early during development.
Having test machines available with different operating systems or browsers or third-party applications that need to work with the system under test makes it easier for the testers to ensure compatibility with each new story or at the end of an iteration. When you start a new theme or project, think about the resources you might need to verify compatibility. If you’re starting on a brand new product, you might have to build up a test lab for it. Make sure your team gets information on your end users’ hardware, operating systems, browsers, and versions of each. If the percentage of use of a new browser version has grown large enough, it might be time to start including that version in your compatibility testing.
When you select or create functional test tools, make sure there’s an easy way to run the same script with different versions of browsers, operating systems, and hardware. For example, Lisa’s team could use the same suite of GUI regression tests on each of the servers running on Windows, Solaris, and Linux. Functional test scripts can also be used for reliability testing. Let’s look at that next.
Reliability
Reliability of software can be referred to as the ability of a system to perform and maintain its functions in routine circumstances as well as unexpected circumstances. The system also must perform and maintain its functions with consistency and repeatability. Reliability analysis answers the question, “How long will it run before it breaks?” Some statistics used to measure reliability are:
Mean time to failure: The average or mean time between initial operation and the first occurrence of a failure or malfunction. In other words, how long can the system run before it fails the first time?
Mean time between failures: A statistical measure of reliability, this is calculated to indicate the anticipated average time between failures. The longer the better.
In traditional projects, we used to schedule weeks of reliability testing that tried to run simulations that matched a regular day’s work. Now, we should be able to deliver at the end of every iteration, so how can we schedule reliability tests?
We have automated unit and acceptance tests running on a regular basis. To do a reliability test, we simply need to use those same tests and run them over and over. Ideally, you would use statistics gathered that show daily usage, create a script that mirrors the usage, and run it on a stable build for however long your team thinks is adequate to prove stability. You can input random data into the tests to simulate production use and make sure the application doesn’t crash because of invalid inputs. Of course, you might want to mirror peak usage to make sure that it handles busy times as well.
You can create stories in each iteration to develop these scripts and add new functionality as it is added to the application. Your acceptance tests could be very specific such as, “Functionality X must perform 10,000 operations in a 24-hour period for a minimum of 3 days.”
Beware: Running a thousand tests without any serious problems doesn’t mean you have reliable software. You have to run the right tests. To make a reliability test effective, think about your application and how it is used all day, every day, over a period of time. Specify tests that are aimed at demonstrating that your application will be able to meet your customers’ needs, even during peak times.
Ask the customer team for their reliability criteria in the form of measurable goals. For example, they might consider the system reliable if ten or fewer errors occur for every 10,000 transactions, or the web application is available 99.999% of the time. Recovery from power outages and other disasters might be part of the reliability objectives, and will be stated in the form of Service Level Agreements. Know what they are. Some industries have their own software reliability standards and guidelines.
Driving development with the right programmer and customer tests should enhance the application’s reliability, because this usually leads to better design and fewer defects. Write additional stories and tasks as needed to deliver a system that meets the organization’s reliability standards.
Your product might be reliable after it’s up and running, but it also needs to be installable by all users, in all supported environments. This is another area where following agile principles gives us an advantage.
Installability
One of the cornerstones of a successful agile team is continuous integration. This means that a build is ready for testing anytime during the day. Many teams choose to deploy one or more of the successful builds into test environments on a daily basis.
Automating the deployment creates repeatability and makes deployment a non-event. This is exciting to us because we have experienced weeks of trying to integrate and install a new system. We know that if we build once and deploy the same build to multiple environments, we have developed consistency.
Janet’s Story
On one project I worked on, the deployment was automatic and was tested on multiple environments in the development cycle. However, there were issues when deploying to the customer site. We added a step to the end game so that the support group would take the release and do a complete install test as if it were the customer’s site. We were able to walk through the deployment notes and eliminated many of the issues the customer would have otherwise seen.
—Janet
As with any other functionality, risks associated with installation need to be evaluated and the amount of testing determined accordingly. Our advice is to do it early and often, and automate the process if possible.
Chapter 20, “Successful Delivery,” has more on installation testing.
“ility” Summary
There are other “ilities” to test, depending on your product’s domain. Safety-critical software, such as that used in medical devices and aircraft control systems, requires extensive safety testing, and the regression tests probably would contain tests related to safety. System redundancy and failover tests would be especially important for such a product. Your team might need to look at industry data around software-related safety issues and use extra code reviews. Configurability, auditability, portability, robustness, and extensibility are just a few of the qualities your team might need to evaluate with technology-facing tests.
Whatever “ility” you need to test, use an incremental approach. Start by eliciting the customer team’s requirements and examples of their objectives for that particular area of quality. Write business-facing tests to make sure the code is designed to meet those goals. In the first iteration, the team might do some research and come up with a test strategy to evaluate the existing quality level of the product. The next step might be to create a suitable test environment, to research tools, or to start with some manual tests.
As you learn how the application measures up to the customers’ requirements, close the loop with new Quadrant 1 and 2 tests that drive the application closer to the goals for that particular property. An incremental approach is also recommended for performance, load, and other tests that are addressed in the next section.
Performance, Load, Stress, and Scalability Testing
Performance, load, stress, and scalability testing all fall into Quadrant 4 because of their technology focus. Often specialized skills are required, although many teams have figured out ways to do their own testing in these areas. Let’s talk about scalability first, because it is often forgotten.
Scalability
Scalability testing verifies the application remains reliable when more users are added. What that really means is, “Can your system handle the capacity of a growing customer base?” It sounds simple, but really isn’t, and is a problem that an agile team usually can’t solve by itself.
It is important to think about the whole system and not just the application itself. For example, the network is often the bottleneck, because it can’t handle the increased throughput. What about the database? Will it scale? Will the hardware you are using handle the new loads being considered? Is it simple just to add new hardware, or is it the bottleneck?
Janet’s Story
In one organization I was recently working in, their customer base had grown very quickly, and the solution they had invested in had reached its capacity due to hardware constraints. It was not a simple matter of adding a new server, because the solution was not designed that way. The system needed to be monitored to restart services during peak usage.
To grow, the organization had to actually change solutions to accommodate its future growth, but this was not recognized until problems started to happen.
Ideally, the organization would have replaced the old system before it was an issue. This is an example of why it is important to understand your system and its capability, as well as future growth projections.
—Janet
You will need to go outside the team to get the answers you require to address scalability issues, so plan ahead.
Performance and Load Testing
Performance testing is usually done to help identify bottlenecks in a system or to establish a baseline for future testing. It is also done to ensure compliance with performance goals and requirements, and to help stakeholders make informed decisions related to the overall quality of the application being tested.
Load testing evaluates system behavior as more and more users access the system at the same time. Stress testing evaluates the robustness of the application under higher-than-expected loads. Will the application scale as the business grows? Characteristics such as response time can be more critical than functionality for some applications.
Grig Gheorghiu [2005] emphasizes the need for clearly defined expectations to get value from performance testing. He says, “If you don’t know where you want to go in terms of the system, then it matters little which direction you take (remember Alice and the Cheshire Cat?).” For example, you probably want to know the number of concurrent users and the acceptable response time for a web application.
Performance and Load-Testing Tools
After you’ve defined your performance goals, you can use a variety of tools to put a load on the system and check for bottlenecks. This can be done at the unit level, with tools such as JUnitPerf, httperf, or a home-grown harness. Apache JMeter, The Grinder, Pounder, ftptt, and OpenWebLoad are more examples of the many open source performance and load test tools available at the time of this writing. Some of these, such as JMeter, can be used on a variety of server types, from SOAP to LDAP to POP3 mail. Plenty of commercial tool options are available too, including NeoLoad, WebLoad, eValid LoadTest, LoadRunner, and SOATest.
See the bibliography for links to sites where you can research tools.
Use these tools to look for performance bottlenecks. Lisa’s team uses JProfiler to look for application bottlenecks and memory leaks, and JConsole to analyze database usage. Similar tools exist for .NET and other environments, including .NET Memory Profiler and ANTS Profiler Pro. As Grig points out, there are database-specific profilers to pinpoint performance issues at the database level; ask your database experts to work with you. Your system administrators can help you use shell commands such as top, or tools such as PerfMon to monitor CPU, memory, swap, disk I/O, and other hardware resources. Similar tools are available at the network level, for example, NetScout.
You can also use the tools the team is most familiar with. In one project, Janet worked very closely with one of the programmers to create the tests. She helped him to define the tests needed based on customer’s performance and load expectations, and he automated them using JUnit. Together they analyzed the results to report back to the customer.
Establishing a baseline is a good first step for evaluating performance. The next section explores this aspect of performance testing.
Baseline
Performance tuning can turn into a big project, so it is essential to provide a baseline that you can compare against new versions of the software on performance. Even if performance isn’t your biggest concern at the moment, don’t ignore it. It’s a good idea to get a performance baseline so that you know later which direction your response time is headed. Lisa’s company hosts a website that has had a small load on it. They got a load test baseline on the site so that as it grew, they’d know how performance was being affected.
Performance Baseline Test Results
Lisa’s coworker Mike Busse took on the task of obtaining performance baselines for their web application that manages retirement plans. He evaluated load test tools, implemented one (JMeter), and set about to get a baseline. He reported the results both in a high-level summary and a spreadsheet with detailed results.
The tests simulated slowly increasing the load up to 100 concurrent users. Three test scripts, each for a common user activity, were used, and they were run separately and all together. Data gathered included:
• Maximum time of a transaction
• Maximum number of busy connections.
• A plot of the max time of a transaction against the number of users (see Figure 11-2 for an example of a chart)
Figure 11-2 Max and average transaction times at different user loads.
• Number of users who were on the system when the max time of a transaction equaled eight seconds
An important aspect of reporting results was providing definitions of terms such as transaction and connection in order to make the results meaningful to everyone. For example, maximum time of a transaction is defined as the longest transaction of all transactions completed during the test.
Mike’s report also included assumptions made for the performance test:
• Eight seconds is a transaction threshold that we would not like to cross.
• The test web server is equivalent to either of the two web servers in production.
• The load the system can handle, as determined by these tests, can be doubled in production because the load is distributed between two web servers.
• The distribution of tasks in the test that combines all three tests is accurate to a reasonable degree.
Mike also identified shortcomings with the performance baseline. More than one transaction can contribute to loading a page, meaning that the max page load time could be longer than the max time of a transaction. The test machine doesn’t duplicate the production environment, which has two machines and load-balancing software to distribute the transactions.
The report ended with a conclusion about the number of concurrent users that the production system could support. This serves as a guideline to be aware of as the production load increases. The current load is less than half of this number, but there are unknowns, such as whether the production users are all active or have neglected to log out.
Make sure your performance tests adequately mimic production conditions. Make results meaningful by defining each test and metric, explaining how the results correlate to the production environment and what can be done with the results, and providing results in graphical form.
If there are specific performance criteria that have been defined for specific functionality, we suggest that performance testing be done as part of the iteration to ensure that issues are found before it is too late to fix them.
Benchmarking can be done at any time during a release. If new functionality is added that might affect the performance, such as complicated queries, rerun the tests to make sure there are no adverse effects. This way, you have time to optimize the query or code early in the cycle when the development team is still familiar with the feature.
Any performance, load, or stress test won’t be meaningful unless it’s run in an environment that mimics the production environment. Let’s talk more about environments.
Test Environments
Final runs of the performance tests will help customers make decisions about accepting their product. For accurate results, tests need to be run on equipment that is similar to that of production. Often teams will use smaller machines and extrapolate the results to decide if the performance is sufficient for the business needs. This should be clearly noted when reporting test results.
Stressing the application to see what load it can take before it crashes can also be done anytime during the release, but usually it is not considered high-priority by customers unless you have a mission-critical system with lots of load.
One resource that is affected by increasing load is memory. In the next section, we discuss memory management.
Memory Management
Memory is usually described in terms of the amount (normally the minimum or maximum) of memory to be used for RAM, ROM, hard drives, and so on. You should be aware of memory usage and watch for leaks, because they can cause catastrophic failures when the application is in production during peak usage. Some programming languages are more susceptible to memory issues, so understanding the strengths and weaknesses of the code will assist you in knowing what to watch for. Testing for memory issues can be done as part of performance, load, and stress testing.
Garbage collection is one tool used to release memory back to the program. However, it can mask severe memory issues. If you see the available memory steadily decreasing with usage and then all of a sudden increasing to maximum available, you might suspect the garbage collection has kicked in. Watch for anomalies in the pattern or whether the system starts to get slow under heavy usage. You may need to monitor for a while and work with the programmers to find the issue. The fix might be something simple, such as scheduling the garbage collection more often or setting the trigger level higher.
When you are working with the programmers on a story, ask them if they expect problems with memory. You can test specifically if you know there might be a risk in the area. Watching for memory leaks is not always easy, but there are tools to help. This is an area where programmers should have tools easily available. Collaborate with them to verify that the application is free of memory issues. Perform the performance and load tests described in the previous section to verify that there aren’t any memory problems.
You don’t have to be an expert on how to do technology-facing testing that critiques the product to help your team plan for it and execute it. Your team can evaluate what tests it needs from this quadrant. Talk about these tests as you plan your release; you can create a test plan specifically for performance and load if you’ve not done it before. You will need time to obtain the expertise needed, either by acquiring it through identifying and learning the skills, or by bringing in outside help. As with all development efforts, break technology-facing tests into small tasks that can be addressed and built upon each iteration.
Summary
In this chapter, we’ve explored the fourth agile testing quadrant, the technology-facing tests that critique the product.
The developer team should evaluate whether it has, or can acquire, the expertise to do these tests, or if it needs to plan to bring in external resources.
An incremental approach to these tests, completing tasks in each iteration, ensures time to address any issues that arise and avoid production problems.
The team should consider various types of “ility” testing, including security, maintainability, interoperability, compatibility, reliability, and installability testing, and should execute these tests at appropriate times.
Performance, scalability, stress, and load testing should be done from the beginning of the project.
Research the memory management issues that might impact your product, and plan tests to verify the application is free of memory issues.