Devops musings: Automation and Cloud for System Integration test

This is a quick overview of very recent efforts within the IBM Rational development organization to employ cloud technology coupled with aggressive provisioning, product install and configuration automation to improve and streamline our product System Integration test processes with an eye towards Continuous Delivery.

Motivation and challenges

Its a common story for sure but our product delivery teams here at IBM Rational are very keen to constantly improve product quality. They are also committed to shorter and shorter product cycles ... essentially wanting to create, evolve and deliver our products faster and with increasing product quality. Piece O’ cake, right? Nope.

Our system verification and integration testing is central to overall product quality and requires install, configuration and and test-scenario execution of our products on a huge variety of OS platforms, databases and physical topologies with large numbers of internal and third party integrations.

When we started down this path to shorter development cycles we were primarily installing and configuring test systems by hand on physical hardware. This was doable given our traditional long product cycle. However, we saw ourselves quickly approaching a very hard wall when asked to support shorter cycles. This meant that, instead of going through this complex system provision, install, configure and scenario test cycle by hand x times a year, we would have to do it 2x times a year, then 4x, etc. We needed to change our way of doing things drastically ... and soon.

Our gamble centered on two key themes: Golden Topologies and Cloud-based systems coupled with install and configuration automation. The Golden Topology approach enables us to focus on a finite set of test systems out of the, essentially, infinite combinations possible within our set of supported platforms, databases, integrations, etc. The rest of this post will describe our approach to cloud-based delivery of ready-to-test systems and we will provide another article focused on our Golden Topology strategy in the near future.

Summary of implementation

When we began this effort we had been driving two proof-of-concept (POC) private cloud systems and each allowed self-provisioning of test systems. The first of these two systems could provide one or more independent VMs to be deployed at which point Build Forge was employed to orchestrate the execution of scripts to deploy products under test and then stitch the VMs into a coherent system.

The second POC private cloud was based on IBM Workload Deployer (IWD). This ended up our preferred cloud implementation for two reasons: first, IWD supports the concept of virtual system patterns (VSPs). A VSP can be considered a template for a logical set of VMs with relationships between them. With a VSP defined you can deploy the system of needed VMs in one operation. Secondly, orchestration and automation code can be directly attached to the VSP (in the form of script packages) and when this code executes as part of deployment it has topology metadata directly available. These capabilities allow us to provide one-stop, push button deployment of very complex system topologies.

I’ll provide a quick example of a moderately complex topology we use to system test our Rational CLM product suite. If you are not familiar, CLM (Collaborative Lifecycle Management) is a solution we deliver as a single composite product that consists of Rational Team Concert (RTC), Rational Quality Manager (RQM) and Rational Requirements Composer (RRC). One of the Golden Topologies we employ to test CLM is a fully distributed, enterprise-class topology on Linux with DB2 as the Database and WebSphere as the application server and IHS as a reverse proxy. This is also affectionately referred to as “E1”.

This topology employs a separate VM for each of the three CLM applications (CCM, RQM and RRC) and another for the shared Jazz Team Server (JTS). An additional VM is dedicated to the reverse proxy and yet another is reserved for DB2. To set up and configure this topology by hand and then install all products and configure them to work together with a common JTS and DB server is quite an undertaking even for a seasoned software engineer.

With VSPs we can take that seasoned engineer’s ability to configure such a system and burn it into automation for anyone to use. With this VSP, any team member can push a few buttons to self provision their own instance of E1!

Results and return on investment

When we started releasing this capability to our test teams the uptake was huge and immediate. This introduced a series of brand new problems around system capacity management that we can talk about in another post.

We immediately felt as if this capability was improving our ability to produce stable test systems more rapidly and with far fewer set-up and configuration errors. Being engineers we decided to define some metrics and start measuring

What we found was pretty amazing. For example, one of our more complex test system topologies for CLM is a horizontal WebSphere cluster with Oracle as the backend and WebSphere Proxy Server out front. This is, of course, is “E3”!. We surveyed some internal teams to get a sense of the time required to manually set up such a system from scratch. This is what we found:

	Time to provision, Install/configure E3
Expert User	11 hours
Novice User	30 hours*
Non-experienced User	96 hours*

*note - these novice and non-experienced users will inevitably be stealing cycles from the experts.

Average User

42 hours

So assume an average time per manual deployment of 42 hours. Now consider that any user: expert, novice or zero experience can deploy the same stable system in about 3 hours total using a VSP. Also consider it takes the user five minutes to launch the VSP deployment process. The next 2 hours and 55 minutes can be spent doing something constructive while waiting for the system to become available.

When we're talking about short development cycles where these deployments are needed in rapid succession the savings add up fast. At a savings of 40 hours per system deployment within the context of short development cycles you’re into savings measured in person-years very quickly. For example, here's a quick business value assessment of patterns of similar complexity to our clustered pattern:

Let's assume four teams (RTC, RQM, RRC, and the CLM) working concurrently and deploying a new system for each of three test topologies four times a month for one year.

Four deployments per month for four teams for three topologies for one year:

4 x 4 x 3 x 12 = 576 deployments

Forty hours savings per deployment:

576 x 40 = 23,040 hours

Assuming a full person year (FTE) is 2080 hours:

23,040 / 2080 ~= 11 FTEs

This represents a fairly conservative estimate of savings we have realized for just three patterns. The return on investment is even more impressive when you consider that an expert virtual system pattern (VSP) developer can create a pattern of this complexity within a few weeks.

Beyond this very tangible savings is the fact that these consistent rapid stable deployments enable new ways of working. When teams start considering these very complex test systems as disposable then doors open to very innovative approaches to system test. You can see why VSPs are a strategic element in our approach to continuous delivery. More on that in a future post.

Devops musings

Wednesday, July 3, 2013

Automation and Cloud for System Integration test

Motivation and challenges

Summary of implementation

Results and return on investment

2 comments:

Contributors