The world is moving in a direction driven by the twin imperatives of risk management and the need for lower costs (e.g., outsourcing). Current IT practices are overly complex and require too much human intervention. On-demand computing (ODC) provides an infrastructural solution to the needs of modern organizations. Risk can be managed and running costs can be lowered by migrating to a more automated IT business infrastructure.
Very few (if any) businesses operate today in a truly on-demand fashion. For this reason, ODC is a migration (not a forklift upgrade) target for today's infrastructures. The end result of this migration will allow for a more automated, business-policy-driven model. It represents a transformation rather than a revolution!
In the simplest terms, we can say that current infrastructure is hindered by two weaknesses: complexity and excessive human input. In fact, IBM claims that 40 percent of computer outages are caused by operator error. Other organizations, such as EMA, put the percentage for complex network misconfigurations as high as 60 percent.
The following are the major features of ODC:
What's particularly interesting about ODC is its integrative nature. It takes the best of numerous areas--such as system/network management, standard technologies (XML, J2EE, web services, EJB, etc.)--and combines them into an infrastructure that is strong, self-configuring, self-protecting, and self-healing, all based on open standards and specifications. IBM refers to this as an on-demand operating environment.
Let's look at the elements of ODC, starting with autonomic computing.
IBM has identified five levels of autonomic computing:
Most organizations today are at level 1. It's important to remember that ODC is a business transformation rather than a wholesale replacement of existing IT practices and infrastructure. This migratory approach also applies to autonomic computing--we can move in a phased fashion from where we are today towards the autonomic era.
Intelligent control loops are at the heart of autonomic computing. In most cases, today these loops are a complex maze of human-centric business processes and customized code (e.g., booking airline tickets). The automation of these control loops provides the basis for the required level of autonomic automation. Let's take a look at them.
Self-management is a key component of autonomic computing. Conceptually similar to network management, this requires the use of control loops that do the following:
Let's take an example from telecom network problem management. The following series of steps illustrates the business process flow that results from a network error such as when one or more link failures occur.
The first stage in the process might result in an update to a GUI in a network monitoring center. Alternatively, a support technician might be paged--somehow, someone somewhere gets contacted.
The next step is an effort to resolve the problem. The latter might be intermittent in nature, or it might have been caused by an errant digger. Assuming the latter, a trouble ticket is created. At this stage, an effort is made to determine the root cause of the problem. Bear in mind that an optical fiber cut may result in a huge number of alarms from the network, so it can take time to get to the bottom of the deluge of management data. We're assuming that a fiber cut has occurred, so it becomes necessary to initiate and track a specific workflow. This will typically require personnel to go to the fault site and carry out repairs. Once this is done, the trouble ticket can be closed.
The business process described in Figure 1 can be mapped into a control loop in broad terms as follows:
The important point is that most business processes can be resolved into control loops similar to the above. As mentioned previously, this allows for automation.
We're all used to control loops in our everyday lives: heating/cooling systems, any timer-based electronic device, fuel injection systems, etc. There is always a danger associated with closed loop systems: instability. This is a well-known problem from automatic control theory, where a controlled element (such as a heating system) starts to oscillate uncontrollably. This problem presents an important challenge to the concept of autonomic computing.
Two major autonomic components are involved in the control loops:
Figure 1 illustrates these components. Two entities are common to both the autonomic manager and the managed elements: sensors and effectors. Sensors are used to collect data concerning the state of a given element in either of two ways: polling (or explicit "gets") or on a notification basis. Effectors provide a means of modifying the state or configuration of an element. Taken together, sensors and effectors provide a manageability interface.
Figure 1. Autonomic components: Manager and managed element
The autonomic manager implements the control loop and consists of four parts, each of which shares a common knowledge base. The four parts are:
IBM maintains an architectural blueprint for autonomic computing in its Thomas J. Watson Research Center in Hawthorne, New York.
One aspect of ODC to which I feel a strong affinity is the proposed manageability of the software elements. This has potentially enormous implications for the way in which ODC software will be produced. A simple example is logging: let's say we want to produce logging that is machine-readable. Why would you want to do this? Well, one reason is to allow downstream software to read the logging data. There's no major reason why this log-reading software couldn't interpret the data and suggest what the problem might be.
This is the beginning of closing the administration loop and potentially letting software start to dynamically process its own problems.
The usual example of policy-driven computing concerns the quality of service on networks; i.e., if the CEO needs network access and an engineer needs access, then give the CEO's traffic priority. This is a little dated. The need for policy now spans the entire business. Let's say a vendor is selling widgets online in a very competitive market for price X. Imagine a new vendor appears in a puff of smoke, also selling the same widgets online for price X minus Y.
If our first vendor is tuned in and operating an on-demand environment, then they should be able to automatically adjust their price to X minus Z in order to continue to compete. The basis of the price adjustment is a policy, which might be expressed as a condition-action clause:
If ((OurPrice - CompetitorPrice) > Tolerance) then
The vendors must keep a close eye on each other and on many other business issues, such as revenue assurance, cost management, profit, etc.
Business continuity infrastructure can also be viewed as a type of policy:
If (we lose site 1) then (move offsite personnel to site
As the number of policy-controlled scenarios increases, it's likely we'll see more of such considerations. It's hard to imagine such policies in a traditional (i.e., non-on-demand business) IT infrastructure.
Given the pressures on organizations to improve ROI, it's no surprise to that executives are keen to "sweat their IT assets" for maximum value. The automation possible through autonomic computing should provide useful dividends with reduced staff levels, moving IT staff onto more complex business-centric tasks. However, the bulk of today's IT infrastructure operates in a standalone fashion. This is reminiscent of the "islands of automation" that used to exist in the manufacturing sector. There is a greater need for IT integration so that multiple servers cooperate to provide business value.
The adoption of web services is giving rise to a much more dynamic use of resources to solve static problems. By static problems, I mean those like the classic example of automatic airline ticket booking. The main variable in this instance is the number of clients; e.g., if airline X offers an irresistible bargain, then its website will have to be ready to receive unusually high levels of traffic. This is an essentially static problem.
IBM currently sees the ODC strategy as the nexus between two areas:
We've discussed the former; let's briefly look at grid computing.
A different class of computing problem is one that requires a vast amount of processing, such as modeling weather patterns, predicting volcanic eruptions, predicting tsunamis, modeling stock price variations, molecular modeling, etc. Grid computing is increasingly being used for this class of problem. Grids are also being employed by corporations and used as the basis for outsourcing IT facilities; e.g., Oracle operates a large grid out of its Austin, Texas operation. Client organizations can outsource software applications to Oracle and then use the grid to gain access as required.
Even though the cost of such grid use seems high (millions of dollars annually), this is often much less than the cost of a client self-hosting the infrastructure.
IBM and Cisco have collaborated on a range of projects. One interesting result of this meeting of minds is a new router from Cisco called the CRS-1. This is a big device that encompasses autonomic capabilities.
One possible weakness in the ODC project is that it is driven by a vendor. This means that it might possibly be negatively affected by that vendor's desire to "shift product." My own view is that the future of computing is more driven by consideration of networking and access to networking rather than processing power. However, these comments must be viewed in the context of history, where vendors have driven hugely successful technologies. Examples of this include: Sun Microsystems and Java, Cisco and IP routing, Microsoft and Windows, etc. So maybe the fact that just one company is pushing ODC isn't such a bad thing. On the other hand, ODC is a new way of producing and using software. This might well require buy-in from many other vendors, and this is not guaranteed to occur.
We've briefly reviewed a range of potentially powerful new technologies. The integrative aspect of ODC is compelling--it takes existing technologies, standards, and specifications and merges these into a wholly new way of running IT. In conjunction with grid computing, it can be seen that ODC is conceptually similar to the global outsourcing phenomenon. The telecom world offers historical precedent for the power of outsourcing such arcane areas as MPLS an VPN management. Organizations can save cash and focus on core activities by offloading these complex technologies from the LAN and into the hands of highly trained and increasingly cash-strapped service providers.
Whether or not ODC takes off as per IBM's vision, other vendors are attempting similar initiatives; Oracle has added grid capability to its Oracle 10g product. Again, history teaches that out of the technology wars there is generally a clear winner to lead the next revolution--Microsoft won the desktop war. It's likely that we will see an ODC winner emerge. What's interesting about this is that no single company has led two consecutive waves of revolution--Microsoft is now no longer seen as the innovation leader (its stock even pays dividends now as it accepts its position as an established brand holder!) and its forays into telecom and gaming could even be seen as solutions looking for a problem.
My own take on ODC is that it is very necessary. One benefit I'd like to see it deliver is simpler systems. This is no longer a matter of choice given the risk-related volatility of this first decade of the twenty-first century. The world may well have a sufficient amount of programmers engaged in producing the thousands of packages that feed the global market for technology. Perhaps a fresh approach is needed for the form and function of much of this software? It's possible that ODC can deliver this.
Stephen B. Morris is an independent writer/consultant based in Ireland.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.