Tuesday, 13 July 2010
Google buys ITA Software (Part 2: What does ITA Software do?)
Yesterday in Part 1 I told some of the history of ITA Software, the air travel pricing and reservation software company bought this month by Google for US$700 million.
Today I'll discuss ITA Software's strengths, weaknesses, and strategic approach to the airfare puzzle (Part 2, below). Tomorrow I'll finish up by describing how ITA Software's acquisition by Google might affect travellers (Part 3).
You may wonder why I go into such detail about what ITA Software does, or why Google is buying them for US$700 million. But as I said yesterday, "Google's purchase of ITA Software is likely to be a bad thing for travellers", and the technical background below is necessary to understand why:
Airline ticket prices are determined by the "fares" in a published "tariff". A fare is not a price tag on a seat, but a price associated with a set of rules . The rules of a fare always include rules about what route(s) on what airline(s), reserved in what "booking classes", qualify for the fare, and usually including a variety of other rules. Any reservation or ticket that satisfies that set of rules is eligible for that price.
Rather than allocating seats by price (or more precisely "reservation confirmations", since airlines overbook and there isn't a 1-to-1 correspondence between how many reservations an airline is willing to confirm on a flight and how many seats there are on the plane), airlines allocate the "availability" of confirmations by "booking class" designated by letter. So as of a particular moment, for a particular flight on a particular date several weeks in the future between New York JFK and Chicago O'Hare, an airline may be willing to confirm up to 9 seats in "Y" class, up to 3 seats in "Q" class, and none at all in "Z" class. There isn't a 1-to-1 correspondence between booking class and price, either: a Q seat JFK-ORD may be ticketed as part of a through one-way fare JFK-ORD-PDG, as part of a JFK-ORD-JFK round-trip, as part of the return leg of a DSM-ORD-JFK-ORD-DSM trip, or as part of millions (or orders of magnitude higher) of other possible journeys at different prices specified by those different fares.
There is no database of availability, either. Airlines determine the availability of confirmations in particular booking classes on particular flights in real time, in response to queries transmitted through computerized reservations systems (CRS's) from reservation offices and call centers and travel agents.
So the price for a specific ticket is a function of both the fares (prices and associated sets of routing and other rules) in currently published tariffs and the real-time willingness of the airline(s) to confirm reservations in specific booking classes on specific flights.
To a skilled human travel agent, this looks like a heuristic sequential query problem, not a database search problem. If such an agent is being paid enough to make their best effort, they look first at the tariff of published fares (typically accessed through a CRS using a complex query language with, at least from the command line, many categories of modifiers and qualifiers). They pick the lowest of the fares in the tariff that will be applicable and for which they think (based on knowledge, experience, and practiced intuition) that there will be availability for flights (airlines, dates, times, route, etc.) acceptable to the traveller, and then search for availability on those flights in the booking class(es) required by that fare. If they can't confirm reservations that qualify for that fare on an acceptable schedule, they go back to the fares (adjusting their expectations based on what they have found), and search for availability for the next higher potentially acceptable fare for which they hope to find qualifying seats available.
(In practice, travel agents less and less often actually go through this process, for a variety of reasons including the elimination of commissions paid by airlines to travel agents, the reluctance of travellers to pay travel agents fees commensurate with the required skills, the degradation of the tools and training made available to travel agents by the CRS's, and the replacement of command-line travel agent CRS interfaces with easy-to-learn but functionally crippled GUI's. But that's another story.)
Central to the break-up of ITA Software's founding partnership, as I discussed yesterday in Part 1, was the decision to abandon any effort to replicate this methodology, and instead to seek a "brute force" solution to airline ticket pricing.
For what it's worth, this didn't have to be a binary choice. Just as some chess-playing programs combine heuristic and brute-force components, or work in partnership with human chess players, the other major recent independent developer of airline ticket pricing software and systems, Airtreks.com -- where I used to work and with whom I am still affiliated -- uses an intermediate "travel consultant cyborg" approach in which some functions are performed by human experts and some by robots, in a complex symbiosis. Airtreks.com has invested almost as much effort in developing proprietary software tools to enhance and extend the abilities of its human experts as in its purely robotic first-order price-estimation software.
But supposing that you want to take an entirely brute force rather than heuristic approach to airline ticket pricing, how do you go about it in the absence of a database of price tags for seats? ITA Software's "solution" was to use a series of availability queries to create a database of pseudo-price tags for pseudo-seats. Once that was done, the problem remained difficult mainly because of its scale and the number of permutations to be considered (again, as with "look-ahead" brute-force chess analysis), but amenable to ITA Software's signature "cleverness" in algorithms and software implementation.
So the essence of ITA Software's system (all the elements of which are visible in their patents and patent applications) is:
- A 'bot that queries airlines for availability, flight by flight, mainly through CRS's although in some cases through direct connections to airlines' in-house reservation systems, to compile a cache of availability information. This process has been described to me by ITA Software CEO Jeremy Wertheimer, and in ITA Software's patents and pending patent applications. (I see Wertheimer each year at the PhoCusWright conference, and I've pressed him on how often a new query is made to update the cache for each flight. Wertheimer won't say, but it appears to be measured in hours for most flights, probably less for some of the flights of greatest interest in the next few days or weeks, and perhaps as infrequently as daily or less for some flights in other parts of the world of little interest to ITA Software's core customer base in the USA.)
- A database and index of the cached responses to these availability queries.
- A search module that responds to user queries with guesses about current availability made on the basis of that index and cache, without the need to query any external data sources unless and until the user tries to confirm reservations on specific flights on the basis of an option offered from the cache.
Through the clever kludge of the CRS crawler and availability cache, ITA Software transforms a real-time problem of third-party queries into a simpler search of a locally resident and already indexed database.
[This description is, of necessity, somewhat simplified, but I fear that greater detail would render it incomprehensible to anyone outside the industry.]
What's perhaps most obvious about this methodology is how closely analogous it is to Google's approach to the problem of "searching" constantly-changing Web sites not stored on Google's servers. Rather than try to query potentially responsive Web pages in real time in response to user search requests, Google conducts a periodic "crawl" of HTTP queries of third-party pages, constructs a "cached" database of responses, indexes that cache database, and searches the index -- not the cache and certainly not the Web itself -- in response to each user query. Only when you click through the search results to the Web site do you see the current page content or find out if it is still the same. It's clear why the approach adopted by ITA Software would seem particularly logical and appropriate to Google's engineers. ITA Software's key problem is also like Google's: How do you index dynamically-generated or personalized Web pages, or real-time dynamic responses to availability queries?
Eliminating real-time availability queries except for customers who have already agreed to a price estimate for specific flights, and are ready to make reservations, saves money for ITA Software and its customers , who are airlines and travel agents -- travellers aren't its customers. Travel agents -- including ITA Software's online travel agency customers -- are charged a fraction of a cent for each query or command they execute from the command line. Human travel agents can't execute commands fast enough for the charges to justify fundamental changes in their procedures, but they can be prohibitive for an online travel agency with a high "look to book" ratio making rapid-fire robotic queries on behalf of comparison shoppers only a small percentage of whom complete purchases.
ITA Software doesn't (yet) host any of the airlines' reservation databases or operate their availability-decision systems. [Update: Several readers have pointed out that this may no longer be entirely true, depending on the manner in which ITA Software's "Dynamic Availability Calculating System" (DACS) has been deployed and is being used as a replacement for, rather than merely an emulator or mirror of, airlines' "legacy" availability management systems.] Reservation database hosting and availability management is either outsourced to CRS's (by most airlines) or handled in-house. Like other CRS users, ITA Software pays per-query fees to compile or update its cached pseudo-availability database. That has several consequences:
- There are enormous economies of scale and barriers to entry for a would-be competitor using the same methodology, since the same number and cost of queries is required to build the pseudo-availability cache regardless of how many people are using the system. It's unclear if the whole concept would have been commercially viable without a launch customer for ITA Software with the sales volume of Orbitz.com.
- ITA Software has a substantial financial incentive to query availability as infrequently as it thinks it can get away with, exploring the limits of consumers' willingness to put up with seemingly "bait and switch" results when what appears to be an offer to sell a ticket turns out to be only an estimate based on an outdated availability cache or incorrect availability projections from responses to past queries. (As an aside, one of the things the USA Department of Transportation has yet to address in its failure to enforce truth-in-advertising law in the sale of airline tickets is the misleading labeling of price and availability estimates as though they were firm offers to sell at a specific price.)
- ITA Software has an even greater financial incentive to eliminate these CRS and airline query fees entirely by developing an airline reservations hosting and availability decision-making ("revenue management") capability of its own, and wooing airlines away from existing CRS's or airlines' in-house systems. Currently, ITA Software uses a bombardment of individual queries to try to assemble an inevitably-imperfect copy, constantly being rendered out-of-date, of each airline's willingness to confirm reservations on each flight in each possible booking class. If ITA Software were hosting or operating that system itself for a particular airline, that entire process would be unnecessary.
This last point is perhaps the most significant: ITA Software's technical approach to the airline ticket pricing problem has created a particular compulsion -- independent of any interest in the CRS, airline hosting, or revenue management problems (not that they aren't all interesting and hard) or belief that they could build a better CRS or hosting platform -- for vertical integration with airline hosting and revenue/availability management.
While ITA Software might see its availability caching and prediction systems as "clever", a critic might see them as a kludge to adapt database search techniques to an information ecosystem in which ITA Software has only indirect query-based access to (constantly changing) third-party databases. But both ITA Software and its critics would likely agree that the ultimate solution for ITA Software lies in vertical integration with hosting/CRS functionality, to give ITA Software direct (non query-based) access to real-time availability for hosted airlines. The analogy for a search provider like Google would be Web sites hosted by Google, which Google doesn't need to query and "crawl" because it already has them on its servers.
When ITA Software got US$100 million in 2006 in its last round of venture capital investment before the sale to Google, its main use of the money was to try to develop its own airline hosting system to eliminate the need for pseudo-availability caching. ITA Software claims it now has an airline reservation hosting system ready to launch. But Air Canada, who was to be the launch customer, backed out, and ITA Software hasn't yet found any other airline willing to risk its operations and revenue stream to beta test a new provider for the most critical component of its IT infrastructure.
Now Google has stepped in, US$700 million in cash in hand, to buy ITA Software. What will happen next? And what will this deal mean for travellers? Stay tuned for Part 3 and my conclusions tomorrow.Link | Posted by Edward on Tuesday, 13 July 2010, 15:15 ( 3:15 PM) | TrackBack (1)