Below is an interview with Werner Vogels, CTO@AWS.
http://www.se-radio.net/2006/12/episode-40-interview-werner-vogels/ (my local copy).
First some quotes:
“Ultimately developers don't care about SOAP and REST, as long as the data is delivered. You don't need to buy into some big revolutionary approach” - Werner Vogels
"If a problem has no solution, it may not be a problem, but a fact, not to be solved, but to be coped with over time"
— Shimon Peres ("Peres’s Law")
Some notes from interview
Systems become heterogenous over time
As systems grow, hardware become more heterogeneous, not necessarily because of change of vendors, but because faster machines with more RAM are introduced. Software must not be designed for homogeneous hardware.
Propability of failure
Larger systems fail more often, so resilience is important. Automation is important in minimizing downtime. The aim is for self healing systems, and things like fast reboots for when things invariably go down. At amazon scale stuff fails, so deal with it. Focus on things that help you recover fast instead of only trying to avoid failure. This is also the message from the Google High Replication Datastore team (see video).
Non-growth in staff
Larger systems must not require a linear growth in personel. Common tasks must be increasingly automated, so that the same amount of personel who administered 100 machines can also administer 1000 machines. Realistically two people should be able to administer thousands of machines.
"Automation doesn’t remove human influence. It shifts the burden from operator to designer. Designers are human too, and make mistakes - J. Reason, Human Error, Cambridge University Press, 1990"
Unit of work
In order for a system to scale, the unit of work must become cheaper as system scales up. For example, maintenance of individual machines in a data center, component replacement, etc. must become more efficient.
Energy consumption
Energy is an economical issue. As much as 50%-60% of power may be lost in electrical components, so investments in economical hardware is ultimately important. Amazon does not use commodity hardware like Google does, which means that hardware can be designed to be more power efficient.
"No need for big revolutions in distributed systems" - Werner Vogels
SOAP and REST
Amazon wants to build an open community. The philosophy is to create an infrastructure of open services that give value to customers and an ecosystem will form around it.
In a given request for a page, one hundred services may be invoked. At Amazon, data can only be accessed via services. Also internally.
Werner Vogels view on web services is: it's not hard to send some XML over the wire! Therefore, there is no need to treat it as something complicated. Amazon offers both SOAP (30% of traffic) and REST (70% of traffic) services externally. SOAP is generally used by Java/.NET developers, while REST generally is used by PHP/Perl developers. Ultimately developers don't care, as long as the data is delivered. You don't need to buy into some big revolutionary approach. Internally services are described in WSDL (the interview was conducted in 2006).
Enterprise state management
In enterprise development everything is centered around state management. The operations themselves are most often stateless, except for the Amazon checkout process which is inherently stateful.
Tools
The right set of tools provide the foundation of a robust SOA. Tools exist that can manage the build process, deploy code and monitor resources. Tools in the deployment process ensure that dependencies are resolved. Werner hints that these tools are very big!