Load balancing is a technique used to distribute workloads
uniformly across servers or other compute resources to optimize network
efficiency, reliability and capacity.
Load balancing is
performed by an appliance -- either physical or virtual -- that identifies in
real time which server in a pool can best meet a given client request, while
ensuring heavy network traffic doesn't unduly overwhelm a single server.
In addition to maximizing
network capacity and performance, load balancing provides failover. If one
server fails, a load balancer immediately redirects its workloads to a backup
server, thus mitigating the impact on end users.
Load balancing is usually
categorized as supporting either Layer 4 or Layer 7. Layer 4 load balancers
distribute traffic based on transport data, such as IP addresses and
Transmission Control Protocol (TCP) port numbers. Layer 7 load-balancing
devices make routing decisions based on application-level characteristics that
include HTTP header information and the actual contents of the message, such as
URLs and cookies.
Layer 7 load balancers are more common, but Layer 4 load balancers remain popular, particularly in edge deployments.
How load balancing works
Load balancers handle incoming requests from users for
information and other services. They sit between the servers that handle those
requests and the internet. Once a request is received, the load balancer first
determines which server in a pool is available and online and then routes the
request to that server. During times of heavy loads, a load balancer can
dynamically add servers in response to spikes in traffic. Conversely, they can
drop servers if demand is low.
A load balancer can be a physical appliance, a software instance
or a combination of both. Traditionally, vendors have loaded proprietary
software onto dedicated hardware and sold them to users as stand-alone
appliances -- usually in pairs, to provide failover if one goes down. Growing
networks require purchasing additional and/or bigger appliances.
In contrast, software load balancing runs on virtual machines (VMs) or white box servers, most likely as a function of an application delivery controller (ADC). ADCs typically offer additional features, like caching, compression, traffic shaping, etc. Popular in cloud environments, virtual load balancing can offer a high degree of flexibility -- for example, enabling users to automatically scale up or down to mirror traffic spikes or decreased network activity.
Load-balancing methods
Load-balancing algorithms determine
which servers receive specific incoming client requests. Standard methods are
as follows:
The hash-based approach calculates a given client's preferred server based on designated keys, such as HTTP headers or IP address information. This method supports session persistence, or stickiness, which benefits applications that rely on user-specific stored state information, such as checkout carts on e-commerce sites.
The least-connections method favors servers with the fewest ongoing transactions, i.e., the "least busy."
The least-time algorithm considers both server response times and active connections -- sending new requests to the fastest servers with the fewest open requests.
The round robin method -- historically, the load-balancing default -- simply cycles through a list of available servers in sequential order.
Formulas can vary significantly in sophistication and
complexity. Weighted load-balancing algorithms, for example, also take into
account server hierarchies -- with preferred, high-capacity servers receiving
more traffic than those assigned lower weights.
Comments
Post a Comment