Amortized Analysis Example, CPSC 331, Winter 2012

Amortized Analysis of Operations on Dynamic Arrays

Overview

During the lectures on basic data structures it was noted that various operations on Dynamic Arrays can be implemented to have constant “amortized time.” In other words, even though the cost of a single operation might be greater than constant, these expensive operations are infrequent — so that, for any positive integer n, the sum of the cost of the first n of these operations is in O(n), and, therefore, the average cost of these operations is in O(1) in the worst case. In particular, Java’s online documentation of theArrayList class promises that this is true for implementations of the add operation, which adds a new element onto the end of the array, without saying how this is done.

The goal of this document is to describe one implementation of the add operation that meets this performance goal and to provide a proof of this. While amortized analysis is not generally discussed in this course in any detail, this result is important enough to be included (at least, as recommended reading). Furthermore, all the other claims about the amortized cost of operations on data structures that will be made, later on in this course, can be proved by modifying the algorithm and the analysis that are given here.

Students who find this to be interesting and who wish to read more about “amortized analysis” should consult Chapter 17 of Cormen, Leiserson, Rivest and Stein”s “Introduction to Algorithms.” The accounting method for amortized analysis, which is used in the following analysis, is described there.

Several of the results that are presented below can be proved using one proof technique that is discussed in MATH 271 or 273, namely, mathematical induction. You should be able to use mathematical induction to prove these results.

Operations, Their Implementations, and Their Real Costs

Java’s online documentation of the ArrayList class indicates that this structure is implemented using a static array. Indeed, the third paragraph of the current version of this documentation is as follows.

“Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time.”

For most of this document, we will consider the behaviour of an instance of ArrayList that is constructed and used in the following way.

The ArrayList is created using the default constructor (with no arguments), so that its initial capacity is ten.
A sequence of n additional operations is performed, for some positive integer n. Each of these operations is one of the following.
- An isEmpty test, which returns true if and only if the ArrayList does not include any elements.
- An application of size, to report the number of elements that are currently in this ArrayList.
- An application of set, which replaces the element at a specified position with a specified element.
- An application of get, which returns the element at a specified position.
- An application of add to increase the size of this ArrayList by appending a specified element onto the end of it.
Suppose that exactly m of these operations are applications of add.

We will assume (as is perfectly reasonable) that, at any point in time, our ArrayList is being implemented using some static array whose length is the current capacity of our ArrayList. For each nonnegative integer i that is less than the current size of the ArrayList, the element at position i of the ArrayList is stored at position i of the static array.

We may also assume that, for each integer i that is greater than or equal to the current size and less than the current capacity, the element at position i of the static array is the default value for the base type of the ArrayList.

The capacity is now easily computed, as the length of the underlying static array. The current size should be maintained as a separate instance variable whose initial value is 0 and whose value is incremented every time an add is performed.

It should be easy to see that if the ArrayList is represented in this way then the operations isEmpty, size, set and get can each be implemented so that they use at most a constant number of steps in the worst case. We will assume, for the rest of this analysis, that each isEmpty, size, set or get operation uses at most c_E steps in the worst case (where, here, the subscript E stands for “Easy”).

Under these circumstances it is easy to bound the total cost of all isEmpty, size, set or get operations in the sequence of operations we are analyzing. Since there are n operations in total and exactly m of these are not isEmpty, size, set or get, there are exactly n−m such operations, and it is clear that these use at most (n−m)c_E steps, in total, in the worst case.

Now consider an add operation. It is clearly always the case that size≤capacity; two cases must now be considered.

size < capacity
size = capacity

If size < capacity then there is room, in the static array currently being used to represent this instance of ArrayList, to include the element that is to be added: We simply write this into position size of the array and increment the value of size. This operation can be carried out using a constant number of steps; suppose therefore that at most c_N steps are used to perform an add operation when size<capacity (here, the subscript N stands for “Normal”).

On the other hand, if size = capacity, so that the currently used static array is already full, then it is necessary to create a new, larger array, copy the initial size entries of the old array into the corresponding positions of the new one, write the given new element into position size of this new array, and ensure that the remaining positions of the new array are filled with the default value for the base type.

It is not clear how the length of the new array (that is, the new capacity of this ArrayList) should be related to the old one, except for the fact that the new capacity must be larger than the old one.

Exercise: It is reasonable to assume that no more than one value can be copied in a single step, so that at least k steps must be used by the above operation where k is the old value of capacity. Using the above assumption, show that it is a very bad idea just to add one to the capacity during this operation, if you want the amortized cost of the add to be small. In particular, show that if n=m and n ≥ 20 — that is, every operation is an add operation, then the total number of steps needed for this sequence of operations is at least n²/6, so that the amortized cost is in Ω(n).

For the rest of this analysis we will consider the case that the capacity is doubled every time an add operation causes the capacity is increased. Since the new capacity is linear in the old one it is reasonable to assume that the number of steps used by this operation is at most c_Xk, where k is the older (smaller) capacity and for some positive constant c_X (and where you can think of the subscript X as standing for “eXtra”).

The Accounting Method for Amortized Analysis

In the accounting method for amortized analysis we associate a virtual cost, or “payment,” to each operation. We also keep the balance of a bank account, or “pool.”

Ideally the virtual cost is only slightly higher than the real cost of most operations, and much lower than the cost of the unusually expensive operations.

The initial balance in the bank account is 0.

Every time we carry out an operation we will also modify the balance of the bank account:

If the real cost of the operation r (that is, the number of steps used by it) is less than the virtual cost v of the operation, then we deposit the difference, v − r, into the bank account — increasing its balance by this amount.
If the real cost of the operation r is greater than the virtual cost v. then we withdraw the difference, r − v, from the bank account — decreasing its balanced by this amount.
Finally, if the real cost of the operation r is equal to the virtual cost v, then the bank account is not accessed and its balanced is not changed.

In other words, the bank account’s balance is increased by exactly v − r (which is postive, negative, and\ zero, respectively, in the above three cases) when each operation is performed.

Exercise: Let n be a nonnegative integer, and suppose that

C_n is the sum of the real costs of the first n operations that are performed,
V_n is the sum of the virtual costs of the first n operations that are performed, and
B_n is the balance of the bank account immediately after the first n operations that are performed.

Prove that

V_n = C_n + B_n

for every integer n ≥ 0.

Notice that this implies the following: If our bank acount is never “overdrawn” — that is, if it is always true that B_n ≥ 0 — then it follows from the above that

C_n ≤ V_n ≤ n × v_max

where v_max is the maximum of the virtual costs of any one of the first n operations.

Since the amortized cost of the first n operations is C_n/ n (by definition), this implies that the amortized cost of the first n operations is at most v_max, the maximum of the virtual costs of any of these operations.

The “accounting method” is based on this. It consists of the following three steps, which we will carry out below.

Define the virtual cost for each operation.
Prove that the bank account, described above, is never overdrawn — that is, prove that B_n ≥ 0, for every integer n ≥ 0, when the bank account is used as described above.
Conclude that the maximum of the virtual costs of any single operation is an upper bound for the amortized cost of the sequence of operations being studied.

Application of This Method

Virtual Costs of Operations

Virtual costs of operations can now be defined as follows.

The virtual cost of any isEmpty, size, set, or get operation will be set to be c_E, that is, the same as the upper bound for the real cost of each of these operations.
The virtual cost of any add operation before the first time that the capacity is increased is

c_N + c_X

but the virtual cost increase, slighty, after that: The virtual cost of any add operation after the first time that the capacity is increased (as well as that of the operation that causes this first increase in capacity) is

c_N + 2 c_X

instead.

Showing That the Bank Account is Never Overdrawn

The following result will be very helpful.

Claim 1: Let n and m be nonnegative integers such that

0 ≤ m ≤ n

and consider any sequence of n operations that include exactly m add operations.

The size of the ArrayList is never more than m during this sequence of operations, and the size is equal to m at the end of this sequence.

Exercise: Use mathematical induction to prove this. (You might actually need to use two proofs by induction, with one inside the other — probably using mathematical induction on n for the main proof, but including a proof using induction on m at some point, too.)

Suppose, first, that m is less than or equal to 10 or, more generally, the initial capacity of the ArrayList. Notice that the bank account’s balance is unchanged by any operation except for an add operation, because the real and virtual costs of these other operations are the same. On the other hand, it should be clear that the bank account’s balance is increased by (at least) c_X every time an add operation is performed. The following result is now easily established.

Claim 2: If m is a nonnegative integer such that m ≤ 10 (or, more generally, if m is less than or equal to the initial capacity of the ArrayList), and a sequence of operations including exactly m add operations is performed, then bank account balance is greater than or equal to zero after each operation, and the bank account balance is at least c_Xm at the end of the sequence.

Exercise: Prove this (using mathematical induction).

It follows from the above that the bank account balance is greater than or equal to 10c_X (or, more generally, the product of c_X and the intial capacity) immediately before the first add operation that causes the capacity to be increased. Thus the bank account balance is greater than or equal to the real cost of this add operation just before the operation and, since the virtual cost of this operation is 2c_X, the bank account balance is (only) greater than or equal to 2c_X immediately after this operation.

If the intial capacity is the default, 10, then the size is 11 and the new capacity is 20. More generally, if the initial capacity is a positive integer k, then the size of the ArrayList just after the first increas in capacity is k+1, and the the new capacity is 2k. In each case, the new size and the new capacity satisfy the following equation:

2 × size − capacity = 2 > 0.

Thus the bank account balance is greater than or equal to

(2 × size − capacity) × c_X

at this point.

Indeed, it is possible to prove (by induction on m) that the following is true as well.

Claim 3: If m is a positive integer such that m is greater than the initial capacity of the ArrayList, then the following properties hold when a sequence of operations, including exactly m add operations, is performed.

The balance of the bank account is greater than or equal to 0 after every operation that is performed.
The final size and the final capacity satisfy the following inequality:

2 × size − capacity > 0;
The balance of the bank account at the end of this sequence of operations is greater than or equal to

(2 × size − capacity) × c_X.

Conclusion

Since the bank account is never overdrawn it follows that the amortized cost of any sequence of operations is at most the maximum of the virtual costs of each operation. In this case this is

max(c_E, c_N + 2c_X) ∈ O(1).

On the other hand, each operation requires at least one step, so that the sum of the costs of the first n operations must be at least n, implying that the amortized cost is at least 1 ∈ Ω(1).

It follows from the above that the amortized cost of any sequence of operations on an ArrayList (whose initial size is 0) is in Θ(1).

Cost of Additional Operations

As indicated in Java’s online documentation of the ArrayList class, this class provides numerous additional methods. Not all of these are constant-time operations on average (let alone, in the worst case) so it is not reasonable to expect that the amortized cost of a sequence of operations that includes these could be in O(1). That noted, the real and amortized costs of a few of these operations are described below.

An Operation to remove an Element

The ArrayList includes an operation to remove an element. One version of this operation removes an element at a given position, shifting any elements of the ArrayList that are to the right of this position over by one to the left. Another version of this operation can be used to remove the first occurrence of a given element in the ArrayList (shifting other elements over again, as needed).

Before analyzing these versions we will consider a simpler version of a remove operation that is not described in the online documentation of an ArrayList, namely a remove operation that simply removes the rightmost element of the ArrayList, replacing it by the default value of the basic type of the ArrayList and decrementing the value of size by one (this operation should throw an appropriate exception, instead, if it is called when the ArrayList is empty).

It should be clear that the number of steps needed to carry out this simpler version of the operation is in Θ(1); indeed, we will use the constant c_N (that bounded the cost of an “inexpensive” add operation) as an upper bound for the real cost of this version of the remove operation (increasing the value of this constant if necessary).

Suppose that we use c_N as the virtual cost of this version of the remove operation as well — then, in the worst case, each of these operations will leave the bank account’s balance unchanged — it is possible that such an operation might increaase the bank account’s balance but it will never decrease it.

A problem, though, is that this “simple remove” operation changes the size of the ArrayList, so that Claims 1, 2, and 3 do not hold if sequences that include remove operations are considered. These claims must be replaced by Claims 1a, 2a, and 3a, respectively, where these are as follows.

Claim 1a: Let n and m be noonegative integers such that 0 ≤ m ≤ n, and consider any sequence of n isEmpty, size, get, add, and simple remove operations that includes exactly m add operations. The size of the ArrayList is never more than m during this sequence of operations.

Note that the above claim does not assert that the size is equal to m at the end of the sequence.

Claim 2a: Consider an execution of a sequence of n operations (of the type considered in Claim 1a) that includes exactly m add operations. Suppose that the capacity of the ArrayList is not increased by any of the operations in this sequence. Then the bank account’s balance is nonnegative after every operation in the sequence and it is greater than or equal to c_Xm at the end of the sequence.

Claim 3a: Consider an execution of a sequence of n operations (of the type considered in Claim 1a) that includes exactly m add operations. Suppose that the capacity of the ArrayList is increased by some operation in this sequence. Then the bank account’s balance is greater than or equal to

max ((2 × size − capacity) × c_X, 0)

after the first operation the increases the capacity of the ArrayList and after every operation that follows it.

Exercise: Prove the above claims using mathematical induction.

Since the bank account’s balance is always nonnegative and the virtual cost of every operation is still bounded as above, it follows that the amortized cost of sequences of operations that include simple remove operations is in Θ(1) as well.

Operations To Search for Elements

A Java ArrayList also supports an indexOf operation that returns the first index in the ArrayList of a given element, returning −1 if the element is not found.

The worst-case cost of such an operation is the same as its worst-case cost if it is performed on a static array: The number of steps used in the worst case, to carry out this operation, is at most

c₁ × size + c₀

for positive constants c₀ and c₁. It follows by the above claims that this cannot be more than c₁n+c₀. (Indeed, it cannot be this high, since the size of the ArrayList cannot be more than n−1, if the sequence of operations being considered includes an indexOf operation at all.)

In order to perform an amortized analysis for a sequence of operations that may include all the operations that have been discussed before this, as well as indexOf operations, it is important to know how many indexOf operations have been included.

Suppose, therefore, that we are considering a sequnence of n operations (of the types we have considered, so far) that includes exactly m add operations and exactly k indexOf operations, where k, m, and n are nonnegative integers such that k+m ≤ n.

Let us define the virtual cost of an indexOf operation to be the same as (our upper bound for) its real cost, that is, c₁×size + c₀. Then, proving modified versions of claims 1a, 2a, and 3a (that also allow indexOf to be used as an operation) is straightforward — virtually no change to the proofs would be needed.

In this case we can see that the total of the virtual costs of all the operations except for the indexOf operations is at most c_S×(n−k), where

c_S = max(c_E, c_N + 2c_X) ∈ O(1).

On the other hand, the total of the virtual costs of all of the indexOf operations is at most c₁nk + c₀k.

The total virtual cost — which is an upper bound for the total real cost — of all operations is, therefore,

c₁nk + c₀k + c_S(n−k) ≤ c₁nk + c_M n,

where c_M = max(c₀, c_S).

Since the amortized cost is the ratio of the total cost and n it follows that the amortized cost is at most

c₁k + c_M ∈ O(k+1).

Exercise: Show that, for all sufficiently large integers n and all nonnegative integers k such that k < n/2, there is a sequence of n operations, including k indexOf operations, whose total cost is at least d₁nk + d₂n for positive constants d₀ and d₁ (that do not depend on n or k).

Use this to argue that the amortized cost of such sequence is in Θ(k+1), in the worst case.

More General add and remove Operations

Java’s ArrayList also includes an add operation that takes a positive integer index as input, along with an element, and inserts this element into position index — moving elements over by one to the right as needed — or throws an exception if the given index is negative, or greater than the current size of the ArrayList.

Note that it is easy to check whether an exception should be thrown (since the size of the ArrayList is easy to obtain), and the entire operation can be performed using only constant time if an exception does need to be thrown. The virtual cost of this operation should be the same as a constant upper bound for the (worst case) real cost of the operation, in the case that an exception is thrown.

Now consider the cost of the operation when theindex is in range and the element should be inserted. One way to analyze the costs of sequences, that may now include this operation, is to break this add operation into two operations (with one carried out after the other), instead:

The given element is included using a simple add operation of the type that we have already considered;
The new element is moved forward by repeatedly exchanging it with the element in front of it, until its position is as given by the input index.

The first of these operations is one of the ones that we have already considered. The second is a linear time operation that can be handled in exactly the same way as an indexOf operation, above: Set its virtual cost to be the same as an upper bound for its worst case cost.

Java’s ArrayList also includes two versions of remove that allow you to specify either the index of the element to be removed, or the element itself. In the latter case, the first occurrence of the element in the ArrayList is to be removed.

Exercise: Express each of the above kinds of remove operations as a pair of operations, namely, an operation that can be performed using time that is linear in the size of the ArrayList (and that does not change the size), and a simple remove operation, that is, the kind of remove operation that we have analyzed already.

Now, once we have done this, we have consider a sequence of n operations (for a positive integer n) that includes exactly m add operations (possibly including some of the more general operations now being considered), and that include exactly k operations that are either

searches, that is, indexOf operations
add operations for which integer indices are specified, or
remove operations, for which either the index or the element to be removed is specified.

We can no longer assume that m+k ≤ n, since an add operation for which an index is given is in both the set of size m and the set of size k that are listed above. However, it is still clear that m ≤ n and that k ≤ n.

After replacing our k complex operations with pairs of simpler ones, we have a sequence of n+k operations of the type we have considered before this, whose total cost is the same as the total cost of the seqeunce we started with.

Since k ≤ n, so that n+k ≤ 2n, we can use the above information to conclude that the amortized cost of our original sequence of n operations is in Θ(k+1), once again.

Adjusting the capacity

Java’s ArrayList includes a trimToSize operation, which reduces the capacity of the ArrayList to be the same as its current size.

It is not clear what the real cost of this should be — Should we somehow include a charge to free up the space of the static array (of length capacity) that is now being discarded?

Since we are already counting the time needed to write default values whenever we allocate space, we will assume here that this is not necessary. Consequently we will assume that the cost of this is dominated by the cost needed to create a new, smaller static array than the one that is currently being used and to copy data into it. We will therefore assume that the cost of this operation is at most

c_{T, 1} × size + c_{T, 0}

for positive constants c_{T, 0} and c_{T, 1}.

This operation, should, presumably only be used when there is reason to think that the ArrayList is not going to get any larger — for if an add operation is performed without a remove operation being performed first, then it is guaranteed that this will be an expensive add operation that causes the underlying static array to be expanded again.

Consequently the virtual cost of this trimToSize should be set to be something that is larger than its worst case, namely,

(c_{T, 1} + c_X) × size + c_{T, 0}

Exercise: Using the information (including virtual costs) given above, show that the amortized cost of any sequence of n operations, including exactly m add operations and exactly k operations that are

searches,
add operations for which integer indices are specified,
remove operations, for which either the index or element to be removed is specified, or
applications of trimToSize

is in Θ(k+1).

Java’s ArrayList also includes operations that can be used to increase the capacity, including a second constructor that receives a positive integer (the desired initial capacity) and an ensureCapacity operation that can be used to increase the capacity of the current ArrayList.

Question: What is a realistic (and common) definition of the “size” of an input, when this input is a positive integer? What can be concluded about the cost of these operations (that can be used to increase capacity, using this information)?

Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/Courses/cpsc331/W12/handouts/lecture09-example.html