Topics - Searching and Sorting, CPSC 331, Winter 2007

Searching and Sorting

Synopsis

Searching and sorting are generally considered to be fundamental problems in computer science: A wide variety of applications require that these problems be solved, so that a huge amount of computing time is devoted to this. Furthermore, a study of searching and sorting can be used to introduce several interesting techniques for the design, implementation, and analysis of algorithms.

We will devote several weeks of this course to this topic.

Algorithms for Searching

We will begin this part of the course by studying algorithms for searching in unsorted and sorted arrays. Notes based on the lecture presentation of this topic are now available. A tutorial exercise on algorithms for searching is now available as well.

Algorithms for Sorting

We will continue by discussing algorithms to sort an array of size n.

Three classical algorithms — “Insertion Sort,” “Selection Sort,” and “Bubble Sort” — are well known and often studied in a data structures course. These all have the same worst-case performance — they each use Θ(n²) operations in the worst case. Notes based on the discussion of classical sorting algorithms in lectures are now available.

We will continue by studying several newer — and asymptotically faster — sorting algorithms.

“Merge Sort” is a recursive algorithm (using “Divide and Conquer”) that can be used to sort an array using Θ(n log n) operations. The lecture notes on Merge Sort are now available online.

A tutorial exercise on sorting, including the classical sorting algorithms and Merge Sort, is now available.

“Heap Sort” is another algorithm that can sort using Θ(n log n) operations. This uses a data structure — a “binary heap” — that is worthy of study in its own right. We will discuss heaps, the Heap Sort algorithm, and a generalization of a Queue called a “Priority Queue” that can be implemented using a heap.

Lecture notes on binary heaps, Heap Sort and priority queues are now available online. A tutorial exercise on heaps and Heap sort is now available as well.

Finally, another Divide and Conquer algorithm — “Quicksort” — will be studied. The worst-case performance of this algorithm is not particularly good — it uses Θ(n²) operations in the worst case, just like Insertion Sort, Selection Sort, or Bubble Sort. However, the expected performance of this algorithm (under a reasonably plausible assumption) is much better — the expected number of operations used is in Θ(n log n). There is also a “randomized” version of this algorithm whose running time depends on the choice of random values that it makes when it is run. The expected number of operations used by this algorithm “in the worst case” (that is, when run on any array of length n) is in Θ(n log n), provided that the entries of the input array are distinct

Lecture notes and a tutorial exercise on Quicksort are now available online.

Quicksort seems to be the fastest of these algorithms in practice, even though the deterministic version of the algorithm studied in this course uses Θ(n²) operations.

References

The “classical” sorting algorithms (Insertion Sort, Selection Sort, and Bubble Sort) do not have a section of the chapters on sorting devoted to them. However, Insertion Sort is presented and analyzed in Chapter 2 (“Getting Started”), and Bubble Sort is considered in an exercise at the end of this chapter.

The “Merge Sort” algorithm is treated in much the same way: A discussion of this algorithm can also be found in Chapter 2.

Heap Sort and Quicksort are the subjects of Chapters 6 and 7 of the text, respectively.

Expectations for Students

After studying this topic in this course (including attending lectures, completing assigned reading and working through the exercises), students should understand and be able to do the following.

Students should know the definition of each of the “Searching” and “Sorting” problems.
Students should be familiar with each of the search algorithms (including Linear Search in unsorted and in sorted arrays as well as Binary Search) and each of the sorting algorithms (Insertion Sort, Selection Sort, Bubble Sort, Merge Sort, Heap Sort and Quicksort) that have been presented. In particular,
- students should be able to apply these algorithms by hand on small inputs;
- students should be able to describe (in plain and simple English) how these algorithms work and give pseudocode for them;
- students should be able to write programs in Java — and, ideally, several other programming languages as well — that implement these algorithms.
- students should be able to use the searching and sorting programs included in the Java Collections framework to solve problems;
- students should understand the worst case performance of these algorithms, as well as the expected performance of Quicksort — and the assumptions about the inputs used to provide an average case analysis of this sorting algorithm.
Students should understand the definitions of the “Priority Queue” abstract data type and the “heap” data structure that can be used to implement it.

As part of this, students should understand operations on heaps and algorithms that implement these operations in the same way that they understand sorting algorithms (as described, immediately above).

Questions for Review

Searching

Describe each of the algorithms for searching that were presented in this course:
- two versions of Linear Search
- Binary Search
You should be able to describe how these algorithms work and — given sufficient time — give pseudocode for them as well.
All of the above algorithms were presented as algorithms that could be used to search in an array.

Which of the algorithms require the input to be sorted? What might happen if the algorithm is run on an unsorted array, instead?

Which, if any, of the above algorithms could be used to search in an unsorted linked list? Which could be used to search in a sorted linked list?
What is the worst case running time of each of the above algorithms?

Sorting

Three “classical” sorting algorithms (that each require Θ(n²) operations to sort an array of size n in the worst case) have been described:
- Insertion Sort
- Selection Sort
- Bubble Sort
How do each of these algorithms work — that is, what do they each do in order to sort an array? What are the relative strengths and weaknesses of these algorithms?
Describe an algorithm that can be used to merge two sorted arrays. This should produce a single sorted array, including all the entries of the original arrays, using a number of operations that is linear in the sum of the sizes of the input arrays.
Describe the use of the above algorithm to sort (that is, describe the “Merge Sort” algorithm). What is the worst-case running time of this algorithm?
What is a binary heap? How can it be represented using an array?

What kind of operations on these heaps — including operations to insert or delete elements — do these data structures support? How are these operations performed, and what is their worst-case cost?
What is a priority queue? Describe an application that could be solved using a priority queue.
Describe the “Heap Sort” algorithm. What is the number of operations used by this algorithm to sort an array of size n in the worst case?
Describe both the deterministic and the randomized versions of the “Quicksort” algorithm that were discussed in class.

How many operations are used by the deterministic Quicksort algorithm to sort an array of size n in the worst case?

What is the expected number of operations used by the deterministic Quicksort algorithm, if the elements of an input array are distinct and all n! relative orderings of values within the array are equally likely?

What is the “expected worst-case running time” of the randomized Quicksort algorithm, if the entries of its input array are distinct?

How do these algorithms behave when the input arrays include numerous copies of the same value? How can this behaviour be improved?

This page last modified:
http://www.cpsc.ucalgary.ca/~jacobs/cpsc331/W07/topics/searching_and_sorting.html

Topics, CPSC 331, Winter 2007