Basic Data Structures
Overview
This post will cover several data structures which are important for coding interview. Definitions, implementations, and common usages will be discussed. The following data structures are included:
Arrays and Strings
- Hash Tables
- ArrayList
- StringBuilder
- Linked Lists
Stacks and Queues
- Stack
- Queue
Trees and Graphs
- Tree
- Binary Heaps
- Tries
- Graph
1. Arrays and Strings
Hash Tables
A hash table is a data structure that maps keys to values and allows highly efficient lookup.
- Implementation 1 (e.g.
HashMap
): an array of linked lists. The key is hashed to compute the index in the array, and use a linked list of key-value pairs at this index (collision). Generally, a good implementation (low collision rate) hasO(1)
time complexity for insertion and lookup. - Implementation 2 (e.g.
TreeMap
): balanced binary search tree (BST). The time complexity isO(log N)
. The benefit is using less space compared to the implementation 1 (need a large array allocation), and enabling the iteration according to the key values.
ArrayList
Arrays are automatically resizable and grow when iterms are appended.
A typical implementation (e.g. ArrayList
) is that when the array is full, the array doubles in size. Although each doubling takes O(N)
time, the amortized insertion time is O(1)
.
StringBuilder
StringBuilder
in Java represents a mutable sequence of characters. It is more efficient to concatenate strings compared to String
. StringBuilder
simply creates a resizable array of all the strings, copying them back to a string only when necessary.
StringBuilder
class differs from the StringBuffer
class on the basis of synchronization. The StringBuilder
class provides no guarantee of synchronization whereas the StringBuffer
class does. Where possible, it is recommended that StringBuilder
be used in preference to StringBuffer
as it will be faster under most implementations.
2. Linked Lists
A linked list (LinkedList
) is a data structure that represents a sequence of nodes. It does not provide constant time access to a particular "index". However, the benefit is that you can add and remove items from the beginnin of the list in constant time, compared to a arraylist.
Common technique
- "Runner" technique (two pointers): use two pointers, with the "fast" pointer ahead of the "slow" pointer by a fixed amount, or it might be hopping multiple nodes for each one node that the "slow" pointer iterates through.
- Recursive technique
3. Stacks and Queues
Stack
A stack (Stack
) uses LIFO (last-in first-out) ordering. Unlike an array, a stack does not offer constant-time access to the i
th item. However, it does allow constant time adds and removes.
A stack can be implemented using a linked list structure while keeping track of the "top" stack node. Simply update this "top" node when doing pop()
and push()
operations.
A stack offers an intuitive way to implement recursive algorithms. You need to push temporary data onto a stack when you recurse, but then remove them as you backtrack. A stack can be used to implement a recursive algorithm iteratively.
Queue
A queue (Queue
) implements FIFO (first-in first-out) ordering.
A queue can be implemented with a linked list while keeping track of the "first" and "last" node. Both "first" and "last" nodes are needed to be considered for updates when performing offer
and poll
operations.
A queue can be helpful in breadth-first search (BFS) or in implementing a cache. In BFS, a queue holds a list of nodes to be processed (visited). Each time we process a node, we add its adjacent nodes to the back of the queue. This allows us to process nodes in the order in which they are viewed.
4. Trees and Graphs
Trees
Simply put, a tree is a connected graph without cycles. Note that for a connected graph without cycles, in theory, we can choose any node as the root.
Typically, a tree has one root node, which has zero or more child nodes. Each child node has zero or more child nodes. A node without childs is called a "leaf".
Binary Tree: each node has up to two children.
There are two types of binary tree of interests:
- Binary Search Tree: the value of the a node must be larger than values of its left subtree, and smaller than values of its right subtree.
- Balanced Binary Tree: the height of the two subtrees (the left subtree and the right subtree) of every node differs by at most one. Two common types are red-black trees and AVL trees.
Binary Tree Traversal
- In-Order Traversal: for each node, we always visit its left subtree, the current node, and finally its right subtree. When performed on a binary search tree, it the nodes in ascending order.
- Pre-Order Traversal: for each node, we always visit the current node, then its left subtree and finally its right subtree. It visits the root first.
- Post-Order Traversal: for each node, we always visit the left subtree, then right subtree and finally the current node. It visits the root last.
Refer to here for more operations on the binary tree.
Binary Heaps (Min-Heaps and Max-Heaps)
A min-heap (PriorityQueue
) is a binary tree where each node is smaller than its children. The root is the minimum element in the tree.
Operations:
- insert: we insert the new element at the bottom of the tree, and bubble up the new element by swapping it with its parent if its value is smaller than its parent. Time complexity:
O(log N)
. - extract minimum element: we remove the minimum element at the root and swap it with the last element in the heap (the bottommost, rightmost element), then bubble down this element by swapping it with one of its children. Note that we need to pick the smallest one among its children. Time complexity:
O(log N)
.
Tries (Prefix Trees)
A trie is a variant of an n-ary tree where characters are stored at each node. Each path down the tree may represent a word. The "null" node (or terminating node) is used to indicate a complete word.
A trie is used for quick prefix lookups, while a hash table is good at checking whether a string is a valid word (it cannot efficiently tell whether a string is a prefix of any valid words).
Graphs
A graph is simply a collection of nodes with edges between them. A graph can be directed or undirected.
Connected graph: there is a path between every pair of vertices.
Acyclic graph: a graph without cycles.
Representation (Implementation)
- Adjacency List: for each vertex (node), we maintain a list of its adjacent vertices. For an undirected graph, an edge would be stored twice. A customized class or an array of lists can be used to represent a graph.
- Adjacency Matrices: a
N * N
matrix (where N is the number of nodes), where atrue
(or1
) atmatrix[i][j]
indicates an edge from nodei
to nodej
. For an undirected graph, an adjacency matrix will be symmetric. Adjacency matrix is less efficient and takes more space compared to the adjacency list.
Graph Search
Depth-First Search (DFS): we start at the root and explore each branch completely before moving on to the next branch. DFS can be implemented with a recursive algorithm (recall the tree traversal). For a directed graph, DFS can be very helpful to find the ancestor or descendant of a node, which is useful for the topological sort.
Breadth-First Search (BFS): we start at the root, and explore each of its neighbor before going on to any of their children. BFS tends to stay close to the current node as long as possible. BFS is preferred if we want to find the shortest path between two nodes. BFS can be implemented with an interative algorithm with a queue holding nodes to be visited.
Note that when traversing a graph, we must check if a node has been visited.
The post is published under CC 4.0 License.