Finding the Closest Element in a Sorted Array

November 02, 2024

I recently worked through a leetcode-style question I found quite interesting. Here's the problem:

Given a sorted array of integers arr, determine which element of arr is closest to all other elements in arr. If there are several possible answers, output the smallest one.

The first thing I thought of reading the question was norms. I know that norms are used to determine the distance between vectors. The task does not compare vectors but integers, but the idea is the same - whatever number minimizes the $l_1$ norm is what I'm supposed to return.

Intuitively, I thought of the mean of the list - that's the value with the closest distance to all values in the list. However, the mean of the list is not necessarily an element of the list - if arr = [1, 2, 4], then the mean of arr is $(1+2+4)/3 = 2.33$ , a number that is not an element of arr.

Peeking Into The Maths

Searching the internet, I found a great discussion Mathematics StackExchange. User Royi delivers what I believe is a beautiful explanation: You can notate the problem as

\arg \min_{x} \sum_{i = 1}^{N} \left| {a}_{i} - x \right|,

where arr has $N$ elements, $a_i$ is the $i$ th element in arr, and $x$ is the number we search for. To find the extrema, I first form the derivative and then determine when the derivative equals 0. The derivative of the $l_1$ norm is the sign-function $\frac{d |x|}{dx} = sign(x)$ , a function that returns $1$ if $x>0$ , $-1$ if $x<0$ , and $0$ if $x=0$ . The extrema is reached when $\sum_{i = 1}^{N} \operatorname{sign} \left( {a}_{i} - x \right) \stackrel{!}{=} 0$ .

In other words, I search an $x$ in arr that divides all values in arr into two subsets: one consisting of values that are larger than $x$ , and the other consisting of values smaller than $x$ . If I found an $x$ for which both subsets have the same number of elements, I found the $x$ I'm looking for.

Writing A Solution

Let's see this in action with some examples. When arr = [1, 2, 3, 4, 5], then I can pick $x=3$ and form a subset of smaller values $\{1,2\}$ and a subset of larger values $\{4,5\}$ .

However, if arr contains an even number of elements, e.g. arr = [1, 2, 3, 4], then the solution is ambiguous: Neither $x=2$ nor $x=3$ separate the set into subsets of equal size! As a result, the sums of distances to other values in arr are equal, $x=2 \rightarrow 1+1+2 = 4$ and $x=3 \rightarrow 2+1+1 = 4$ .

With both values being possible answers to the question, the problem statement comes to the rescue:

[...]If there are several possible answers, output the smallest one.

Since arr is sorted, the solution in Python code is elegantly simple - just find the middle of the list:

def solution(arr):
    k = len(arr)
    if k % 2 == 1:  # odd
        mid = k // 2
    else:  # even
        mid = k // 2 - 1  # return smaller element
    return arr[mid]

Conclusion

The problem beautifully demonstrates how mathematical thinking can lead to surprisingly simple solutions in programming. What initially appeared to be a complex optimization problem of sums and distances was reduced to finding the middle of a sorted array.

By Philipp Jung, data engineer and machine learning researcher.