Reduce Array Size to The Half Solution In C++/Java/Python/Javascript
Problem Description
You are given an integer array arr. You can choose a set of integers and remove all the occurrences of these integers in the array. Return the minimum size of the set so that at least half of the integers of the array are removed.
Example
Input: arr = [3,3,3,3,5,5,5,2,2,7]
Output: 2
Explanation: Choosing {3,7} will make the new array [5,5,5,2,2] which has size 5 (i.e equal to half of the size of the old array).
Possible sets of size 2 are {3,5},{3,2},{5,2}.
Choosing set {2,7} is not possible as it will make the new array [3,3,3,3,5,5,5] which has a size greater than half of the size of the old array.
Input: arr = [7,7,7,7,7,7]
Output: 1
Explanation: The only possible set you can choose is {7}. This will make the new array empty.
Constraints
- 2 <= arr.length <= 10^5
- arr.length is even.
- 1 <= arr[i] <= 10^5
The goal is to remove the most frequent elements first to minimize distinct numbers efficiently. Like clearing a room, start with the biggest items for the quickest impact. Let’s explore!
Brute Force Approach
Intuition
Our goal is to reduce the size of an array to at most half its original length by removing the fewest number of elements. The key insight is that the most frequently occurring elements contribute the most to the array’s size, so removing them first will achieve the goal in the fewest steps.
To accomplish this, we first create a frequency map to count occurrences of each element. Sorting these counts in descending order allows us to prioritize removing the most frequent elements. When the size of the array is too large, removing elements one by one isn't feasible. Instead of randomly selecting elements, we systematically eliminate the ones that provide the biggest reduction in size.
Starting with the highest frequency elements, we remove them while keeping track of the total elements removed and the remaining array size. This process continues until the array size is reduced to at most half of its original length.
This approach ensures efficiency, especially for large arrays. By prioritizing the largest groups first, we reach the goal much faster and avoid unnecessary operations, making it a more optimal solution than a naive approach.
Reduce Array Size to The Half Algorithm
Step 1: Create Frequency Map
We start by creating a frequency map to count how many times each number appears in our array. This is crucial because we need to identify which elements appear most frequently. For each element in the array, we increment its count in our frequency map, giving us a clear picture of element distributions.
Step 2: Convert Frequencies to List
After counting frequencies, we extract just the frequency values into a list. We don't need to keep track of which number has which frequency anymore - we only care about the frequency values themselves. This simplifies our next step and makes sorting more efficient.
Step 3: Sort Frequencies
We sort the frequency list in descending order. This is a critical step because we want to process the most frequently occurring elements first. By sorting in descending order, we ensure we can remove the maximum number of elements with the minimum set size, which is exactly what the problem asks for.
Step 4: Track Removal Process
We maintain two important variables:
removed_count : Keeps track of how many total elements we've removed.
set_size : Counts how many unique numbers we've included in our set.
We also calculate our target, which is half the original array size (arr.size() / 2).
Step 5: Greedy Selection
We iterate through our sorted frequencies.For each frequency:
- Add the frequency to removed_count .
- Increment set_size by 1.
- Check if removed_count has reached or exceeded our target. This implements our greedy strategy of always taking the highest frequencies first.
Step 6: Generate Result
Once we've removed enough elements (reached our target), we return set_size. This represents the minimum number of unique elements needed to remove at least half of the array's elements.
Dry Run
Input: arr = {3, 3, 3, 3, 5, 5, 5, 2, 2, 7}
Step 1: Build Frequency Map
Scan array:
3 -> seen 4 times
5 -> seen 3 times
2 -> seen 2 times
7 -> seen 1 time
Frequency map: {3: 4, 5: 3, 2: 2, 7: 1}
Step 2: Store Frequencies in a Vector
Frequencies: [4, 3, 2, 1]
Step 3: Sort Frequencies in Descending Order
Sorted Frequencies: [4, 3, 2, 1]
Step 4: Reduce the Array Size
Initialize: removedCount = 0, setSize = 0, target = 5 (half of array size)
- First removal:
Remove all 4 occurrences of 3
removedCount = 4, setSize = 1
Array now has {5, 5, 5, 2, 2, 7} - Second removal:
Remove all 3 occurrences of 5
removedCount = 7, setSize = 2
Array now has {2, 2, 7}
Target achieved!
Output: Minimum set size is 2
Why This Works:
The frequency map accurately counts occurrences of each element.
Sorting the frequencies ensures we prioritize the most frequent elements.
Reducing the most frequent elements first quickly reduces the array size.
Code for All Languages
C++
// Approach: Reduce Array Size to Half
// Language: C++
class Solution {
public:
// This program calculates the minimum set size required to remove
// at least half the elements of the given array. It uses frequency
// counting, sorting, and greedy removal based on frequency.
int minSetSize(vector<int>& arr) {
// Step 1: Count the frequency of each element using an unordered_map
unordered_map<int, int> frequency;
for (int num : arr) {
frequency[num]++;
}
// Step 2: Store the frequencies in a vector
vector<int> freq_list;
for (const auto& entry : frequency) {
freq_list.push_back(entry.second);
}
// Step 3: Sort the frequencies in descending order
sort(freq_list.begin(), freq_list.end(), greater<int>());
// Step 4: Initialize variables to track progress
int removed_count = 0;
int set_size = 0;
int target = arr.size() / 2;
// Step 5: Remove elements with the highest frequency first
for (int freq : freq_list) {
removed_count += freq;
set_size++;
if (removed_count >= target) break; // Stop once half the array is removed
}
// Step 6: Return the minimum number of unique elements required
return set_size;
}
};
Java
// Approach: Reduce Array Size to Half
// Language: Java
import java.util.*;
class Solution {
// This program calculates the minimum set size required to remove
// at least half the elements of the given array. It uses frequency
// counting, sorting, and greedy removal based on frequency.
public int minSetSize(int[] arr) {
// Step 1: Create a frequency map to count occurrences of each element
Map<Integer, Integer> frequency = new HashMap<>();
for (int num : arr) {
frequency.put(num, frequency.getOrDefault(num, 0) + 1);
}
// Step 2: Extract frequency values and sort in descending order
List<Integer> freqList = new ArrayList<>(frequency.values());
freqList.sort(Collections.reverseOrder());
// Step 3: Remove elements greedily until at least half the array is removed
int removedCount = 0, setSize = 0, target = arr.length / 2;
for (int freq : freqList) {
removedCount += freq;
setSize++;
if (removedCount >= target) break;
}
// Step 4: Return the minimum set size
return setSize;
}
}
Python
# Approach: Reduce Array Size to Half
# Language: Python
from collections import Counter
class Solution(object):
# Function to find the minimum set size using sorting
def minSetSize(self, arr):
"""
:type arr: List[int]
:rtype: int
"""
# Count the frequency of each element
frequency = Counter(arr)
# Store the frequencies in a list
freq_list = list(frequency.values())
# Sort the frequencies in descending order
freq_list.sort(reverse=True)
# Variables to track the number of elements removed and the size of the set
removed_count, set_size = 0, 0
target = len(arr) // 2 # Target size: Half the original array size
# Remove elements with the highest frequency first
for freq in freq_list:
removed_count += freq
set_size += 1
if removed_count >= target:
break
# Return the minimum set size
return set_size
Javascript
/**
* Approach: Reduce Array Size to Half
* Language: JavaScript
*
* @param {number[]} arr - The input array of integers.
* @return {number} - The minimum set size required to remove at least half of the array.
*/
function minSetSize(arr) {
// Count the frequency of each element using a Map
const frequency = new Map();
for (const num of arr) {
frequency.set(num, (frequency.get(num) || 0) + 1);
}
// Store the frequencies in an array
const freqList = Array.from(frequency.values());
// Sort the frequencies in descending order
freqList.sort((a, b) => b - a);
// Variables to track the number of elements removed and the size of the set
let removedCount = 0, setSize = 0;
const target = Math.floor(arr.length / 2); // Target size: Half the original array size
// Remove elements with the highest frequency first
for (const freq of freqList) {
removedCount += freq;
setSize++;
if (removedCount >= target) break;
}
// Return the minimum set size
return setSize;
}
Time Complexity: O(n + k log k)
- Frequency Map Creation
— Traverse array once to build frequency map
— Each insertion in unordered_map takes O(1) average time
— Total time for frequency counting: O(n), where n is array size - Frequency Vector Creation
— Extract frequencies from map to vector
— Takes O(k) time where k is number of unique elements
— Each insertion into vector is O(1) - Sorting Frequencies
— Sort vector of k frequencies in descending order
— Uses C++ std::sort: O(k log k)
— k represents number of unique values in original array - Final Selection Process
— Single pass through sorted frequencies: O(k)
— Each iteration performs constant time operations - Overall Time Complexity
Total is O(n + k log k), combining frequency map creation O(n), vector creation O(k), and sorting O(k log k)
Space Complexity: O(n + k)
- Auxiliary Space Complexity: O(k)
— Frequency map stores k key-value pairs: O(k)
— Vector for sorted frequencies stores k elements: O(k)
— Counter variables (removed_count, set_size): O(1)
— Total auxiliary space: O(k) - Total Space Complexity
— Input array requires O(n) space
— Additional data structures need O(k) space
— k is always ≤ n (unique elements cannot exceed array size) - Final space complexity: O(n + k)
- Note: In worst case, all elements are unique (k = n). In best case, all elements are identical (k = 1).
So now, we’re thinking smarter: Instead of doing extra work, we’ll use a priority queue to keep pulling the most frequent elements until the job is done. This way, we save time and still get the right answer.
Optimal Approach
Intuition
Have you heard about the Heap data structure? It’s a specialized tree-based structure that’s super useful for a variety of applications.
So, Heap is a complete binary tree where every level, except possibly the last, is fully filled, and all nodes are as far left as possible. We have two main types of heaps:
Min Heap: The root node is always the smallest element in the heap.
Max Heap: The root node is always the largest element in the heap.
For our problem, a Max Heap is particularly useful because it allows us to efficiently access the element with the highest frequency without sorting the entire list.
So, let’s dive into how we can leverage a Max Heap to achieve our goal:
First, we need to understand the frequency of each element in the array. It’s like taking an inventory in a store to see how many of each item you have.
With this frequency data, we can shift our focus to the elements that show up the most. Just like if you’re trying to clear space in a crowded room, you’d start by moving the biggest piles of stuff first.
Here’s the clever part: Instead of sorting all the frequencies (which can be time-consuming), we use a Max Heap. A Max Heap acts like a super-organized assistant that always hands you the most needed item (in this case, the element with the highest frequency) right when you ask for it. This allows us to efficiently remove the most frequent elements without sorting the entire list.
By keeping a Max Heap, we save time because we don’t need to sort all the frequencies; we just get the top one as needed. The approach is intuitive because it aligns with our natural tendency to tackle the largest problem areas first.
In essence, by combining frequency counts with a Max Heap, we make our solution both faster and cleaner. It’s like having a streamlined process where everything falls into place smoothly.
Approach
Step 1: Initialize Frequency Map
We begin by creating an unordered_map called 'frequency' that will track how many times each element appears in the input array. We iterate through the array once, and for each element we encounter, we increment its count in our frequency map. This step is crucial for identifying which elements appear most frequently in our array.
Step 2: Create Max-Heap
Next, we create a priority queue (max-heap) to store the frequencies we collected. For each entry in our frequency map, we extract just the frequency value and push it into our max-heap. The max-heap automatically maintains these frequencies in descending order, ensuring that the highest frequency is always at the top. This ordering is essential for our greedy approach of removing the most frequent elements first.
Step 3: Initialize Tracking Variables
We set up three critical variables:
removed_count: Initialized to 0, tracks how many elements we've removed so far
set_size: Initialized to 0, keeps track of how many unique numbers we've included in our set.
target: Set to arr.size() / 2, represents how many elements we need to remove to achieve our goal.
Step 4: Removal Process
We enter a loop that continues until we've removed enough elements (removed_count >= target). In each iteration:
- Take the top (highest) frequency from the max-heap
- Add this frequency to removed_count
- Remove this frequency from the heap
- Increment set_size by 1.
Step 5: Return Result
After we've removed enough elements to meet or exceed our target, we return set_size. This value represents the minimum number of unique elements needed in our set to remove at least half of the original array's elements.
Dry Run
Input: arr = {3, 3, 3, 3, 5, 5, 5, 2, 2, 7}
Step 1: Build Frequency Map
- Scan array:
3 -> seen 4 times
5 -> seen 3 times
2 -> seen 2 times
7 -> seen 1 time
Frequency map: {3: 4, 5: 3, 2: 2, 7: 1}
Step 2: Push Frequencies into a Max-Heap
Frequencies: [4, 3, 2, 1]
Max-Heap: 4, 3, 2, 1
Step 3: Reduce the Array Size
Initialize: removedCount = 0, setSize = 0, target = 5 (half of array size)
- First removal:
Remove all 4 occurrences of 3
removedCount = 4, setSize = 1
Array now has {5, 5, 5, 2, 2, 7} - Second removal:
Remove all 3 occurrences of 5
removedCount = 7, setSize = 2
Array now has {2, 2, 7}
Output: Minimum set size is 2
Code for All Languages
C++
// Approach: Reduce Array Size to Half
// Language: C++
// This program calculates the minimum set size required to remove
// at least half the elements of the given array. It uses frequency
// counting and a max-heap (priority queue) to remove the most frequent
// elements first, ensuring the reduction is achieved efficiently.
class Solution {
public:
// Function to find the minimum set size using a max-heap
int minSetSize(vector<int>& arr) {
// Step 1: Count the frequency of each element
unordered_map<int, int> frequency;
for (int num : arr) {
frequency[num]++;
}
// Step 2: Push all frequencies into a max-heap
priority_queue<int> max_heap;
for (const auto& entry : frequency) {
max_heap.push(entry.second);
}
// Step 3: Initialize variables to track progress
int removed_count = 0;
int set_size = 0;
int target = arr.size() / 2;
// Step 4: Remove elements with the highest frequency first
while (removed_count < target) {
removed_count += max_heap.top();
max_heap.pop();
set_size++;
}
// Step 5: Return the minimum set size
return set_size;
}
};
Java
// Approach: Reduce Array Size to Half
// Language: Java
// This program calculates the minimum set size required to remove
// at least half the elements of the given array. It uses frequency
// counting and a max-heap (priority queue) to remove the most frequent
// elements first, ensuring the reduction is achieved efficiently.
import java.util.*;
class Solution {
// Function to find the minimum set size using a max-heap
public int minSetSize(int[] arr) {
// Step 1: Count the frequency of each element
Map<Integer, Integer> frequency = new HashMap<>();
for (int num : arr) {
frequency.put(num, frequency.getOrDefault(num, 0) + 1);
}
// Step 2: Push all frequencies into a max-heap
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
maxHeap.addAll(frequency.values());
// Step 3: Initialize variables to track progress
int removedCount = 0;
int setSize = 0;
int target = arr.length / 2;
// Step 4: Remove elements with the highest frequency first
while (removedCount < target) {
removedCount += maxHeap.poll();
setSize++;
}
// Step 5: Return the minimum set size
return setSize;
}
}
Python
# Approach: Reduce Array Size to Half
# Language: Python
from collections import Counter
import heapq
class Solution:
# Function to find the minimum set size using a max-heap
def minSetSize(self, arr):
"""
:type arr: List[int]
:rtype: int
"""
# Count the frequency of each element
frequency = Counter(arr)
# Push all frequencies into a max-heap (negative because heapq is a min-heap by default)
maxHeap = [-freq for freq in frequency.values()]
heapq.heapify(maxHeap)
# Variables to track the number of elements removed and the size of the set
removedCount = 0
setSize = 0
target = len(arr) // 2
# Remove elements with the highest frequency first
while removedCount < target:
# Remove the most frequent element
removedCount -= heapq.heappop(maxHeap)
# Increment the size of the set
setSize += 1
# Return the minimum set size
return setSize
Javascript
// Function to minimize set size using a custom max-heap
function minSetSize(arr) {
// Step 1: Count frequency of each element
const frequency = {};
arr.forEach(num => {
frequency[num] = (frequency[num] || 0) + 1;
});
// Step 2: Insert frequencies into a max-heap
const maxHeap = new MaxHeap();
for (const freq of Object.values(frequency)) {
maxHeap.push(freq);
}
// Step 3: Remove elements with the highest frequency first
let removedCount = 0, setSize = 0;
const target = arr.length / 2;
while (removedCount < target) {
removedCount += maxHeap.pop();
setSize++;
}
// Return the minimum set size
return setSize;
}
// Custom Max-Heap implementation
class MaxHeap {
constructor() {
this.heap = [];
}
// Insert a value into the heap
push(value) {
this.heap.push(value);
this._heapifyUp();
}
// Remove and return the maximum value (top of the heap)
pop() {
if (this.heap.length === 1) return this.heap.pop();
const max = this.heap[0];
this.heap[0] = this.heap.pop();
this._heapifyDown();
return max;
}
// Return the maximum value without removing it
peek() {
return this.heap.length > 0 ? this.heap[0] : null;
}
// Get heap size
size() {
return this.heap.length;
}
_heapifyUp() {
let index = this.heap.length - 1;
while (index > 0) {
let parentIndex = Math.floor((index - 1) / 2);
if (this.heap[parentIndex] >= this.heap[index]) break;
[this.heap[parentIndex], this.heap[index]] = [this.heap[index], this.heap[parentIndex]];
index = parentIndex;
}
}
_heapifyDown() {
let index = 0;
let length = this.heap.length;
while (true) {
let leftChild = 2 * index + 1;
let rightChild = 2 * index + 2;
let largest = index;
if (leftChild < length && this.heap[leftChild] > this.heap[largest]) {
largest = leftChild;
}
if (rightChild < length && this.heap[rightChild] > this.heap[largest]) {
largest = rightChild;
}
if (largest === index) break;
[this.heap[index], this.heap[largest]] = [this.heap[largest], this.heap[index]];
index = largest;
}
}
}
Time Complexity: O(n + k log k)
- Frequency Counting :
— We iterate through the array once to count frequencies.
— For each element, insertion into unordered_map takes O(1) average time.
— Total time for frequency counting: O(n), where n is array size - Sorting/Heap Creation:
— For heap approach: Creating max-heap from k frequencies takes O(k log k).
— For sorting approach: Sorting k frequencies takes O(k log k).
— k represents number of unique elements in array. - Final Selection:
— Iterating through sorted frequencies/heap takes O(k) in worst case
— Each heap operation (pop) takes O(log k)
— Total time for selection: O(k log k) - Overall Time Complexity
The total time complexity is O(n + k log k), combining frequency counting and sorting/heap operations.
Space Complexity: O(n + k)
- Auxiliary Space Complexity: O(k)
— Frequency map stores k key-value pairs: O(k)
— Vector/Heap for frequencies stores k elements: O(k)
— Additional variables (removed_count, set_size, target): O(1)
— Total auxiliary space: O(k) - Total Space Complexity
— Input array takes O(n) space
— Auxiliary data structures take O(k) space
— Since k ≤ n (number of unique elements can’t exceed array size)
— Total space complexity: O(n + k) - Note: In worst case, k = n when all elements are unique. In best case, k = 1 when all elements are same.
Learning Tip
Now we have successfully tackled this problem, let’s try these similar problems.
Given an array nums
and an integer p
, we need to form p
pairs such that the maximum absolute difference among all pairs is minimized. Each index can be used only once. Sorting the array helps, and we can use binary search with a greedy approach to efficiently find the optimal minimum maximum difference.
Given an integer n
, we can add or subtract powers of 2
to make it 0
. The goal is to find the minimum number of such operations, which can be efficiently solved using bit manipulation and greedy selection of closest powers of 2.