问题如何在O（nlogn）中找到总和最接近零或某个值t的子阵列

实际上这是编程珍珠第2版第8章的问题＃10。它提出了两个问题：给定一个整数数组A []（正数和非正数），你怎么能找到一个A []的连续子数组，其和最接近0？或者最接近某个值t？

我可以想办法解决最接近0的问题。计算前缀和数组S []，其中S [i] = A [0] + A [1] + ... + A [i]。然后根据元素值和保留的原始索引信息对此S进行排序，以找到最接近0的子阵列和，只需迭代S数组并执行两个相邻值的差异并更新最小绝对差值。

问题是，解决第二个问题的最佳方法是什么？最接近某个值t？任何人都可以提供代码或至少一个算法吗？（如果有人有最接近零问题的解决方案，也欢迎回答）

7191

2018-05-05 20:39

起源

我有一个排序数组，条目颜色为红色和黑色。如何找到最接近的红黑对？这如何解决您的问题？ - David Eisenstat

在这种情况下，“子阵列”是否表示连续的数组元素，还是可以留下漏洞？ - MvG

@MvG：我没有Bentley的副本，但我很确定他的意思是连续的元素。 - Fred Foo

@DavidEisenstat我没有得到提示......排序的数组不仅包含2个不同的值，那么它有何帮助？ - Henley Chiu

@DavidEisenstat更详细的描述表示赞赏。 - zoujyjs

答案:

要解决此问题，您可以自己构建一个区间树，在O（nlogn）中，或者平衡的二元搜索树，或甚至从STL映射中获益。

以下是使用STL地图 LOWER_BOUND（）。

#include <map>
#include <iostream>
#include <algorithm>
using namespace std;

int A[] = {10,20,30,30,20,10,10,20};

// return (i, j) s.t. A[i] + ... + A[j] is nearest to value c
pair<int, int> nearest_to_c(int c, int n, int A[]) {
    map<int, int> bst;
    bst[0] = -1;
    // barriers
    bst[-int(1e9)] = -2;
    bst[int(1e9)] = n;

    int sum = 0, start, end, ret = c;
    for (int i=0; i<n; ++i) {
            sum += A[i];
            // it->first >= sum-c, and with the minimal value in bst
            map<int, int>::iterator it = bst.lower_bound(sum - c);
            int tmp = -(sum - c - it->first);
            if (tmp < ret) {
                    ret = tmp;
                    start = it->second + 1;
                    end = i;
            }

            --it;
            // it->first < sum-c, and with the maximal value in bst
            tmp = sum - c - it->first;
            if (tmp < ret) {
                    ret = tmp;
                    start = it->second + 1;
                    end = i;
            }

            bst[sum] = i;
    }
    return make_pair(start, end);
}

// demo
int main() {
    int c;
    cin >> c;
    pair<int, int> ans = nearest_to_c(c, 8, A);

    cout << ans.first << ' ' << ans.second << endl;
    return 0;
}

2017-07-29 10:14

这是恕我直言的正确解决方案。它需要更多的支持。基本上它是遍历数组，保留前缀和的排序历史记录，以及当前的 sum，找到最接近历史的最佳候选人 sum - t。它是O（NlogN）并且一次通过。 - OnurC

是的，这是正确的解决方案。 - Jingguo Yao

对于c = 0，演示为我返回随机数 - BlueTrin

为什么我们也不考虑最接近的候选人 (sum + c)？ - damluar

你可以调整你的方法。假设你有一个数组 S 正如您所写的那样，前缀sums已按照和值的递增顺序排序。关键概念不仅要检查连续的前缀和，而是使用两个指针来指示数组中的两个位置 S。写在（略微pythonic）伪代码：

left = 0                 # Initialize window of length 0 ...
right = 0                # ... at the beginning of the array
best = ∞                 # Keep track of best solution so far
while right < length(S): # Iterate until window reaches the end of the array
  diff = S[right] - S[left]
  if diff < t:           # Window is getting too small
    if t - diff < best:  # We have a new best subarray
      best = t - diff
      # remember left and right as well
    right = right + 1    # Make window bigger
  else:                  # Window getting too big
    if diff - t < best   # We have a new best subarray
      best = diff - t
      # remember left and right as well
    left = left + 1      # Make window smaller

复杂性受到排序的约束。上述搜索最多需要2个ñ= O（ñ）循环的迭代，每个迭代都有一个由常量限制的计算时间。请注意，上面的代码被认为是积极的 t。

该代码是为了积极的元素而设想的 S，积极的 t。如果出现任何负整数，您最终可能会遇到原始索引的情况 right 小于 left。所以你最终会得到一个子序列和 -t。你可以在中检查这个条件 if … < best 检查，但如果你只是在那里压制这种情况，我相信你威力遗漏了一些相关案件。底线是：采取这个想法，考虑一下，但你必须适应负数。

注意： 一世认为这与鲍里斯斯特兰德杰夫想要表达的一般观点相同他的解决方案。但是，我发现这个解决方案有点难以阅读并且难以理解，因此我提供了自己的解决方案。

2018-05-06 15:18

我认为这是不正确的：首先，正如你所提到的，它不能处理-ve值。对于所有+ ve值，您无需预先计算和排序前缀和。正值子问题可以用您的算法求解，修改后保持运算之和 left 和 right 并将其与...进行比较 t。 - OnurC

@OnurC：对于正数组元素，确实如此，没有排序前缀和的方法也可以。我相信我的方法可能更容易扩展，以便它也能处理负值。但这更像是一种直觉，我还没有想到这一点。在任何情况下，虽然我的代码对于肯定的情况可能是不必要的，但我认为它不正确。你做？如果是这样，你能举一个例子吗？ - MvG

你对0案的解决方案对我来说似乎没问题。以下是我对第二种情况的解决方案：

您再次计算前缀总和并排序。
您初始化为索引 start 到0（排序前缀数组中的第一个索引） end 至 last （前缀数组的最后一个索引）
你开始迭代了 start 0 ...last 并为每个你找到相应的 end - 前缀sum的最后一个索引 prefix[start] + prefix[end] > t。当你找到它 end 最好的解决方案 start 或者是 prefix[start] + prefix[end] 要么 prefix[start] + prefix[end - 1] （后者只有在 end > 0）
最重要的是你不要搜索 end 为每个人 start 从头开始 - prefix[start] 迭代所有可能的值时，值会增加 start，这意味着在每次迭代中，您只对值<=之前的值感兴趣 end。
你可以停止迭代 start > end
你可以充分利用所有人获得的所有价值 start 位置。

很容易证明这会给你带来复杂性 O(n logn) 对于整个算法。

2018-05-06 09:07

由于整体复杂性 O(n*log(n)) 无论如何，你也可以使用二进制搜索来查找 end 对于特定值 start。线性算法可能更容易编码，但:) - Niklas B.

你可以解释一下这个部分：“当你找到那个结束时，开始的最佳解决方案是前缀[start] +前缀[end]或前缀[start] +前缀[end - 1]”说排序的前缀和是1， 2,50,100,1000,10000,100000和t是2.我们从前缀[0] +前缀[6]开始，这是1 + 1000000 = 100001.最好的解决方案，你告诉我的是这个，或者1 + 10000？实际上不是1 + 2的最佳解决方案吗？ - Henley Chiu

好的，我理解上面的情况除了我不认为如果原始数组有负＃的话它实际上是有效的。如果t！= 0，我也认为你的解决方案失败了，因为你必须考虑原始数组中2前缀和的结束位置。因为如果t = 100，那么200-100确实是100，但100-200与100相差很远。如果t = 0则无关紧要因为+ n和-n与0的距离相等。 - Henley Chiu

举一个具体的例子，假设原始数组是：75,25，-75，-25,1。前2个元素的前缀和是100，所有元素的前缀和是1.假设t = 100.1，你选择1 ，和100作为最佳前缀和对。 1 - 100 = -99，与其他候选人相差不到100。 - Henley Chiu

我的解决方案与您的解决方案类似。所以我将HashMap映射到每个已排序的前缀和到它所代表的范围的索引。然后在比较2个前缀和时，首先查看索引。所以你做了PrefixSum [i] - PrefixSum [j]，其中我的前缀sum涵盖了比j更大的范围。 - Henley Chiu

我偶然发现了这个问题。虽然已经有一段时间了，但我发布了它。 O（nlogn）时间，O（n）空间算法。这是运行Java代码。希望这有助于人们。

import java.util.*;

public class FindSubarrayClosestToZero {

    void findSubarrayClosestToZero(int[] A) {
        int curSum = 0;
        List<Pair> list = new ArrayList<Pair>();

        // 1. create prefix array: curSum array
        for(int i = 0; i < A.length; i++) {
            curSum += A[i];
            Pair pair = new Pair(curSum, i);
            list.add(pair);
        }

        // 2. sort the prefix array by value
        Collections.sort(list, valueComparator);

        // printPairList(list);
        System.out.println();


        // 3. compute pair-wise value diff: Triple< diff, i, i+1>
        List<Triple> tList = new ArrayList<Triple>();
        for(int i=0; i < A.length-1; i++) {
            Pair p1 = list.get(i);
            Pair p2 = list.get(i+1);
            int valueDiff = p2.value - p1.value;

            Triple Triple = new Triple(valueDiff, p1.index, p2.index);          
            tList.add(Triple);
        }       

        // printTripleList(tList);
        System.out.println();

        // 4. Sort by min diff
        Collections.sort(tList, valueDiffComparator);
        // printTripleList(tList);

        Triple res = tList.get(0);

        int startIndex = Math.min(res.index1 + 1, res.index2);
        int endIndex = Math.max(res.index1 + 1, res.index2);

        System.out.println("\n\nThe subarray whose sum is closest to 0 is: ");
        for(int i= startIndex; i<=endIndex; i++) {
            System.out.print(" " + A[i]);
        }
    }

    class Pair {
        int value;
        int index;

        public Pair(int value, int index) {
            this.value = value;
            this.index = index;
        }
    }

    class Triple {
        int valueDiff;
        int index1;
        int index2;

        public Triple(int valueDiff, int index1, int index2) {
            this.valueDiff = valueDiff;
            this.index1 = index1;
            this.index2 = index2;
        }
    }

    public static Comparator<Pair> valueComparator = new Comparator<Pair>() {
        public int compare(Pair p1, Pair p2) {
            return p1.value - p2.value;
        }
    };      

    public static Comparator<Triple> valueDiffComparator = new Comparator<Triple>() {
        public int compare(Triple t1, Triple t2) {
            return t1.valueDiff - t2.valueDiff;
        }
    };

    void printPairList(List<Pair> list) {
        for(Pair pair : list) {
            System.out.println("<" + pair.value + " : " + pair.index + ">");
        }
    }

    void printTripleList(List<Triple> list) {
        for(Triple t : list) {
            System.out.println("<" + t.valueDiff + " : " + t.index1 + " , " + t.index2 + ">");
        }
    }


    public static void main(String[] args) {
        int A1[] = {8, -3, 2, 1, -4, 10, -5};       // -3, 2, 1
        int A2[] = {-3, 2, 4, -6, -8, 10, 11};      // 2, 4, 6
        int A3[] = {10, -2, -7};                                // 10, -2, -7

        FindSubarrayClosestToZero f = new FindSubarrayClosestToZero();
        f.findSubarrayClosestToZero(A1);
        f.findSubarrayClosestToZero(A2);
        f.findSubarrayClosestToZero(A3);
    }
}

2017-11-30 23:42

解决时间复杂度： O(NlogN)
解空间复杂度： O(N)

[注意这个问题在O（N）中无法解决，正如一些人声称的那样]

算法：-

计算累积数组（这里，cum[]）给定数组[第10行]
对累积数组进行排序[第11行]
答案是最小的 C[i]-C[i+1] ，$ \ forall $i∈[1，n-1]（基于1的索引）[第12行]

C ++代码： -

#include<bits/stdc++.h>
#define M 1000010
#define REP(i,n) for (int i=1;i<=n;i++) 
using namespace std;
typedef long long ll;
ll a[M],n,cum[M],ans=numeric_limits<ll>::max(); //cum->cumulative array
int main() {
    ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
    cin>>n; REP(i,n) cin>>a[i],cum[i]=cum[i-1]+a[i];
    sort(cum+1,cum+n+1);
    REP(i,n-1) ans=min(ans,cum[i+1]-cum[i]);
    cout<<ans; //min +ve difference from 0 we can get
}

2018-05-17 23:13

在更多地思考这个问题后，我发现@ frankyym的解决方案是正确的解决方案。我对原始解决方案进行了一些改进，这是我的代码：

#include <map>
#include <stdio.h>
#include <algorithm>
#include <limits.h>

using namespace std;

#define IDX_LOW_BOUND -2

// Return [i..j] range of A
pair<int, int> nearest_to_c(int A[], int n, int t) {
  map<int, int> bst;
  int presum, subsum, closest, i, j, start, end;
  bool unset;
  map<int, int>::iterator it;

  bst[0] = -1;
  // Barriers. Assume that no prefix sum is equal to INT_MAX or INT_MIN.
  bst[INT_MIN] = IDX_LOW_BOUND;
  bst[INT_MAX] = n;
  unset = true;
  // This initial value is always overwritten afterwards.
  closest = 0; 
  presum = 0;
  for (i = 0; i < n; ++i) {
    presum += A[i];
    for (it = bst.lower_bound(presum - t), j = 0; j < 2; --it, j++) {
      if (it->first == INT_MAX || it->first == INT_MIN) 
        continue;
      subsum = presum - it->first;
      if (unset || abs(closest - t) > abs(subsum - t)) {
        closest = subsum;
        start = it->second + 1;
        end = i;
        if (closest - t == 0)
          goto ret;
        unset = false;
      }
    }
    bst[presum] = i;
  }
ret:
  return make_pair(start, end);
}

int main() {
  int A[] = {10, 20, 30, 30, 20, 10, 10, 20};
  int t;
  scanf("%d", &t);
  pair<int, int> ans = nearest_to_c(A, 8, t);
  printf("[%d:%d]\n", ans.first, ans.second);
  return 0;
}

2017-12-24 07:11

作为旁注：我同意其他线程提供的算法。最近我还有另一种算法。

组成A []的另一个副本，即B []。在B []内，每个元素是A [i] -t / n，这意味着B [0] = A [0] -t / n，B [1] = A [1] -t / n ... B [N-1] = A [N-1] -t / N。然后第二个问题实际上转换为第一个问题，一旦找到最接近0的B []的最小子阵列，则同时找到最接近t的A []的子阵列。（如果t不能被n整除，那有点棘手，但是，必须选择合适的精度。运行时也是O（n））

2017-12-26 23:49

问题 如何在O（nlogn）中找到总和最接近零或某个值t的子阵列

答案:

热门问题

问题如何在O（nlogn）中找到总和最接近零或某个值t的子阵列