问题如何编写SQL查询，为特定条件选择不同的对值？

我在解决以下问题时遇到问题：

对于具有特定分数的配对值，如何将它们分组，只返回具有最佳分数的不同配对值？

例如，假设我有一个包含以下行值的表：

(t1,p1,65)
(t1,p2,60)
(t1,p3,20)
(t2,p1,60)
(t2,p2,59)
(t2,p3,15)

前两列表示配对值，第三列表示配对得分。最佳得分为 (t1,p1,65)。由于现在使用了t1和p1，我想将它们排除在进一步分析之外。

下一个最好的分数是 (t2,p2,59)。即使 (t1,p2) 得分为60，我想排除它因为“t1”已被使用。（t2,p1) 也有60分，但由于p1也已经被使用，这对被排除在外。

这导致不同的对分数值：

(t1,p1,65)
(t2,p2,59)

有没有办法只用查询生成这个结果？我试图想办法对结果进行分组和分区，但由于必须根据得分等级对已使用的值进行一些计算，我发现这很难处理。

编辑：

要生成数据：

with t(t, p, score) as (
    (values ('t1','p1',65),
           ('t1','p2',60),
           ('t1','p3',20),
           ('t2','p1',60),
           ('t2','p2',59),
           ('t2','p3',15)
     ))
select t.* from t;

6539

2017-11-01 17:17

起源

这是一个有趣的问题:) - borowis

为什么你有 (t2, p1, 60) 结果，如果你说我们必须排除 p1？ - borowis

您需要使用递归CTE解决此问题。 - Gordon Linoff

你的例子不包括在内 t1 但不是 p1。是对的吗？如果是这样，那么您可能需要澄清几个例子。 - Philip Couling

在玩完这个之后，它似乎比我想象的更难。 Postgres（和大多数数据库）对递归CTE有一定的限制，排除了表达这种情况的自然方式。这个问题有一些让我怀疑它不能用递归CTE处理。它与以下事实有关：有两个不同的步骤：找到剩余的最大分数，然后从考虑中删除节点。 - Gordon Linoff

答案:

使用存储的函数相对简单：

--drop function if exists f();
--drop table if exists t;
create table t(x text,y text, z int);
insert into t values
  ('t1','p1',65),
  ('t1','p2',60),
  ('t1','p3',20),
  ('t2','p1',60),
  ('t2','p2',59),
  ('t2','p3',15)/*,
  ('t3','p1',20),
  ('t3','p2',60),
  ('t3','p3',40)*/;

create function f() returns setof t immutable language plpgsql as $$
declare
  ax text[];
  ay text[];
  r t;
begin
  ax := '{}'; ay := '{}';
  loop
    select * into r
      from t
      where x <> all(ax) and y <> all(ay)
      order by z desc, x, y limit 1;
    exit when not found;
    ax := ax || r.x; ay := ay || r.y;
    return next r;
  end loop;
end $$;

select * from f();
╔════╤════╤════╗
║ x  │ y  │ z  ║
╠════╪════╪════╣
║ t1 │ p1 │ 65 ║
║ t2 │ p2 │ 59 ║
╚════╧════╧════╝

但是，如果取消注释第三组值，结果将会有所不同：

╔════╤════╤════╗
║ x  │ y  │ z  ║
╠════╪════╪════╣
║ t1 │ p1 │ 65 ║
║ t3 │ p2 │ 60 ║
║ t2 │ p3 │ 15 ║
╚════╧════╧════╝

Upd：以及在相同测试数据上使用递归CTE的等效项：

with recursive r as (
  (select x, y, z, array[x] as ax, array[y] as ay from t order by z desc, x, y limit 1)
  union all
  (select t.x, t.y, t.z, r.ax || t.x, r.ay || t.y from t, r 
  where not (t.x = any(r.ax) or t.y = any(r.ay)) 
  order by t.z desc, t.x, t.y limit 1))
select * from r;

2017-11-01 19:25

美丽。谢谢！ - Stephen Tableau

我实际上在一个有几十万行的数据集上尝试了这个，但由于递归，它非常慢。关于如何加快效率的任何想法？ - Stephen Tableau

@StephenTableau如果不了解实际的数据结构，选择性，索引等，很难说什么。尝试创建涵盖的索引 where 和 order by 零件（对于我的例子，它将是 ...on t(x,y) 和 ...on t(z desc)）。是个 Gordon Linoff的解决方案更快？可能这是更有效的解决方案。 - Abelisto

这个问题显然一直困扰着我。以下似乎实现了您的逻辑，将访问值的数组保存在行中：

with recursive t(t, p, score) as (
    (values ('t1','p1',65),
           ('t1','p2',60),
           ('t1','p3',20),
           ('t2','p1',60),
           ('t2','p2',59),
           ('t2','p3',15)
     )),
     cte(t, p, score, cnt, lastt, lastp, ts, ps) as (
        (select t.*, count(*) over ()::int, tt.t, tt.p, ARRAY[tt.t], ARRAY[tt.p]
         from t cross join
              (select t.* from t order by score desc limit 1) tt
        ) 
        union all
        select t, p, score, 
               sum(case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then 1 else 0 end) over ()::int,
               first_value(t) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last),
               first_value(p) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last),
               ts || first_value(t) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last),
               ps || first_value(p) over (order by case when not (ts @> ARRAY[t] or ps @> ARRAY[p]) then score end desc nulls last)
        from cte 
        where cnt > 0
       )
 select *
 from cte
 where lastt = t and lastp = p and cnt > 0;

2017-11-01 19:14

很好用！非常感谢！ - Stephen Tableau

t1被使用，所以你已经排除了（t1，p2），但也使用了p1，你没有排除它。对我来说，它看起来就像是第一列的灌浆。

select t1.c1, t2.c2, t1.s 
  from table1 t2 
  inner join (select c1, max(score) s from table1 group by t1) t1 
    on (t1.s=t2.score and t1.c1=t2.c1);

哪里 table1 是你的桌子的名字和 c1 是第一个， c2 第二和 score 第三栏;

2017-11-01 17:32

问题是你可能有t1，p1和p1，t1元组具有相同的分数，你不应该列出它们2次 - borowis

我的例子结果是一个错误。我更新了它以排除p2。 - Stephen Tableau

@StephenTableau在那种情况下，我的答案是不正确的。 - Kacper

如果第一对值和第二对值是不同的列（例如，X和Y），则可以按X分组并执行MAX（得分）作为聚合函数，以获得以X开头的元组的最大分数。

进一步的步骤取决于您的数据，因为如果每个元组都被反转，您可能仍然会得到不需要的重复项。因此，要排除这样的反转元组，您可以先进行自联接。

2017-11-01 17:30

问题 如何编写SQL查询，为特定条件选择不同的对值？

答案:

热门问题

问题如何编写SQL查询，为特定条件选择不同的对值？