问题 组关闭数字


我有一个包含2列整数的表。第一列表示起始索引,第二列表示结束索引。

START END
1     8
9     13
14    20
20    25
30    42
42    49
60    67

简单到目前为止。我想要做的是将所有记录组合在一起:

START END
1     25
30    49
60    67

记录可以跟在上一个结束索引的相同索引处开始,或者以1的边距开始:

START END
1     10
10    20

START END
1     10
11    20

将导致

START END
1     20

我正在使用SQL Server 2008 R2。

任何帮助都会很棒


2337
2017-08-31 14:27


起源

我认为这是一个有趣的问题,但你真的尝试过这样做吗?您尝试过的查询? - jadarnel27
你可能有任何重叠的对,如 1,8 和 3,15? - Martin Smith
Tx为您的评论马丁..没有重叠对。 Jadarnel27 - 我使用sql游标解决了这个问题,但这个解决方案根本没有效率,我正在寻找更优雅,更好的解决方案。 - Liran Ben Yehuda


答案:


这适用于您的示例,如果它不适用于其他数据,请告诉我

create table #Range 
(
  [Start] INT,
  [End] INT
)

insert into #Range ([Start], [End]) Values (1, 8)
insert into #Range ([Start], [End]) Values (9, 13)
insert into #Range ([Start], [End]) Values (14, 20)
insert into #Range ([Start], [End]) Values (20, 25)
insert into #Range ([Start], [End]) Values (30, 42)
insert into #Range ([Start], [End]) Values (42, 49)
insert into #Range ([Start], [End]) Values (60, 67)



;with RangeTable as
(select
    t1.[Start],
    t1.[End],
    row_number() over (order by t1.[Start]) as [Index]
from
    #Range t1
where t1.Start not in (select 
                      [End] 
               from
                  #Range
                  Union
               select 
                  [End] + 1
               from
                  #Range
               )
)
select 
    t1.[Start],
    case 
   when t2.[Start] is null then
        (select max([End])
                     from #Range)
       else
        (select max([End])
                     from #Range
                     where t2.[Start] > [End])
end as [End]    
from 
    RangeTable t1
left join 
    RangeTable t2
on
    t1.[Index] = t2.[Index]-1 

drop table #Range;

3
2017-08-31 15:45



嗨Aducci,您的解决方案也适用于比示例表中的数据更大的其他数据。 - Liran Ben Yehuda
@Liran Ben Yehuda - 你是否有理由将其标记为答案? - Aducci
Tx为您提供支持。我只是寻找最好的解决方案,我必须先做一些性能测试。 - Liran Ben Yehuda


编辑包括另一个我认为更可靠的版本,并且还适用于重叠范围

CREATE TABLE #data (start_range INT, end_range INT)
INSERT INTO #data VALUES (1,8) 
INSERT INTO #data VALUES (2,15) 
INSERT INTO #data VALUES (9,13)
INSERT INTO #data VALUES (14,20) 
INSERT INTO #data VALUES (13,26) 
INSERT INTO #data VALUES (12,21) 
INSERT INTO #data VALUES (9,25) 
INSERT INTO #data VALUES (20,25) 
INSERT INTO #data VALUES (30,42) 
INSERT INTO #data VALUES (42,49) 
INSERT INTO #data VALUES (60,67)   

;with ranges as
(
SELECT start_range as level
,end_range as end_range
,row_number() OVER (PARTITION BY (SELECT NULL) ORDER BY start_range) as row
FROM #data
UNION ALL
SELECT
level + 1 as level
,end_range as end_range
,row
From ranges 
WHERE level < end_range
)
,ranges2 AS
(
SELECT DISTINCT 
level
FROM ranges
)
,ranges3 AS
(
SELECT 
level
,row_number() OVER (ORDER BY level) - level as grouping_group
from ranges2
)
SELECT 
MIN(level) as start_number
,MAX(level) as end_number
FROM ranges3
GROUP BY grouping_group
ORDER BY start_number ASC

我认为这应该有效 - 虽然可能不会在大型套装上特别有效......

CREATE TABLE #data (start_range INT, end_range INT)
INSERT INTO #data VALUES (1,8)
INSERT INTO #data VALUES (2,15)
INSERT INTO #data VALUES (9,13)
INSERT INTO #data VALUES (14,20)
INSERT INTO #data VALUES (21,25)
INSERT INTO #data VALUES (30,42)
INSERT INTO #data VALUES (42,49)
INSERT INTO #data VALUES (60,67)


;with overlaps as
(
select * 
,end_range - start_range as range
,row_number() OVER (PARTITION BY (SELECT NULL) ORDER BY start_range ASC) as line_number
from #data
)
,overlaps2 AS
(
SELECT
O1.start_range
,O1.end_range
,O1.line_number
,O1.range
,O2.start_range as next_range
,CASE WHEN O2.start_range - O1.end_range < 2 THEN 1 ELSE 0 END as overlap
,O1.line_number - DENSE_RANK() OVER (PARTITION BY (CASE WHEN O2.start_range - O1.end_range < 2 THEN 1 ELSE 0 END) ORDER BY O1.line_number ASC) as overlap_group
FROM overlaps O1
LEFT OUTER JOIN overlaps O2 on O2.line_number = O1.line_number + 1
)
SELECT 
MIN(start_range) as range_start
,MAX(end_range) as range_end
,MAX(end_range) - MIN(start_range) as range_span
FROM overlaps2
GROUP BY overlap_group

4
2017-08-31 14:47



+1在这里测试过并且有效。包括好的东西 CREATE 和 INSERT 声明。 - Adrian Carneiro
嗨Davin,你的第二个解决方案更可靠,因为第一个解决方案效果不佳。实际上,原始表不包含任何重叠。如果你有任何想法如何以更有效的方式重新解决问题,我想知道。 Tx为你的帮助:) - Liran Ben Yehuda
@Liran Ben Yehuda - 在您的原始问题中,您希望1-10,11-20和1-10,10-20的示例给出1-20的范围 - 因此存在重叠,因为第二种情况10出现两次,这是否意味着在您的实际表中每个开始和结束范围值是唯一的? - Dibstar
@Devin - 很抱歉这个混乱。你是对的,我的意思是重叠范围不存在,例如你的例子中的第二行(2,15)(也是[9,25]行值) - Liran Ben Yehuda
@Liran,在这种情况下,我认为这应该仍然相对有效,因为CTE只会填补存在的范围的间隙,然后看看哪些范围在前一行的1之内 - 你有多少数据范围的数据有你的实际表吗? - Dibstar


你可以用一个 数字表 解决这个问题。基本上,您首先扩展范围,然后组合后续项目。

这是一个实现:

WITH data (START, [END]) AS (
  SELECT  1,  8 UNION ALL
  SELECT  9, 13 UNION ALL
  SELECT 14, 20 UNION ALL
  SELECT 20, 25 UNION ALL
  SELECT 30, 42 UNION ALL
  SELECT 42, 49 UNION ALL
  SELECT 60, 67
),
expanded AS (
  SELECT DISTINCT
    N = d.START + v.number
  FROM data d
    INNER JOIN master..spt_values v ON v.number BETWEEN 0 AND d.[END] - d.START
  WHERE v.type = 'P'
),
marked AS (
  SELECT
    N,
    SeqID = N - ROW_NUMBER() OVER (ORDER BY N)
  FROM expanded
)
SELECT
  START = MIN(N),
  [END] = MAX(N)
FROM marked
GROUP BY SeqID

此解决方案使用 master..spt_values 作为数字表,用于扩展初始范围。但如果(全部或部分)这些范围可能超过2048(后续)值,那么您应该定义和使用 你自己 数字表。


3
2017-09-01 17:09