问题拆分保持重复分隔符

我正在尝试使用 stringi 要在分隔符上拆分的包（可能重复定界符），但保留分隔符。这类似于我问卫星前的这个问题： R拆分分隔符（拆分）保留分隔符（拆分）但是分隔符可以重复。我不认为基地 strsplit 可以处理这种类型的正则表达式。该 stringi 包可以，但我无法弄清楚如何格式正则表达式，如果有重复，它分裂在分隔符上，也不会在字符串的末尾留下空字符串。

基本R解决方案，stringr，stringi等解决方案都受到欢迎。

后来的问题发生是因为我使用贪婪 * 在...上 \\s 但是这个空间并没有很大的空间，所以我只想把它留在：

MWE

text.var <- c("I want to split here.But also||Why?",
   "See! Split at end but no empty.",
   "a third string.  It has two sentences"
)

library(stringi)   
stri_split_regex(text.var, "(?<=([?.!|]{1,10}))\\s*")

＃结果

## [[1]]
## [1] "I want to split here." "But also|"     "|"          "Why?"                 
## [5] ""                     
## 
## [[2]]
## [1] "See!"       "Split at end but no empty." ""                          
## 
## [[3]]
## [1] "a third string."      "It has two sentences"

＃期望的结果

## [[1]]
## [1] "I want to split here." "But also||"                     "Why?"                                  
## 
## [[2]]
## [1] "See!"         "Split at end but no empty."                         
## 
## [[3]]
## [1] "a third string."      "It has two sentences"

7242

2017-10-22 14:19

起源

答案:

运用 strsplit

 strsplit(text.var, "(?<=[.!|])( +|\\b)", perl=TRUE)
 #[[1]]
 #[1] "I want to split here." "But also||"            "Why?"                 

 #[[2]]
 #[1] "See!"                       "Split at end but no empty."

 #[[3]]
 #[1] "a third string."      "It has two sentences"

要么

 library(stringi)
 stri_split_regex(text.var, "(?<=[.!|])( +|\\b)")
 #[[1]]
 #[1] "I want to split here." "But also||"            "Why?"                 

 #[[2]]
 #[1] "See!"                       "Split at end but no empty."

 #[[3]]
 #[1] "a third string."      "It has two sentences"

2017-10-22 15:39

你介意解释一下吗？ *SKIP 和 *F 是，以及他们在正则表达式中扮演什么角色？ - Josh O'Brien

@Josh O'Brien感谢您的评论。其实，*SKIP *F 不需要。我之前在处理代码时使用它，之后没有检查它。 - akrun

@akrun很好的重新工作。 - Tyler Rinker

@Tyle Rinker谢谢。也是 *SKIP *F 部分没有合作 stringi。 - akrun

这两种方法都运作良好，但这一方法更简洁。谢谢你+1 - Tyler Rinker

答案:

运用 strsplit

 strsplit(text.var, "(?<=[.!|])( +|\\b)", perl=TRUE)
 #[[1]]
 #[1] "I want to split here." "But also||"            "Why?"                 

 #[[2]]
 #[1] "See!"                       "Split at end but no empty."

 #[[3]]
 #[1] "a third string."      "It has two sentences"

要么

 library(stringi)
 stri_split_regex(text.var, "(?<=[.!|])( +|\\b)")
 #[[1]]
 #[1] "I want to split here." "But also||"            "Why?"                 

 #[[2]]
 #[1] "See!"                       "Split at end but no empty."

 #[[3]]
 #[1] "a third string."      "It has two sentences"

2017-10-22 15:39

你介意解释一下吗？ *SKIP 和 *F 是，以及他们在正则表达式中扮演什么角色？ - Josh O'Brien

@Josh O'Brien感谢您的评论。其实，*SKIP *F 不需要。我之前在处理代码时使用它，之后没有检查它。 - akrun

@akrun很好的重新工作。 - Tyler Rinker

@Tyle Rinker谢谢。也是 *SKIP *F 部分没有合作 stringi。 - akrun

这两种方法都运作良好，但这一方法更简洁。谢谢你+1 - Tyler Rinker

只需使用找到字符间位置的模式：（1）是之前是其中之一 ?.!|; （2）不是其次是其中之一 ?.!|。坚持下去 \\s* 匹配和吃掉任意数量的连续空格字符，你很高兴。

##                  (look-behind)(look-ahead)(spaces)
strsplit(text.var, "(?<=([?.!|]))(?!([?.!|]))\\s*", perl=TRUE)
# [[1]]
# [1] "I want to split here." "But also||"            "Why?"                 
# 
# [[2]]
# [1] "See!"                       "Split at end but no empty."
# 
# [[3]]
# [1] "a third string."      "It has two sentences"

2017-10-22 17:10

你向我展示了我的正则表达式思维错误，这对学习有很大的帮助。 akrun的方法更简洁一些。 +1 - Tyler Rinker

问题 拆分保持重复分隔符

答案:

答案:

热门问题

问题拆分保持重复分隔符