我正在尝试编写一个函数来计算模式的连续实例数。举个例子,我喜欢这个字符串
string<-"A>A>A>B>C>C>C>A>A"
被转化为
"3 A > 1 B > 3 C > 2 A"
我有一个计算每个字符串实例的函数,见下文。但它没有达到我想要的排序效果。任何想法或指针?
谢谢,
[R
现有功能:
fnc_gen_PathName <- function(string) {
p <- strsplit(as.character(string), ";")
p1 <- lapply(p, table)
p2 <- lapply(p1, function(x) {
sapply(1:length(x), function(i) {
if(x[i] == 25){
paste0(x[i], "+ ", names(x)[i])
} else{
paste0(x[i], "x ", names(x)[i])
}
})
})
p3 <- lapply(p2, function(x) paste(x, collapse = "; "))
p3 <- do.call(rbind, p3)
return(p3)
}
正如@MrFlick评论的那样,您可以尝试使用以下内容 rle
和 strsplit
with(rle(strsplit(string, ">")[[1]]), paste(lengths, values, collapse = " > "))
## [1] "3 A > 1 B > 3 C > 2 A"
这里有两个dplyr解决方案:一个是常规解决方案,一个是rle解决方案。优点是:可以输入多个字符串作为向量,在(ugh)重新编译之前构建一个整洁的中间数据集。
library(dplyr)
library(tidyr)
library(stringi)
strings = "A>A>A>B>C>C>C>A>A"
data_frame(string = strings) %>%
mutate(string_split =
string %>%
stri_split_fixed(">")) %>%
unnest(string_split) %>%
mutate(ID =
string_split %>%
lag %>%
`!=`(string_split) %>%
plyr::mapvalues(NA, TRUE) %>%
cumsum) %>%
count(string, ID, string_split) %>%
group_by(string) %>%
summarize(new_string =
paste(n,
string_split,
collapse = " > ") )
data_frame(string = strings) %>%
group_by(string) %>%
do(.$string %>%
first %>%
stri_split_fixed(">") %>%
first %>%
rle %>%
unclass %>%
as.data.frame) %>%
summarize(new_string =
paste(lengths, values, collapse = " > "))