I have a data frame that looks like this:
我有一个如下所示的数据框:
X1 X3
1: thrL 190..255
2: thrA 337..2799
3: thrB 2801..3733
4: thrC 3734..5020
5: yaaX 5234..5530
6: yaaA complement(5683..6459)
7: yaaJ complement(6529..7959)
I am struggling to separate this dataframe into three columns using the ..
as a separator. I've tried other solutions on similar posts such as splitstackshape
and gsub
, however none have really worked because those are for when the delimiters are not wildcard values like periods.
我正在努力将这个数据框分成三列,使用..作为分隔符。我已尝试过类似帖子的其他解决方案,例如splitstackshape和gsub,但是没有一个真正起作用,因为那些是分隔符不是像时期一样的通配符值。
X1 X2 X3 X4
1: thrL 190 255 f
2: thrA 337 2799 f
3: thrB 2801 3733 f
4: thrC 3734 5020 f
5: yaaX 5234 5530 f
6: yaaA 5683 6459 r
7: yaaJ 6529 7959 r
This is what I'm trying right now
这就是我现在正在尝试的
concat.split.multiple(i, "X3", "\\.\\.")
Any suggestions?
有什么建议么?
Thanks in advance
提前致谢
1
Using dplyr
and tidyr
:
使用dplyr和tidyr:
library(dplyr)
library(tidyr)
df %>%
mutate(X4=ifelse(grepl("complement", X3), "f", "r")) %>%
mutate(X3=gsub("[a-z()]", "", X3)) %>%
separate(X3, into=c("X2", "X3"), sep="\\.\\.")
1
Here is a base R solution. Use fixed=T
in your strsplit
to split on a literal dot rather than dot as a wildcard. You can use (e.g.) grepl
to detect the "complement".
这是一个基础R解决方案。在strsplit中使用fixed = T来分割文字点而不是点作为通配符。您可以使用(例如)grepl来检测“补码”。
e.g.
例如
# reproducible example
set.seed(1)
mydf <- data.frame(X1=letters[1:7], X3=paste0(sample(100, 7), '..', sample(100, 7)), stringsAsFactors=F)
mydf$X3[6:7] <- paste0('complement(', mydf$X3[6:7], ')')
# X1 X3
# 1 a 27..67
# 2 b 37..63
# 3 c 57..7
# 4 d 89..20
# 5 e 20..17
# 6 f complement(86..66)
# 7 g complement(97..37)
Detecting complement(..)
:
检测补体(..):
mydf$X4 <- ifelse(grepl('complement\\(', mydf$X3), 'r', 'f')
Now extracting just the "number..number" bit and splitting:
现在只提取“number..number”位并拆分:
# extract just "number..number", ignoring all else.
tmp <- gsub('^.*?([0-9]+\\.\\.[0-9]+).*$', '\\1', as.character(mydf$X3))
# split. use fixed=T
tmp <- strsplit(tmp, '..', fixed=T)
# extract the splits, convert to numeric
mydf$X2 <- as.numeric(vapply(tmp, '[[', i=1, 'template'))
mydf$X3 <- as.numeric(vapply(tmp, '[[', i=2, 'template'))
# columns not in order, but you know how to fix that.
# X1 X3 X4 X2
# 1 a 67 f 27
# 2 b 63 f 37
# 3 c 7 f 57
# 4 d 20 f 89
# 5 e 17 f 20
# 6 f 66 r 86
# 7 g 37 r 97
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2015/07/21/bae44c2b2a31b6796cb36acc9c95245.html。