R将一列拆分为两列,当分隔符为“..”时

[英]R Splitting one column into two, when delimiter is '..'


I have a data frame that looks like this:

我有一个如下所示的数据框:

     X1                     X3
1: thrL               190..255
2: thrA              337..2799
3: thrB             2801..3733
4: thrC             3734..5020
5: yaaX             5234..5530
6: yaaA complement(5683..6459)
7: yaaJ complement(6529..7959)

I am struggling to separate this dataframe into three columns using the .. as a separator. I've tried other solutions on similar posts such as splitstackshape and gsub, however none have really worked because those are for when the delimiters are not wildcard values like periods.

我正在努力将这个数据框分成三列,使用..作为分隔符。我已尝试过类似帖子的其他解决方案,例如splitstackshape和gsub,但是没有一个真正起作用,因为那些是分隔符不是像时期一样的通配符值。

     X1   X2   X3  X4
1: thrL  190  255   f
2: thrA  337 2799   f
3: thrB 2801 3733   f
4: thrC 3734 5020   f
5: yaaX 5234 5530   f
6: yaaA 5683 6459   r
7: yaaJ 6529 7959   r

This is what I'm trying right now

这就是我现在正在尝试的

concat.split.multiple(i, "X3", "\\.\\.")

Any suggestions?

有什么建议么?

Thanks in advance

提前致谢

2 个解决方案

#1


1  

Using dplyr and tidyr:

使用dplyr和tidyr:

library(dplyr)
library(tidyr)
df %>%
   mutate(X4=ifelse(grepl("complement", X3), "f", "r")) %>% 
   mutate(X3=gsub("[a-z()]", "", X3)) %>%
   separate(X3, into=c("X2", "X3"), sep="\\.\\.")

#2


1  

Here is a base R solution. Use fixed=T in your strsplit to split on a literal dot rather than dot as a wildcard. You can use (e.g.) grepl to detect the "complement".

这是一个基础R解决方案。在strsplit中使用fixed = T来分割文字点而不是点作为通配符。您可以使用(例如)grepl来检测“补码”。

e.g.

例如

# reproducible example
set.seed(1)
mydf <- data.frame(X1=letters[1:7], X3=paste0(sample(100, 7), '..', sample(100, 7)), stringsAsFactors=F)
mydf$X3[6:7] <- paste0('complement(', mydf$X3[6:7], ')')

#   X1                 X3
# 1  a             27..67
# 2  b             37..63
# 3  c              57..7
# 4  d             89..20
# 5  e             20..17
# 6  f complement(86..66)
# 7  g complement(97..37)

Detecting complement(..):

检测补体(..):

mydf$X4 <- ifelse(grepl('complement\\(', mydf$X3), 'r', 'f')

Now extracting just the "number..number" bit and splitting:

现在只提取“number..number”位并拆分:

# extract just "number..number", ignoring all else.
tmp <- gsub('^.*?([0-9]+\\.\\.[0-9]+).*$', '\\1', as.character(mydf$X3))
# split. use fixed=T
tmp <- strsplit(tmp, '..', fixed=T)
# extract the splits, convert to numeric
mydf$X2 <- as.numeric(vapply(tmp, '[[', i=1, 'template'))
mydf$X3 <- as.numeric(vapply(tmp, '[[', i=2, 'template'))
# columns not in order, but you know how to fix that.
#   X1 X3 X4 X2
# 1  a 67  f 27
# 2  b 63  f 37
# 3  c  7  f 57
# 4  d 20  f 89
# 5  e 17  f 20
# 6  f 66  r 86
# 7  g 37  r 97
智能推荐

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2015/07/21/bae44c2b2a31b6796cb36acc9c95245.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告