增加数组/系列中的连续正组

[英]Incrementing contiguous positive groups in array/Series


Suppose I have a Pandas series of boolean values like so.

假设我有一个像这样的Pandas系列布尔值。

vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)

>>> vals
0     False
1     False
2     False
3      True
4      True
5      True
6      True
7     False
8     False
9      True
10     True
11    False
12     True
13     True
14     True
dtype: bool

I want to turn this boolean Series into a series where each group of 1's is properly enumerated, like so

我想把这个布尔系列变成一个系列,其中每个1组都被正确枚举,就像这样

0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     0
8     0
9     2
10    2
11    0
12    3
13    3
14    3

How can I do this efficiently?

我怎样才能有效地做到这一点?


I have been able to do so manually, looping over the series on a Python level and incrementing, but this is obviously slow. I'm looking for a vectorized solution - I saw this answer from unutbu concerning splitting on increasing groups in NumPy, and was trying to get that to work with a cumsum of some sort but have been unsuccessful so far.

我已经能够手动完成,在Python级别上循环遍历系列并递增,但这显然很慢。我正在寻找一个矢量化的解决方案 - 我从unutbu看到了这个答案,关于在NumPy中增加群体的分裂,并试图让它与某种类型的cumsum一起使用但到目前为止还没有成功。

3 个解决方案

#1


3  

You can try this:

你可以试试这个:

vals.astype(int).diff().fillna(vals.iloc[0]).eq(1).cumsum().where(vals, 0)

#0     0
#1     0
#2     0
#3     1
#4     1
#5     1
#6     1
#7     0
#8     0
#9     2
#10    2
#11    0
#12    3
#13    3
#14    3
#dtype: int64

#2


3  

Here's a NumPy approach -

这是一种NumPy方法 -

def island_same_label(vals):

    # Get array for faster processing with NumPy tools, ufuncs
    a = vals.values

    # Initialize output array
    out = np.zeros(a.size, dtype=int)

    # Get start indices for each island of 1s. Set those as 1s
    out[np.flatnonzero(a[1:] > a[:-1])+1] = 1

    # In case 1st element was True, we would have missed it earlier, so add that
    out[0] = a[0]

    # Finally cumsum and mask out non-island regions
    np.cumsum(out, out=out)
    return pd.Series(np.where(a, out, 0))

Using the sample and tiling for a number of times for the input -

使用样本和平铺多次输入 -

In [15]: vals=pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)

In [16]: vals = pd.Series(np.tile(vals,10000))

In [17]: %timeit Psidom_app(vals) # @Psidom's soln
    ...: %timeit Wen_app(vals) # @Wen's soln
    ...: %timeit island_same_label(vals) # Proposed in this post
    ...: 
100 loops, best of 3: 9.53 ms per loop
100 loops, best of 3: 13.2 ms per loop
1000 loops, best of 3: 959 µs per loop

#3


1  

m=(vals.diff().ne(0)&vals.ne(0)).cumsum()
m[vals.eq(0)]=0
m
Out[235]: 
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     0
8     0
9     2
10    2
11    0
12    3
13    3
14    3
dtype: int32

Data Input

数据输入

vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1])

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2017/09/13/5bfb1af6e7dcb4abd54353600729622d.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号