增加数组/系列中的连续正组

[英]Incrementing contiguous positive groups in array/Series

本文翻译自 Eric Hansen 查看原文 2017/09/13 47 pandas/ python/ series/ numpy

Suppose I have a Pandas series of boolean values like so.

假设我有一个像这样的Pandas系列布尔值。

vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)

>>> vals
0     False
1     False
2     False
3      True
4      True
5      True
6      True
7     False
8     False
9      True
10     True
11    False
12     True
13     True
14     True
dtype: bool

I want to turn this boolean Series into a series where each group of 1's is properly enumerated, like so

我想把这个布尔系列变成一个系列，其中每个1组都被正确枚举，就像这样

How can I do this efficiently?

我怎样才能有效地做到这一点？

I have been able to do so manually, looping over the series on a Python level and incrementing, but this is obviously slow. I'm looking for a vectorized solution - I saw this answer from unutbu concerning splitting on increasing groups in NumPy, and was trying to get that to work with a cumsum of some sort but have been unsuccessful so far.

我已经能够手动完成，在Python级别上循环遍历系列并递增，但这显然很慢。我正在寻找一个矢量化的解决方案 - 我从unutbu看到了这个答案，关于在NumPy中增加群体的分裂，并试图让它与某种类型的cumsum一起使用但到目前为止还没有成功。

3 个解决方案

#1

You can try this:

你可以试试这个：

vals.astype(int).diff().fillna(vals.iloc[0]).eq(1).cumsum().where(vals, 0)

#0     0
#1     0
#2     0
#3     1
#4     1
#5     1
#6     1
#7     0
#8     0
#9     2
#10    2
#11    0
#12    3
#13    3
#14    3
#dtype: int64

#2

Here's a NumPy approach -

这是一种NumPy方法 -

def island_same_label(vals):

    # Get array for faster processing with NumPy tools, ufuncs
    a = vals.values

    # Initialize output array
    out = np.zeros(a.size, dtype=int)

    # Get start indices for each island of 1s. Set those as 1s
    out[np.flatnonzero(a[1:] > a[:-1])+1] = 1

    # In case 1st element was True, we would have missed it earlier, so add that
    out[0] = a[0]

    # Finally cumsum and mask out non-island regions
    np.cumsum(out, out=out)
    return pd.Series(np.where(a, out, 0))

Using the sample and tiling for a number of times for the input -

使用样本和平铺多次输入 -

In [15]: vals=pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)

In [16]: vals = pd.Series(np.tile(vals,10000))

In [17]: %timeit Psidom_app(vals) # @Psidom's soln
    ...: %timeit Wen_app(vals) # @Wen's soln
    ...: %timeit island_same_label(vals) # Proposed in this post
    ...: 
100 loops, best of 3: 9.53 ms per loop
100 loops, best of 3: 13.2 ms per loop
1000 loops, best of 3: 959 µs per loop

#3

m=(vals.diff().ne(0)&vals.ne(0)).cumsum()
m[vals.eq(0)]=0
m
Out[235]: 
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     0
8     0
9     2
10    2
11    0
12    3
13    3
14    3
dtype: int32

Data Input

数据输入

vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1])

注意！

本站翻译的文章，版权归属于本站，未经许可禁止转摘，转摘请注明本文地址：http://www.silva-art.net/blog/2017/09/13/5bfb1af6e7dcb4abd54353600729622d.html。

猜您在找

请问如何为Series增加数组值还有改变Series中两点间颜色呢? 如何在插入时将值推入数组并增加数组索引? - How to push values into an array along with incrementing the array index while inserting? 如何增加数组的长度 - How to increase an array's length 对于每个增加数组的循环 - for each loop with increase array 为什么在下面的函数中增加数组“a”不是错误？ - Why it is not an error to increment array “a” in the below function?