Suppose I have a Pandas series of boolean values like so.
假设我有一个像这样的Pandas系列布尔值。
vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)
>>> vals
0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 False
8 False
9 True
10 True
11 False
12 True
13 True
14 True
dtype: bool
I want to turn this boolean Series into a series where each group of 1's is properly enumerated, like so
我想把这个布尔系列变成一个系列,其中每个1组都被正确枚举,就像这样
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 0
8 0
9 2
10 2
11 0
12 3
13 3
14 3
How can I do this efficiently?
我怎样才能有效地做到这一点?
I have been able to do so manually, looping over the series on a Python level and incrementing, but this is obviously slow. I'm looking for a vectorized solution - I saw this answer from unutbu concerning splitting on increasing groups in NumPy, and was trying to get that to work with a cumsum
of some sort but have been unsuccessful so far.
我已经能够手动完成,在Python级别上循环遍历系列并递增,但这显然很慢。我正在寻找一个矢量化的解决方案 - 我从unutbu看到了这个答案,关于在NumPy中增加群体的分裂,并试图让它与某种类型的cumsum一起使用但到目前为止还没有成功。
3
You can try this:
你可以试试这个:
vals.astype(int).diff().fillna(vals.iloc[0]).eq(1).cumsum().where(vals, 0)
#0 0
#1 0
#2 0
#3 1
#4 1
#5 1
#6 1
#7 0
#8 0
#9 2
#10 2
#11 0
#12 3
#13 3
#14 3
#dtype: int64
3
Here's a NumPy approach -
这是一种NumPy方法 -
def island_same_label(vals):
# Get array for faster processing with NumPy tools, ufuncs
a = vals.values
# Initialize output array
out = np.zeros(a.size, dtype=int)
# Get start indices for each island of 1s. Set those as 1s
out[np.flatnonzero(a[1:] > a[:-1])+1] = 1
# In case 1st element was True, we would have missed it earlier, so add that
out[0] = a[0]
# Finally cumsum and mask out non-island regions
np.cumsum(out, out=out)
return pd.Series(np.where(a, out, 0))
Using the sample and tiling for a number of times for the input -
使用样本和平铺多次输入 -
In [15]: vals=pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)
In [16]: vals = pd.Series(np.tile(vals,10000))
In [17]: %timeit Psidom_app(vals) # @Psidom's soln
...: %timeit Wen_app(vals) # @Wen's soln
...: %timeit island_same_label(vals) # Proposed in this post
...:
100 loops, best of 3: 9.53 ms per loop
100 loops, best of 3: 13.2 ms per loop
1000 loops, best of 3: 959 µs per loop
1
m=(vals.diff().ne(0)&vals.ne(0)).cumsum()
m[vals.eq(0)]=0
m
Out[235]:
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 0
8 0
9 2
10 2
11 0
12 3
13 3
14 3
dtype: int32
Data Input
数据输入
vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1])
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2017/09/13/5bfb1af6e7dcb4abd54353600729622d.html。