I have a dataframe, and I am trying to get a Series of the form:
我有一个数据框,我试图得到一个系列的形式:
col1 col2 col3
col1 1.0 0.20 0.70
col2 0.2 1.00 0.01
col3 0.7 0.01 1.00
GOAL:
col1Xcol1 1.0
col1Xcol2 0.2
col1Xcol3 0.7
col2Xcol1 0.2
...
My code so far:
我的代码到目前为止:
pvals2=pd.DataFrame({'col1': [1, .2,.7],
'col2': [.2, 1,.01],
'col3': [.7,.01,1]},
index = ['col1', 'col2', 'col3'])
print(pvals.transpose().join(pvals, how='outer',lsuffix='_left', rsuffix='_right'))
OUTPUT:
vote_left ballot1_left ballot1_x_left vote_right ballot1_right \
vote 0 0.0923 0.0521 0 0.0923
ballot1 0.0923 0 0.8213 0.0923 0
ballot1_x 0.0521 0.8213 0 0.0521 0.8213
ballot1_x_right
vote 0.0521
ballot1 0.8213
ballot1_x 0
0
First stack the dataframe
首先堆叠数据帧
st = pvals2.stack()
Create a new index, by adding together the multiindex
通过将多索引添加在一起来创建新索引
newdex = st.index._get_level_values(0) + 'X' + st.index._get_level_values(1)
Set newdex
as the index for the series
将newdex设置为系列的索引
st.set_axis(0,newdex)
All together
st = pvals2.stack()
st.set_axis(0,st.index._get_level_values(0) + 'X' + st.index._get_level_values(1))
col1Xcol1 1.00
col1Xcol2 0.20
col1Xcol3 0.70
col2Xcol1 0.20
col2Xcol2 1.00
col2Xcol3 0.01
col3Xcol1 0.70
col3Xcol2 0.01
col3Xcol3 1.00
0
Consider melt
with column assignment for new index then select the value column since a single pandas DataFrame column is a pandas Series:
考虑使用新索引的列分配进行融合,然后选择值列,因为单个pandas DataFrame列是一个pandas系列:
Data
from io import StringIO
import pandas as pd
txt = ''' col1 col2 col3
col1 1.0 0.20 0.70
col2 0.2 1.00 0.01
col3 0.7 0.01 1.00'''
df = pd.read_table(StringIO(txt), sep="\s+")
Series build
mdf = pd.melt(df.reset_index(), id_vars='index')
mdf['s'] = mdf['index'] + 'X' + mdf['variable']
new_series = mdf.set_index('s').rename_axis(None)['value']
print(new_series)
# col1Xcol1 1.00
# col2Xcol1 0.20
# col3Xcol1 0.70
# col1Xcol2 0.20
# col2Xcol2 1.00
# col3Xcol2 0.01
# col1Xcol3 0.70
# col2Xcol3 0.01
# col3Xcol3 1.00
# Name: value, dtype: float64
0
concat
and setting the new index works:
concat和设置新索引的工作原理:
>>> ser = pd.concat([pvals2[col] for col in pvals2.columns])
>>> ser.index = [pvals2[col].name + 'X' + x for col in pvals2.columns
for x in pvals2[col].index]
>>> ser
col1Xcol1 1.00
col1Xcol2 0.20
col1Xcol3 0.70
col2Xcol1 0.20
col2Xcol2 1.00
col2Xcol3 0.01
col3Xcol1 0.70
col3Xcol2 0.01
col3Xcol3 1.00
dtype: float64
0
The following code:
以下代码:
pvals = pd.DataFrame({'col1': [1, .2,.7],
'col2': [.2, 1,.01],
'col3': [.7,.01,1]},
index = ['row1', 'row2', 'row3'])
values = []
ind = []
for i in range(len(pvals.index)):
for col in pvals:
row = pvals.index[i]
values.append(pvals[col][row])
ind.append("%sX%s" % (row, col))
newpvals = pd.Series(values, ind)
gives:
>>> newvals
row1Xcol1 1.00
row1Xcol2 0.20
row1Xcol3 0.70
row2Xcol1 0.20
row2Xcol2 1.00
row2Xcol3 0.01
row3Xcol1 0.70
row3Xcol2 0.01
row3Xcol3 1.00
dtype: float64
Edit: I misread, so changed into Series
.
编辑:我误读了,所以变成了系列。
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2018/03/02/1f2ad2a6f671fe3d6bb66f59919c8041.html。