问题 Pandas一次更新多个列

我正在尝试一次更新几个字段 - 我有两个数据源，我正在尝试调和它们。我知道我可以做一些丑陋的合并，然后删除列，但期望下面的代码工作：

df = pd.DataFrame([['A','B','C',np.nan,np.nan,np.nan],
                  ['D','E','F',np.nan,np.nan,np.nan],[np.nan,np.nan,np.nan,'a','b','d'],
                  [np.nan,np.nan,np.nan,'d','e','f']], columns = ['Col1','Col2','Col3','col1_v2','col2_v2','col3_v2'])

print df

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2  NaN  NaN  NaN       a       b       d
3  NaN  NaN  NaN       d       e       f

#update 
df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = df[['col1_v2','col2_v2','col3_v2']]

print df

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2  NaN  NaN  NaN       a       b       d
3  NaN  NaN  NaN       d       e       f

我想要的输出是：

 Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a    b    c       a       b       d
3    d    e    f       d       e       f

我认为它与切片上的更新/设置有关，但我总是使用.loc更新值，而不是一次更多的列。

我觉得有一种简单的方法可以做到这一点，我只是想念，任何想法/建议都会受到欢迎！

编辑以反映以下解决方案 感谢您对索引的评论。但是，我对此有一个疑问，因为它与系列有关。如果我想以类似的方式更新单个系列，我可以这样做：

df.loc[df['Col1'].isnull(),['Col1']] = df['col1_v2']

print df

  Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a  NaN  NaN       a       b       d
3    d  NaN  NaN       d       e       f

请注意，我没有在这里考虑索引，我过滤到2x1系列并将其设置为等于4x1系列，但它正确处理了它。思考？我试图更好地理解我已经使用了一段时间的功能，但我想我没有完全掌握底层机制/规则

12271

2018-05-23 20:47

起源

该任务的右侧是 pd.Series 没有列信息。那列信息已经进入了 name 'pd.Series`对象的属性。在进行赋值时，它忽略了列的对齐，只是将指定的系列放在指定的列中。尝试 df.loc[df['Col1'].isnull(),['Col1', 'Col2']] = df['col1_v2'] 并且看到它只是将该系列放入现在指定的两个列中。为了分配正确的列从正确的列，您需要正确调用列。否则，循环。 - piRSquared

我还补充说，如果相反，你做了 df.loc[df['Col1'].isnull(),['Col1']] = df[['col1_v2']] 使用'col_v2'周围的双括号，这会尝试将数据帧推入该列，这将使您处于与以前相同的情况。这进一步说明了使用系列与数据帧进行分配之间的区别。 - piRSquared

答案:

你想要更换

print df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']]

  Col1 Col2 Col3
2  NaN  NaN  NaN
3  NaN  NaN  NaN

附：

replace_with_this = df.loc[df['Col1'].isnull(),['col1_v2','col2_v2', 'col3_v2']]
print replace_with_this

  col1_v2 col2_v2 col3_v2
2       a       b       d
3       d       e       f

似乎合理。但是，在执行分配时，需要考虑索引对齐，其中包括列。

所以，这应该工作：

df.loc[df['Col1'].isnull(),['Col1','Col2', 'Col3']] = replace_with_this.values

print df

  Col1 Col2 Col3 col1_v2 col2_v2 col3_v2
0    A    B    C     NaN     NaN     NaN
1    D    E    F     NaN     NaN     NaN
2    a    b    d       a       b       d
3    d    e    f       d       e       f

我通过使用来计算列数 .values 最后。这剥离了列中的列信息 replace_with_this 数据框，只使用适当位置的值。

2018-05-23 21:26

谢谢 - 根据您的解决方案添加上面的编辑，对此的想法？ - flyingmeatball

在“采取山”的精神，我提供以下解决方案，产生所要求的结果。

我意识到这并不完全是你所追求的，因为我没有切片df（以合理的 - 但非功能性 - 你建议的方式）。

#Does not work when indexing on np.nan, so I fill with some arbitrary value. 
df = df.fillna('AAA')

#mask to determine which rows to update
mask = df['Col1'] == 'AAA'

#dict with key value pairs for columns to be updated
mp = {'Col1':'col1_v2','Col2':'col2_v2','Col3':'col3_v2'}

#update
for k in mp: 
     df.loc[mask,k] = df[mp.get(k)]

#swap back np.nans for the arbitrary values
df = df.replace('AAA',np.nan)

输出：

Col1    Col2    Col3    col1_v2     col2_v2     col3_v2
A       B       C       NaN         NaN         NaN
D       E       F       NaN         NaN         NaN
a       b       d       a           b           d
d       e       f       d           e           f

如果我不替换nans，我得到的错误如下。我将准确研究该错误源于何处。

ValueError: array is not broadcastable to correct shape

2018-05-23 21:53

问题 Pandas一次更新多个列

答案:

热门问题