Jak używać grupy Pandy przez zastosowanie() bez dodawania dodatkowego indeksu

Bardzo często chcę utworzyć nową ramkę DataFrame, łącząc wiele kolumn zgrupowanej DataFrame. Apply() funkcja pozwala mi zrobić, ale wymaga, aby utworzyć niepotrzebne index:Jak używać grupy Pandy przez zastosowanie() bez dodawania dodatkowego indeksu

In [359]: df = pandas.DataFrame({'x': 3 * ['a'] + 2 * ['b'], 'y': np.random.normal(size=5), 'z': np.random.normal(size=5)}) 

In [360]: df 
Out[360]: 
    x   y   z 
0 a 0.201980 -0.470388 
1 a 0.190846 -2.089032 
2 a -1.131010 0.227859 
3 b -0.263865 -1.906575 
4 b -1.335956 -0.722087 

In [361]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()})) 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
/home/emarkley/work/src/partner_analysis2/main.py in <module>() 
----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()})) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in apply(self, func, *args, **kwargs) 
    267   applied : type depending on grouped object and function 
    268   """ 
--> 269   return self._python_apply_general(func, *args, **kwargs) 
    270 
    271  def aggregate(self, func, *args, **kwargs): 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/groupby.py in _python_apply_general(self, func, *args, **kwargs) 
    417    group_axes = _get_axes(group) 
    418 
--> 419    res = func(group, *args, **kwargs) 
    420 
    421    if not _is_indexed_like(res, group_axes): 

/home/emarkley/work/src/partner_analysis2/main.py in <lambda>(x) 
----> 1 df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()})) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy) 
    371    mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy) 
    372   elif isinstance(data, dict): 
--> 373    mgr = self._init_dict(data, index, columns, dtype=dtype) 
    374   elif isinstance(data, ma.MaskedArray): 
    375    mask = ma.getmaskarray(data) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype) 
    454   # figure out the index, if necessary 
    455   if index is None: 
--> 456    index = extract_index(data) 
    457   else: 
    458    index = _ensure_index(index) 

/usr/local/lib/python3.2/site-packages/pandas-0.8.2.dev-py3.2-linux-x86_64.egg/pandas/core/frame.py in extract_index(data) 
    4719 
    4720   if not indexes and not raw_lengths: 
-> 4721    raise ValueError('If use all scalar values, must pass index') 
    4722 
    4723   if have_series or have_dicts: 

ValueError: If use all scalar values, must pass index 

In [362]: df.groupby('x').apply(lambda x: pandas.DataFrame({'r': (x.y + x.z).sum()/x.z.sum(), 's': (x.y + x.z ** 2).sum()/x.z.sum()}, index=[0])) 
Out[362]: 
      r   s 
x      
a 0 1.316605 -1.672293 
b 0 1.608606 -0.972593

Czy istnieje jakiś sposób, aby używać apply() lub jakaś inna funkcja, aby uzyskać takie same wyniki bez dodatkowego indeksu zer?

Źródło

2012-09-13 user2303

Ty wytwarzania łączną wartość R i S na grupę, więc należy używać Series tutaj:

In [26]: df.groupby('x').apply(lambda x: 
      Series({'r': (x.y + x.z).sum()/x.z.sum(), 
        's': (x.y + x.z ** 2).sum()/x.z.sum()})) 
Out[26]: 
      r   s 
x      
a -0.338590 -0.916635 
b 66.655533 102.566146

Źródło

2012-09-13 17:41:05

Perfect! Nie wiem, jak to przegapiłem ... – user2303

Dziękuję za to. Nie wiedziałem też, że powinienem zwrócić serię, a także próbowałem zwrócić DataFrame. Te informacje (które powinny zwrócić serię) powinny znajdować się w oficjalnej dokumentacji. – JustAC0der

Jak używać grupy Pandy przez zastosowanie() bez dodawania dodatkowego indeksu

Odpowiedz

Powiązane problemy