Utwórz tabelę przestawną, która zawiera listę wartości

Jakiego agfunc należy użyć do utworzenia listy przy użyciu tabeli przestawnej? Próbowałem użyć str, który nie działa.Utwórz tabelę przestawną, która zawiera listę wartości

Wejścia

import pandas as pd 
data = { 
    'Test point': [0, 1, 2, 0, 1], 
    'Experiment': [1, 2, 3, 4, 5] 
} 
df = pd.DataFrame(data) 
print df 

pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=len) 
print pivot 

pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=str) 
print pivot

Wyjścia

Experiment Test point 
0   1   0 
1   2   1 
2   3   2 
3   4   0 
4   5   1 
      Experiment 
Test point    
0     2 
1     2 
2     1 
               Experiment 
Test point             
0   0 1\n3 4\nName: Experiment, dtype: int64 
1   1 2\n4 5\nName: Experiment, dtype: int64 
2     2 3\nName: Experiment, dtype: int64

sygnał wyjściowy

  Experiment 
Test point             
0   1, 4 
1   2, 5 
2   3

Źródło

2017-10-14 bluprince13

można użyć list się jako funkcję:

>>> pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x:list(x)) 
      Experiment 
Test point   
0    [1, 4] 
1    [2, 5] 
2     [3]

Źródło

2017-10-25 11:01:37

Zastosowanie

In [1830]: pd.pivot_table(df, index=['Test point'], values=['Experiment'], 
          aggfunc=lambda x: ', '.join(x.astype(str))) 
Out[1830]: 
      Experiment 
Test point 
0    1, 4 
1    2, 5 
2     3

Lub, groupby zrobi.

In [1831]: df.groupby('Test point').agg({ 
       'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) 
Out[1831]: 
      Experiment 
Test point 
0    1, 4 
1    2, 5 
2     3

Ale jeśli chcesz potem jako listy.

In [1861]: df.groupby('Test point').agg({'Experiment': lambda x: x.tolist()}) 
Out[1861]: 
      Experiment 
Test point 
0    [1, 4] 
1    [2, 5] 
2     [3]

x.astype(str).str.cat(sep=', ') jest podobny do ', '.join(x.astype(str))

Źródło

2017-10-14 10:56:35 Zero

Option 1
str wstępne przemiany + groupby + apply.

Można wstępnie przekonwertować na ciąg, aby uprościć wywołanie groupby.

df.assign(Experiment=df.Experiment.astype(str))\ 
     .groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') 

      Experiment 
Test point   
0    1, 4 
1    2, 5 
2     3

i modyfikacja wymagałoby to zadanie, na Ustalone szybkości (assign zwraca kopii i wolniej)

df.Experiment = df.Experiment.astype(str) 
df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') 

      Experiment 
Test point   
0    1, 4 
1    2, 5 
2     3

na negatywny wpływ modyfikacji oryginalnej dataframe również.

Wydajność

# Zero's 1st solution 
%%timeit 
df.groupby('Test point').agg({'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) 

100 loops, best of 3: 3.72 ms per loop

# Zero's second solution 
%%timeit 
pd.pivot_table(df, index=['Test point'], values=['Experiment'], 
       aggfunc=lambda x: ', '.join(x.astype(str))) 

100 loops, best of 3: 5.17 ms per loop

# proposed in this post 
%%timeit -n 1 
df.Experiment = df.Experiment.astype(str) 
df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') 

1 loop, best of 3: 2.02 ms per loop

Należy zauważyć, że metoda .assign jest tylko kilka ms wolniejszy niż ten. Większe wzrosty wydajności powinny być widoczne dla większych ramek danych.

Wariant 2
groupby + agg:

Podobna operacja wynika z agg:

df.assign(Experiment=df.Experiment.astype(str))\ 
     .groupby('Test point').agg({'Experiment' : ', '.join}) 

      Experiment 
Test point   
0    1, 4 
1    2, 5 
2     3

a wersja w miejscu tego może być taki sam, jak powyżej.

# proposed in this post 
%%timeit -n 1 
df.Experiment = df.Experiment.astype(str) 
df.groupby('Test point').agg({'Experiment' : ', '.join}) 

1 loop, best of 3: 2.21 ms per loop

agg powinien zobaczyć wzrost prędkości ponad apply dla większych dataframes.

Źródło

2017-10-24 08:34:46

Utwórz tabelę przestawną, która zawiera listę wartości

Odpowiedz

Powiązane problemy