python - Calculate STD manually using Groupby Pandas DataFrame -
I was trying to write a solution, which is a different and a manual for calculating a mean and STD The way.
I have created
a = ["Apple", "Banana", "Cherry", "Apple"] B = [3,4,7,3] C = [5,4,1,4] D = [7,8,3,7] Pd DF = PD to import pandals DataFrame (index = class (4), column = list ("ABCD")) DF ["A"] = A DF ["B"] = BDF ["C"] = C DF ["D"] = D
Again, I made a list of A duplication. Then I went through the group all the time of the objects and calculated the solution.
import as np l = list (set (df.A)) df.groupby ('A', As_index = False) listMean = [0] * len (df.C) ListSTD = [0] * L in the LAN (df.C) X: s = np.mean (df [df ['A'] == x] for C =.) = Z = [index for index, enumerate In the object (df ['a']. Value] x == item i for z: listMean [i] = s in: s = np.std (df [df ['a'] == X] .cvalues) z = index for index, enumerate item (df ['a']. Value) if x == item] i in Z: listSTD [i] = s df ['c'] = ListMean df ['E'] = listSTD print df
I used description ()
grouping To calculate the mean, STD, "A" by
print df.groupby ('A'). Description ()
and test the suggestion solution:
result = df.groupby (['a'], as_index = False) .gg ({ 'C': ['mean', 'std'], 'b': 'first', 'd': 'first'})
I have noticed that when I get the STD ("E"), I get different results. I'm just curious, what did I miss?
contains: population SD and sample SD
population SD
sample sd
It is used when the value is There is only one sample from the universe.
np.std
by default population counts SD, while panda ' series.std
calculates sample SD by default. [42]: np.std ([4,5]) outside [42]: in 0.5 [43]: np.std ([4,5], ddof = 0)
Outside [43]: 0.5 in [44]: np.std ([4,5]], Ddof = 1 out [44]: 0.70710678118654757 in [45]: x = pd.Series ([4,5]) [46]: x.std () out [46]: 0.70710678118654757 [47]: X.std (ddof = 0) out [47]: 0.5
ddof < / Code> for cents and "degrees of freedom", and control the number that occurred in the SD formula from
N
.
The formula images appear above. There is the "uncorrected sample standard deviation" I called the population SD, and the "true sample standard deviation" sample is SD.
Comments
Post a Comment