To calculate summary statistics of columns in a Pandas DataFrame, you can use the describe()
method. This method provides various summary statistics of each column, such as count, mean, standard deviation, minimum, maximum, and quartile values.
Here's an example of how to use the describe()
method:
pythonimport pandas as pd
# create a sample DataFrame
data = {'column1': [1, 2, 3, 4, 5],
'column2': [10, 20, 30, 40, 50],
'column3': [100, 200, 300, 400, 500]}
df = pd.DataFrame(data)
# calculate summary statistics
stats = df.describe()
print(stats)
Output:
column1 column2 column3
count 5.000000 5.000000 5.000000
mean 3.000000 30.000000 300.000000
std 1.581139 15.811388 158.113883
min 1.000000 10.000000 100.000000
25% 2.000000 20.000000 200.000000
50% 3.000000 30.000000 300.000000
75% 4.000000 40.000000 400.000000
max 5.000000 50.000000 500.000000
As you can see, the describe()
method returns a DataFrame containing the summary statistics for each column. The count
column shows the number of non-missing values in each column, the mean
column shows the average value of each column, the std
column shows the standard deviation of each column, the min
and max
columns show the minimum and maximum values of each column, and the 25%
, 50%
, and 75%
columns show the quartile values of each column.
Comments
Post a Comment