# How do I create a sum row and sum column in pandas?

Posted on

Problem :

I’m going through the Khan Academy course on Statistics as a bit of a refresher from my college days, and as a way to get me up to speed on pandas & other scientific Python.

I’ve got a table that looks like this from Khan Academy:

``````             | Undergraduate | Graduate | Total
-------------+---------------+----------+------
Straight A's |           240 |       60 |   300
-------------+---------------+----------+------
Not          |         3,760 |      440 | 4,200
-------------+---------------+----------+------
Total        |         4,000 |      500 | 4,500
``````

I would like to recreate this table using pandas. Of course I could create a DataFrame using something like

``````"Graduate": {...},
"Total": {...},
``````

But that seems like a naive approach that would both fall over quickly and just not really be extensible.

I’ve got the non-totals part of the table like this:

``````df = pd.DataFrame(
{
"Undergraduate": {"Straight A's": 240, "Not": 3_760},
"Graduate": {"Straight A's": 60, "Not": 440},
}
)
df
``````

I’ve been looking and found a couple of promising things, like:

``````df['Total'] = df.sum(axis=1)
``````

But I didn’t find anything terribly elegant.

I did find the `crosstab` function that looks like it should do what I want, but it seems like in order to do that I’d have to create a dataframe consisting of 1/0 for all of these values, which seems silly because I’ve already got an aggregate.

I have found some approaches that seem to manually build a new totals row, but it seems like there should be a better way, something like:

``````totals(df, rows=True, columns=True)
``````

or something.

Does this exist in pandas, or do I have to just cobble together my own approach?

Solution :

Or in two steps, using the `.sum()` function as you suggested (which might be a bit more readable as well):

``````import pandas as pd

df = pd.DataFrame( {"Undergraduate": {"Straight A's": 240, "Not": 3_760},"Graduate": {"Straight A's": 60, "Not": 440},})

#Total sum per column:
df.loc['Total',:]= df.sum(axis=0)

#Total sum per row:
df.loc[:,'Total'] = df.sum(axis=1)
``````

Output:

``````              Graduate  Undergraduate  Total
Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500
``````

## `append` and `assign`

The point of this answer is to provide an in line and not an in place solution.

### `append`

I use `append` to stack a `Series` or `DataFrame` vertically. It also creates a `copy` so that I can continue to chain.

### `assign`

I use `assign` to add a column. However, the `DataFrame` I’m working on is in the in between nether space. So I use a `lambda` in the `assign` argument which tells `Pandas` to apply it to the calling `DataFrame`.

``````df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500
``````

## Fun alternative

Uses `drop` with `errors='ignore'` to get rid of potentially pre-existing `Total` rows and columns.

Also, still in line.

``````def tc(d):
return d.assign(Total=d.drop('Total', errors='ignore', axis=1).sum(1))

df.pipe(tc).T.pipe(tc).T

Not                440           3760   4200
Straight A's        60            240    300
Total              500           4000   4500
``````

From the original data using `crosstab`, if just base on your input, you just need `melt` before `crosstab`

``````s=df.reset_index().melt('index')
pd.crosstab(index=s['index'],columns=s.variable,values=s.value,aggfunc='sum',margins=True)
Out[33]:
index
Not                440           3760  4200
Straight A's        60            240   300
All                500           4000  4500
``````

Toy data

``````df=pd.DataFrame({'c1':[1,2,2,3,4],'c2':[2,2,3,3,3],'c3':[1,2,3,4,5]})
# before `agg`, I think your input is the result after `groupby`
df
Out[37]:
c1  c2  c3
0   1   2   1
1   2   2   2
2   2   3   3
3   3   3   4
4   4   3   5

pd.crosstab(df.c1,df.c2,df.c3,aggfunc='sum',margins
=True)
Out[38]:
c2     2     3  All
c1
1    1.0   NaN    1
2    2.0   3.0    5
3    NaN   4.0    4
4    NaN   5.0    5
All  3.0  12.0   15
``````

The original data is:

``````>>> df = pd.DataFrame(dict(Undergraduate=[240, 3760], Graduate=[60, 440]), index=["Straight A's", "Not"])
>>> df
Out:
Straight A's        60            240
Not                440           3760
``````

You can only use `df.T` to achieve recreating this table:

``````>>> df_new = df.T
>>> df_new
Out:
Straight A's   Not
After computing the `Total` by row and columns:
``````>>> df_new.loc['Total',:]= df_new.sum(axis=0)