Site icon Console Flare Blog

Value_counts and Groupby in Pandas Explained in Easy Steps

value_counts and groupby in Pandas explained with simple data analysis examples

Value_counts and Groupby in Pandas Explained in Easy Steps

Analysts use value_counts and groupby in Pandas to explore a dataset and summarize information fast. This tutorial explains value_counts and groupby in Pandas with simple examples that beginners understand. When you learn value_counts and groupby in Pandas, you get better at summarizing data quickly. Most data analysis tasks depend on value_counts and groupby in Pandas because they help you convert raw data into insights.

value_counts() and groupby() in pandas | How to find meaningful insights | #pandastutorial

Download Sample Dataset: bmw.csv

1. Load the Dataset

You start by importing Pandas and loading the BMW CSV file. The read_csv() function reads the file and creates a DataFrame. Calling head() shows the first five rows, which lets you confirm whether the data loaded correctly.

Python:
import pandas as pd 
df = pd.read_csv('bmw.csv') 
df.head()

Output:

model year price transmission mileage fuelType tax mpg engineSize
2 Series 2018 16250 Manual 10401 Petrol 145 52.3 1.5
5 Series 2014 11200 Automatic 67068 Diesel 125 57.6 2.0
6 Series 2018 27000 Automatic 14827 Petrol 145 42.8 2.0
5 Series 2016 16000 Automatic 62794 Diesel 160 51.4 3.0
1 Series 2017 12750 Automatic 26676 Diesel 145 72.4 1.5

2. Understanding value_counts()

The value_counts() function counts how many times each unique value appears in a column. It sorts the results in descending order, returns a Series, and uses the unique values as the index. This makes it useful when you want to understand category frequency.

However, many beginners skip value_counts even though it saves time.

2.1 Count Transmission Types

Python:
df['transmission'].value_counts()
When this runs, Pandas checks the transmission column, groups identical values together, counts their occurrences, and returns the counts.


Output:

transmission count
Semi-Auto 4666
Automatic 3588
Manual 2527

This shows that semi-automatic transmissions appear the most in the dataset.

In addition, both functions work well when you analyse multiple columns.

2.2 Get Proportions Using normalize=True

Python:
df['transmission'].value_counts(normalize=True)

The normalize=True argument tells Pandas to divide each count by the total number of rows. Instead of raw counts, you receive proportions.

Output

transmission proportion
Semi-Auto 0.432798
Automatic 0.332808
Manual 0.234394

This means about 43 percent of all BMW cars in the dataset are Semi-Auto.

2.3 Convert to Percentage

Python:
df['transmission'].value_counts(normalize=True) * 100

Multiplying by 100 changes the proportions into readable percentages.

Output

transmission percent
Semi-Auto 43.279844
Automatic 33.280772
Manual 23.439384

2.4 Disable Sorting Using sort=False

Python:
df['transmission'].value_counts(sort=False)

The sort=False option tells Pandas to return counts in the order values appear in the column instead of sorting them.

Output

transmission count
Manual 2527
Automatic 3588
Semi-Auto 4666

2.5 Sort by Index

Python:
df['transmission'].value_counts().sort_index()

Here, Pandas counts the values and then sorts them alphabetically by their index. That helps when you want an organized output rather than descending order.

Output

transmission count
Automatic 3588
Manual 2527
Semi-Auto 4666

2.6 Reset the Index

Python:
df['transmission'].value_counts().reset_index()

The reset_index() function converts the Series into a DataFrame. The old index becomes a normal column, which makes the output easier to join with other tables or export.

Output

index transmission
Semi-Auto 4666
Automatic 3588
Manual 2527

3. Count Multiple Columns Using value_counts()

You can count combinations by passing multiple columns. Pandas treats each unique pair as a separate category and counts how often it appears.

Python:
df[['year','transmission']].value_counts(normalize=True)

Pandas groups by both year and transmission together, then counts how many rows fall into each pair.

Output (truncated)

year transmission count
2019 Semi-Auto 2005
2019 Automatic 997
2016 Automatic 736
2017 Semi-Auto 672

This reveals that 2019 had the highest number of semi-automatic cars.

3.1 Get Proportion for Multiple Columns

Python:
df[['year','transmission']].value_counts(normalize=True)

Pandas now divides each pair count by the total number of rows and returns proportions instead of raw frequencies.

Output:

year transmission proportion
2019 Semi-Auto 0.185975
2019 Automatic 0.092478
2016 Automatic 0.068268

3.2 Percentage Format

Python:
df[['year','transmission']].value_counts(normalize=True) * 100

This returns percentage values, making it easier to read and compare the categories.

Output:

year transmission percent
2019 Semi-Auto 18.59
2019 Automatic 9.24

3.3 Sort Index on Both Columns

Python:
mdf = df[['year','transmission']].value_counts(normalize=True) * 100 
mdf.sort_index()

sort_index() sorts based on the tuple index consisting of year and transmission. This gives neatly organized output ordered by both columns.

Output:

year transmission percent
1996 Automatic 0.009276
1997 Manual 0.009276
2020 Semi-Auto 5.509693

3.4 Reset Index and Sort

Python:
mdf = mdf.reset_index() 
mdf.sort_values(['year','transmission'], ascending=[True, False])

After resetting the index, the columns become normal DataFrame columns. Sorting them gives better control over the display order, such as sorting by year ascending and transmission descending.

Output:

year transmission percent
1996 Automatic 0.009276
1997 Manual 0.009276
2020 Automatic 0.704944

4. Understanding groupby()

The groupby() function groups rows based on a column and then performs calculations on each group. You can compute mean, median, sum, max, min, count, and more. Pandas separates the data into groups internally and then applies your chosen operation to each group independently.

4.1 Average Mileage per Transmission

Python:
mdf = df.groupby('transmission').agg(avg_mileage=('mileage','mean')) 
mdf

All rows are grouped by transmission type in this code, which then chooses the mileage column within each group, computes the mean, and outputs the results in a new DataFrame.

Output:

transmission avg_mileage
Automatic 31830.67
Manual 34720.48
Semi-Auto 15631.34

4.2 Reset Index

Python:
mdf.reset_index()

Resetting the index converts the group labels back into normal DataFrame columns.

4.3 Sort by Mileage

Python:
mdf.sort_values('avg_mileage', ascending=False)

Sorting the result arranges the transmission types based on average mileage. This makes comparisons easier.

Manual cars show the highest average mileage among the three.

Output:

transmission avg_mileage
Manual 34720.48
Automatic 31830.67
Semi-Auto 15631.34
Insight:
Manual cars have the highest mileage.

5. Groupby With Multiple Aggregations

A dictionary-style structure within agg() allows you to compute multiple statistics simultaneously. Pandas adds the outcomes to new output columns after applying each operation to the designated column.

For example, groupby helps you split data before applying calculations.

Python:
mdf = df.groupby('transmission').agg( 
      avg_mileage=('mileage','mean'),
      median_mileage=('mileage','median'),
      car_count=('model','count') 
)

Here, Pandas groups by transmission type and calculates three different values. It takes the mean and median of mileage and counts the number of models in each group.

Output:

transmission avg_mileage median_mileage car_count
Manual 34720.48 30000 2527
Automatic 31830.67 26193.5 3588
Semi-Auto 15631.34 9036.5 4666

6. Price and Engine Size Analysis

You can combine many aggregations across multiple columns. Pandas handles each operation one by one inside each transmission group.

Python:
mdf = df.groupby('transmission').agg( 
    model_count=('model','count'), 
    total_price=('price','sum'), 
    avg_price=('price','mean'), 
    median_price=('price','median'), 
    max_price=('price','max'), 
    min_price=('price','min'), 
    avg_engine_size=('engineSize','mean') 
)

Pandas divides the data by transmission first. The total number of cars, the average price, the median price, the highest and lowest prices, the sum of prices, and the average engine size for each transmission group are then determined.

Output:

transmission model_count total_price avg_price max_price min_price avg_engine_size
Automatic 3588 80439458 22419.02 99950 1200 2.25
Manual 2527 36988839 14637.45 46000 1445 1.83
Semi-Auto 4666 127660584 27359.74 123456 5299 2.27

7. Best Fuel Type for Mileage

You can group by fuel type to examine mileage performance across different fuel categories.

Python:
mdf = df.groupby('fuelType').agg( 
    avg_mileage=('mileage','mean'), 
    median_mileage=('mileage','median')
 ).sort_values('avg_mileage', ascending=False)

Fuel types are grouped by Pandas, which then calculates mileage statistics and sorts the results so that the fuel type with the best mileage shows up first.

Output:

fuelType avg_mileage median_mileage
Other 44760.63 44351.5
Diesel 29852.80 24600
Hybrid 23400.53 16900
Electric 19925.33 20321
Petrol 16524.09 9097

8. Multi-Column Groupby

You can group by more than one column at the same time. Pandas creates a separate group for every transmission and fuel type combination.

Python:
mdf = df.groupby(['transmission','fuelType']).agg( 
    avg_price=('price','mean')
)

Pandas reads each unique pair of transmission and fuelType, groups rows accordingly, calculates the average price inside each group, and returns the results in a multi-index DataFrame.

Output:

transmission fuelType avg_price
Automatic Diesel 21216.70
Automatic Petrol 25364.46
Manual Diesel 13722.74
Semi-Auto Petrol 28209.39
Semi-Auto Hybrid 30115.37

9. Multi Aggregation on Price and Mileage

Here, you calculate the average price, total price, and average mileage for each transmission and fuel type combination.

As a result, your summaries become cleaner and easier to compare.

Python:
mdf = df.groupby(['transmission','fuelType']).agg( 
    avg_price=('price','mean'), 
    total_price=('price','sum'), 
    avg_mileage=('mileage','mean'), 
)

Pandas handles each calculation separately but keeps the same group structure. The result gives deep insights into how transmission and fuel type together affect price and mileage.

Finally, you understand how to combine value_counts and groupby for deeper analysis.

Output:

transmission fuelType avg_price total_price avg_mileage
Automatic Diesel 21216.70 52808380 36286.72
Semi-Auto Petrol 28209.39 46601921 11222.51

Final Thought

You looked at how value_counts() facilitates working with multiple columns, formatting results, calculating proportions, and counting values. In addition to applying computations like mean, median, sum, count, and more, you learned how to group rows using the groupby() function. You also saw how multi-aggregation and multi-column grouping produce effective summaries. These features provide you with powerful tools for efficiently and rapidly analyzing complicated datasets.

Internal link:

Aggregate Functions in Pandas: Beginner’s Guide with Examples

Filtering in Pandas: Learn loc, iloc, isin(), and between()

Load Data in Pandas – A Complete Beginner’s Guide to Data Import

Pandas official documentation:

https://pandas.pydata.org/docs/ag

Conclusion:

Data science is transforming finance, from predictive analytics to personalized services. It’s not just a technology upgrade – it’s changing how the industry operates. Those who leverage data science will stay ahead in this data-driven era.

Want to know what else can be done by Data Science?

If you wish to learn more about data science or want to advance your career in the data science field, feel free to join our free workshop on Master’s in Data Science with Power BI, where you will get to know how exactly the data science field works and why companies are ready to pay handsome salaries in this field.

In this workshop, you will get to know each tool and technology from scratch, which will make you skillfully eligible for any data science profile.

To join this workshop, register yourself on ConsoleFlare, and we will call you back.

Thinking, Why Console Flare?

Recently, ConsoleFlare has been recognized as one of the Top 10 Most Promising Data Science Training Institutes of 2023.

Console Flare offers the opportunity to learn Data Science in Hindi, just like how you speak daily.

Console Flare believes in the idea of “What to learn and what not to learn,” and this can be seen in their curriculum structure. They have designed their program based on what you need to learn for data science and nothing else.

Want more reasons?

Register yourself on ConsoleFlare, and we will call you back.
Log in or sign up to view
See posts, photos, and more on Facebook.

value_counts and groupby in Pandas

Console Flare

Exit mobile version