Value_counts and Groupby in Pandas Explained in Easy Steps

Console Flare

2 months ago

value_counts and groupby in Pandas explained with simple data analysis examples

Value_counts and Groupby in Pandas Explained in Easy Steps

Analysts use value_counts and groupby in Pandas to explore a dataset and summarize information fast. This tutorial explains value_counts and groupby in Pandas with simple examples that beginners understand. When you learn value_counts and groupby in Pandas, you get better at summarizing data quickly. Most data analysis tasks depend on value_counts and groupby in Pandas because they help you convert raw data into insights.

The Dataset (bmw.csv)

Download Sample Dataset: bmw.csv

1. Load the Dataset

You start by importing Pandas and loading the BMW CSV file. The read_csv() function reads the file and creates a DataFrame. Calling head() shows the first five rows, which lets you confirm whether the data loaded correctly.

Output:

model	year	price	transmission	mileage	fuelType	tax	mpg	engineSize
2 Series	2018	16250	Manual	10401	Petrol	145	52.3	1.5
5 Series	2014	11200	Automatic	67068	Diesel	125	57.6	2.0
6 Series	2018	27000	Automatic	14827	Petrol	145	42.8	2.0
5 Series	2016	16000	Automatic	62794	Diesel	160	51.4	3.0
1 Series	2017	12750	Automatic	26676	Diesel	145	72.4	1.5

2. Understanding value_counts()

The value_counts() function counts how many times each unique value appears in a column. It sorts the results in descending order, returns a Series, and uses the unique values as the index. This makes it useful when you want to understand category frequency.

However, many beginners skip value_counts even though it saves time.

2.1 Count Transmission Types

Output:

transmission	count
Semi-Auto	4666
Automatic	3588
Manual	2527

This shows that semi-automatic transmissions appear the most in the dataset.

In addition, both functions work well when you analyse multiple columns.

2.2 Get Proportions Using normalize=True

Python:

df['transmission'].value_counts(normalize=True)

The normalize=True argument tells Pandas to divide each count by the total number of rows. Instead of raw counts, you receive proportions.

Output

transmission	proportion
Semi-Auto	0.432798
Automatic	0.332808
Manual	0.234394

This means about 43 percent of all BMW cars in the dataset are Semi-Auto.

2.3 Convert to Percentage

Multiplying by 100 changes the proportions into readable percentages.

Output

transmission	percent
Semi-Auto	43.279844
Automatic	33.280772
Manual	23.439384

2.4 Disable Sorting Using sort=False

The sort=False option tells Pandas to return counts in the order values appear in the column instead of sorting them.

Output

transmission	count
Manual	2527
Automatic	3588
Semi-Auto	4666

2.5 Sort by Index

Here, Pandas counts the values and then sorts them alphabetically by their index. That helps when you want an organized output rather than descending order.

Output

transmission	count
Automatic	3588
Manual	2527
Semi-Auto	4666

2.6 Reset the Index

The reset_index() function converts the Series into a DataFrame. The old index becomes a normal column, which makes the output easier to join with other tables or export.

Output

index	transmission
Semi-Auto	4666
Automatic	3588
Manual	2527

3. Count Multiple Columns Using value_counts()

You can count combinations by passing multiple columns. Pandas treats each unique pair as a separate category and counts how often it appears.

Output (truncated)

year	transmission	count
2019	Semi-Auto	2005
2019	Automatic	997
2016	Automatic	736
2017	Semi-Auto	672

This reveals that 2019 had the highest number of semi-automatic cars.

3.1 Get Proportion for Multiple Columns

Pandas now divides each pair count by the total number of rows and returns proportions instead of raw frequencies.

Output:

year	transmission	proportion
2019	Semi-Auto	0.185975
2019	Automatic	0.092478
2016	Automatic	0.068268

3.2 Percentage Format

This returns percentage values, making it easier to read and compare the categories.

Output:

year	transmission	percent
2019	Semi-Auto	18.59
2019	Automatic	9.24

3.3 Sort Index on Both Columns

sort_index() sorts based on the tuple index consisting of year and transmission. This gives neatly organized output ordered by both columns.

Output:

year	transmission	percent
1996	Automatic	0.009276
1997	Manual	0.009276
2020	Semi-Auto	5.509693

3.4 Reset Index and Sort

After resetting the index, the columns become normal DataFrame columns. Sorting them gives better control over the display order, such as sorting by year ascending and transmission descending.

Output:

year	transmission	percent
1996	Automatic	0.009276
1997	Manual	0.009276
…	…	…
2020	Automatic	0.704944

4. Understanding groupby()

The groupby() function groups rows based on a column and then performs calculations on each group. You can compute mean, median, sum, max, min, count, and more. Pandas separates the data into groups internally and then applies your chosen operation to each group independently.

4.1 Average Mileage per Transmission

All rows are grouped by transmission type in this code, which then chooses the mileage column within each group, computes the mean, and outputs the results in a new DataFrame.

Output:

transmission	avg_mileage
Automatic	31830.67
Manual	34720.48
Semi-Auto	15631.34

4.2 Reset Index

Resetting the index converts the group labels back into normal DataFrame columns.

4.3 Sort by Mileage

Sorting the result arranges the transmission types based on average mileage. This makes comparisons easier.

Manual cars show the highest average mileage among the three.

Output:

transmission	avg_mileage
Manual	34720.48
Automatic	31830.67
Semi-Auto	15631.34

Insight:
Manual cars have the highest mileage.

5. Groupby With Multiple Aggregations

A dictionary-style structure within agg() allows you to compute multiple statistics simultaneously. Pandas adds the outcomes to new output columns after applying each operation to the designated column.

For example, groupby helps you split data before applying calculations.

Here, Pandas groups by transmission type and calculates three different values. It takes the mean and median of mileage and counts the number of models in each group.

Output:

transmission	avg_mileage	median_mileage	car_count
Manual	34720.48	30000	2527
Automatic	31830.67	26193.5	3588
Semi-Auto	15631.34	9036.5	4666

6. Price and Engine Size Analysis

You can combine many aggregations across multiple columns. Pandas handles each operation one by one inside each transmission group.

Pandas divides the data by transmission first. The total number of cars, the average price, the median price, the highest and lowest prices, the sum of prices, and the average engine size for each transmission group are then determined.

Output:

transmission	model_count	total_price	avg_price	max_price	min_price	avg_engine_size
Automatic	3588	80439458	22419.02	99950	1200	2.25
Manual	2527	36988839	14637.45	46000	1445	1.83
Semi-Auto	4666	127660584	27359.74	123456	5299	2.27

7. Best Fuel Type for Mileage

You can group by fuel type to examine mileage performance across different fuel categories.

Fuel types are grouped by Pandas, which then calculates mileage statistics and sorts the results so that the fuel type with the best mileage shows up first.

Output:

fuelType	avg_mileage	median_mileage
Other	44760.63	44351.5
Diesel	29852.80	24600
Hybrid	23400.53	16900
Electric	19925.33	20321
Petrol	16524.09	9097

8. Multi-Column Groupby

You can group by more than one column at the same time. Pandas creates a separate group for every transmission and fuel type combination.

Pandas reads each unique pair of transmission and fuelType, groups rows accordingly, calculates the average price inside each group, and returns the results in a multi-index DataFrame.

Output:

transmission	fuelType	avg_price
Automatic	Diesel	21216.70
Automatic	Petrol	25364.46
Manual	Diesel	13722.74
Semi-Auto	Petrol	28209.39
Semi-Auto	Hybrid	30115.37

9. Multi Aggregation on Price and Mileage

Here, you calculate the average price, total price, and average mileage for each transmission and fuel type combination.

As a result, your summaries become cleaner and easier to compare.

Pandas handles each calculation separately but keeps the same group structure. The result gives deep insights into how transmission and fuel type together affect price and mileage.

Finally, you understand how to combine value_counts and groupby for deeper analysis.

Output:

transmission	fuelType	avg_price	total_price	avg_mileage
Automatic	Diesel	21216.70	52808380	36286.72
Semi-Auto	Petrol	28209.39	46601921	11222.51

Final Thought

You looked at how value_counts() facilitates working with multiple columns, formatting results, calculating proportions, and counting values. In addition to applying computations like mean, median, sum, count, and more, you learned how to group rows using the groupby() function. You also saw how multi-aggregation and multi-column grouping produce effective summaries. These features provide you with powerful tools for efficiently and rapidly analyzing complicated datasets.

Internal link:

Aggregate Functions in Pandas: Beginner’s Guide with Examples

Filtering in Pandas: Learn loc, iloc, isin(), and between()

Load Data in Pandas – A Complete Beginner’s Guide to Data Import

Pandas official documentation:

https://pandas.pydata.org/docs/ag

Conclusion:

Data science keeps changing how teams solve problems. You get clearer insights, stronger decisions, and better results when you use data in your work. Keep building your skills and stay curious about new tools. Your progress depends on how you apply data in real situations.

Want to know what else can be done by Data Science?

If you wish to learn more about data science or want to advance your career in the data science field, feel free to join our free workshop on Master’s in Data Science with Power BI, where you will get to know how exactly the data science field works and why companies are ready to pay handsome salaries in this field.

In this workshop, you will get to know each tool and technology from scratch, which will make you skillfully eligible for any data science profile.

To join this workshop, register yourself on ConsoleFlare, and we will call you back.

Thinking, Why Console Flare?

Recently, ConsoleFlare has been recognized as one of the Top 10 Most Promising Data Science Training Institutes of 2023.

Console Flare offers the opportunity to learn Data Science in Hindi, just like how you speak daily.

Console Flare believes in the idea of “What to learn and what not to learn,” and this can be seen in their curriculum structure. They have designed their program based on what you need to learn for data science and nothing else.

Want more reasons?

Register yourself on ConsoleFlare, and we will call you back.
Log in or sign up to view
See posts, photos, and more on Facebook.

value_counts and groupby in Pandas

Value_counts and Groupby in Pandas Explained in Easy Steps

The Dataset (bmw.csv)

1. Load the Dataset

2. Understanding value_counts()

2.1 Count Transmission Types

2.2 Get Proportions Using normalize=True

2.3 Convert to Percentage

2.4 Disable Sorting Using sort=False

2.5 Sort by Index

2.6 Reset the Index

3. Count Multiple Columns Using value_counts()

3.1 Get Proportion for Multiple Columns

3.2 Percentage Format

3.3 Sort Index on Both Columns

3.4 Reset Index and Sort

4. Understanding groupby()

4.1 Average Mileage per Transmission

4.2 Reset Index

4.3 Sort by Mileage

Insight:Manual cars have the highest mileage.

5. Groupby With Multiple Aggregations

6. Price and Engine Size Analysis

7. Best Fuel Type for Mileage

8. Multi-Column Groupby

9. Multi Aggregation on Price and Mileage

Final Thought

Internal link:

Aggregate Functions in Pandas: Beginner’s Guide with Examples

Pandas official documentation: https://pandas.pydata.org/docs/ag

Conclusion:

Want to know what else can be done by Data Science?

Thinking, Why Console Flare?

Want more reasons?

Insight:
Manual cars have the highest mileage.

Pandas official documentation:

https://pandas.pydata.org/docs/ag