-
The Dataset (bmw.csv)
Download Sample Dataset: bmw.csv
1. Load the Dataset
You start by importing Pandas and loading the BMW CSV file. The read_csv() function reads the file and creates a DataFrame. Calling head() shows the first five rows, which lets you confirm whether the data loaded correctly.
Output:
| model | year | price | transmission | mileage | fuelType | tax | mpg | engineSize |
|---|---|---|---|---|---|---|---|---|
| 2 Series | 2018 | 16250 | Manual | 10401 | Petrol | 145 | 52.3 | 1.5 |
| 5 Series | 2014 | 11200 | Automatic | 67068 | Diesel | 125 | 57.6 | 2.0 |
| 6 Series | 2018 | 27000 | Automatic | 14827 | Petrol | 145 | 42.8 | 2.0 |
| 5 Series | 2016 | 16000 | Automatic | 62794 | Diesel | 160 | 51.4 | 3.0 |
| 1 Series | 2017 | 12750 | Automatic | 26676 | Diesel | 145 | 72.4 | 1.5 |
2. Understanding value_counts()
The value_counts() function counts how many times each unique value appears in a column. It sorts the results in descending order, returns a Series, and uses the unique values as the index. This makes it useful when you want to understand category frequency.
However, many beginners skip value_counts even though it saves time.
2.1 Count Transmission Types
Output:
| transmission | count |
|---|---|
| Semi-Auto | 4666 |
| Automatic | 3588 |
| Manual | 2527 |
This shows that semi-automatic transmissions appear the most in the dataset.
In addition, both functions work well when you analyse multiple columns.
2.2 Get Proportions Using normalize=True
| transmission | proportion |
|---|---|
| Semi-Auto | 0.432798 |
| Automatic | 0.332808 |
| Manual | 0.234394 |
This means about 43 percent of all BMW cars in the dataset are Semi-Auto.
2.3 Convert to Percentage
Multiplying by 100 changes the proportions into readable percentages.
Output
| transmission | percent |
|---|---|
| Semi-Auto | 43.279844 |
| Automatic | 33.280772 |
| Manual | 23.439384 |
2.4 Disable Sorting Using sort=False
The sort=False option tells Pandas to return counts in the order values appear in the column instead of sorting them.
Output
| transmission | count |
|---|---|
| Manual | 2527 |
| Automatic | 3588 |
| Semi-Auto | 4666 |
2.5 Sort by Index
Here, Pandas counts the values and then sorts them alphabetically by their index. That helps when you want an organized output rather than descending order.
Output
| transmission | count |
|---|---|
| Automatic | 3588 |
| Manual | 2527 |
| Semi-Auto | 4666 |
2.6 Reset the Index
The reset_index() function converts the Series into a DataFrame. The old index becomes a normal column, which makes the output easier to join with other tables or export.
Output
| index | transmission |
|---|---|
| Semi-Auto | 4666 |
| Automatic | 3588 |
| Manual | 2527 |
3. Count Multiple Columns Using value_counts()
You can count combinations by passing multiple columns. Pandas treats each unique pair as a separate category and counts how often it appears.
Output (truncated)
| year | transmission | count |
|---|---|---|
| 2019 | Semi-Auto | 2005 |
| 2019 | Automatic | 997 |
| 2016 | Automatic | 736 |
| 2017 | Semi-Auto | 672 |
This reveals that 2019 had the highest number of semi-automatic cars.
3.1 Get Proportion for Multiple Columns
Pandas now divides each pair count by the total number of rows and returns proportions instead of raw frequencies.
Output:
| year | transmission | proportion |
|---|---|---|
| 2019 | Semi-Auto | 0.185975 |
| 2019 | Automatic | 0.092478 |
| 2016 | Automatic | 0.068268 |
3.2 Percentage Format
This returns percentage values, making it easier to read and compare the categories.
Output:
| year | transmission | percent |
|---|---|---|
| 2019 | Semi-Auto | 18.59 |
| 2019 | Automatic | 9.24 |
3.3 Sort Index on Both Columns
sort_index() sorts based on the tuple index consisting of year and transmission. This gives neatly organized output ordered by both columns.
Output:
| year | transmission | percent |
|---|---|---|
| 1996 | Automatic | 0.009276 |
| 1997 | Manual | 0.009276 |
| 2020 | Semi-Auto | 5.509693 |
3.4 Reset Index and Sort
After resetting the index, the columns become normal DataFrame columns. Sorting them gives better control over the display order, such as sorting by year ascending and transmission descending.
Output:
| year | transmission | percent |
|---|---|---|
| 1996 | Automatic | 0.009276 |
| 1997 | Manual | 0.009276 |
| … | … | … |
| 2020 | Automatic | 0.704944 |
4. Understanding groupby()
The groupby() function groups rows based on a column and then performs calculations on each group. You can compute mean, median, sum, max, min, count, and more. Pandas separates the data into groups internally and then applies your chosen operation to each group independently.
4.1 Average Mileage per Transmission
All rows are grouped by transmission type in this code, which then chooses the mileage column within each group, computes the mean, and outputs the results in a new DataFrame.
Output:
| transmission | avg_mileage |
|---|---|
| Automatic | 31830.67 |
| Manual | 34720.48 |
| Semi-Auto | 15631.34 |
4.2 Reset Index
Resetting the index converts the group labels back into normal DataFrame columns.
4.3 Sort by Mileage
Sorting the result arranges the transmission types based on average mileage. This makes comparisons easier.
Manual cars show the highest average mileage among the three.
Output:
| transmission | avg_mileage |
|---|---|
| Manual | 34720.48 |
| Automatic | 31830.67 |
| Semi-Auto | 15631.34 |
Insight:
Manual cars have the highest mileage.
5. Groupby With Multiple Aggregations
A dictionary-style structure within agg() allows you to compute multiple statistics simultaneously. Pandas adds the outcomes to new output columns after applying each operation to the designated column.
For example, groupby helps you split data before applying calculations.
Here, Pandas groups by transmission type and calculates three different values. It takes the mean and median of mileage and counts the number of models in each group.
Output:
| transmission | avg_mileage | median_mileage | car_count |
|---|---|---|---|
| Manual | 34720.48 | 30000 | 2527 |
| Automatic | 31830.67 | 26193.5 | 3588 |
| Semi-Auto | 15631.34 | 9036.5 | 4666 |
6. Price and Engine Size Analysis
You can combine many aggregations across multiple columns. Pandas handles each operation one by one inside each transmission group.
Pandas divides the data by transmission first. The total number of cars, the average price, the median price, the highest and lowest prices, the sum of prices, and the average engine size for each transmission group are then determined.
Output:
| transmission | model_count | total_price | avg_price | max_price | min_price | avg_engine_size |
|---|---|---|---|---|---|---|
| Automatic | 3588 | 80439458 | 22419.02 | 99950 | 1200 | 2.25 |
| Manual | 2527 | 36988839 | 14637.45 | 46000 | 1445 | 1.83 |
| Semi-Auto | 4666 | 127660584 | 27359.74 | 123456 | 5299 | 2.27 |
7. Best Fuel Type for Mileage
You can group by fuel type to examine mileage performance across different fuel categories.
Fuel types are grouped by Pandas, which then calculates mileage statistics and sorts the results so that the fuel type with the best mileage shows up first.
Output:
| fuelType | avg_mileage | median_mileage |
|---|---|---|
| Other | 44760.63 | 44351.5 |
| Diesel | 29852.80 | 24600 |
| Hybrid | 23400.53 | 16900 |
| Electric | 19925.33 | 20321 |
| Petrol | 16524.09 | 9097 |
8. Multi-Column Groupby
You can group by more than one column at the same time. Pandas creates a separate group for every transmission and fuel type combination.
Pandas reads each unique pair of transmission and fuelType, groups rows accordingly, calculates the average price inside each group, and returns the results in a multi-index DataFrame.
Output:
| transmission | fuelType | avg_price |
|---|---|---|
| Automatic | Diesel | 21216.70 |
| Automatic | Petrol | 25364.46 |
| Manual | Diesel | 13722.74 |
| Semi-Auto | Petrol | 28209.39 |
| Semi-Auto | Hybrid | 30115.37 |
9. Multi Aggregation on Price and Mileage
Here, you calculate the average price, total price, and average mileage for each transmission and fuel type combination.
As a result, your summaries become cleaner and easier to compare.
Pandas handles each calculation separately but keeps the same group structure. The result gives deep insights into how transmission and fuel type together affect price and mileage.
Finally, you understand how to combine value_counts and groupby for deeper analysis.
Output:
| transmission | fuelType | avg_price | total_price | avg_mileage |
|---|---|---|---|---|
| Automatic | Diesel | 21216.70 | 52808380 | 36286.72 |
| Semi-Auto | Petrol | 28209.39 | 46601921 | 11222.51 |
Final Thought
You looked at how value_counts() facilitates working with multiple columns, formatting results, calculating proportions, and counting values. In addition to applying computations like mean, median, sum, count, and more, you learned how to group rows using the groupby() function. You also saw how multi-aggregation and multi-column grouping produce effective summaries. These features provide you with powerful tools for efficiently and rapidly analyzing complicated datasets.
Internal link:
Aggregate Functions in Pandas: Beginner’s Guide with Examples
Filtering in Pandas: Learn loc, iloc, isin(), and between()
Load Data in Pandas – A Complete Beginner’s Guide to Data Import
Pandas official documentation:
https://pandas.pydata.org/docs/ag
Conclusion:
Data science is transforming finance, from predictive analytics to personalized services. It’s not just a technology upgrade – it’s changing how the industry operates. Those who leverage data science will stay ahead in this data-driven era.
Want to know what else can be done by Data Science?
If you wish to learn more about data science or want to advance your career in the data science field, feel free to join our free workshop on Master’s in Data Science with Power BI, where you will get to know how exactly the data science field works and why companies are ready to pay handsome salaries in this field.
In this workshop, you will get to know each tool and technology from scratch, which will make you skillfully eligible for any data science profile.
To join this workshop, register yourself on ConsoleFlare, and we will call you back.
Thinking, Why Console Flare?
Recently, ConsoleFlare has been recognized as one of the Top 10 Most Promising Data Science Training Institutes of 2023.
Console Flare offers the opportunity to learn Data Science in Hindi, just like how you speak daily.
Console Flare believes in the idea of “What to learn and what not to learn,” and this can be seen in their curriculum structure. They have designed their program based on what you need to learn for data science and nothing else.
Want more reasons?
Register yourself on ConsoleFlare, and we will call you back.
Log in or sign up to view
See posts, photos, and more on Facebook.
value_counts and groupby in Pandas

