Site icon Console Flare Blog

Filtering in Pandas: Learn loc, iloc, isin(), and between()

Filtering in Pandas: Learn loc, iloc, isin(), and between()

Filtering in Pandas: Learn loc, iloc, isin(), and between()

Filtering in Pandas: Learn loc, iloc, isin(), and between()

Filtering in Pandas is a key part of analyzing data. This approach makes it much easier to find your way around and understand your data by letting you choose specific rows or columns based on certain conditions.

You might need to get certain information from a DataFrame in Python instead of using the whole thing. This is called filtering, and it lets you rapidly discover the rows or columns that are important to you.

Pandas makes filtering easier by giving you two options: loc and iloc.


What is Filtering in Pandas?

Filtering is the process of picking out data that meets a certain need.
For instance, finding customers over 30 or sales from a certain group.

Filtering is useful:

Pay attention to data that matters

Before analyzing, clean up the datasets.

Make results faster and clearer


Setting Up the Dataset

Download Sample Dataset: retail_sales_dataset

We’ll use a sample file called retail_sales_dataset.csv.

Python

import pandas as pd 
df = pd.read_csv('retail_sales_dataset.csv') 
print(df.head())

Output:


1. Filtering with iloc—Based on where the index is

Index location is what iloc stands for.
It uses the numeric index to filter rows or columns.

Example 1: Get one row

Python

df.iloc[0]

displays information from row one (index 0).


Example 2: Get multiple rows in a Data Frame

You should simply specify a list of indexes in iloc to obtain multiple rows in the dataframe.

Python

df.iloc[[0, 3, 5]]

displays rows that have the indexes 0, 3, and 5.

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  0 | CUST001     | Male   |  34 | Beauty           |        3 |             50 |
|  3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|  5 | CUST006     | Female |  45 | Beauty           |        1 |             30 |

Example 3: Slice rows

You should simply use index slicing, as we did in Python, to slice multiple rows in a dataframe.

Python
df.iloc[2:5]

Gives rows from index 2 to 4.

Output

|     | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| --: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|   0 | CUST001     | Male   |  34 | Beauty           |        3 |             50 |
|   1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|   2 | CUST003     | Male   |  50 | Electronics      |        1 |             30 |
|   3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|   4 | CUST005     | Male   |  30 | Beauty           |        2 |             50 |
| ... | ...         | ...    | ... | ...              |      ... |            ... |
| 995 | CUST996     | Male   |  62 | Clothing         |        1 |             50 |
| 996 | CUST997     | Male   |  52 | Beauty           |        3 |             30 |
| 997 | CUST998     | Female |  23 | Beauty           |        4 |             25 |
| 998 | CUST999     | Female |  36 | Electronics      |        3 |             50 |
| 999 | CUST1000    | Male   |  47 | Electronics      |        4 |             30 |

Example 4: Get a single value

A single value has two indexes. To get a single value, you need to specify the row and column index.

Python
df.iloc[0, 0]

provides the value found in the first column and row.


Example 5: Get random rows and columns

To get random rows and random columns from dataframe, you should specify the list of rows and list of columns.

Python

df.iloc[[0, 3, 5], [0, 2, 4]]

Output

|    | Customer ID | Age | Quantity |
| -: | ----------- | --: | -------: |
|  0 | CUST001     |  34 |        3 |
|  3 | CUST004     |  37 |        1 |
|  5 | CUST006     |  45 |        1 |

Example 6: Slice rows and columns

You can use slicing of rows and columns in dataframe.

Python
df.iloc[0:3, 2:4]

displays rows 0–2 and columns 2–3 (Product Category, Age).

Output

|    | Age | Product Category |
| -: | --: | ---------------- |
|  0 |  34 | Beauty           |
|  1 |  26 | Clothing         |
|  2 |  50 | Electronics      |

Example 7: Get a full column

You can get single column by slicing index of all rows and single column.

df.iloc[:, 1]
Shows every piece of information from the “Gender” column.

2. Using loc for filtering based on conditions

When filtering based on values rather than index numbers, loc is utilized.
Logical conditions are applicable to any column.

Syntax:

df.loc[condition]

Example 1: Gender = Male

df.loc[df['Gender'] == 'Male']

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  0 | CUST001     | Male   |  34 | Beauty           |        3 |             50 |
|  2 | CUST003     | Male   |  50 | Electronics      |        1 |             30 |
|  3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|  4 | CUST005     | Male   |  30 | Beauty           |        2 |             50 |
|  6 | CUST007     | Male   |  46 | Clothing         |        2 |             25 |

Example 2: Age > 50

df.loc[df['Age'] > 50]

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  8 | CUST009     | Male   |  63 | Electronics      |        2 |            300 |
|  9 | CUST010     | Female |  52 | Clothing         |        4 |             50 |
| 13 | CUST014     | Male   |  64 | Clothing         |        4 |             30 |
| 18 | CUST019     | Female |  62 | Clothing         |        2 |             25 |
| 24 | CUST025     | Female |  64 | Beauty           |        1 |             50 |

Example 3: Product Category != Electronics

df.loc[df['Product Category'] != 'Electronics']

Multiple Conditions in Filtering

Logical operators can be used to combine several conditions:

Symbol Meaning Example
& and (df['Gender']=='Male') & (df['Age']>50)
` ` or
!= not equal df['Category'] != 'Electronics'
~ NOT ~(df['Gender']=='Male')

Example 1: Male and Age > 50

df.loc[(df['Gender']=='Male') & (df['Age']>50)]

Output

|     | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| --: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|   8 | CUST009     | Male   |  63 | Electronics      |        2 |            300 |
|  13 | CUST014     | Male   |  64 | Clothing         |        4 |             30 |
|  35 | CUST036     | Male   |  52 | Beauty           |        3 |            300 |
|  47 | CUST048     | Male   |  54 | Electronics      |        3 |            300 |
|  58 | CUST059     | Male   |  62 | Clothing         |        1 |             50 |
| ... | ...         | ...    | ... | ...              |      ... |            ... |
| 969 | CUST970     | Male   |  59 | Electronics      |        4 |            500 |
| 972 | CUST973     | Male   |  60 | Clothing         |        1 |             50 |
| 983 | CUST984     | Male   |  56 | Clothing         |        1 |            500 |
| 995 | CUST996     | Male   |  62 | Clothing         |        1 |             50 |
| 996 | CUST997     | Male   |  52 | Beauty           |        3 |             30 |

Example 2: Age between 20 and 30

df.loc[(df['Age']>20) & (df['Age']<30)]

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
| 10 | CUST011     | Male   |  23 | Clothing         |        2 |             50 |
| 12 | CUST013     | Male   |  22 | Electronics      |        3 |            500 |
| 16 | CUST017     | Female |  27 | Clothing         |        4 |             25 |
| 19 | CUST020     | Male   |  22 | Clothing         |        3 |            300 |

Example 3: Category Clothing or Electronics

df.loc[(df['Product Category']=='Clothing') | (df['Product Category']=='Electronics')]

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|  2 | CUST003     | Male   |  50 | Electronics      |        1 |             30 |
|  3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|  6 | CUST007     | Male   |  46 | Clothing         |        2 |             25 |
|  7 | CUST008     | Male   |  30 | Electronics      |        4 |             25 |

Example 4: Category not Electronics

df.loc[df['Product Category']!='Electronics']

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  0 | CUST001     | Male   |  34 | Beauty           |        3 |             50 |
|  1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|  3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|  4 | CUST005     | Male   |  30 | Beauty           |        2 |             50 |
|  5 | CUST006     | Female |  45 | Beauty           |        1 |             30 |

Example 5: Not Male and not Electronics

df.loc[~((df['Gender']=='Male') & (df['Product Category']=='Electronics'))]

Output

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  0 | CUST001     | Male   |  34 | Beauty           |        3 |             50 |
|  1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|  3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|  4 | CUST005     | Male   |  30 | Beauty           |        2 |             50 |
|  5 | CUST006     | Female |  45 | Beauty           |        1 |             30 |

Useful Filtering Functions

1. isin() –  Check Multiple Values

df.loc[df['Product Category'].isin(['Electronics', 'Clothing'])]

Ouput

|    | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| -: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|  1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|  2 | CUST003     | Male   |  50 | Electronics      |        1 |             30 |
|  3 | CUST004     | Male   |  37 | Clothing         |        1 |            500 |
|  6 | CUST007     | Male   |  46 | Clothing         |        2 |             25 |
|  7 | CUST008     | Male   |  30 | Electronics      |        4 |             25 |

2. between() – Filter by Range

df.loc[df['Age'].between(20, 30)]

Output

|     | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| --: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|   1 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|   4 | CUST005     | Male   |  30 | Beauty           |        2 |             50 |
|   7 | CUST008     | Male   |  30 | Electronics      |        4 |             25 |
|  10 | CUST011     | Male   |  23 | Clothing         |        2 |             50 |
|  12 | CUST013     | Male   |  22 | Electronics      |        3 |            500 |
| ... | ...         | ...    | ... | ...              |      ... |            ... |
| 970 | CUST971     | Female |  27 | Electronics      |        4 |             50 |
| 980 | CUST981     | Female |  30 | Electronics      |        2 |             30 |
| 982 | CUST983     | Female |  29 | Clothing         |        1 |            300 |
| 986 | CUST987     | Female |  30 | Clothing         |        3 |            300 |
| 997 | CUST998     | Female |  23 | Beauty           |        4 |             25 |

3. reset_index() – Clean Up After Filtering

Filtering causes the indexes to lose their serial order. The dataframe’s index can be reset using reset_index().

df.reset_index(drop=True, inplace=True)

Output

|     | Customer ID | Gender | Age | Product Category | Quantity | Price per Unit |
| --: | ----------- | ------ | --: | ---------------- | -------: | -------------: |
|   0 | CUST002     | Female |  26 | Clothing         |        2 |            500 |
|   1 | CUST005     | Male   |  30 | Beauty           |        2 |             50 |
|   2 | CUST008     | Male   |  30 | Electronics      |        4 |             25 |
|   3 | CUST011     | Male   |  23 | Clothing         |        2 |             50 |
|   4 | CUST013     | Male   |  22 | Electronics      |        3 |            500 |
| ... | ...         | ...    | ... | ...              |      ... |            ... |
| 226 | CUST971     | Female |  27 | Electronics      |        4 |             50 |
| 227 | CUST981     | Female |  30 | Electronics      |        2 |             30 |
| 228 | CUST983     | Female |  29 | Clothing         |        1 |            300 |
| 229 | CUST987     | Female |  30 | Clothing         |        3 |            300 |
| 230 | CUST998     | Female |  23 | Beauty           |        4 |             25 |

Update records

Let’s say you wish to raise Beauty’s price from 50 to 100. The value can be updated using the.loc property.

Python

df.loc[df['Product Category'] == 'Beauty', 'Price per Unit'] = 100
df.head()

Output

Customer ID Gender Age Product Category Quantity Price per Unit
CUST001 Male 34 Beauty 3 50
CUST002 Female 26 Clothing 2 500
CUST003 Male 50 Electronics 1 30
CUST004 Female 22 Beauty 5 20
CUST005 Male 60 Clothing 2 700

Additionally, you can use more than one condition. Let’s say you wish to raise the price of a male clothing product from 500 to 1,000.

Python

df.loc[(df['Gender'] == 'Male') & (df['Product Category'] == 'Clothing'), 'Price per Unit'] = 1000
df.head()

Output

Customer ID Gender Age Product Category Quantity Price per Unit
CUST001 Male 34 Beauty 3 100
CUST002 Female 26 Clothing 2 500
CUST003 Male 50 Electronics 1 30
CUST004 Female 22 Beauty 5 1000
CUST005 Male 60 Clothing 2 100

 


Common Mistakes

Missing parentheses in multiple conditions
df['Age'] > 20 & df['Age'] < 30
(df['Age'] > 20) & (df['Age'] < 30)

Using and / or instead of & / |

Forgetting df.loc[] before conditions

Related Post: 

Load Data in Pandas – A Complete Beginner’s Guide to Data Import


Summary

For index-based filtering, utilize iloc. Utilize loc for filtering based on conditions.Employ logical operators to combine filters.Employ helper functions such as between() and isin().Reset the index after filtering every time.Filtering makes it easier to swiftly extract the appropriate data. You can handle the majority of real-world datasets with ease once you comprehend how loc and iloc function.

For more such content and regular updates, follow us on Facebook, Instagram, and LinkedIn

Conclusion:

The fusion of data science in the finance sector is not just a technological evolution but also a fundamental shift in the way the financial industry operates. From predictive analytics to personalized financial services, the applications of data science are reshaping traditional practices and opening up new possibilities. As we all move forward, the synergy between finance and data science will continue to evolve, creating a more robust, efficient, and resilient financial ecosystem. In this data-driven era, those who embrace the power of data science will be at the forefront of innovations and success in the world of finance. Want to know what else can be done by Data Science? If you wish to learn more about data science or want to advance your career in the data science field, feel free to join our free workshop on Masters in Data Science with PowerBI, where you will get to know how exactly the data science field works and why companies are ready to pay handsome salaries in this field. In this workshop, you will get to know each tool and technology from scratch, which will make you skillfully eligible for any data science profile .To join this workshop, register yourself on ConsoleFlare, and we will call you back.

Thinking, Why Console Flare?

Recently, ConsoleFlare has been recognized as one of the Top 10 Most Promising Data Science Training Institutes of 2023.

Console Flare offers the opportunity to learn Data Science in Hindi, just like how you speak daily.

Console Flare believes in the idea of “What to learn and what not to learn,” and this can be seen in their curriculum structure. They have designed their program based on what you need to learn for data science and nothing else.

Want more reasons?

Register yourself on ConsoleFlare, and we will call you back.

Log in or sign up to view
See posts, photos, and more on Facebook.

Console Flare

Exit mobile version