Python Pandas: How To Iterate Columns In DataFrame
Python Pandas Data frame is the two-dimensional data structure in which the data is aligned in the tabular fashion in rows and columns. The DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns). In this example, we will see different ways to iterate over all or specific columns of a Dataframe.
Pandas iterate over columns
Python Pandas DataFrame consists of rows and columns so, to iterate DataFrame, we have to iterate the DataFrame like a dictionary. In the dictionary, we iterate over the keys of the object in the same way we have to iterate in the Dataframe.
In Pandas Dataframe, we can iterate an item in two ways:
- Iterating over rows
- Iterating over columns
Let’s create a DataFrame first.
# app.py import pandas as pd tigerking = [ ('Joe', 'Exotic', 7), ('Carole', 'Baskin', 7), ('Howard', 'Baskin', 6), ('John', 'Finlay', 6), ('Bhagavan', 'Antle', 6), ] df = pd.DataFrame(tigerking, columns=['First Name', 'Last Name', 'Total Episodes']) print(df)
Output
python3 app.py First Name Last Name Total Episodes 0 Joe Exotic 7 1 Carole Baskin 7 2 Howard Baskin 6 3 John Finlay 6 4 Bhagavan Antle 6
Iterate columns in Dataframe using column names
DataFrame.columns returns the sequence of column names.
We can iterate these column names, and for each column name, we can select the column contents by column name.
See the following code.
# app.py import pandas as pd tigerking = [ ('Joe', 'Exotic', 7), ('Carole', 'Baskin', 7), ('Howard', 'Baskin', 6), ('John', 'Finlay', 6), ('Bhagavan', 'Antle', 6), ] df = pd.DataFrame(tigerking, columns=['First Name', 'Last Name', 'Total Episodes']) for column in df: # Select column contents by column name using [] operator columnObj = df print('Colunm Name : ', column) print('Column Contents : ', columnObj.values)
Output
python3 app.py Colunm Name : First Name Column Contents : ['Joe' 'Carole' 'Howard' 'John' 'Bhagavan'] Colunm Name : Last Name Column Contents : ['Exotic' 'Baskin' 'Baskin' 'Finlay' 'Antle'] Colunm Name : Total Episodes Column Contents : [7 7 6 6 6]
If you analyze the output, then you can see that first, we have gotten the column name, and then we got the content of the columns in the form of the list.
In this approach, you don’t need to use any method to iterate the columns. In the next approach, we will see a function to iterate the columns.
How to Iterate columns using DataFrame.iteritems()
DataFrame class provides a member function iteritems(). It yields an iterator that can be used to iterate all the columns of the dataframe.
For each column in a DataFrame, it returns the iterator to the tuple containing the column name and column contents as Series.
DataFrame iteritems() function is used to iterator over (column name, Series) pairs.
It iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.
The column names for the DataFrame is being iterated over.
Let’s apply the Pandas DataFrame iteritems() function.
# app.py import pandas as pd tigerking = [ ('Joe', 'Exotic', 7), ('Carole', 'Baskin', 7), ('Howard', 'Baskin', 6), ('John', 'Finlay', 6), ('Bhagavan', 'Antle', 6), ] df = pd.DataFrame(tigerking, columns=['First Name', 'Last Name', 'Total Episodes']) for (columnName, columnData) in df.iteritems(): print('Colunm Name : ', columnName) print('Column Values : ', columnData.values)
Output
Colunm Name : First Name Column Values : ['Joe' 'Carole' 'Howard' 'John' 'Bhagavan'] Colunm Name : Last Name Column Values : ['Exotic' 'Baskin' 'Baskin' 'Finlay' 'Antle'] Colunm Name : Total Episodes Column Values : [7 7 6 6 6]
Here, you can see that we are getting the first column name and then get the list of values of that column.
How to Iterate specific columns in DataFrame
Let’s say we have a scenario in which we have to select those columns only from DataFrame and then iterate over them. Let’s tackle that issue.
# app.py import pandas as pd tigerking = [ ('Joe', 'Exotic', 7), ('Carole', 'Baskin', 7), ('Howard', 'Baskin', 6), ('John', 'Finlay', 6), ('Bhagavan', 'Antle', 6), ] df = pd.DataFrame(tigerking, columns=['First Name', 'Last Name', 'Total Episodes']) for column in df[['First Name', 'Total Episodes']]: # Select column contents by column name using [] operator columnsObj = df print('Colunm Name : ', column) print('Column Contents : ', columnsObj.values)
Output
python3 app.py Colunm Name : First Name Column Contents : ['Joe' 'Carole' 'Howard' 'John' 'Bhagavan'] Colunm Name : Total Episodes Column Contents : [7 7 6 6 6]
We have selected two columns, and in the output, we got the two columns with their values.
Iterate columns in dataframe by index using iloc[]
We can iterate over the columns of the Dataframe using an index.
For example, we can iterate over a range i.e., 0 to Max number of columns; then, for each index, we can select the column contents using iloc[].
See the following code.
# app.py import pandas as pd tigerking = [ ('Joe', 'Exotic', 7), ('Carole', 'Baskin', 7), ('Howard', 'Baskin', 6), ('John', 'Finlay', 6), ('Bhagavan', 'Antle', 6), ] df = pd.DataFrame(tigerking, columns=['First Name', 'Last Name', 'Total Episodes']) for index in range(df.shape[1]): print('Column Number : ', index) # Select column by index position using iloc[] columnsObj = df.iloc[: , index] print('Column Contents : ', columnsObj.values)
Output
python3 app.py Column Number : 0 Column Contents : ['Joe' 'Carole' 'Howard' 'John' 'Bhagavan'] Column Number : 1 Column Contents : ['Exotic' 'Baskin' 'Baskin' 'Finlay' 'Antle'] Column Number : 2 Column Contents : [7 7 6 6 6]
In the above code, we didn’t output the name of the column, but instead, we have printed the index of the column and then the content of the column.
Finally, Pandas iterate over columns example is over.