Pandas library is the popular Python package for data science and machine learning, and with good reason: it offers dominant, expressive, and flexible data structures that make data manipulation and analysis effortless, among many other things.
Python dictionary
Python dictionary is a collection that is unordered, changeable, and indexed. Dictionaries are written with curly braces, and they have keys and values.
The Python dictionary is an unordered collection of items.
Python dataframe
Pandas DataFrame is one of these structures which helps us do mathematical computation very easily. The Dataframe is the two-dimensional data structure; for example, the data is aligned in a tabular fashion in rows and columns.
DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns).
Python dictionary to dataframe
To convert a dictionary to a dataframe in Python, use the pd.dataframe() constructor. DataFrame constructor accepts the data object that can be ndarray, or dictionary.
Pandas DataFrame can contain the following data type of data.
- The Pandas Series is a one-dimensional labeled array that holds any data type with axis labels or indexes. An example of a Series object is one column from a DataFrame.
- The NumPy ndarray, which can be a record or structure.
- The two-dimensional ndarray using NumPy.
- Dictionaries of one-dimensional arrays, lists, dictionaries, or series.
Syntax of DataFrame constructor
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
If we pass the dictionary in data, it should contain the list like objects in the value field like Series, arrays or lists, etc.
Let’s initialize the following dictionary.
# app.py StrangerThings = { 'name': ['Millie', 'Finn', 'Gaten'], 'age': [15, 16, 16], 'city': ['London', 'Vancouver', 'New york'] }
We must import the pandas library and convert a dictionary to the DataFrame using the Pandas.dataframe() function.
See the following code.
# app.py import pandas as pd StrangerThings = { 'name': ['Millie', 'Finn', 'Gaten'], 'age': [15, 16, 16], 'city': ['London', 'Vancouver', 'New york'] } dataFrameObj = pd.DataFrame(StrangerThings) print(dataFrameObj)
Output
➜ pyt python3 app.py name age city 0 Millie 15 London 1 Finn 16 Vancouver 2 Gaten 16 New york ➜ pyt
On Initialising the DataFrame object with this kind of dictionary, each item (Key / Value pair) in the dictionary will be converted to one column, i.e., the key will become the Column Name, and the list in the Value field will be the column data.
All the keys in the dictionary will be converted to the column names and lists in each value field to the column Data.
Python dict to DataFrame with custom indexes
We can also pass an index list to the DataFrame constructor to replace the default index list.
# app.py import pandas as pd StrangerThings = { 'name': ['Millie', 'Finn', 'Gaten'], 'age': [15, 16, 16], 'city': ['London', 'Vancouver', 'New york'] } dataFrameObj = pd.DataFrame(StrangerThings, index=['m', 'f', 'g']) print(dataFrameObj)
We have passed the index parameter with the list of m, f, g.
Output
➜ pyt python3 app.py name age city m Millie 15 London f Finn 16 Vancouver g Gaten 16 New york ➜ pyt
Create DataFrame from Dictionary
DataFrame constructor accepts the dictionary that should contain a list of objects in values. But what if we have a dictionary that doesn’t have lists in value? Then how it gives an output.
Let’s understand with an example.
# app.py import pandas as pd StrangerThings = { 'millie': 15, 'finn': 16, 'gaten': 16 } dataFrameObj = pd.DataFrame(StrangerThings) print(dataFrameObj)
Okay, now run the file.
➜ pyt python3 app.py Traceback (most recent call last): File "app.py", line 9, in <module> dataFrameObj = pd.DataFrame(StrangerThings) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 348, in __init__ mgr = self._init_dict(data, index, columns, dtype=dtype) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 459, in _init_dict return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 7315, in _arrays_to_mgr index = extract_index(arrays) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 7352, in extract_index raise ValueError('If using all scalar values, you must pass' ValueError: If using all scalar values, you must pass an index ➜ pyt
It gives the ValueError: If using all scalar values, you must pass an index
So, the question is how to create a two-column DataFrame object from this kind of dictionary and put all keys and values in these separate columns.
We will create a list of tuples (key / value) from this dictionary and pass it to another dataframe constructor that accepts the list.
See the following code.
# app.py import pandas as pd StrangerThings = { 'millie': 15, 'finn': 16, 'gaten': 16 } dataFrameObj = pd.DataFrame( list(StrangerThings.items()), index=['m', 'f', 'g']) print(dataFrameObj)
Output
➜ pyt python3 app.py 0 1 m millie 15 f finn 16 g gaten 16 ➜ pyt
That means we have created a dataframe from an unusual dictionary.
Create DataFrame from Dictionary and skip data
If we want to create the DataFrame object from the dictionary by skipping some of the items. Let’s see how to do that.
# app.py import pandas as pd StrangerThings = { 'name': ['Millie', 'Finn', 'Gaten'], 'age': [15, 16, 16], 'city': ['London', 'Vancouver', 'New york'] } dataFrameObj = pd.DataFrame(StrangerThings, columns=['name', 'city']) print(dataFrameObj)
In the above code, we pass the columns parameter, which holds the column names we need to get while converting a dictionary to the dataframe.
We can skip the columns by explicitly defining the column names, which we need to include in the dataframe.
Output
➜ pyt python3 app.py name city 0 Millie London 1 Finn Vancouver 2 Gaten New york ➜ pyt
We provided a list with only two column names as in the columns parameter. So, DataFrame should contain only two columns.
Create DataFrame from nested Dictionary
Let’s say we have the following dictionary.
StrangerThings = { 0: { 'name': 'Millie', 'age': 15, 'city': 'London' }, 1: { 'name': 'Finn', 'age': 16, 'city': 'Vancouver' }, 2: { 'name': 'Gaten', 'age': 16, 'city': 'New York' } }
Let’s write the code that converts this nested Dictionary to DataFrame.
# app.py import pandas as pd StrangerThings = { 0: { 'name': 'Millie', 'age': 15, 'city': 'London' }, 1: { 'name': 'Finn', 'age': 16, 'city': 'Vancouver' }, 2: { 'name': 'Gaten', 'age': 16, 'city': 'New York' } } dataFrameObj = pd.DataFrame(StrangerThings) dfObj = dataFrameObj.transpose() print(dfObj)
In the above example, we used the DataFrame() and transpose() functions to convert the nested dict to pandas dataframe.
The transpose() function of the matrix is used to swap the column with indexes so that data will be more readable.
Output
➜ pyt python3 app.py age city name 0 15 London Millie 1 16 Vancouver Finn 2 16 New York Gaten ➜ pyt
So, we have seen multiple variations of creating the DataFrame from Dictionary.
That’s it.