How to Convert Dictionary to DataFrame in Python

Pandas library is the popular Python package for data science and machine learning, and with good reason: it offers dominant, expressive, and flexible data structures that make data manipulation and analysis effortless, among many other things.

Python dictionary

Python dictionary is a collection that is unordered, changeable, and indexed.  Dictionaries are written with curly braces, and they have keys and values.

The Python dictionary is an unordered collection of items.

Python dataframe

Pandas DataFrame is one of these structures which helps us do mathematical computation very easily. The Dataframe is the two-dimensional data structure; for example, the data is aligned in a tabular fashion in rows and columns.

DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns).

Python dictionary to dataframe

To convert a dictionary to a dataframe in Python, use the pd.dataframe() constructor. DataFrame constructor accepts the data object that can be ndarray, or dictionary.

Pandas DataFrame can contain the following data type of data.

  1. The Pandas Series is a one-dimensional labeled array that holds any data type with axis labels or indexes. An example of a Series object is one column from a DataFrame.
  2. The NumPy ndarray, which can be a record or structure.
  3. The two-dimensional ndarray using NumPy.
  4. Dictionaries of one-dimensional arrays, lists, dictionaries, or series.

Syntax of DataFrame constructor

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

If we pass the dictionary in data, it should contain the list like objects in the value field like Series, arrays or lists, etc.

Let’s initialize the following dictionary.

# app.py

StrangerThings = {
    'name': ['Millie', 'Finn', 'Gaten'],
    'age': [15, 16, 16],
    'city': ['London', 'Vancouver', 'New york']
}

We must import the pandas library and convert a dictionary to the DataFrame using the Pandas.dataframe() function.

See the following code.

# app.py

import pandas as pd

StrangerThings = {
    'name': ['Millie', 'Finn', 'Gaten'],
    'age': [15, 16, 16],
    'city': ['London', 'Vancouver', 'New york']
}

dataFrameObj = pd.DataFrame(StrangerThings)
print(dataFrameObj)

Output

➜  pyt python3 app.py
     name  age       city
0  Millie   15     London
1    Finn   16  Vancouver
2   Gaten   16   New york
➜  pyt

On Initialising the DataFrame object with this kind of dictionary, each item (Key / Value pair) in the dictionary will be converted to one column, i.e., the key will become the Column Name, and the list in the Value field will be the column data.

All the keys in the dictionary will be converted to the column names and lists in each value field to the column Data.

Python dict to DataFrame with custom indexes

We can also pass an index list to the DataFrame constructor to replace the default index list.

# app.py

import pandas as pd

StrangerThings = {
    'name': ['Millie', 'Finn', 'Gaten'],
    'age': [15, 16, 16],
    'city': ['London', 'Vancouver', 'New york']
}

dataFrameObj = pd.DataFrame(StrangerThings, index=['m', 'f', 'g'])
print(dataFrameObj)

We have passed the index parameter with the list of m, f, g.

Output

➜  pyt python3 app.py
     name  age       city
m  Millie   15     London
f    Finn   16  Vancouver
g   Gaten   16   New york
➜  pyt

Create DataFrame from Dictionary

DataFrame constructor accepts the dictionary that should contain a list of objects in values. But what if we have a dictionary that doesn’t have lists in value? Then how it gives an output.

Let’s understand with an example.

# app.py

import pandas as pd

StrangerThings = {
    'millie': 15,
    'finn': 16,
    'gaten': 16
}

dataFrameObj = pd.DataFrame(StrangerThings)
print(dataFrameObj)

Okay, now run the file.

➜  pyt python3 app.py
Traceback (most recent call last):
  File "app.py", line 9, in <module>
    dataFrameObj = pd.DataFrame(StrangerThings)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 348, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 459, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 7315, in _arrays_to_mgr
    index = extract_index(arrays)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 7352, in extract_index
    raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
➜  pyt

It gives the ValueError: If using all scalar values, you must pass an index

So, the question is how to create a two-column DataFrame object from this kind of dictionary and put all keys and values in these separate columns.

We will create a list of tuples (key / value) from this dictionary and pass it to another dataframe constructor that accepts the list.

See the following code.

# app.py

import pandas as pd

StrangerThings = {
    'millie': 15,
    'finn': 16,
    'gaten': 16
}

dataFrameObj = pd.DataFrame(
    list(StrangerThings.items()), index=['m', 'f', 'g'])
print(dataFrameObj)

Output

➜  pyt python3 app.py
        0   1
m  millie  15
f    finn  16
g   gaten  16
➜  pyt

That means we have created a dataframe from an unusual dictionary.

Create DataFrame from Dictionary and skip data

If we want to create the DataFrame object from the dictionary by skipping some of the items. Let’s see how to do that.

# app.py

import pandas as pd

StrangerThings = {
    'name': ['Millie', 'Finn', 'Gaten'],
    'age': [15, 16, 16],
    'city': ['London', 'Vancouver', 'New york']
}

dataFrameObj = pd.DataFrame(StrangerThings, columns=['name', 'city'])
print(dataFrameObj)

In the above code, we pass the columns parameter, which holds the column names we need to get while converting a dictionary to the dataframe.

We can skip the columns by explicitly defining the column names, which we need to include in the dataframe.

Output

➜  pyt python3 app.py
     name       city
0  Millie     London
1    Finn  Vancouver
2   Gaten   New york
➜  pyt

We provided a list with only two column names as in the columns parameter. So, DataFrame should contain only two columns.

Create DataFrame from nested Dictionary

Let’s say we have the following dictionary.

StrangerThings = {
    0: {
        'name': 'Millie',
        'age': 15,
        'city': 'London'
    },
    1: {
        'name': 'Finn',
        'age': 16,
        'city': 'Vancouver'
    },
    2: {
        'name': 'Gaten',
        'age': 16,
        'city': 'New York'
    }
}

Let’s write the code that converts this nested Dictionary to DataFrame.

# app.py

import pandas as pd

StrangerThings = {
    0: {
        'name': 'Millie',
        'age': 15,
        'city': 'London'
    },
    1: {
        'name': 'Finn',
        'age': 16,
        'city': 'Vancouver'
    },
    2: {
        'name': 'Gaten',
        'age': 16,
        'city': 'New York'
    }
}

dataFrameObj = pd.DataFrame(StrangerThings)
dfObj = dataFrameObj.transpose()
print(dfObj)

In the above example, we used the DataFrame() and transpose() functions to convert the nested dict to pandas dataframe.

The transpose() function of the matrix is used to swap the column with indexes so that data will be more readable.

Output

➜  pyt python3 app.py
  age       city    name
0  15     London  Millie
1  16  Vancouver    Finn
2  16   New York   Gaten
➜  pyt

So, we have seen multiple variations of creating the DataFrame from Dictionary.

That’s it.

Related Posts

Python dict to json

Python dict to string

Python dict to list

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.