Pandas is a software library written for the Python programming language for data manipulation and analysis. It is used for working with relational or labeled data easily and intuitively.
Key Features of Pandas
Here are the key features:
- Pandas library is a fast and efficient DataFrame object with default and customized indexing.
- Pandas library helps load the data into in-memory objects from different file formats.
- It has functions that deal with Data alignment and integrate the handling of missing data.
- Using Pandas, we can reshape and pivot the data sets.
- It has Label-based slicing, indexing, and subsetting of more massive datasets.
- Pandas can insert or delete Columns from the data structure.
- We can use Pandas for data aggregation and transformations.
- It gives the High-performance merging and joining of data.
- Time Series functionality.
Install Pandas on Mac
Follow this step if you have not installed them previously on your machine.
You can install via PyPI using the following command.
python3 -m pip install --upgrade pandas
To upgrade the version, you can use the following command.
python3 -m pip install --upgrade pandas==0.23.0
Make sure; you install it with proper permission, such as using sudo if you are on Linux or Mac.
Standard Python distribution does not come with the Pandas module. An alternative way is to install NumPy using a popular Python package installer, pip.
If you have installed a software package like Anaconda, then Pandas have already been installed.
Pandas Data Structure
Pandas deal with the following two data structures.
DataFrames
DataFrames allow you to store and manipulate the tabular data in rows of observations and columns of variables.
DataFrames in Python are very similar as they come with the Pandas library, and they are defined as two-dimensional labeled data structures with columns of potentially different types.
Features of DataFrame
- Potentially columns are of different types
- Size – Mutable
- Labeled axes (rows and columns)
- Can Perform Arithmetic operations on rows and columns
A pandas DataFrame can be created using the following constructor.
pandas.DataFrame( data, index, columns, dtype, copy)
Example
import pandas as pd
import numpy as np
data = [['Krunal', 21],['Rushikesh', 22],['Hardik',30]]
df = pd.DataFrame(data, columns=['Name', 'Enrollment Number'])
print(df)
Output
In the above example, we have taken the data: Name and Enrollment Number. For that data, we have used the NumPy library.
Then, we passed that data to the DataFrame and created a tabular data structure.
Series
Series is the one-dimensional labeled array capable of holding data of any data type like integer, string, float, Python objects, etc. The axis labels are collectively called indices.
Labels need not be unique but must be a hashable type. The object supports integer and label-based indexing and provides various methods for performing operations involving the index.
The syntax of Series in Pandas is the following.
pandas.Series( data, index, dtype, copy)
Example
import pandas as pd
data = [1, 2, 3, 4, 5, 6, 7]
df = pd.Series(data)
print(df)
Run the file and see the output.
That’s it.
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.