Python Jupyter Notebook is “used to develop and present data science projects.” The Jupyter Notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and rich media.
The interactive workflow promotes an iterative and rapid development, making notebooks an increasingly popular choice of contemporary data science, analysis, and increasingly science at large.
Getting Started With Jupyter Notebook
You can install Jupyter Notebook by installing Anaconda. I am using Macbook, but the procedure in Windows is almost the same.
Anaconda is the most popular Python distribution for Data Science and Machine Learning and comes pre-loaded with all the most popular libraries and tools.
Anaconda lets us hit the ground running in your own fully stocked data science workshop without the hassle of managing the many installations or worrying about OS-specific dependencies.
Installation of Jupyter Notebook
The installation process is straightforward, and after you install the Anaconda, you will see the screen below.
It is an Anaconda Navigator. The second option is a jupyter notebook, which we need to launch to work with Python. Let’s launch it, and your terminal will be opened, and it will start a jupyter notebook on a browser whose local URL is: http://localhost:8888/tree.
Congratulations!! You have installed it successfully.
Creating Your First Notebook
First, you need to select a project folder. I have selected mine in the desktop/code/pyt folder. For this project, I am using Python 3. That is why you will select your Python version to 3.
Now create a file whose extension will be .ipynb.
What is an ipynb File?
Each .ipynb file is the text file that describes your notebook’s contents in JSON format. Each cell and its contents, including image attachments converted into strings of text, are listed there with some metadata.
Jupyter Notebook interface
I have created a Jupyter Notebook file called DataScience.ipynb. It looks like the below image.
In Jupyter Notebook, Cells create the body of the notebook. In the below screenshot of a new notebook, the box with a green outline is the empty cell. There are mainly two main cell types that we will cover:
- The code cell contains code to be executed in the kernel and displays its output below.
- The Markdown cell contains the text formatted using a Markdown and displays its output in place when it is run.
The first cell in the new notebook is always the code cell. Let’s test it out with a classic hello world example. Type the following code inside the cell.
19 + 2
Now click the Run button in the toolbar above or press Ctrl + Enter. The result should look like this below.
The output is instantly shown in the next line. This is the beauty of the Jupyter Notebook.
After, you can add, remove or edit the cells according to your requirements. Also, don’t forget to insert explanatory text or titles and subtitles to clarify your code. That is what makes the notebook a real notebook in the end.
Running Jupyter Notebook The Pythonic Way: Pip
If you don’t want to install Anaconda, ensure you have the latest version of pip.
If you have installed Python, you will typically already have it. Now, upgrade your pip version if you have an old one. Type the following commands concerning your operating system.
# On Windows python -m pip install -U pip setuptools # On OS X or Linux pip install -U pip setuptools
Once you have pip installed on your machine, you can just run the following command.
# Python2 pip install jupyter # Python 3 pip3 install jupyter
Now that you know what you will be working with and installing it, it’s time to get started!
Run the following command to open up the application.
Then you’ll see the application opening in a web browser at the address: http://localhost:8888.
So, we have seen both ways to install Jupyter Notebook.
Data Analysis With Pandas and Jupyter Notebook
Download the DataSet for our example. You need to visit the following link.
It’s data of Summer Olympic medallists from 1896 to 2008. It is publicly available.
Now, open that link and save that file to data.csv inside the same project folder where the Jupyter file is. Make sure that both are in the same directory.
Okay, import that file and skip the first four rows of that file by doing the following code. The skiprows parameter indicated the Line numbers to skip (0-indexed) or some lines to skip (int) at the start of the file.
Here, we have written the three lines of code and got the data.
import pandas as pd olympicsData = pd.read_csv('data.csv', skiprows=4) olympicsData.head()
If you are getting the same data, then perfect, you are in the right direction and have successfully imported the data.
Access the DataFrames in Jupyter Notebook
The next step is to access the DataFrame from that data. Type the following code inside the notebook cell and hit the ctrl + enter.
As a result, you will see the first 30 rows and the last 30 rows.
Access the Series in Notebook
A series is a one-dimensional array of index data. To access the series from the Olympics data, we must pass the column name as an index and see the output. Let’s say we need to see all the Sports in the Olympics. Write the following code inside the cell.
See the output below.
So, this is how you can access the data from the CSV data file, and using different Python Pandas data structures, you can perform the operations on that data.
That’s it for this tutorial.