banner



How To Create A Dataframe Python

In this post, you'll learn how to create a Pandas dataframe from lists, including how to work with single lists, multiple lists, and lists of lists! You'll also learn how to work with creating an index and providing column names. Knowing these skills is an important skill when working with data coming from different sources, such as via web scraping.

The Quick Answer: Use the DataFrame() Class to Create Dataframes

Quick Answer - Create Pandas Dataframe from Lists
How to create a Pandas dataframe from lists!

Let's take a look at what you'll learn!

The Pandas dataframe() object – A Quick Overview

The pandas Dataframe class is described as a two-dimensional, size-mutable, potentially heterogeneous tabular data. This, in plain-language, means:

  • two-dimensional means that it contains rows and columns
  • size-mutable means that its size can change
  • potentially heterogeneous means that it can contain different datatypes

You can create an empty dataframe by simply writing df = pd.DataFrame(), which creates an empty dataframe object. We've covered creating an empty dataframe before, and how to append data to it. But in this tutorial, you won't be creating an empty dataframe.

Instead, you can use the data= parameter, which, positionally is the first argument. The data= parameter can contain an ndarray, a dictionary, a list, or a list-like object. Because of these many options, lets see how you can create a dataframe from Pandas lists!

Create a Pandas Dataframe from a Single List

Now that you have an understanding of what the pandas DataFrame class is, lets take a look at how we can create a Pandas dataframe from a single list.

Recall, that the data= parameter is the parameter used to pass in data. Because the data= parameter is the first parameter, we can simply pass in a list without needing to specify the parameter.

Let's take a look at passing in a single list to create a Pandas dataframe:

import pandas as pd  names = ['Katie', 'Nik', 'James', 'Evan'] df = pd.DataFrame(names)  print(df)

This returns a dataframe that looks like this:

          0 0  Katie 1    Nik 2  James 3   Evan

Specifying Column Names when Creating a Pandas Dataframe

We can see that Pandas has successfully created our dataframe, but that our column is unnamed. Since Pandas doesn't actually know what to call the column, we need to more explicit and use the columns= argument. The columns= argument takes a list-like object, passing in column headers in the order in which the columns are created.

Let's re-create our dataframe and specify a column name:

import pandas as pd  names = ['Katie', 'Nik', 'James', 'Evan'] df = pd.DataFrame(names, columns=['Name'])  print(df)

This now returns a clearly-labelled dataframe that looks like the below:

          Name 0  Katie 1    Nik 2  James 3   Evan

In the next section, you'll learn how to create a Pandas dataframe from multiple lists, by using the zip() function.

Create a Pandas Dataframe from Multiple Lists with Zip

Let's say you have more than a single list and want to pass them in. Simply passing in multiple lists, unfortunately, doesn't work. Because of this, we need to combine our lists in order.

The easiest way to do this is to use the built-in zip() function. The function takes two or more iterables, like lists, and combines them into an object, in the same way that a zipper does!

Let's see how this can work by creating a Pandas dataframe from two or more lists:

# Create a Pandas Dataframe from Multiple Lists using zip() import pandas as pd  names = ['Katie', 'Nik', 'James', 'Evan'] ages = [32, 32, 36, 31] locations = ['London', 'Toronto', 'Atlanta', 'Madrid'] zipped = list(zip(names, ages, locations))  df = pd.DataFrame(zipped, columns=['Name', 'Age', 'Location'])  print(df)

Let's see what this dataframe looks like by printing it out:

          Name  Age Location 0  Katie   32   London 1    Nik   32  Toronto 2  James   36  Atlanta 3   Evan   31   Madrid

Let's also break down what we've done here:

  1. We created three lists, containing names, ages, and locations, holding our ordered data
  2. We then created a Python zip() object, which contained the zips of names, ages, and locations. We then applied the list() function to turn this zip object into a list of tuples
  3. We then passed this zipped object into our DataFrame() class, along with a list of our column names to create our dataframe.

Want to learn more about the zip() function? Check out my in-depth tutorial on zipping two or more lists in Python and pick up some fun tips and tricks along the way!

In the next section, you'll learn how to turn lists of lists into a Pandas dataframe!

Create a Pandas Dataframe from a List of Lists

There may be many times you encounter lists of lists, such as when you're working with web scraping data. Lists of lists are simply lists that contain other lists. They are also often called multi-dimensional lists. For example, a list of lists may look like this:

data = [['Katie', 32, 'London'], ['Nik', 32, 'Toronto']]

Lists of lists behave a little differently, as you're essentially adding data at, what appears to be, a row level, rather than. column level, as we've been exploring so far.

Thankfully, Pandas is intelligent to figure out how to split each list of lists into different columns for you.

Let's look at how we can create a Pandas dataframe from a list of lists:

# Create a Pandas Dataframe from a multi-dimensional list of lists import pandas as pd  data = [['Katie', 32, 'London'], ['Nik', 32, 'Toronto'], ['James', 36, 'Atlanta'], ['Evan', 31, 'Madrid']] df = pd.DataFrame(data, columns=['Name', 'Age', 'Location'])  print(df)

This returns the formatted dataframe below:

          Name  Age Location 0  Katie   32   London 1    Nik   32  Toronto 2  James   36  Atlanta 3   Evan   31   Madrid

In the next section, you'll learn how to specify datatypes for when creating a Pandas dataframe from a list.

Specifying Data Types with Pandas Dataframes from Lists

While Pandas can do a good job of identifying datatypes, specifying datatypes can have significant performance improvements when loading and maintaining your dataframe. Because of this, it's an important step to do when you're either noticing data being loaded incorrectly or you want to manage the memory used by your dataframe.

Let's take a look at how we can do this in Pandas. We'll force the age column to be of size int8, in order to reduce the memory it uses.

import pandas as pd  data = [['Katie', 32, 'London'], ['Nik', 32, 'Toronto'], ['James', 36, 'Atlanta'], ['Evan', 31, 'Madrid']] df = pd.DataFrame(data, columns=['Name', 'Age', 'Location'], dtype='int8')  print(df)

Now that we've specified our Pandas dataframe's datatypes, let's take a look at what it looks like:

          Name  Age Location 0  Katie   32   London 1    Nik   32  Toronto 2  James   36  Atlanta 3   Evan   31   Madrid

In the next section, you'll learn how to create a Pandas dataframe from dictionaries with lists.

Want to learn more about Pandas? Check out all of my Pandas tutorials by clicking this link.

Create a Pandas Dataframe from Dictionaries with Lists

In this final section, you'll learn how to work with dictionaries that contain lists in order to produce a Pandas dataframe. This is something you'll often encounter when working with web API data, needing to convert complex dictionaries into simplified Pandas dataframes.

Since Pandas does allow for dictionaries to be passed into the data= parameter, there is little we actually need to do. Let's see just well Pandas handles dictionaries to create dataframes:

import pandas as pd  dictionary = {     'Name': ['Katie', 'Nik', 'James', 'Evan'],      'Age': [32, 32, 36, 31],      'Location': ['London', 'Toronto', 'Atlanta', 'Madrid']     } df = pd.DataFrame(dictionary)  print(df)

Let's see what this dataframe looks like now:

          Name  Age Location 0  Katie   32   London 1    Nik   32  Toronto 2  James   36  Atlanta 3   Evan   31   Madrid

Here, we've passed in a dictionary that contains lists as the values. Pandas was even able to extrapolate the column names by using the key values of each item in the dictionary!

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Conclusion

In this post, you learned different ways of creating a Pandas dataframe from lists, including working with a single list, multiple lists with the zip() function, multi-dimensional lists of lists, and how to apply column names and datatypes to your dataframe.

To learn more about the Pandas dataframe object, check out the official documentation here.

How To Create A Dataframe Python

Source: https://datagy.io/pandas-dataframe-from-list/

Posted by: rhodescapassicer.blogspot.com

0 Response to "How To Create A Dataframe Python"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel