- Write Python statements that import a module, call and chain methods
- Use slices, lists, tuples, dictionaries, and list comprehensions
- Use JupyterLab to open an existing Notebook or to start a new Notebook
- Edit and run the cells of a Notebook
- Create and modify the headings in a Notebook
- Use JupyterLab’s Tab completion and tooltip features.
- Use JupyterLab’s Magic Commands to time statements and display the current variables.
- Use Panda to create data frames and plots
- Use Seaborn to create plots
- Use the Pandas read methods to import data into a DataFrame
- Identify the data problems in a DataFrame by using the info(), unique(), nunique(), and value_counts() methods.
- Add datetime, string, and numeric columns that are derived from other columns to a DataFrame
- Group and aggregate the data in a DataFrame, and Pivot the data in a DataFrame or create a pivot table from the data
- Generate date ranges at different time intervals
- Use scatter plots or heatmaps to identify correlations
- Use Scikit-learn to create and validate a multiple regression model from the variables that you select and to use the model to make predictions
Lesson 1: Introduction to Python for Data Analysis
This chapter starts by introducing you to data analysis with Python and by reviewing the Python coding skills that you’ll need for data analysis. Then, it shows you how to use JupyterLab as your IDE. Last, this chapter introduces you to the case studies for this book because they are a critical part of the learning process.
Lesson 2: The Pandas Essentials for Data Analysis
The Pandas module is the primary module for Python data analysis, and it is installed as part of the Anaconda distribution. This module provides methods for getting the data into a DataFrame and for cleaning, preparing, analyzing, and visualizing that data. In this chapter, you’ll learn the Pandas essentials that you’ll use for almost every Python analysis that you undertake.
Lesson 3: The Pandas Essentials for Data visualization
As you will see, Pandas works okay for getting quick visualizations as you prepare and analyze the data. But it has some quirks that can make it difficult to use. And it doesn’t provide all the features that you need for refining your visualizations so they’re suitable for presentation. To get around those limitations, you need to use a module that’s specifically designed for data visualization, like the Seaborn module that’s presented in the next chapter.
Lesson 4: The Seaborn Essentials for Data Visualization
As chapter 3 pointed out, the Pandas plot() method is okay for creating quick plots as you clean, prepare, and analyze the data. But you’re going to want to use a data visualization library like Seaborn for most of your plots. As you will learn in this chapter, Seaborn not only makes it easier to prepare a wider variety of plots, but it also lets you enhance those plots so they’re suitable for presentation.
Lesson 5: How to Get the Data
This chapter shows you how to get the data that you’re going to analyze. That not only includes finding the data on a website, but also importing it into a Pandas DataFrame. Once the data is in a DataFrame, you can use the methods of the DataFrame to clean, prepare, analyze, and plot the data.
Lesson 6: How to Clean the Data
Like it or not, most of the data that you work with will have data problems that need to be fixed before you can analyze the data. That’s true whether you get the data from a third-party website or you’re using data from your own company’s databases or spreadsheets. In fact, some estimates say that cleaning the data takes from 20 to 25 percent of the time in a typical analysis project. What’s worse is that if you don’t clean the data, the results of your analysis are likely to be misleading or inaccurate. That’s why this chapter presents the skills that you’ll need for cleaning the various types of data that you’ll be working with.
Lesson 7: How to Prepare the Data
This chapter shows how to prepare a DataFrame for analysis. To do that, you may add and modify columns; apply functions and lambda expressions; set, unstack, and reset indexes; and combine the data in two or more DataFrames. So those are the skills that you’ll learn in this chapter. But you’ll also learn how to handle the SettingWithCopyWarning, which you’ve probably been getting from time to time.
Lesson 8: How to Analyze the Data
In the previous chapter, you learned how to prepare the data for analysis. Now, in this chapter, you’ll learn how to analyze the data that you’ve prepared. As you will see, however, there’s a lot of overlap between preparation and analysis. That’s why this chapter presents some of the preparation skills that are closely related to the analytical skills.
Lesson 9: How to Analyze Time-Series Data
In the previous chapter, you learned the critical skills for analyzing data. Now, this chapter expands on that by showing you how to analyze data that is indexed by dates and times. Although the general analysis techniques stay the same, many of the operations that you do with time-series data are unique.
Lesson 10: How to Make Predictions with a Linear Regression Model
This chapter begins by showing you how to use the Pandas and Seaborn libraries to find correlations between variables. Then, it shows how to use the Scikit-learn library to create a linear regression model and use it to make predictions. Finally, it shows how to use Seaborn to automatically create and plot a linear regression model.
Lesson 11: How to Make Predictions with a Multiple Regression Model
In the previous chapter, you learned how to make predictions with a simple regression model. In the real world, though, the dependent variable is usually affected by more than one independent variable. That’s why this chapter shows how to create a multiple regression model and use it to make predictions. That should lead to more accurate predictions.