Exploratory Data Analysis in Pandas | Python Pandas Tutorials

112,541
0
Published 2023-06-06
Take my Full Python Course Here: www.analystbuilder.com/courses/pandas-for-data-ana…

In this series we will be walking through everything you need to know to get started in Pandas! In this video, we learn about Exploratory Data Analysis in Pandas.

Dataset in GitHub:
github.com/AlexTheAnalyst/PandasYouTubeSeries/blob…

Code in GitHub: github.com/AlexTheAnalyst/PandasYouTubeSeries/blob…

Favorite Pandas Course:
Data Analysis with Pandas and Python - bit.ly/3KHMLlu
____________________________________________

SUBSCRIBE!
Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
____________________________________________

RESOURCES:

Coursera Courses:
📖Google Data Analyst Certification: coursera.pxf.io/5bBd62
📖Data Analysis with Python - coursera.pxf.io/BXY3Wy
📖IBM Data Analysis Specialization - coursera.pxf.io/AoYOdR
📖Tableau Data Visualization - coursera.pxf.io/MXYqaN

Udemy Courses:
📖Python for Data Analysis and Visualization- bit.ly/3hhX4LX
📖Statistics for Data Science - bit.ly/37jqDbq
📖SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
📖Tableau A-Z - bit.ly/385lYvN

Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
____________________________________________

BECOME A MEMBER -

Want to support the channel? Consider becoming a member! I do Monthly Livestreams and you get some awesome Emoji's to use in chat and comments!

youtube.com/channel/UC7cs8q-gJRlGwj4A8OmCmXg/join
____________________________________________

Websites:
💻Website: AlexTheAnalyst.com
💾GitHub: github.com/AlexTheAnalyst
📱Instagram: @Alex_The_Analyst
____________________________________________

0:00 Intro
1:51 First Look at Data
3:45 Info()
4:40 Describe()
5:47 Counting all Null Values
7:09 Count of Unique Values
8:15 Sorting on Values
10:40 Correlation between Columns
11:53 Heatmap using Seaborn
14:43 Grouping Data
25:02 Visualizing Grouped Data
26:17 Boxplots for Outliers
29:07 Data Types of Columns
30:41 Outro

All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever

All Comments (21)
  • Hello, at minute 24:24, I managed to reverse the range of column names using [5:13][::-1]. The expression [::-1] is used to reverse ranges and it is very useful: df2 = df.groupby('Continent')[df.columns[5:13][::-1]].mean(numeric_only=True).sort_values(by='2022 Population', ascending=False) df2 Thank you very much, Mr. Alex, for these tutorials.
  • @pbp7
    Man, “Oceania” was so funny 😂, tks for the class!
  • @JW-pu1uk
    This is absolutely top tier content. I can't stress this enough to people new, or going into the DA/DS field: you WILL be exploring and cleaning data sets much more than you will be visualizing and building models. Thanks for this, Alex!
  • @satrapech6107
    the correction of df.corr() is: numeric_columns = df.select_dtypes(include=[np.number]) correlation_matrix = numeric_columns.corr correlation_matrix()
  • @DEDE-ix9lg
    I always enjoy a video from Alex. Making one of the best videos , while some other channels just can be a real headache
  • @AlastorGarcia
    Thanks Alex! Right now i'm applying to my first DA Job and you have no idea how useful your videos have been for me!!
  • Oceania is one of the 7 Continents (North America, South America, Europe, Asia, Africa, Oceania, Antartica). It's basically Australia and the countries (islands) around it. Hope that helps!
  • I just finished all the videos in you bootcamp playlist few hours ago and I'm excited to do this again..
  • @kartikgupta370
    We can also write this to save time writing all the column names in the list "df2 = df.groupby('Continent')[df.columns[12:4:-1]].mean(numeric_only=True).sort_values(by='2022 Population', ascending=False) "
  • @sj1795
    EXCELLENT SUPERB video!! I can't believe it--I'm 6/7 videos away from the end of your FANTASTIC bootcamp series! Wahoo! I learned a lot in this video. :) As for "ending on a low note", hardly Alex lol All your content is uplifting and rewarding! As always, THANK YOU!
  • Hi Alex Thank you so much for your support for freshers in the field of data analytics.
  • @Inc0gnit030
    I really enjoyed this introduction to Pandas! Keep up the good work!
  • Hello, 100000000 thanks for sharing For the Corealtion part at 11mn df.corr(numeric_only=True) # pass numeric only param to not having error
  • @MaximKazartsev
    Alex, thank you for this great video and everything you do! In order to avoid manual ordering of the population years, there is a way to use df.columns method, by adding reversed. The whole construction looks like df2 = df.groupby('Continent')[list(reversed(df.columns[5:13]))].mean().sort_values(by='2022 Population', ascending=False) And it works )
  • @staquatica1607
    I got some error's (using pycharm) that I solved by using "mumeric_only=True". For instance: df.corr(numeric_only=True) and df.groupby("Continent").mean(numeric_only=True)
  • Hello, Alex! Once again, thanks a lot for all your hard work! At 13:10 I got an error ValueError: 'box_aspect' and 'fig_aspect' must be positive" Solved it by putting the plt.rcParams BEFORE the sns.heatmap The other problem was that some functions didn't work until I added the parameter numeric_only = True, e.g., df.corr (numeric_only=True) or .mean(numeric_only = True) Hope, it can help someone!
  • @toygar8699
    For those get error in heatmap: import matplotlib.pyplot as plt numeric_columns = df.select_dtypes(include=['float']) sns.heatmap(numeric_columns.corr(), annot=True) plt.rcParams['figure.figsize'] = (20, 7) plt.show()
  • Thank you so much for this. I really enjoyed it and learned a lot of what I had forgotten a few years ago.