Nowadays layoff is threatening the techworld.Nothing to worry about it.We have to upgrade ourself.Be positive and patient.
In this artical we analyse a data related to layoffs.Dataset "Technology Company Layoffs(2022-202" is downloaded from Kaggle.Thanks to Widya Salim for wonderfull work.
Let us start our work by importing python libreries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("tech_layoffs.csv")
print(df.head())
........................................
The column "additional_notes" have single entry.So we can drop it.
..........................................
df = df.drop("additional_notes",axis = "columns")
...........................................
Observe the columns "total_layoffs" and "impacted_workforce_percentage".Both have "Unclear" as entry.We should replace "Unclear" by Numerical value.One way is to replace it by columns mean or mode.Here it is recommended to use mode value.Both columns have object values.
To find mean or mode columns should contain numerical values.Here the values other than "Unclear" are also object.We can easily convert object to numerical or int values by seperating them from "Unclear".
#This code forms a new dataset by removing the rows containing "Unclear"
..........................................
layoff = df[df['total_layoffs'] != "Unclear"]
#Object to int64
layoff["total_layoffs"] = layoff['total_layoffs'].astype("int64")
...........................................
#calculate mode of "total_layoffs"
...........................................
layoffs_mode = layoff['total_layoffs'].mode()[0]
...........................................
#replace "Unclear" with mode in original dataset.
...........................................
df["total_layoffs"] = df["total_layoffs"].apply(lambda x : layoffs_mode if x == "Unclear" else x)
......................................
#converting object to int64
df["total_layoffs"] = df["total_layoffs"].astype("int64")
..........................................
# for the column "impacted_workforce_percentage"
............................................
impacted = df[df['impacted_workforce_percentage'] != "Unclear"]
impacted['impacted_workforce_percentage'] = impacted['impacted_workforce_percentage'].astype("int64")
impacted_mode = impacted['impacted_workforce_percentage'].mode()[0]
df["impacted_workforce_percentage"] = df["impacted_workforce_percentage"].apply(lambda x : impacted_mode if x == "Unclear" else x)
df["impacted_workforce_percentage"] = df["impacted_workforce_percentage"].astype("int64")
..........................................
#the column "reported_date" is object.this code convert object to datetime form.
.........................................
df['reported_date'] = pd.to_datetime(df['reported_date'].astype("str"))
..........................................
We have done data pre processing.Now it is time to investigate the data.A lot of methods are there for analysing data.But here are some basic things to try.
............................................
Calculate the mean of total_layoffs column
.............................................
mean_lay_offs =
df['total_layoffs'].mean()
print("Mean of total layoffs :",mean_lay_offs)
............................................
Calculate the median of impacted_workforce_percentage
column
..............................................
impacted_workforce_percentage_mode =
df['impacted_workforce_percentage'].median()
print("median of impacted_workforce_percentage :",impacted_workforce_percentage_mode)
...........................................
Group the data by industry and calculate the sum of
total_layoffs
............................................
layoffs_sum_by_industry =
df.groupby("industry")['total_layoffs'].sum()
print("Sum of total layoffs by industry:
",layoffs_sum_by_industry)
............................................
top 10 industries and total layoffs
............................................
layoffs_sum_by_industry.sort_values(ascending =
False).head(10).plot(kind = "bar",figsize = (10,5))
plt.title("Sum of Total Layoffs by Industry")
plt.xlabel("Industries")
plt.ylabel("Sum of Total Layoffs")
plt.show()
............................................Calculate the number of unique values in the 'headquarter_location' column
............................................
unique_location = df['headquarter_location'].nunique()
print("Unique_location :", unique_location)
Plot the top 10 number of layoffs for each headquarter
location
............................................
location_counts = df['headquarter_location'].value_counts().sort_values(ascending
= False).head(10)
location_counts.plot(kind = "bar",figsize =
(10,5))
plt.title("Top 10 number of Layoffs by Headquarter
Location")
plt.xlabel("Headquarter Location")
plt.ylabel("Number of Layoffs")
plt.show()
............................................
Calculate the percentage of layoffs for each 'status'
............................................
status_percentage = df['status'].value_counts(normalize =
True) * 100
print("Percentage of layoffs by status
:",status_percentage)
..........................................
Plot the percentage of layoffs for each 'status'
......................................
status_percentage.plot(
kind = "pie",figsize =
(5,5),autopct = "%.1f%%")
plt.title("Percentage of Layoffs by Status")
plt.show()
............................................
No comments:
Post a Comment