Wednesday, 8 February 2023

Analyse the "Layoffs"

     Nowadays layoff is  threatening the techworld.Nothing to worry about it.We have to upgrade ourself.Be positive and patient.


    In this artical we analyse a data related to layoffs.Dataset "Technology Company Layoffs(2022-202" is downloaded from Kaggle.Thanks to Widya Salim for wonderfull work.


Let us start our work by importing python libreries.


 

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

df = pd.read_csv("tech_layoffs.csv")

print(df.head())

........................................

The column "additional_notes" have single entry.So we can drop it.

..........................................

df = df.drop("additional_notes",axis = "columns")

...........................................

 Observe the columns "total_layoffs" and "impacted_workforce_percentage".Both have "Unclear" as entry.We should replace "Unclear" by Numerical value.One way is to replace it by columns mean or mode.Here it is recommended to use mode value.Both columns have object values.

    To find mean or mode columns should contain numerical values.Here the values other than "Unclear" are also object.We can easily  convert object to numerical or int values by seperating them from "Unclear".

#This code forms a new dataset by removing the rows containing "Unclear"

..........................................

layoff = df[df['total_layoffs'] != "Unclear"]

#Object to int64

layoff["total_layoffs"] = layoff['total_layoffs'].astype("int64")

...........................................

#calculate mode of "total_layoffs"

...........................................

layoffs_mode = layoff['total_layoffs'].mode()[0]

...........................................

#replace "Unclear" with mode in original dataset.

...........................................

df["total_layoffs"] = df["total_layoffs"].apply(lambda x : layoffs_mode if x == "Unclear" else x) 

......................................

#converting object to int64

df["total_layoffs"] = df["total_layoffs"].astype("int64")

..........................................

# for the column "impacted_workforce_percentage" 

............................................

impacted = df[df['impacted_workforce_percentage'] != "Unclear"]

impacted['impacted_workforce_percentage'] = impacted['impacted_workforce_percentage'].astype("int64")

impacted_mode = impacted['impacted_workforce_percentage'].mode()[0]

df["impacted_workforce_percentage"] = df["impacted_workforce_percentage"].apply(lambda x : impacted_mode if x == "Unclear" else x)

df["impacted_workforce_percentage"] = df["impacted_workforce_percentage"].astype("int64")

..........................................

#the column "reported_date" is object.this code convert object to datetime form.

.........................................

df['reported_date'] = pd.to_datetime(df['reported_date'].astype("str"))


..........................................


    We have done data pre processing.Now it is time to investigate the data.A lot of methods are there for analysing data.But here are some basic things to try.

............................................



Calculate the mean of total_layoffs column

.............................................

mean_lay_offs  = df['total_layoffs'].mean()

print("Mean of total layoffs :",mean_lay_offs)


............................................

Calculate the median of impacted_workforce_percentage column

..............................................

impacted_workforce_percentage_mode = df['impacted_workforce_percentage'].median()

print("median of impacted_workforce_percentage :",impacted_workforce_percentage_mode)

...........................................

Group the data by industry and calculate the sum of total_layoffs

............................................

layoffs_sum_by_industry = df.groupby("industry")['total_layoffs'].sum()

print("Sum of total layoffs by industry: ",layoffs_sum_by_industry)

............................................

top 10 industries and total layoffs

............................................

layoffs_sum_by_industry.sort_values(ascending = False).head(10).plot(kind = "bar",figsize = (10,5))

plt.title("Sum of Total Layoffs by Industry")

plt.xlabel("Industries")

plt.ylabel("Sum of Total Layoffs")

plt.show()

............................................Calculate the number of unique values in the 'headquarter_location' column

............................................

unique_location = df['headquarter_location'].nunique()

print("Unique_location :", unique_location)

 

...........................................

Plot the top 10 number of layoffs for each headquarter location

............................................

location_counts = df['headquarter_location'].value_counts().sort_values(ascending = False).head(10)

location_counts.plot(kind = "bar",figsize = (10,5))

plt.title("Top 10 number of Layoffs by Headquarter Location")

plt.xlabel("Headquarter Location")

plt.ylabel("Number of Layoffs")

plt.show()

............................................

Calculate the percentage of layoffs for each 'status'

............................................

status_percentage = df['status'].value_counts(normalize = True) * 100

print("Percentage of layoffs by status :",status_percentage)

..........................................

Plot the percentage of layoffs for each 'status'

......................................

status_percentage.plot(

kind = "pie",figsize = (5,5),autopct = "%.1f%%")

plt.title("Percentage of Layoffs by Status")

plt.show()

............................................

No comments:

Post a Comment