The dataset consists of 543 entries and 8 columns, including information about constituencies, leading and trailing candidates, their respective parties, the winning margin, and the status of the results. Here is an outline of our analysis steps:
1 – Data Cleaning
- Check for and handle any missing values.
- Ensure data types are appropriate for analysis
2 – Descriptive Statistics
- Summary statistics for the dataset.
- Distribution of leading and trailing parties.
3 – Visualizations
- Party-wise distribution of seats.
- Geographical distribution of seats (if data includes state information).
- Analysis of winning margins.
- Voter turnout analysis (if included in the data)
4 – Detailed Analysis
- Comparison of leading and trailing parties.
- Margin analysis for close contests.
Let’s start with data cleaning and then move to the descriptive statistics and visualizations.
Step 1: Data Cleaning
We’ll handle any missing values and ensure data types are appropriate.
# Handle missing values
data['Trailing Candidate'].fillna('Unknown', inplace=True)
data['Trailing Party'].fillna('Unknown', inplace=True)
# Convert 'Margin' to numeric, removing commas
data['Margin'] = data['Margin'].str.replace(',', '').astype(int)
# Verify the changes
data.info()
Step 2: Descriptive Statistics
We’ll compute summary statistics and analyze the distribution of leading and trailing parties.
# Summary statistics
summary_stats = data.describe()
# Distribution of leading and trailing parties
leading_party_distribution = data['Leading Party'].value_counts()
trailing_party_distribution = data['Trailing Party'].value_counts()
leading_party_distribution, trailing_party_distribution
Step 3: Visualizations
Party-wise Distribution of Seats
We’ll visualize the distribution of seats by party.
import matplotlib.pyplot as plt
import seaborn as sns
# Plot the distribution of seats by party
plt.figure(figsize=(12, 6))
sns.countplot(y='Leading Party', data=data, order=data['Leading Party'].value_counts().index)
plt.title('Party-wise Distribution of Seats')
plt.xlabel('Number of Seats')
plt.ylabel('Party')
plt.show()
Analysis of Winning Margins
We’ll analyze the distribution of winning margins.
# Plot the distribution of winning margins
plt.figure(figsize=(12, 6))
sns.histplot(data['Margin'], bins=30, kde=True)
plt.title('Distribution of Winning Margins')
plt.xlabel('Margin')
plt.ylabel('Frequency')
plt.show()
Step 4: Detailed Analysis
Comparison of Leading and Trailing Parties
We’ll compare the number of seats won by leading parties and those lost by trailing parties.
# Leading vs. Trailing Party Comparison
leading_vs_trailing = data.groupby(['Leading Party', 'Trailing Party']).size().unstack(fill_value=0)
# Plot heatmap of leading vs. trailing party
plt.figure(figsize=(12, 8))
sns.heatmap(leading_vs_trailing, cmap='coolwarm', annot=True, fmt='d')
plt.title('Leading vs. Trailing Party Comparison')
plt.xlabel('Trailing Party')
plt.ylabel('Leading Party')
plt.show()
Complete Python Code
# Descriptive Statistics
summary_stats = data.describe()
# Distribution of leading and trailing parties
leading_party_distribution = data['Leading Party'].value_counts()
trailing_party_distribution = data['Trailing Party'].value_counts()
# Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
# Plot the distribution of seats by party
plt.figure(figsize=(12, 6))
sns.countplot(y='Leading Party', data=data, order=data['Leading Party'].value_counts().index)
plt.title('Party-wise Distribution of Seats')
plt.xlabel('Number of Seats')
plt.ylabel('Party')
plt.show()
# Plot the distribution of winning margins
plt.figure(figsize=(12, 6))
sns.histplot(data['Margin'], bins=30, kde=True)
plt.title('Distribution of Winning Margins')
plt.xlabel('Margin')
plt.ylabel('Frequency')
plt.show()
# Leading vs. Trailing Party Comparison
leading_vs_trailing = data.groupby(['Leading Party', 'Trailing Party']).size().unstack(fill_value=0)
plt.figure(figsize=(12, 8))
sns.heatmap(leading_vs_trailing, cmap='coolwarm', annot=True, fmt='d')
plt.title('Leading vs. Trailing Party Comparison')
plt.xlabel('Trailing Party')
plt.ylabel('Leading Party')
plt.show()
# Display summary statistics and party distributions
summary_stats, leading_party_distribution, trailing_party_distribution
Explore Career Growth Article:- Why Regular Skill Updates are Crucial for Career Growth
Check out our Trending Courses Demo Playlist
Data Analytics with Power Bi and Fabric |
Could Data Engineer |
Data Analytics With Power Bi Fabic |
AWS Data Engineering with Snowflake |
Azure Data Engineering |
Azure & Fabric for Power bi |
Full Stack Power Bi |
Most Commented