In today's digital age, social media platforms have become a rich source of information and insights. Twitter, in particular, offers a wealth of user-generated content that can provide valuable insights into various topics, trends, and public sentiment. In this blog post, we will explore how to scrape and analyze tweets from specific users and hashtags using Python and various data analysis techniques.
Before we dive into the code, let's ensure that we have the necessary libraries and credentials in place. We will be using the Tweepy library to access the Twitter API, Pandas for data manipulation, and several other libraries for data analysis and visualization.
First, make sure you have the required libraries installed by running the following command:
!pip install tweepy pandas numpy spacy torch seaborn matplotlib nltk gensim transformers sklearn wordcloud networkx
Next, you'll need to set up your Twitter API credentials. If you don't have them yet, you can apply for a developer account on the Twitter Developer Portal (https://developer.twitter.com/). Once you have your consumer key, consumer secret, access token, and access token secret, replace the empty strings in the code with your actual credentials.
The first task we'll tackle is scraping tweets from specific Twitter users.
We have a list of target users, including "LloydsBank," "MyMaybank," "ZenithBank," "NatWest_Help," "Nedbank,"
and "talktoBOI."
We'll use the scrape_user_tweets
function to fetch the tweets from these users.
The function takes the list of usernames and the number of tweets to scrape as input.
The scrape_user_tweets
function utilizes the Tweepy library to authenticate with the Twitter API
and fetch user tweets using the user_timeline
endpoint.
For each tweet, we extract relevant information such as the tweet text, date, time, and Twitter handle.
We also fetch additional details like reply count, retweet count, favorite count, hashtags, user mentions,
media URLs, and more.
The scraped tweets are stored in a list of dictionaries, where each dictionary represents a tweet. Finally, we convert the list of dictionaries into a Pandas DataFrame for further analysis.
Next, let's move on to scraping tweets based on specific hashtags.
We have a list of hashtags, including "LloydsBank," "Maybank," "ZenithBank," "Natwest," "Nedbank," and
"BankOfIreland."
We'll use the scrape_hashtag_tweets
function to fetch tweets containing these hashtags.
The function takes the list of hashtags and the number of tweets to scrape as input.
The scrape_hashtag_tweets
function utilizes the Tweepy library to authenticate with the Twitter API
and fetch hashtag tweets using the search_tweets
endpoint.
Similar to scraping user tweets, we extract various details from each tweet, including the tweet text, date,
time,
user name, location, followers count, following count, profile URL, profile image URL, and more.
The scraped hashtag tweets are also stored in a list of dictionaries and converted into a Pandas DataFrame.
Now that we have scraped the user tweets and hashtag tweets, we can proceed with data preprocessing to clean and prepare the text data for analysis. We have several functions to help us in this process:
clean_tweet_text
: This function cleans the tweet text by removing URLs, retweet tags, aliases,
emojis,
unwanted characters, and converting the text to lowercase.
It also tokenizes the text, removes stopwords, and lemmatizes the remaining tokens.
rename_columns
: This function renames the columns of the user and hashtag tweet DataFrames
to provide more descriptive names.
preprocess_text
: This function applies the clean_tweet_text
function to the tweet
text
column of the user and hashtag tweet DataFrames, creating a new column called "clean_text" that contains
the preprocessed text.
By applying these preprocessing functions, we can clean the tweet text, remove unnecessary information, and convert the text into a more suitable format for analysis.
With the preprocessed user tweet data at our disposal, we can now perform various analyses and gain insights into user behavior and engagement. Here are a few examples:
Similar to user tweets, we can perform various analyses on the preprocessed hashtag tweet data. Here are a few examples:
In this blog post, we explored how to scrape and analyze user tweets and hashtag tweets using Python. We covered the process of scraping tweets, preprocessing the data, and conducting various analyses to gain insights into user behavior, engagement, and sentiment.
By leveraging the power of social media data, we can uncover valuable information and trends that can inform decision-making, market research, and sentiment analysis. The techniques and examples presented in this blog post serve as a starting point for exploring Twitter data, and they can be further expanded and customized to suit specific analysis requirements.
Social media platforms like Twitter continue to provide a rich source of data that can offer valuable insights into user behavior, trends, and public sentiment. By harnessing the power of data analysis, we can make sense of this vast amount of information and derive actionable insights for various applications.
Happy analyzing!