Sentiment analysis of social media text containing opinions about the product, event, or service is used in various applications like election results prediction, product endorsement, and many more. Sarcasm is a form of sentiment in which people use positive words to express negative feelings. While communicating verbally, people express sarcasm using hand gestures, facial expressions, and eye movements. These clues are missing in text data, making sarcasm detection challenging. Because of these challenges, scholars are interested in detecting sarcasm in social media texts. The feature extraction technique is an important component in a sarcasm detection model. Most solutions use GloVe, word2vec, or general-purpose pre-trained models for feature extraction. The GloVe/word2vec techniques ignore words that are not present in their vocabulary leading to information loss, require more extensive data for training and generating exact vectors, and ignore contextual information. A general-purpose pre-trained model overcomes the limitations of GloVe/word2vec models but cannot learn features from the social media text due to informal grammar, abbreviations, and irregular vocabulary. In this view, the BERTweet model (trained on social media text) is applied to generate sentence-level semantics and contextual features. The Bi-GRU model processes these features to learn long-distance dependencies from both directions (forward and backward), and the self-attention layer is applied on top of the Bi-GRU model to remove redundant and irrelevant information. This work presents a hybrid method called B2GRUA that combines the strengths of the BERTweet pre-trained model, bi-directional gated recurrent unit and attention mechanism (Bi-GRUAM) for classifying text into sarcastic/non-sarcastic. The efficacy of the proposed model is evaluated on three benchmark datasets, namely SemEval 2018 Task 3.A, iSarcasm, and 2020 shared sarcasm detection task (Twitter data). It is observed from the results that the proposed model out-performed state-of-the-art models on all the datasets (24% better on the iSarcasm dataset and around 2% on both the 2020 shared sarcasm detection task and SemEval 2018 Task 3.A dataset). ANOVA one-way test is applied to validate the results statistically.