{"id":915,"date":"2025-05-19T05:29:01","date_gmt":"2025-05-19T05:29:01","guid":{"rendered":"https:\/\/gnikesh.com\/?p=915"},"modified":"2025-06-11T05:29:40","modified_gmt":"2025-06-11T05:29:40","slug":"from-tweets-to-takes-training-an-ai-to-read-the-gun-control-debate","status":"publish","type":"post","link":"https:\/\/gnikesh.com\/index.php\/2025\/05\/19\/from-tweets-to-takes-training-an-ai-to-read-the-gun-control-debate\/","title":{"rendered":"From Tweets to Takes: Training an AI to Read the Gun-Control Debate"},"content":{"rendered":"\n<p>Every time a mass-shooting tragedy dominates the U.S. news cycle, social media lights up with urgent, emotional arguments about firearms. Some people call for outright bans on civilian gun ownership, others argue for tighter regulations, and many seek a middle ground. <span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">To understand these conversations at scale,<\/span> our new study, created\u00a0GUNSTANCE\u2014the first publicly available machine-learning dataset focused squarely on the two most hot-button positions in the debate: \u201cbanning guns\u201d and \u201cregulating guns.\u201d<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">From millions of tweets to a clean, labeled corpus<\/h3>\n\n\n\n<p>We began by collecting tweets posted in the eight weeks following seven high-profile shootings that occurred in 2021-2022. After filtering out retweets, links to external news sites, and corporate news accounts, we were left with about 86,000 reactions written by users.<\/p>\n\n\n\n<p>Two targets were defined:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\u201cGuns should be banned.\u201d<\/strong><\/li>\n\n\n\n<li><strong>\u201cGuns should be regulated.\u201d<\/strong><\/li>\n<\/ul>\n\n\n\n<p>Because a single tweet can take a stance on one or both targets, each tweet was compared to both statements. Using Amazon Mechanical Turk, annotators assigned 2,700 tweets about banning guns and 2,800 about regulating guns to one of three easy-to-grasp classes\u2014<strong>In-Favor, Against, or Neutral<\/strong>. The remaining 16,000+ unlabeled tweets were deliberately kept in the dataset so that modern semi-supervised techniques could learn from them.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why ordinary classifiers were not enough<\/h3>\n\n\n\n<p>Traditional stance-detection systems need thousands of hand-labeled examples. Even then they struggle when a fresh shooting suddenly shifts the tone and vocabulary of the debate. Large language models (LLMs) such as ChatGPT show impressive zero-shot skills, but calling them for every new tweet is expensive and slow.<\/p>\n\n\n\n<p>To get the best of both worlds, we introduced a <strong>hybrid method called AUM-ST + ChatGPT<\/strong>. It starts with AUM-ST, a self-training algorithm that watches how confidently a base neural network (BERTweet, in this case) predicts pseudo-labels for the unlabeled tweets. Whenever AUM-ST finds tweets whose labels it is <em>least<\/em> sure about, it sends only this small, hard subset\u2014roughly five percent of the data\u2014to ChatGPT for a second opinion. Those ChatGPT labels are then folded back into training, and the cycle repeats. Because the LLM is consulted sparingly, the approach adds little cost while injecting expert-level knowledge just where it matters most.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How well does it work?<\/h3>\n\n\n\n<p>Across multiple test settings, the hybrid system consistently outperformed every baseline:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>On all events combined, its F1-score reached <strong>66.5%<\/strong> for \u201cbanning guns\u201d and <strong>66.1%<\/strong> for \u201cregulating guns\u201d, beating the best purely supervised model by about eight percentage points and even edging out ChatGPT used by itself.<\/li>\n\n\n\n<li>When the model was trained on six shootings and evaluated on a <strong>new, unseen<\/strong> event such as Buffalo or Uvalde, the hybrid still led the pack, gaining up to ten F1 points over the next-best method.<\/li>\n\n\n\n<li>In the toughest cross-target test\u2014training on \u201cban\u201d but predicting \u201cregulate\u201d, and vice-versa\u2014it either matched or surpassed specialised semi-supervised alternatives.<\/li>\n<\/ul>\n\n\n\n<p>In plain language, the system can follow a brand-new gun-control conversation as it unfolds and reliably tell whether a tweet is supporting, opposing, or sitting on the fence.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What this means beyond gun control<\/h3>\n\n\n\n<p>Because <em>GUNSTANCE<\/em> is public and ships with strong baselines, it gives the community a realistic testbed for measuring progress in stance detection under real-time, emotionally charged conditions. The hybrid SSL-with-LLM recipe is equally valuable: any domain where labeled data are scarce but unlabeled text pours in\u2014think public-health rumors, climate-policy debates, or product-review sentiment\u2014can reuse the same strategy.<\/p>\n\n\n\n<p>The code and dataset are freely available on <a href=\"https:\/\/github.com\/gnikesh\/gunstance\">GitHub<\/a>, so you can explore the tweets, retrain the models, or adapt the pipeline to your own topic.<\/p>\n\n\n\n<p>Reference:<br>Gyawali, Nikesh \u00b7 Sirbu, Iustin \u00b7 Sosea, Tiberiu \u00b7 Khanal, Sarthak \u00b7 Caragea, Doina \u00b7 Rebedea, Traian \u00b7 Caragea, Cornelia. 2024. <strong>GunStance: Stance Detection for Gun Control and Gun Regulation.<\/strong> <em>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Volume 1: Long Papers<\/em>, pp. 12027-12044. Association for Computational Linguistics.;<\/p>\n\n\n\n<p>ACL Anthology link (open access): <a href=\"https:\/\/aclanthology.org\/2024.acl-long.650\">https:\/\/aclanthology.org\/2024.acl-long.650<\/a><\/p>\n\n\n\n<p>**Original manuscript summarized by ChatGPT<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every time a mass-shooting tragedy dominates the U.S. news cycle, social media lights up with urgent, emotional arguments about firearms. Some people call for outright bans on civilian gun ownership, others argue for tighter regulations, and many seek a middle ground. To understand these conversations at scale, our new study, created\u00a0GUNSTANCE\u2014the first publicly available machine-learning [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7,23],"tags":[19,24],"class_list":["post-915","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-ml_ai","tag-computer","tag-machine-learning"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/posts\/915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/comments?post=915"}],"version-history":[{"count":1,"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/posts\/915\/revisions"}],"predecessor-version":[{"id":916,"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/posts\/915\/revisions\/916"}],"wp:attachment":[{"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/media?parent=915"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/categories?post=915"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gnikesh.com\/index.php\/wp-json\/wp\/v2\/tags?post=915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}