{"id":1089,"date":"2023-06-10T20:23:00","date_gmt":"2023-06-11T03:23:00","guid":{"rendered":"https:\/\/www.docbug.com\/blog\/?p=1089"},"modified":"2023-06-10T20:23:00","modified_gmt":"2023-06-11T03:23:00","slug":"party-tricks-are-a-bad-way-to-evaluate-chatgpt","status":"publish","type":"post","link":"https:\/\/www.docbug.com\/blog\/archives\/1089","title":{"rendered":"Party tricks are a bad way to evaluate ChatGPT"},"content":{"rendered":"\n<p>People talk about ChatGPT and similar large language models as being &#8220;trained on the Internet as a whole&#8221;, which makes it seem even more magical when it manages to do something ridiculous like knock out a rap battle between George Carlin and Julius Caesar, or explain quantum physics in haiku. These are essentially party tricks, but they <em>feel<\/em> more impressive than doing something useful like summarizing a document or writing a cover letter because it feels like a good test of generalization. After all, what are the odds that the model happened to be trained on something so random?<\/p>\n\n\n\n<p>Unfortunately the answer seems to be  &#8220;a lot higher than I think&#8221;, and it&#8217;s kind of alarming how often I come up with a &#8220;novel&#8221; task to give to ChatGPT or Stable Diffusion only to find <a href=\"https:\/\/epicrapbattlesofhistory.fandom.com\/wiki\/George_Carlin_vs_Richard_Pryor\">something<\/a> <a href=\"https:\/\/epicrapbattlesofhistory.fandom.com\/wiki\/Shaka_Zulu_vs_Julius_Caesar\">close<\/a> to it with a quick web search. Turns out the Internet is <a href=\"https:\/\/xkcd.com\/305\/\">really vast<\/a> (who knew?), and apparently I&#8217;m also not nearly as creative as I think I am. Add the fact that since ChatGPT-3.5 OpenAI has <a href=\"https:\/\/towardsdatascience.com\/how-chatgpt-works-the-models-behind-the-bot-1ce5fca96286\">included<\/a> human-generated answers to commonly-asked prompts in their training, and it&#8217;s especially hard to figure out whether what it&#8217;s doing is magic or &#8220;just&#8221; clever interpolation.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"512\" src=\"https:\/\/www.docbug.com\/blog\/wp-content\/uploads\/2023\/06\/carlin-and-caesar-rap-battle.png\" alt=\"\" class=\"wp-image-1090\" srcset=\"https:\/\/www.docbug.com\/blog\/wp-content\/uploads\/2023\/06\/carlin-and-caesar-rap-battle.png 512w, https:\/\/www.docbug.com\/blog\/wp-content\/uploads\/2023\/06\/carlin-and-caesar-rap-battle-300x300.png 300w, https:\/\/www.docbug.com\/blog\/wp-content\/uploads\/2023\/06\/carlin-and-caesar-rap-battle-150x150.png 150w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>People talk about ChatGPT and similar large language models as being &#8220;trained on the Internet as a whole&#8221;, which makes it seem even more magical when it manages to do something ridiculous like knock out a rap battle between George Carlin and Julius Caesar, or explain quantum physics in haiku. These are essentially party tricks, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[23],"tags":[],"class_list":["post-1089","post","type-post","status-publish","format-standard","hentry","category-machine-learning-ai"],"_links":{"self":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/1089","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/comments?post=1089"}],"version-history":[{"count":2,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/1089\/revisions"}],"predecessor-version":[{"id":1092,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/1089\/revisions\/1092"}],"wp:attachment":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/media?parent=1089"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/categories?post=1089"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/tags?post=1089"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}