LessWrong Curated Podcast  By  cover art

LessWrong Curated Podcast

By: LessWrong
  • Summary

  • Audio version of the posts shared in the LessWrong Curated newsletter.
    © 2023 LessWrong Curated Podcast
    Show more Show less
Episodes
  • "Unifying Bargaining Notions (2/2)" by Diffractor
    Jun 12 2023

    Alright, time for the payoff, unifying everything discussed in the previous post. This post is a lot more mathematically dense, you might want to digest it in more than one sitting.

     Imaginary Prices, Tradeoffs, and Utilitarianism

    Harsanyi's Utilitarianism Theorem can be summarized as "if a bunch of agents have their own personal utility functions Ui, and you want to aggregate them into a collective utility function U with the property that everyone agreeing that option x is better than option y (ie, Ui(x)≥Ui(y) for all i) implies U(x)≥U(y), then that collective utility function must be of the form b+∑i∈IaiUi for some number b and nonnegative numbers ai."

    Basically, if you want to aggregate utility functions, the only sane way to do so is to give everyone importance weights, and do a weighted sum of everyone's individual utility functions.

    Closely related to this is a result that says that any point on the Pareto Frontier of a game can be post-hoc interpreted as the result of maximizing a collective utility function. This related result is one where it's very important for the reader to understand the actual proof, because the proof gives you a way of reverse-engineering "how much everyone matters to the social utility function" from the outcome alone.

    First up, draw all the outcomes, and the utilities that both players assign to them, and the convex hull will be the "feasible set" F, since we have access to randomization. Pick some Pareto frontier point u1,u2...un (although the drawn image is for only two players)

    https://www.lesswrong.com/posts/RZNmNwc9SxdKayeQh/unifying-bargaining-notions-2-2

    Show more Show less
    41 mins
  • "The ants and the grasshopper" by Richard Ngo
    Jun 6 2023

    Inspired by Aesop, Soren Kierkegaard, Robin Hanson, sadoeuphemist and Ben Hoffman.

    One winter a grasshopper, starving and frail, approaches a colony of ants drying out their grain in the sun, to ask for food.

    “Did you not store up food during the summer?” the ants ask.

    “No”, says the grasshopper. “I lost track of time, because I was singing and dancing all summer long.”

    The ants, disgusted, turn away and go back to work.

    https://www.lesswrong.com/posts/GJgudfEvNx8oeyffH/the-ants-and-the-grasshopper

    Show more Show less
    10 mins
  • "Steering GPT-2-XL by adding an activation vector" by TurnTrout et al.
    May 18 2023

    Summary: We demonstrate a new scalable way of interacting with language models: adding certain activation vectors into forward passes. Essentially, we add together combinations of forward passes in order to get GPT-2 to output the kinds of text we want. We provide a lot of entertaining and successful examples of these "activation additions." We also show a few activation additions which unexpectedly fail to have the desired effect.

    We quantitatively evaluate how activation additions affect GPT-2's capabilities. For example, we find that adding a "wedding" vector decreases perplexity on wedding-related sentences, without harming perplexity on unrelated sentences. Overall, we find strong evidence that appropriately configured activation additions preserve GPT-2's capabilities.

    Our results provide enticing clues about the kinds of programs implemented by language models. For some reason, GPT-2 allows "combination" of its forward passes, even though it was never trained to do so. Furthermore, our results are evidence of linear feature directions, including "anger", "weddings", and "create conspiracy theories." 

    We coin the phrase "activation engineering" to describe techniques which steer models by modifying their activations. As a complement to prompt engineering and finetuning, activation engineering is a low-overhead way to steer models at runtime. Activation additions are nearly as easy as prompting, and they offer an additional way to influence a model’s behaviors and values. We suspect that activation additions can adjust the goals being pursued by a network at inference time.

    https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

    Show more Show less
    1 hr and 43 mins

What listeners say about LessWrong Curated Podcast

Average Customer Ratings
Overall
  • 5 out of 5 stars
  • 5 Stars
    1
  • 4 Stars
    0
  • 3 Stars
    0
  • 2 Stars
    0
  • 1 Stars
    0
Performance
  • 5 out of 5 stars
  • 5 Stars
    1
  • 4 Stars
    0
  • 3 Stars
    0
  • 2 Stars
    0
  • 1 Stars
    0
Story
  • 5 out of 5 stars
  • 5 Stars
    1
  • 4 Stars
    0
  • 3 Stars
    0
  • 2 Stars
    0
  • 1 Stars
    0

Reviews - Please select the tabs below to change the source of reviews.