Protect Me From What I Want

[This fragment is available in an audio version.]

Over on Mastodon, there are many people who enjoy not being in the grip of software like Facebook or Twitter that single-mindedly tries to maximize “engagement”, which means the amount of time you stare at the screen so they can show you ads. These algorithms don’t care what they’re showing you and if it turns out that showing you exclusively stories vilifying or praising Donald Trump (depending) maximizes engagement, then that’s what you’ll see. So the chant over there is “No algorithms on Mastodon!” This chant is wrong, and the discussion around it teaches us that we need clarity on what algorithms are, what moral weight they can carry, and whether they can be avoided. (Spoiler: They can’t.)

This all started when I interjected here, and the longest and most twisted Mastodon thread I have so far seen ensued. Let’s start with my initial remarks:

I disagree. An algorithm is not intrinsically bad. As long as we understand that it represents the interests of whoever paid to have it constructed. I think an algorithm with human values that simply wanted to enrich experience is perfectly possible.

I haven't seen one, probably because nobody has ever had a financial incentive to construct it.

Mastodon would be a good place to try to make one.

In the following discussion I’m going to use Mastodon terminology: “Boost” is a synonym for “Retweet” and “Favorite” for “Like”.

What an algorithm is and why there will always be one · Consider a Mastodon instance that is engaged in creating a feed for, well, you, because you’ve just opened that tab or app. It gets the list of who you follow, probably finds some of those posts already in its memory, and reaches out to other Mastodon instances for the rest. It ends up with a big jumbled unsorted list of what could go in your feed. “Good,” you say, “just sort them in reverse chronological order and we’re done. Don’t talk to me about algorithms!”

Well, first of all, sorting is an algorithm, and not a simple one at that.

But it’s not that simple. Suppose one post from someone you don’t follow has been boosted by 5 people that you do, all at different times. Does it appear just once in your feed (and using which boost timestamp?) or five times? Neither answer is obviously right, but when you make that choice, you’re designing an algorithm.

Some of the posts are replies to each other, and to you. Do they appear right next to each other or in strict chronological order, mixed up with other posts? If you’re going to thread them together, do you do that in chronological not reverse-chronological order? Welcome to the world of algorithm design.

I guarantee the algorithm that generates “simple chronological order” isn’t simple at all. Among other things, it handles lots of corner cases other than the ones I just mentioned, that nobody is smart enough to think of until they actually start writing the code to present the posts to the people.

What “engagement” algorithms are like · Engagement maximization, a la Facebook or Twitter, is a special and very interesting case. It begins innocently; let’s include a few posts in your feed from people you don’t follow if those people are super-popular, or are followed by nearly all the people you do follow, or contain hashtags that you’ve been searching for a lot. Mostly harmless, right?

But then it gets deep. These things involve the application of large-scale Machine Learning (ML) technology. The big operators have billions of data points; they know what appears in people’s posts, and they know how long the people were “engaged”. So, by processing those billions of data points, they produce an “ML Model” which is then exercised to answer one question: “What selection of posts should we feed this person to keep them engaged?”

Modern ML models aren’t simple at all, and also are generally not comprehensible by humans. The people who built the model can’t answer questions like “Why is the model feeding a lot of posts by Ben Shapiro to people who seem only mildly conservative?” But they can prove that doing what the model says maximizes “engagement”.

A lot of people are now reacting viscerally, saying “I never want that kind of algorithm in my life.” Well, that kind of algorithm is being used every day to filter spam out of your email. And being used on Twitter to combat Nazis and incels, to the extent that when oppressed groups migrate to Mastodon, they get a lot more abusive bigotry. (I think we can fix that.)

The BMW Art Car decorated by Jenny Holzer

The Jenny Holzer BMW Art Car. The biggest message, “Protect Me From What I Want”, is a little hard to read.
Borrowed from BMW Art Cars 15.

Protect me from what I want · And anyhow, those algorithms are just showing you what you want. Don’t try to deny it, if it wasn’t what you wanted you wouldn’t be doomscrolling so much, would you? These ML models know what you want and that’s what they show you.

(Jenny Holzer is wonderful. One time many years ago I was walking across Times Square in New York and on a huge otherwise-blank billboard were her words: “Protect me from what I want.” It felt like someone sticking a knife into my brain; that phrase explains so very much.)

In whose interest? · In my Mastodon-post outtake above, I said that an algorithm “represents the interests of whoever paid to have it constructed”. That’s true and, in the context of a capitalist enterprise, a fairly complete answer. But in the world of software, things can happen outside the control of capitalist enterprises.

Open-source, for example; the algorithms in Linux that make computers useful, or in Postgres that reliably store and retrieve data, or in Nginx that efficiently respond to Web traffic, were mostly written by people who found the work interesting and had a problem they needed to solve for themselves.

With the advent of the Fediverse generally and Mastodon specifically, for the first time we have a large-scale opportunity to experiment with algorithms that are written for people by people just because they’re cool, or because they produce feeds that the programmer likes for herself, or that her Dad likes, or that she notices causes her kids to be less obsessive about screen time.

So let’s stop saying “No algorithms!” because that’s just wrong, and figure out how to get nice algorithms built, ones that primarily are there to serve humanity’s best interests.

One thing I think we can all agree on is this: We want algorithms that (unlike every commercial social-media algorithm) don’t tell anyone else what we’re watching!

Write your own algorithm? · I’ll be honest: I want the algorithm both to give me, and protect me from, what I want. And I want some control over it. I can think of a couple of ways this could happen:

Someone figures out a “feed algorithm construction kit” that has a bunch of knobs on it you can twist, with labels like “tennis” and “activism” and “Christianity” and “keto diet” and “baroque music” and “surprise” and “pretty” and “outrage” and so on, and you fool with the settings until you get a feed you like.

I think this is plausible, but very difficult.
Mastodon introduces a feature where you can download and install algorithms, which can be posted by anyone. They are given the raw unsorted list of posts from people you follow and use that produce a coherent feed. You might have to pay for them. They could be free. They could involve elaborate ML, or not. They might sometimes pull in posts from people you don’t follow. They could be open-source, or not.

I like this idea a lot, although the technology would require careful design. The algorithm would have to be able to store information about what you’ve seen and how you’ve reacted, and some ability to follow the social network around the Fediverse, but simultaneously would have to be restricted so it could never reveal what it knows about you.

Your friendly local algorithm shop · I’m simultaneously a pretty extreme leftist and convinced that free marketplaces, deployed in our service, can produce useful results. I’d like there to be multiple feed-construction algorithms that are competing for my business, that people talk about at the pub while watching the hockey game or at the toddler drop-in or in Mastodon threads.

I’m pretty sure there are surprises waiting, and some of them are going to be good.

Contributions

Comment feed for ongoing:

From: MacCruiskeen (Nov 28 2022, at 11:08)

The flip side of the question of 'what algorithms are getting used' of course is 'what data is collected to feed the algorithms.' Twitter et al have incentive to collect as much as they can so as to provide ad-targeting to their advertisers. With the result that data management and security become *huge* privacy issues. Mastadon doesn't need to know a lot about me to work. I think one way of looking at the algorithm question is what does it need to know to work, and am I comfortable with that? And of course I don't want to see ads and stuff.

[link]

From: Kazinator (Nov 28 2022, at 11:46)

I think you're wobbly with your goalposts regarding algorithm.

What people refer to by "algorithm", in this context, is a selection strategy which suggests content for you which you have not searched for or opted into in any way.

This exists along side the other decisions like sorting of timelines and whether you see the same item multiple times via different boosts.

[link]

From: Lonk (Nov 28 2022, at 14:29)

Here is a suggestion for a very simple algorithm:

- the algorithm maintains a matrix of user-to-user "trust" values that is initialized with very small starting values (e.g. 0.0001)

- when you like/boost/upvote some content - your "trust" toward other people who liked that same content before you is increased by 1/N, where N is the number of people who liked it before you.

- when you dislike some content, your trust in other people who liked it is reduced by the same amount.

- each time you like some content - you spend a fraction of the trust other people have in you.

The content feed is ranked based on how much you trust people that have liked that content.

I believe this algorithm has the qualities we are looking for:

- transparent - easy to explain why you seen any piece of content which could be traced to the content you liked in the past

- gives the user explicit control over their feed

- fair at allocating your attention - those who get your attention must have proven to have be worthy of your attention in the past

If you would like to try such algorithm in action chech out my web address.

[link]

From: Bryan Newbold (Nov 28 2022, at 15:08)

I also love Jenny Holzer and used to wear a blunt "Protect Me From What I Want" shirt around at parties.

Mostly agree with what you are saying here, with two exceptions.

My main disagreement is with: "those algorithms are just showing you what you want. Don’t try to deny it, if it wasn’t what you wanted you wouldn’t be doomscrolling so much, would you?".

This conflates observed behavior and intention/desire. People enact behaviors against their own intentions and desires all the time! The ambiguity and context of the word "want" makes it powerful, poetic, and fascinating (see Holzner), but really muddies the waters here. I believe it is good to provide people with tools with which they can realize the behavior that they individually desire. Many people, myself included, want to customize tools to promote some of my own behaviors ("wants"?) and suppress other behaviors ("wants"?). Especially when it comes to daily information flow. I "want" to read interesting things on the internet. I also "want" to not regret the time I spent doing so afterwards.

A smaller point, which I suspect you would agree with, is that one of the biggest issues with "The" algorithm is that there was one dominant algorithm that *everybody else* was using all the time. The customization of algorithms doesn't just impact the individual reader, it also impacts the content that everybody else is creating and discussing. Eg, because what they are discussing is what they are seeing in their feeds (generated by "The" algorithm), or because they are crafting content to be maximally boosted/promoted/whatever by a specific algorithm.

[link]

From: PeterL (Nov 28 2022, at 15:49)

As someone who has worked on email spam (and also trust, because you need to know whether a spam or not-spam report can be trusted), I can assure you that whatever algorithm you come up with will not survive contact with reality. It's a constant arms race between the spammers and bots and the anti-spam/trust people. There are many more "features" (in the machine learning sense) than just looking at the text or the userid. With email, for example, the originating IP (sometimes), data rates that are typically seen from that IP address, whether a 450 response (mailbox temporarily unavailable) results in a later retransmit, etc., etc., etc.

For an algorithm that doesn't care about "engagement", but instead just tries to be helpful, take a look at how good (or bad) gmail is in classifying your email ("important", "social", "promotions", etc.) and consider how well that would work in an adversarial situation.

[link]

From: dr2chase (Nov 28 2022, at 17:44)

I have minor amateur experience (I am a compiler hacker who long ago had some mathematical education) and for me, my algorithm worked fantastically, and I think that there should be more like it. I am, unfortunately, not in a position to set up an algorithm shop because I am of course employed in tech and not thrilled at the prospect of paper work necessary to open-source my hack (plus my code is a badly maintained compost heap).

But, here's the informal description of an incredibly crappy algorithm that works. Start with a block list. Notice that some "words" are common, like MAGA and Molon Labe and #2ndAmendment (there's more, these are an example). Collect all the followers of the people on your block list, filter for those words.

Next collect all your friends and their friends. Be careful of people with a zillion friends (same for zillion followers above). These two sets are your bootstrap.

For the good and bad sets, collect all the words in all the profiles, and score words by frequency in both (this is crappy Bayesian analysis by an amateur). Use those words to derive "scores" for profiles. Start iterating over followers of people that you already block, as you encounter them, and score their profiles. Anyone with a bad-looking score is "bad".

There's a few well-known terrible accounts; just mark those bad and don't sweat the analytical impurity (example: Scott "Dilbert" Adams. Jordan "Lobsters" Peterson.)

Next/also, as the rate limit permits (and the rate limit is a problem) keep track of how many bad people follows-at-least-1-bad people follow (i.e., following at least 1 "bad" person gets you on the list, scanning of bad people is necessarily incomplete, so of those scanned, how many do they follow) and if that exceeds a %age threshold. Except, anyone in your F/FOAF set is excluded.

So, this just iterates, every so often, run it some more to find more people, if you stumble across some new nasty person in your social media, instead of just blocking, you drop him (it's usually a him) into the algorithmic hopper for processing.

From time to time do a little random sampling of people who have a "bad" profile or who followed too many other "bad" people to be sure that your thresholds are set right. You want your "bad" profiles to look like "what right thinking person would ever follow this jerk?!!!".

The final tweak, which makes it much more effective, to the point that I need to adjust the thresholds on the bad-network score, is if you encounter a new "bad" person, iterate over the people they follow, looking for known-bad people whose followers have not recently been scanned. That is, you just found a "new" bad person, their account may be new to this purpose, to find other "new" bad people, assume they are similar in who they follow.

And also, all the people whose badness exceeds a threshold, the algorithm blocks, automatically. That number currently exceeds 2 million on Twitter.

That's the entire crappy algorithm, and it works astonishingly well. It found fascists in languages I don't speak. It found rabid Brexit frothers. There's definite correlation between computed scores and apparent awfulness. One signal that it works pretty well is how often some quote-tweeted reply guy that appears in my feed is someone that I already blocked, versus not.

AND THIS IS A TERRIBLE ALGORITHM. I work with people who do real live machine learning, and compared to that, this is junk. Nonetheless, it works really well.

It was also educational. I am somewhat more inclined to believe that there is "something" going on when a dumbass algorithm pretty happily links up fascists all over the world; i.e., it looks like an organization. Some of the subgroups also network more carefully than the others, as if maybe they didn't want to be rolled up quite so quickly and automatically. I wish I knew enough ML to do image recognition, because boy-oh-boy are there some common patterns in profile photos and banners.

But could algorithms work, and work for us? Heck yeah, I have seen it. I have done it. And I did an awful job.

Note the one thing I had going for me; I only had to please myself. Nonetheless, I am willing to trust other people's judgement, generally, and used friends of my friends to generate a whitelist (another way to tell if your algorithm has gone a little funny in the head, is if it hits too many whitelist exceptions). I can pretty easily imagine some way of subscribing/combining other people's blocklists.

[link]

From: Jessica C (Nov 28 2022, at 19:54)

Jenny Holzer's "Protect me from what I want" is frequently referenced in Byung-Chul Han's Psychopolitics. It seems up your alley, value-wise.

[link]

From: Russell Beattie (Nov 28 2022, at 21:04)

You're totally correct. I wrote a blog post over a decade ago which analyzed Twitter feeds:

https://www.russellbeattie.com/blog/drinking-from-the-firehose

The short version of the post is that by following just a few dozen people who post a few times a day, it becomes pretty hard to keep up. Once you start following hundreds - it's basically impossible. It's just math - there are only so many minutes in the day and tweets add up.

At the time (at the bottom of the post) I assumed this mean that Twitter's days were numbered. It was obvious to me that most people were just posting into thin air and eventually that would get old.

But what I hadn't foreseen is the impact that engagement algorithms would have on social media. Without them, all the activity is just so much noise - like a busy IRC channel - but with them, users are able to see interesting posts every time they open the app.

Twitter is generally seen as a communications platform where following an account means that you want to be aware of that account's activity. But it's not and hasn't been for a long time. It's a post aggregator, where "following" is simply a way of adding to the pool from which posts may be selected and shown to you. It simply can't work any other way.

Mastodon users are going to discover this basic fact soon enough as the excitement dies down and people realize how tedious and boring it is to wade through posts in chronological order, and how much they miss on a daily basis once they follow enough people, and how little interaction they get as others miss their posts as well.

Again, it's just math. Without an algorithm, it'll all become noise sooner than later.

[link]

From: Ryan Baker (Nov 28 2022, at 21:20)

I feel "protect me from what I want" must mean something a little different.. "protect me from what will make me crave more" might be more accurate?

They seem close, but in the realm of an ML optimized model, that small difference is a lot bigger than is comfortable.

On a different note, one difference among individuals, is that some may want more unique material, and some may want more common material. One may want deep material, and one may want the simple recent facts.

As far as I know, the only algorithms I've seen even attempt at that were music playlist algorithms (new material bias was an option in Spotify and Pandora at points in the past but are no longer a thing as far as I know.. Spotify replaced with Discover Weekly)

How else might I favor what I want, vs. what makes me crave more. If there's a rational productivity function, that'd be a good one. Like, if it could tell I was getting smarter, but spending less time to do it, that'd be great.

Might seem outlandish, but people have used ML algorithms to measure things like that, or happiness, which would be another good one.. Am I happier after an hour of using this site, or less happy? Though if anyone does do that, do realize you should look at both the short and the long term.. does no good to make me happy after 1 hour if it every day that peak is a little lower than the last because the overall average is degrading day by day.

[link]

From: Michael (Nov 29 2022, at 02:56)

it will be interesting to see how instance owners decide to deal with their funding challenges as the fediverse continues to grow.

Even if advertising is fought off, owners will surely realise that their funding will be maximised if their users find they're spending more time here, and so might still be tempted to use those same engagement algorithms.

[link]

From: Alastair (Nov 29 2022, at 11:03)

Thanks, I'm now earwormed with Placebo's *Protège-moi* (apparently inspired by Jenny Holzer)

[link]

From: Chris Quenelle (Nov 30 2022, at 22:43)

Thanks for the insight, it helped get me thinking about some related topics.

http://quenelle.org/tech/2022/content-views.html

c.im/@cq

[link]

ongoing

What this is ·

Truth · Biz · Tech