Clean URL slugs: The Good, the Bad, and the Ugly
A slug is the last part of the URL and identifies a specific page. For example, on my blog, the slug of https://michaelzanggl.com/articles/clean-slugs-the-good-bad-and-ugly/ is "clean-slugs-the-good-bad-and-ugly". This is a
clean URL as opposed to something like "https://michaelzanggl.com/articles/232".
As you can imagine this has some great benefits. It makes for a human-readable URL and is great for SEO.
I remember, years ago, back in the old YouTube, people would put random/funny words in the slug to find YouTube profiles under that name to chat or leave comments on their page. When the internet was still about discovery ❤️
The good thing about IDs is that they don't change. This is not true for text. See for example:
- a GitHub repository: https://github.com/MZanggl/promistate
- an article on dev.to: https://dev.to/michi/tailwind-css-for-skeptics-interactive-tailwind-css-tutorial-1h15
They both contain two slugs. The username and the name of the repository/article.
But what happens if you change your username?
Well, GitHub has an entire article about the side effects: https://docs.github.com/en/github/setting-up-and-managing-your-github-user-account/changing-your-github-username
It says that the old username will be made available again for others to use. It goes on to say:
" Web links to your existing repositories will continue to work. ... If the new owner of your old username creates a repository with the same name as your repository, that will override > the redirect entry and your redirect will stop working. "
This, of course, has the potential to be exploited. If you forget to update only one link to a repo of yours in a blog post, package.json, or elsewhere, somebody can now claim that username, create a repository under the same name, and cause some pretty hefty damage under your name.
dev.to also allows changing the username/title but leaves you in the dark with what will happen to the old handle and all the links associated with it.
Slugs might work great for English and languages that use the Roman script but look what happens when I copy the link from a product page from the Japanese Amazon store:
Now imagine sending such a link on an app like WhatsApp where you just have a little space. It's quite the opposite of human-readable, how ironic.
Wait, but why is this happening in the first place?
An explanation as to why the browser returns you this encoded version of the link can be found here.
Let's confirm what is said by checking the request on the Amazon site's GET request:
I'm not an expert on this topic, but the reason why URI, instead of IRI, is still being used to this day might be due to exploits like the IDN homograph attack.
While slugs do enhance UX, oftentimes it's only minor enhancement. We already have open graph to display title, description, image, etc. when sharing links. Sites like twitter use URL shorteners anyways. Referencing sites on Reddit, blog posts, news articles, etc. usually don't show the raw link, but provide alternative text, etc.
Now, slugs are still great. But it's not as simple as adding one to your website and you are done (unless you have total control over the content...).
If you plan to implement slugs into your website, make sure you
- Educate your users about what happens when you change content that is part of a slug
- Only include URL-friendly characters
I guess the gist of this post is that just because something looks simple in tech, it often comes with a lot of added complexity, and it's never as simple as "just doing this one little thing". The YouTube video page works just fine with just an ID in the slug.