Interactive visualization of technological history.
Researchers have discovered a new way to covertly track Android users. Both Meta and Yandex were using it, but have suddenly stopped now that they have been caught.
The details are interesting, and worth reading in detail:
>Tracking code that Meta and Russia-based Yandex embed into millions of websites is de-anonymizing visitors by abusing legitimate Internet protocols, causing Chrome and other browsers to surreptitiously send unique identifiers to native apps installed on a device, researchers have discovered. Google says it’s investigating the abuse, which allows Meta and Yandex to convert ephemeral web identifiers into persistent mobile app user identities.
The covert trackingimplemented in the Meta Pixel and Yandex Metrica trackersallows Meta and Yandex to bypass core security and privacy protections provided by both the Android operating system and browsers that run on it. Android sandboxing, for instance, isolates processes to prevent them from interacting with the OS and any other app installed on the device, cutting off access to sensitive data or privileged system resources. Defenses such as state partitioning and storage partitioning, which are built into all major browsers, store site cookies and other data associated with a website in containers that are unique to every top-level website domain to ensure they’re off-limits for every other site.
Washington Post article.
Hovertext:
Fortunately it later finds some ants engaging in a simple form of market exchanges.
The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.
Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.
Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:
Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.
And no, a robots.txt
file doesn’t help.
If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.
Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:
LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.
You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:
There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.
My own experience with The Session bears this out.
Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .
So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.
When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.
The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.
If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.
If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.
A Redditor shared this message with his neighbor in Seattle, Washington. The neighbor is a Tesla-owning jerk.
I don't think it qualifies as terrorism if these are attacks targeted against specific individuals. This cheesing of a Tesla was earned through behavior. — Read the rest
The post Best of the internet: "To My Neighbor Whose Tesla Is Covered in Kraft Singles" appeared first on Boing Boing.