hckrnws
Device-detector: Universal Device Detection library from User Agent
by josephscott
I've got a more limited and performant library I maintain. It frequently comes in first in speed comparisons.
It can only tell you things actually included in the UA string itself as it's just be a parser and not a "knowledge engine"
I have a bot I wrote to help me with various web tasks that are too tedious manually. I just tested it against this and it says "isbot: false".
edit: looks like it only detects bots that overtly identify themselves as bots, e.g. Googlebot -- it's designed to identify clients, not as some sort of security device
The worst bot normally doesn't identify themselves as bots.
A really useful flag would be "isAIBot" so you can tell them to f-off. A colleague returned from SRECon and had been asking around in regards to bots from AI companies and it's getting ridicules. AI companies are just hammering sites left and right, to the point where some are hitting the limits on their deals with hosting companies and transit providers.
And you can't filter them out, because their running on AWS, Azure or GCP IPs and aren't identifying themselves properly.
Why allow free and unrestricted connections from hosting companies in the first place?
It has a misleading name, it should be just a user-agent detector, as one cannot identify devices by just looking at user-agent header.
That flag should be called isKnownBot.
It’s all well and good but you really shouldn’t be relying on the user agent string for much more than identifying bots who wish to be known.
It's mostly just used for analytics. Getting a rough idea of how people are accessing your services.
PHP made it to the front page!
[flagged]
Many years ago (pre-smartphone), there was a Java library, written by an Italian chap, that did pretty much the same thing. Don’t remember the name. This appears to use the same approach. I think they had a PHP version, but that was a long time ago. I know it was several megabytes, which was huge, in those days.
Did what it said on the tin, but did so, by maintaining a huge list of individual devices and their characteristics. At the time, I chose not to use it (I was developing a [c]WAP server), but it had a number of supporters, and its maintainer was pretty sharp, and quite dedicated.
These days, there’s an order of magnitude more devices, and a much greater variety. Big job.
You're talking about WURFL and Luca Passani. I recall working with it for a project 15 years ago.
Same here... back in 2011 or so. We needed something much more performant than WURFL. My efforts eventually became a feature/product at Akamai known as "Edge Device Characterization" (EDC) using algorithms not dissimilar to how LLMs are trained today.
I can't speak to how good the actual product is today (or even when it launched, but that's a whole 'nother story), but during development it was capable of processing 100K RPS in a footprint of ~30MB RAM with ~98% accuracy compared to WURFL as a baseline.
Wow, I remember WURFL! I used this at my second-ever job, back when mobile was still taking off, and we were trying to create some sort of mobile-server-plugin-thing for a big Java CMS monstrosity thing, as well as running NYC Restaurant Week's mobile site.
Those were not the good ol' days.
BTW: It looks like WURFL is still a thing. He seems to have made a business of it[0].
The key to WURFL was a massive XML file. That was constantly updated with the latest gizmos, and whatnot.
Yup. That's it.
If you're looking for a Ruby implementation based on the same underlying user-agent parsing data, here you go: https://github.com/podigee/device_detector
The README lists all the ports here: https://github.com/matomo-org/device-detector?tab=readme-ov-...
I'm just curious — what could be a potential use case for such things on the backend? For bot detection, it seems quite unreliable. Would it be more suitable for server-side rendered UIs? Or am I missing something?
The most obvious use, is to have a server that only delivers content relevant to the end-user device.
This would be very useful in limited-bandwidth scenarios.
Responsive sites aren’t the same. They deliver all the data, but filter it in the UI.
Obviously the formatting differ, but why would you deliver different content to different devices?
Hypothetical example: When I open Twitter in the browser, I see a feed - but I also see a "What's Happening" section, and a "Who To Follow" list of suggestions, as well as what looks to be my inbox, minimized. Plus, the feed itself automatically loads the images that people are tweeting.
If you know a client is likely to be from a place where bandwidth is expensive, you may choose not to load the "What's Happening"/"Who To Follow", or the messages, or possibly even the image URLs (which I'd guess come from the backend with an array of URLs of those images in various sizes & resolutions.)
Hell, you might even load a smaller subset of the feed - 10 items instead of 30.
1) End device has ability to display HiDPI images -> Send big
2) End device does not have ability to display HiDPI images -> Send small
Of course, if you have (1), in a low-bandwidth environment, then you actually want the server to send small, even if the device can handle big, but that can be indicated with a different flag.
The `img` tag in HTML already supports that.
Yes. But the same principle applies to things other than images. For example, regular PDFs, vs. optimized ones.
Analytics is the area I have seen this used the most.
Good tool. I wish Google had gone even further with Chrome in reducing the information in the user agent. It seems like user agent is primarily used as a browser fingerprinting signal.
Indeed - its been years since browsing anonymity was possible (without rigorous opsec and inconvenience).
Believe me, no one identifies users by UA at the moment, it's pointless and even fingerprintv4+ does not do this.
I need a way to detect the screen DPI from the user agent, so I can return higher resolution images only to devices that can use them. I realize detecting that based on user agent may not always be accurate, but surely it could work the vast majority of the time. Does anybody know of a lib that implements that on NodeJS?
Consider using the img tag’s srcset property for this purpose. It has many advantages over what you are suggesting.
https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageEl...
I didn't realize srcset could select for "pixel density", thanks for the tip!
Please consider taking network speed into account. The device can be great but on mobile network it may take ages to load everything, depending on the location (e.g. on a train you may not have stable 5G long enough).
This is still a consideration, and one of the reasons that having a customized server delivery is an important capability.
Responsive sites still upload the same data, but show less of it to you.
That said, if there were a way to report network connection speed to the server, it could make the decision to reduce the data load (regardless of end device).
what can you say about DPI from a string like "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"? I don't think it's possible from the user agent, but it's a one liner in Javascript
DeviceAtlas has a displayPpi property for this purpose and has a NodeJS API.
Way too expensive for me.
Does something similar exist for python or node.js ?
If not I would like to contribute to that as an open source.
I made something similar, but with a JavaScript-first approach:
https://github.com/matomo-org/device-detector
List of ports
i was need something similar for golang and i try to use regexes in those projects, but in eye of performance it wasnt good enough. sometimes i wish to understand more deeply regexes.
it maybe another way to speed up for golang like prefix tree instead of using regexes, any one know a something similar for golang?
The reverse would also be handy. Device-pretender: Universal comprehensive User Agent from pretender library.
Comment was deleted :(
Crafted by Rajat
Source Code