curl_impersonate, the idgames api, and burning off your fingerprints

okay look: i am not a professional ban evader and adversarial scraper. i'm just a guy who wants to download some things with curl sometimes. i had recently set up something using the idgames api using the most advanced technology available to human beings: curl piped into jq. everything was going great... up until i started getting hassled by cloudflare. putting cloudflare in front of something that's designed for automated access is pretty dubious in the first place, but it's one of those things that's probably more effort to fix than it's worth. same for getting off of cloudflare entirely. anyway, this kind of automated access is the sort of thing that cloudflare tries to tamp down on. likely exacerbating the problem is that i have recently started using librewolf in place of firefox. it's been a really nice improvement, actually - stuff like pocket and all the garbage on thet start page gets stripped out of the binary entirely. it also comes with built-in browser fingerprint blocking, which really highlights how heavily that gets used in the modern web. taking a spin with the EFF's tester shows how much more sophisticated this stuff has gotten in the decade or so since I last checked. in particular, the most sophisticated fingerprinting measure how things look as you render them in webgl, the exact results of which are dependent on the exact hardware setup you have. this is very difficult to evade... unless you turn off webgl entirely, which librewolf does by default. this is already enough to get bounced by a decent number of websites, or have to go through a cloudflare captcha to actually get to what you were looking at. cloudflare is frustratingly opaque in this regard and i have no idea how much that carries over between different clients on the same ip.

in any case, i eventually started to get cloudflare challenge pages from the idgames api instead of the actual content i was after. obviously curl can't actually handle whatever javascript nightmare gets run in your browser when this happens, so i was stuck. enter: curl_impersonate. some kind soul sat down with wireshark and painstakingly combed through actual browser ssl handshakes, recompiled curl to use either nss (firefox) or boringssl (chrome) instead of openssl, and took a deep dive through the browser's source. that work is detailed in but at the end of the day i don't have to care about any of that. it's just a dropin replacement for regular curl. and it works! it looks enough like a real browser to fool cloudflare. part of me really wants to lambast people for using this kind of stuff and locking things down to known browsers. but in this day and age, there's a much larger amount of money on the scraping side than there's ever been - enough to impose real costs on people who are out there running stuff that makes the internet a better place. so i actually really can't hold all this fingerprinting business against anyone as much as i would like to.

there's none of that here though. just pure, uncut html and css. at least until the scrapers come by to start causing problems on purpose. actually looking at my logs it's mostly exploit probers. there's no php on here! you're wasting your time! presumably this is for owning boxes so they can mine crypto on them. but that's a whole other set of problems.