Four short links: 20 August 2019
Content Moderation, Robust Learning, Archiving Floppies, and xkcd Charting
- Information Operations Directed at Hong Kong (Twitter) — Today we are adding archives containing complete tweet and user information for the 936 accounts we’ve disclosed to our archive of information operations—the largest of its kind in the industry. This is a goldmine for researchers, as you can see from Renee DiResta’s notes. Facebook also removed accounts for the same reason but hasn’t shared the data. Google has not taken a position yet, which prompted Alex Stamos to say, “Two of the three relevant companies have made public statements. Neither have realistic prospects in the PRC, the other does. Lots of lessons from this episode, but one might be a reinforcement of how Russia represents ‘easy mode’ for platforms doing state attribution. It’s a lot harder when the actor is financially critical, like the PRC or India.” We’re in interesting times, and research around content moderation are the most interesting things I’ve seen on the Internet since SaaS. This work cuts to human truths, technical capability, and the limits of openness.
- Robust Learning from Untrusted Sources (Morning Paper) — designed to let you incorporate data from multiple “weakly supervised” (i.e., noisy) data sources. Snorkel replaces labels with probability-weighted labels, and then trains the final classifier using those.
- Imaging Floppies (Jason Scott) — recording the magnetic strength everywhere on the disk so you archive all the data not just the data you can read once. The result of this hardware is that it takes a 140 kilobyte floppy disk (140k) and reads it into a 20 megabyte (20,000 kilobyte) disk image. This means a LOT of the magnetic aspects of the floppy are read in for analysis. […] This doesn’t just dupe the data, but the copy protection, unique track setup, and a bunch of variance around each byte on the floppy to make it easier to work with. The software can then do all sorts of analysis to give us excellent, bootable disk images. Don’t ever think that archiving is easy, or problems are solved.
- Chart.xkcd — a chart library plots “sketchy,” “cartoony,” or “hand-drawn” styled charts. The world needs more whimsy.