Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 22 February 2020
Codeless ML, Gender Tagging, Circular Transformation, and Rat Chat
- Teachable Machine — Google’s codeless ML training.
- Google AI No Longer Uses Binary Gender Tags on People (Input Mag) — the change is already in effect. Credited to their new AI Principles.
- VP of Something (Matt Webb) — It’s pretty clear to me that in 10 years time, sustainability will have to be a VP role, if not a C-level role, and “circular transformation” (I just made that up; you can have it) will be a phrase for the 2020s, just as “digital transformation” was the business mantra for the 2010s.
- DeepSqueak — Developed by researchers Russell Marx and Kevin Coffey at the University of Washington School of Medicine, the software uses sophisticated deep learning algorithms (hence the name “DeepSqueak”) to automatically pick rodent calls out of raw audio, compare them to calls with similar characteristics, and even look for patterns in the squeaks’ order. Not much is currently known about what all those squeaks mean, but Coffey hopes that once enough biologists compile enough calls, a sort of murine “Rosetta Stone” will emerge.
Four short links: 20 February 2020
Computer Laws, Embedded Systems, Moore's Law, and Crowdsourcing Data
- Hacker Laws — a lot of classics, like Cunningham’s Law: The best way to get the right answer on the internet is not to ask a question—it’s to post the wrong answer. And Kernighan’s Law: Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
- A Practical Guide to Watchdogs for Embedded Systems — a lot of good advice and sample code.
- Not Everyone Thinks Moore’s Law is Over — legendary microprocessor engineer says, “I’m expecting more transistors every 2-3 years by a number large enough that how you think of computer architecture has to change. And his reasoning is sound as a generalization.
- ADS-B Data Sharing — There are many websites tracking aircraft, and all of them rely on data shared by ADS-B fans. However, the access to aggregated ADS-B worldwide data is limited. The main goal of ADSBHub is to become a ADS-B data sharing center and valuable data source for all enthusiasts and professionals interested in development of ADS-B-related software. Collaborative project that has the best data for what’s in the air.
Four short links: 19 February 2020
Devil's Dictionary, Bioinformatics, OpenAI, and AI Research Rankings
- Devil’s Dictionary of Programming — dsl—a domain specific language, where code is written in one language and errors are given in another.
- Rosalind — a platform for learning bioinformatics and programming through problem solving.
- OpenAI Gets Commercial (MIT TR) — deep learning seems to be the most effective direction for AGI research; compute required is doubling every 3.4 months, which consumes cash, so they need to make money, so there’s a lot less publishing and a lot more nervous eyes toward commercial projects that might bring in revenue.
- AI Research Rankings 2019 — China has steadily increased its share of authorship of the top 10% most-cited papers: China’s share was at 26.5% in 2018, not far behind the United States at 29%.
Four short links: 18 February 2020
Generating Test Cases, Regulating Social Media, Incentivizing Scientists, and Questions for Managers
- Millions of Tiny Databases — a distributed database system, whose developers used TLA+ to provide proof of correctness. And test cases! (via Marc Brooker)
- Charting a Way Forward (Facebook) — Facebook’s discussion document for regulating social media. This paper explores possible regulatory structures for content governance outside the United States and identifies questions that require further discussion. It builds off recent developments on this topic, including legislation proposed or passed into law by governments as well as scholarship that explains the various content governance approaches that have been adopted in the past and may be taken in the future. Its overall goal is to help frame a path forward—taking into consideration the views not only of policymakers and private companies, but also civil society and the people who use Facebook’s platform and services.
- Stagnation and Scientific Incentives — We demonstrate empirically that measures of novelty are correlated with but distinct from measures of scientific impact, which suggests that if also novelty metrics were utilized in scientist evaluation, scientists might pursue more innovative, riskier projects.
- 1-on-1 Meeting Questions — Mega list of 1-on-1 meeting questions compiled from a variety of sources.
Four short links: 17 February 2020
Hackers vs. Coders, Deep Learning, Bluetooth Vulnerabilities, and the Deadlock Empire
- Coders are not Hackers: Hacker Culture is Dead and Coders Killed It (Twitter) — interesting thesis, though I think there’s a non-trivial amount of rhetorical fudging in those claims. I subscribe to Mel Conway’s idea that software is bifurcating into tool-makers and tool-users, or architects and plumbers. The industry in 2020 feels qualitatively very different than it did in 2000.
- Trax — Google Brain’s deep learning library. Trax helps you understand and explore advanced deep learning. We focus on making Trax code clear while pushing advanced models like Reformer to their limits.
- Bluetooth Vulnerabilities — I’m always surprised by how many things Bluetooth is in, especially the things that I wouldn’t have imagined flashing firmware on. (“Buoy device”, “Inhaler”, “Blood Glucose Meter”).
- The Deadlock Empire — Welcome to The Deadlock Empire, commander! The skills you need are your intelligence, cunning, perseverance, and the will to test yourself against the intricacies of multi-threaded programming in the divine language of C#. Each challenge below is a computer program of two or more threads. You take the role of the Scheduler—and a cunning one! Your objective is to exploit flaws in the programs to make them crash or otherwise malfunction.
Four short links: 14 February 2020
Advanced Binary Deobfuscation, Data Liberation, Video Conversion, and Tesla Firmware
- ABD — Course materials for Advanced Binary Deobfuscation by NTT Secure Platform Laboratories.
- Building Data Liberation Infrastructure — You can treat this as a tutorial on liberating your data from any service. I’ll be explaining some technical decisions and guidelines on: how to reliably export your data from the cloud (and other silos), locally; how to organize it for easy and fast access; how to keep it up to date without constant maintenance; how to make the infrastructure modular, so other people could use only parts they find necessary and extend it.
- AutoFlip — Google’s AI-powered open source tool for intelligently converting video between landscape, portrait, square, etc., dimensions. (via Google AI Blog)
- Reverse-Engineering the Tesla Firmware Upgrade Process — always interesting to see how other people do it, and the open questions about how Tesla’s security is implemented.
Four short links: 13 February 2020
UK Internet Regulation, Julia ML, Innovation Prizes, and Face Generating GANs
- Ofcom To Regulate UK Internet — The regulator will play a key role in enforcing a statutory duty of care to protect users from harmful and illegal terrorist and child abuse content.
- Turing — Julia library for fast machine learning.
- The Effects of Prize Structures on Innovative Performance — We find that a winner-takes-all compensation scheme generates significantly more novel innovation relative to a compensation scheme that offers the same total compensation, but shared across the 10 best innovations. Moreover, we find that the elasticity of creativity with respect to compensation schemes is much larger for teams than individual innovators.
- 4.5 Years of GAN Progress on Face Generation (Ian Goodfellow) — the picture is impressive.
Four short links: 12 February 2020
Engineering Strategy, Many Parameters, Organizing Information, and 3D Printing
- Drafting an Engineering Strategy (Mathias Meyer) — how he crafted his strategy as a new (remote) CTO.
- T-NLG (Microsoft) — a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. If you want me, I’ll be here still boggling at 17B parameters.
- Contextualise — a simple and flexible [open source] tool particularly suited for organizing information-heavy projects and activities consisting of unstructured and widely diverse data and information resources.
- McDonald’s Deep Fryer Oil to 3D-Printing Resin — what this article doesn’t mention is whether it still smells delicious. Another key advantage is biodegradability. The researchers found that burying a 3D-printed object made with their resin in soil lost 20% of its weight in about two weeks. “If you bury it in soil, microbes will start to break it down because, essentially, it’s just fat,” Simpson says.
Four short links: 11 February 2020
Empires, Intelligence, Privacy, and Prisoner's Dilemma
- The Fate of Empires — 1977 text summarizing the history of empires, claiming the stages of the rise and fall of great nations seem to be: The Age of Pioneers (outburst); The Age of Conquests; The Age of Commerce; The Age of Affluence; The Age of Intellect; The Age of Decadence. All history is bunk, of course, and history that neatly buckets diverse experience is doubly bunk, but the application of these stages to large companies is left as an exercise to the reader.
- Applied Thinking for Intelligence Analysis — Australian Air Force guide. Time pressures increase the risks of mistakes, confirmatory thinking, cognitive bias, and seizing on the first piece of relevant information an analyst finds. Time pressures also decrease analysts’ use of what they perceive to be lengthy analytic techniques and self-conscious critical thinking approaches. Applicable to most decision-making, not just intelligence analysis.
- US Agencies Using Phone Location Data for Immigration Enforcement (CNet) — policies can ban the government from gathering data, but if they don’t prevent the government from buying the data or outsourcing the task to people who gather or buy the data, then the data will still be used to do the task.
- Axelrod — Python library for iterated Prisoner’s Dilemma research.
Four short links: 10 February 2020
Digital Dictators, Meaningful Availability, GTM Metrics, and High-Dimensional Interactive Plots
- The Digital Dictators: How Technology Strengthens Autocracy (Foreign Affairs) — Dictatorships harness technology not only to suppress protests but also to stiffen older methods of control.
- Meaningful Availability — Google paper. This paper presents and evaluates, in the context of Google’s G Suite, a novel availability metric: windowed user-uptime. This metric has two main components. First, it directly models user-perceived availability and avoids the bias in commonly used availability metrics. Second, by simultaneously calculating the availability metric over many windows it can readily distinguish between many short periods of unavailability and fewer but longer periods of unavailability.
- GTM Metrics — Tweetstream. There is a dazzling amount of inconsistency in what GTM metrics are presented at board meetings of early stage B2B companies. Here is my hit list of the most important, and why.
- HiPlot: High-dimensional Interactive Plots Made Easy (Facebook AI) — a lightweight interactive visualization tool to help AI researchers discover correlations and patterns in high-dimensional data. It uses parallel plots and other graphical ways to represent information more clearly, and it can be run quickly from a Jupyter Notebook with no setup required.
Four short links: 7 February 2020
API Security, Automated Video Moderation is Hard, Mining Massive Data Sets, and Tech Worker Compensation
- 31 Days of API Security Tips — Mobile Certificate Pinning? Before you start reverse engineering and patching the client app, check for both iOS and Android clients, and older versions of them. There’s a decent chance that the pinning isn’t enabled in one of them. Save time.
- Pornhub Doesn’t Care (Vice) — Pornhub’s automated detection of banned content is severely flawed, as minor edits are sufficient to present old content as new. This is closely related to the trouble that Facebook and YouTube had identifying edited versions of the Christchurch shooter’s video.
- CS246: Mining Massive Data Sets — Stanford class, with slides and suggested readings.
- Tech Workers’ Compensation — Tech favors the young. For people with more than 15 years of experience, there’s practically no correlation between years of experience and income (corr < 0). After 15 years of experience, you either retire, switch to management, or change career. (via hardmaru on Twitter)
Four short links: 6 February 2020
Identifying Doctored Images, Demonstrating ML, Radioactive Data, and Search All Your Things
- Assembler — Google’s Jigsaw group is releasing a tool to spot faked/doctored images.
- Demonstrating Machine Learning with Starbursts — simple demonstration for kids that illustrates “learning” to perform a task without explicit programming to accomplish that task.
- Using “Radioactive Data” to Detect Whether a Data Set Was Used For Training (Facebook AI) — We call this new verification method “radioactive” data because it is analogous to the use of radioactive markers in medicine: drugs such as barium sulphate allow doctors to see certain conditions more clearly on computerized tomography (CT) scans or other X-ray exams. We introduce unique marks that are harmless and have no impact on the classification accuracy of models, but remain present through the learning process and are detectable with high confidence in a neural network. Our method provides a level of confidence (p-value) that a radioactive data set was used to train a particular model.
- ripgrep-all — ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Four short links: 5 February 2020
Go to Rust, Underinvestment in Quality, Ad Tech Report, and Mission Creep
- Discord Switching from Go to Rust — memory management proved a deciding factor: Go’s garbage collector versus Rust’s compile-time ownership of memory.
- What Happened with DNC Tech (Rabble) — a lot of background and context, but the crux is: the decision-makers refuse to use free software, alienating the progcoders/ragtag communities. They also refuse to fund projects between cycles to build reusable platforms. Best pithy comment: underinvestment in quality is still the median software project experience by William Pietri.
- Out of Control — Norwegian Consumer Council report on the ad tech industry. Many actors in the online advertising industry collect information about us from a variety of places, including web browsing, connected devices, and social media. When combined, this data provides a complex picture of individuals, revealing what we do in our daily lives, our secret desires, and our most vulnerable moments. This massive commercial surveillance is systematically at odds with our fundamental rights and can be used to discriminate, manipulate, and exploit us. They’re filing lawsuits against companies, too.
- Morning Report: Smart Streetlights Are Experiencing Mission Creep — it’s my theory that once something is shown to be possible with bits, it will happen because there’s such diversity of motivations in the world that someone wants it to happen, and it’s nigh impossible to prevent things without physical restrictions. Not everyone can put cameras and other sensors into streetlights, but once they’re there, we’ll see every application of that data justified. It’s the Tragic Inevitability of Software, which we’re only beginning to appreciate.
Four short links: 4 February 2020
Tools Course, Rich vs. Famous, Hacker Brain, and Network Data Analysis
- The Missing Semester of Your MIT Education — We’ll teach you how to master the command-line, use a powerful text editor, use fancy features of version control systems, and much more.
- Reasons Not to Be Famous (Tim Ferris) — my buddy Rowan Simpson talked about this over a decade ago, and it’s still something most people don’t think about.
- Expert Programmers Have Fine-tuned Cortical Representations of Source Code — This approach enabled us to identify seven brain regions, widely distributed in the frontal, parietal, and temporal cortices, that have a tight relationship with programming expertise. In these brain regions, functional categories of source code could be decoded from brain activity and the decoding accuracies were significantly correlated with individual behavioral performances on source-code categorization. Our results suggest that programming expertise is built up on fine-tuned cortical representations specialized for the domain of programming.
- nfstream — a Python package providing fast, flexible, and expressive data structures designed to make working with online or offline network data both easy and intuitive.
Four short links: 3 February 2020
Regulation, Learning Algorithms, Engineering Beliefs, and Google Maps Hacks
- Standing on the Shoulders of Giants (Ben Evans) — I like his framing of the problems (“tech companies being bad to other companies,” “tech companies being bad to us,” “bad guys using tech”).
- Algo Deck — an open-source collection of +200 algorithmic cards [for Anki]. It helps you prepare for and succeed in your algorithm and data structure interview. The code examples are in Java.
- Things I Believe — these two provoke a lot of thoughts: There are many fundamental discoveries in computer science that are yet to be found. Peak productivity for most software engineers happens closer to two hours of work a day than eight hours.
- Google Maps Hacks — 99 secondhand smartphones are transported in a handcart to generate virtual traffic jam in Google Maps.Through this activity, it is possible to turn a green street red which has an impact in the physical world by navigating cars on another route to avoid being stuck in traffic.
Four short links: 31 January 2020
Thunderbird, Disassembly, Security Keys, and ML Systems Design
- Thunderbird on the Move (ZDNet) — the news is not that interesting, except that it represents signs of life for Thunderbird.
- SourceGen Disassembly — a collection of disassembly projects, mostly Apple II games. The machine-language portions and embedded graphics have been converted to readable form.
- OpenSK — an open-source implementation for security keys written in Rust that supports both FIDO U2F and FIDO2 standards—from Google.
- Machine Learning Systems Design — 27 open-ended questions that test your ability to […] design systems to solve practical problems.
Four short links: 30 January 2020
Open-Domain Chatbot, Coding Style, Game Mod Copyright, and Design for Repair
- Towards a Human-like Open-Domain Chatbot — Google’s paper on making a chatbot that can have a vaguely plausible conversation on any subject. (via Google’s AI Blog)
- fast.ai’s Coding Style — interesting to see how different they are from historic coding standards, but they justify those differences (and aren’t dogmatic about the code they accept).
- All Your Mods Are Belong to Us — if you make a “custom game” (aka mod), then Blizzard owns the copyright. An example of attempting to capture all the value you create.
- Engineering a Repairable World — we need to view repair not only as an entryway to the field, but also as an essential or even ethical element of sustainable design and engineering.
Four short links: 29 January 2020
Speculative Debugging, Planned and Unplanned Work, Legible News, and Ops Lessons
- Reverb — speculative debugging for web applications. […] Reverb has three features that enable a fundamentally more powerful debugging experience. First, Reverb tracks precise value provenance, allowing a developer to quickly identify the reads and writes to JavaScript state that affected a particular variable’s value. Second, Reverb enables speculative bug fix analysis. […] Third, Reverb supports wide-area debugging for applications whose server-side components use event-driven architectures. (via Morning Paper)
- Planned vs. Unplanned Work (John Allspaw) — Planned work comes from experience, and experience comes from unplanned work.
- Legible News — scraped from Wikipedia’s current affairs section, and presented in a wonderfully minimalist fashion.
- Ops Lessons We Learn the Hard Way — Your network team has a way into the network that your security team doesn’t know about. (via BoingBoing)