Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 28 January 2020
TinyML, Probability, 51% Attacks, and Brad Fitzpatrick
- TinyML Book — machine learning for embedded systems, an O’Reilly book by Pete Warden and Daniel Sityunake.
- Useful Probability for Systems Programmers — interesting findings like: If you have 1N chance of success, then you’re more likely than not to have succeeded after N tries, but the probability is only about two thirds.
- Cost of 51% Attacks — This is a collection of coins and the theoretical cost of a 51% attack on each network.
- Brad Fitzpatrick Leaving Google — with a concise summary of his amazing track record. What next? TBA. But building something new.
Four short links: 27 January 2020
Developer Productivity, Compilers Course, Terminal Shooter, and Future of Coding
- The Developer Coefficient (Stripe) — Access to developers is a bigger threat to success than access to capital. […] The average developer spends more than 17 hours a week dealing with maintenance issues, such as debugging and refactoring. In addition, they spend approximately four hours a week on “bad code,” which equates to nearly $85 billion worldwide in opportunity cost lost annually, according to Stripe’s calculations on average developer salary by country
- Stanford Compilers Course — self-directed MOOC goes away on March 26, so get amongst it while you can.
- Terminal Phase — a space shooter game you can play in your terminal.
- Synthesizing Data-Structure Transformations from Input-Output Examples (Morning Paper) — I believe I’ve linked to the paper before, but I just noticed this interesting point: It is known from prior work that such [functional] languages offer natural advantages in program synthesis. Good to see Adrian (the Morning Paper guy) is interested in the same “future of coding” areas that I am. This promises to be an interesting series of papers he looks at.
Four short links: 24 January 2020
Virus Genomes, Kubernetes Security, Copyright Crisis, and Startup Validation
- China Open Sourcing the Wuhan Coronaviruses Genomes (Twitter) — fast-tracking research.
- kube-scan — Octarine k8s cluster risk assessment tool.
- Copyright is in Crisis (Cory Doctorow) — excellent excoriation of the state of the creative industries, where consolidation and regulation work against the creators and for the middlemen.
- Validating Startup Ideas — Our goal in publishing this is to help other founders think about how to do early validation the way that we do inside the studio.
Four short links: 23 January 2020
Formal Methods, Backends, Binary Representation, and Chat Bridging
- The Business Case for Formal Methods — a short explanation, a list of benefits and case studies, and a demo. Everything’s in TLA+, but the arguments apply equally well to Alloy, B, statecharts, etc. (Via Lobsters)
- Backend Lore — From late 2012 to the present I have been writing backends (server-side code) for web applications. This document summarizes many aspects of how I write these pieces of code.
- float-toy — play with the binary representation of IEEE floats.
- matterbridge — [chat] bridge between mattermost, IRC, gitter, xmpp, slack, discord, telegram, rocket.chat, steam, twitch, ssh-chat, zulip, whatsapp, keybase, matrix, and more with REST API (mattermost not required!)
Four short links: 22 January 2020
Unending Projects, Work/Life Game, Software Characterization, Team Dynamics
- Elements of Scheduling — notable for several things, but my eye was caught by: finite convergence to completion fell beyond our reach. I know that state.
- Dungeons and Deadlines — a game of work/life balance.
- Microsoft Application Inspector — open source software characterization source code analyzer that helps you understand what a program does by identifying interesting features and characteristics using static analysis and a customizable json-based rules engine.
- Understanding Team Dynamics — We find that highly successful teams are significantly more focused than average teams of the same size, that their members have worked on more diverse sets of projects, and the members of highly successful teams are more likely to be core members or “leads” of other teams.
Four short links: 21 January 2020
Network Visualization, Computational Notebooks, Computing History, and Preserving Privacy
- Cytoscape — an open source software platform for visualizing complex networks and integrating these with any type of attribute data.
- What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities — Our findings suggest that data scientists face numerous pain points throughout the entire workflow—from setting up notebooks to deploying to production—across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.
- Advent of Computing — podcast of computing history.
- Privacy-Preserving Record Linkage — toolbox for deterministic, probabilistic, and privacy-preserving record linkage techniques.
Four short links: 20 January 2020
AR Lenses, Faux Keyboard Noises, Tech Villainy, and Data Tests
- AR Contact Lens — The path ahead is not a short one; contact lenses are considered medical devices and therefore need US Food and Drug Administration (FDA) approval. But the Mojo Lens has been designated as an FDA Breakthrough Device, which will speed things up a little. And clinical studies have begun.
- Bucklespring — This project emulates the sound of my old faithful IBM Model-M space saver bucklespring keyboard while typing on my notebook, mainly for the purpose of annoying the hell out of my coworkers.
- Orange Badge (Tim Bray) — At some point, it’s going to be a real problem being management in a sector that’s widely feared and distrusted. But we in the tech tribe haven’t really internalized much about this yet. This. Silicon Valley failed to die a hero, so has lived long enough to see itself become the villain.
- Great Expectations — Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.
Four short links: 17 January 2020
Cursed Filesystem, Many Cats, Speech Processing, and Standard Operating Procedure
- cursedfs — Make a disk image formatted with both ext2 and FAT at once. Silliness!
- cats — Here, placed side-by-side for comparison, are GNU’s implementation of cat, Plan 9’s implementation, Busybox’s implementation, and NetBSD’s implementation, Seventh Edition Unix (1979), Tenth Edition Unix (1989), and 4.3BSD. There’s a lot to learn from the differences!
- wav2letter++ — a fast, open source speech processing toolkit the speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for maximum efficiency.
- Work is Work (Coda Hale) — Neither your employee handbook nor your calendar are accurate depictions of how work in the organization is done. Unless your organization is staffed with zombies, members of the organization will constantly be subverting standard operating procedure in order to get actual work done. Even ants improvise. (via Ben Gracewood)
Four short links: 16 January 2020
Zero Trust, Safeguarding Elections, Design Heuristics, and Image/Container Analysis
- Zero Trust Architecture Principles — Ten principles to help you design and deploy a zero trust architecture. They are: know your architecture; create a single strong user identity; create a strong device identity; authenticate everywhere; know the health of your devices and services; focus your monitoring on devices and services; set policies according to the value of services or data; control access to your services and data; don’t trust the network, including the local network; choose services designed for zero trust.
- Ten Things Technology Platforms Can Do To Safeguard The 2020 US Election — (and everyone else’s elections, you bumptious yokels). They’re all good suggestions. Google, Twitter, and Facebook do not share common language or definitions for political ads—the primary social media companies should agree on a common, broad set of definitions for political ads and adopt them across platforms. Seems like “the limits of free speech online” is an issue without a widely agreed success condition, making it unsuited to the competing-and-changing nature of free enterprise, which thrives better in “sell more widgets / make more money” types of clear-cut goals. If there’ll never be a market-led solution, citizens should direct suggestions like this post to their government rather than to the companies themselves.
- Evidence-based Design Heuristics for Idea Generation — Observations go beyond products to consider multiple concepts generated for a given problem.
- Terrier — an image and container analysis tool that can be used to scan images and containers to identify and verify the presence of specific files according to their hashes.
Four short links: 15 January 2020
Sleep Deprivation, Lip Reading, Data Processing, and Rapid Product Development
- Performance Degradation and Restoration During Sleep Deprivation (NCBI) — These results suggest that the brain adapts to chronic sleep restriction. In mild to moderate sleep restriction, this adaptation is sufficient to stabilize performance, although at a reduced level. These adaptive changes are hypothesized to restrict brain operational capacity and to persist for several days after normal sleep duration is restored, delaying recovery. Crunches are dangerous. (via Popular Science)
- Video-Driven Speech Reconstruction — aka: teaching a neural net to read lips.
- pxi — a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
- iPod Timeline (Patrick Collison) — wow, that’s a hell of a timeline.
Four short links: 14 January 2020
Privacy Legislation, Bystander Effect, Computing Education, and Tech Adversaries
- The 2019 Privacy Legislation Bomb Cyclone — I know from experience in edtech that the morass of states’ legislation doesn’t make life easy for startups. Brace for more, affecting everyone. We may have passed the days when you could do something online without needing an expert opinion from a lawyer.
- Cross-National CCTV Footage Shows That Intervention is the Norm in Public Conflicts (PDF) — It is important therefore to recognize a key distinction between the likelihood of individual intervention and the aggregate that at least someone provides help. Yet, in comparison to the vast number of studies that examine intervention from the perspective of the individual bystander, we know surprisingly little about the situational intervention likelihood—that is, the probability that at least one bystander at the emergency event intervenes. […] Using a unique cross-national video data set from the United Kingdom, the Netherlands, and South Africa (N = 219), we show that in nine of 10 public conflicts, at least one bystander, but typically several, will do something to help. Although the Bystander Effect means an individual may feel less likely to help, there’s a 90% chance that *someone* will help. Not guaranteed to apply in company meetings, however.
- Computing Education: What I Got Wrong — really interesting lessons from the trenches of changing computing education. We are much more likely to integrate CS into mathematics or science teacher programs than to have standalone CS teacher professional development—and even that will require an enormous effort. […] Even if you have classes, you might not get students taking them, or it may just be more of the same kinds of students […] Diverse participation is really hard. I still believe in the value of having students program for learning lots of different things, but I’m no longer convinced that the “hard fun” of Logo is the most useful or productive path for using the power of computing for learning. I am less interested in making things for just a few precocious students, especially if teachers hate it. I believe in making things with teachers.[…] We can try to teach everyone about computational thinking, but that won’t get as far as improving the computing to help everyone’s thinking. Fix the environment, not the people.
- Tech Adversaries vs. Enemies (Alex Stamos) — excellent graduation speech. It is seductive to go along with the expectations of your boss, your colleagues, your shareholders, which you must resist. It can also be seductive to put yourself on a path where you might never be faced with hard decisions that you might regret or where you are free to always criticize without taking any ethical risks on your own.
Four short links: 13 January 2020
Simulated Customer, Symbolic Meets Statistical, Deep Fakes, and Online Radicalization
- Simulated Customer — The site will randomly generate one of 40 different [sales] objections, and give you 20 seconds to answer it.
- From Shallow to Deep Interactions Between Knowledge Representation, Reasoning, and Machine Learning — This paper proposes a tentative and original survey of meeting points between knowledge representation and reasoning (KRR) and machine learning (ML), two areas which have been developing quite separately in the last three decades. […] This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate.
- NHK Raises the Dead to Mixed Reviews — Enka singer Hibari Misora graced the “Kohaku” stage for the first time in decades to perform a new song. Well, technically, it wasn’t Misora herself—she died in 1989. Rather, it was a life-like hologram performing this fresh tune thanks to Yamaha’s Vocaloid: AI, a piece of technology that can replicate voices. Deepfaked audio and imagery. (via Hacker News)
- Empirical Studies of Online Radicalization: A Review and Discussion — Only 18 studies that met Desmarais et al.’s (2017) stringent systematic review criteria empirically examined the radicalization process. Fewer still, presumably, examined the online radicalization process. Indeed, Hassan et al., (2018) conducted a systematic review specifically focused on the relationship between the impact of extremist online content and violent radicalization. Eleven studies fit their eligibility criteria. […] The emerging evidence base is also pretty clear. Those who are radicalized and/or commit acts of terrorism have generally been exposed to radicalizing content. Exposure to this content leads to affective, emotional, and behavioral change at each stage of the process. Of course, some of these studies have relatively small sample sizes, and are only focused on specific types of terrorists or geographical contexts. The key now is to replicate and build upon this preliminary evidence to give us a sense of not just whether exposure to ideological content in the online environment causes violent extremism, but also how, in what contexts and for whom? Is “exposure” sufficient whether it is in the virtual or physical world? Does it work differently for different people in different contexts?
Four short links: 10 January 2020
Automation UX, Awful AI, Neural Net Guitar Pedal, and Closed Web
- Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity — it’s really interesting to read this and think how they might manifest in, eg., a chatbot. I remember Jesse Robbins talking about Orion for emergency workers and how they were having to invent this stuff. It’s remarkable how we’ve gone through a chatbot boom and bust cycle without much forward progress in standardizing these things. (via The Morning Paper)
- Awful AI — a curated list to track current scary usages of AI—hoping to raise awareness to its misuses in society.
- A Neural Network Guitar Pedal — This neural network is trained to turn a guitar into a piano in real time. It’s not perfect but it’s still pretty amazing. (via Twitter)
- Web is Now Closed — Samuel Maddock has been trying to create a rival “indie” browser, and has been to each of the EME DRM vendors and has been sent away by all of them. This is appalling.
Four short links: 9 January 2020
Structuring Papers, State of the World 2020, Reading Big Difficult Books, and Storing Forever
- Ten Simple Rules for Structuring Papers — Focus your paper on one central contribution, which you communicate in the title; write for flesh-and-blood human beings who do not know your work; stick to the context-content-conclusion (C-C-C) scheme; optimize your logical flow by avoiding zig-zag and using parallelism; tell a complete story in the abstract; get across why the paper matters in the introduction; communicate the results as a sequence of statements, supported by figures, that connect logically to support the central contribution; discuss how the gap was filled, the limitations of the interpretation, and the relevance to the field; allocate time where it matters: title, abstract, figures, and outlining; get feedback to reduce, reuse, and recycle the story.
- State of the World 2020 — Bruce Sterling and Jon Lebkowsky at The WELL. So in MMXX, we’re in a world situation that claims to be post-global and post-internet, and post world-trade, where everybody wants to take back control, be great again, assure sovereign cyberspace, set tariffs, jail immigrant tots, beat up ethnic minorities, nurture billionaires, ignore science, and reduce education to assure that there are fewer brainy chicks—but in practice, there’s no big difference among the players. They ALL do that. There’s next to no genuine cultural variety. They all use the same hardware, slogans, and techniques.
- A Note on Reading Big, Difficult Books (Brad DeLong) — We have our recommended ten-stage process for reading such big books: 1. Figure out beforehand what the author is trying to accomplish in the book. 2. Orient yourself by becoming the kind of reader the book is directed at—the kind of person with whom the arguments would resonate. 3. Read through the book actively, taking notes. 4. “Steelman” the argument, reworking it so that you find it as convincing and clear as you can possibly make it. 5. Find someone else—usually a roommate—and bore them to death by making them listen to you set out your “steelmanned” version of the argument. 6. Go back over the book again, giving it a sympathetic but not credulous reading. 7. Then you will be in a good position to figure out what the weak points of this strongest-possible argument version might be. 8. Test the major assertions and interpretations against reality: do they actually make sense of and in the context of the world as it truly is? 9. Decide what you think of the whole. 10. Then comes the task of cementing your interpretation, your reading, into your mind so that it becomes part of your intellectual panoply for the future.
- Perkeep — Camlistore gets a new name. A set of open source formats, protocols, and software for modeling, storing, searching, sharing, and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser, or FUSE filesystem.
Four short links: 8 January 2020
Running Unconferences, Media Server, Lyfte's Workflow Tool, and Bandwidth Utilization
- Ten Simple Rules for Organizing an Unconference — academia-targeted, but generally useful, advice for running unconferences.
- Jellyfin — free software media server.
- Flyte — a structured programming and distributed processing platform for highly concurrent, scalable, and maintainable workflows from Lyft. Intro blog post lays out the case, and this blog post describes the differences between Flyte and Apache Airtable.
- bandwhich — terminal-based bandwidth utilization tool.
Four short links: 7 January 2020
Coding Interview Problems, Coder Stratification, Writing a Compiler, and Distributed Execution Framework
- Coding Interview Problems Solved in Go — see also some in Rust, and the best coding interview take ever, by Aphyr. Because thinly veiled excuses to use dynamic programming or graph coloring are the “Hello world” of our Google-aspirational age. (via Hacker News)
- Coding Will Divide Along Class Lines (Mike Loukides) — The programming world will increasingly be split between highly trained professionals and people who don’t have a deep background but have a lot of experience building things. The former group builds tools, frameworks, languages, and platforms; the latter group connects things and builds websites, mobile apps, and the like. This divide will mean different tools and training for each.
- A Compiler Writing Journey — In this GitHub repository, I’m documenting my journey to write a self-compiling compiler for a subset of the C language. I’m also writing out the details so that, if you want to follow along, there will be an explanation of what I did, why, and with some references back to the theory of compilers.
- Ray — a distributed execution framework that makes it easy to scale your applications and to leverage state-of-the-art machine learning libraries. See this introductory post for the rationale.
Four short links: 6 January 2020
OS Forks, WASM OS, Mediating Consent, and Computational Cinematography
- An Excess of Operating Systems (Jean-Louis Gassée) — Fuschia exists for technical reasons, but Samsung’s, Amazon’s, Huawei’s, etc., are all for business reasons (not wanting to tithe or be tied strategically to Google).
- Redshirt — The redshirt operating system is an experiment to build some kind of operating-system-like environment where executables are all in WASM and are loaded from an IPFS-like decentralized network. […] There exists three core syscalls (send a message, send an answer, wait for a message), and everything else is done by passing messages between processes or between a process and the “kernel.” Programs don’t know who they are sending the message to. One person’s dream is another’s nightmare.
- Mediating Consent (Renee DiResta) — essay on manufacturing consent in the social media age. The path forward requires systems to facilitate mediating, not manufacturing, consent. We need a hybrid form of consensus that is resistant to the institutional corruption of top-down control, and welcomes pluralism, but is also hardened against bottom-up gaming of social infrastructure by malign actors.
- Synopsis — a suite of open source software for computational cinematography—tools that help the creation of visual media. Synopsis is built to help editors, artists, indie film makers, A/V developers, and creators do what they do best—tell stories, make experiences, and build amazing tools.
Four short links: 3 January 2020
Portable Scripts, Training Actors, Cyber Law, and Government Data
- Chesterton’s Shell Script (Pete Warden) — those who forget Perl’s Configure.sh are doomed to recreate it. “Congratulations, you’re not running Eunice!”
- Light (Facebook AI) — a large-scale fantasy text adventure game research platform for training agents that can both talk and act, interacting either with other models or with humans. (Via introducing blog post.)
- International Cyber Law in Practice: Interactive Toolkit — At its heart, it consists of 13 hypothetical scenarios, to which more will be added in the future. Each scenario contains a description of cyber incidents inspired by real-world examples, accompanied by detailed legal analysis. The aim of the analysis is to examine the applicability of international law to the scenarios and the issues they raise.
- UX of Bushfire Maps (Ellen Broad) — classic government map/data problem: each state/agency has its own map, showing its own view of the world, and they don’t even use the same symbols. Which makes life miserable for people who don’t care about the org chart, they just want to learn something their government knows — like whether their house will burn today. (Via Merrin Macleod.)