Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 13 November 2019
Predicting Smells, Diversified Sampling, Card Games, and Universal SQL Interface
- Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules (Google) — a readable walk through of how they model and make sense of smells.
- Diversified Sampling: Mining Large Datasets for Special Cases — TL;DR: the objective is to build a small sample of the data in which special cases are likely to be represented. The strategy is to have each data item emit a list of “features,” and to boost the probability of selecting rare features. A shallow understanding of the data is often enough to design an effective features extraction.
- RLCard — open source reinforcement learning toolkit for card games. Paper. (via SyncedReview)
- usql — Universal command-line interface for SQL databases.
Four short links: 12 November 2019
Evil Tech, Deep Tech, Visualization, and Machine Language Rules
- Uyghur Detecting “Smart” Camera — this should start hard conversations in your workplace about what you build. (via BoingBoing)
- Deep Tech Trends 2019 — a lot of companies doing a lot of things. So many utility Daleks that will…clean your floors, clean your toilet, run your landfill, hold your drill…
- SandDance — By using easy-to-understand views, SandDance helps you find insights about your data, which in turn help you tell stories supported by data, build cases based on evidence, test hypotheses, dig deeper into surface explanations, support decisions for purchases, or relate data into a wider, real-world context. SandDance uses unit visualizations, which apply a one-to-one mapping between rows in your database and marks on the screen. Smooth animated transitions between views help you to maintain context as you interact with your data.
- Rules of Machine Learning (Google) — A simple heuristic can get your product out the door. A complex heuristic is unmaintainable. Once you have data and a basic idea of what you are trying to accomplish, move on to machine learning. As in most software engineering tasks, you will want to be constantly updating your approach, whether it is a heuristic or a machine-learned model, and you will find that the machine-learned model is easier to update and maintain.
Four short links: 11 November 2019
WebAssembly Shell, Network Security, FPGA Bugs, and Inoculating Against Online Misinformation
- WebAssembly.sh an open source terminal that uses the WebAssembly Package Manager (WAPM) and local files to run server-side Wasm / WASI modules in a shell-like interface. (via Mikeal Rogers)
- Russia’s Suspected Internet Cable Spy Ship Appears Off Americas (Forbes) — assume your network is compromised, as you should always have done. Before enlightenment: encryption. After enlightenment: encryption.
- Finding and Understanding Bugs in FPGA Synthesis Tools — Every synthesis tool was found to introduce discrepancies between the netlist and the design, and all tools except XST crashed when given valid input. Vivado was found to missynthesise 4% of random testcases that it was given.
- Fake News Game Confers Psychological Resistance Against Online Misinformation (Nature) — In the game, players take on the role of a fake news producer and learn to master six documented techniques commonly used in the production of misinformation: polarization, invoking emotions, spreading conspiracy theories, trolling people online, deflecting blame, and impersonating fake accounts. The game draws on an inoculation metaphor, where preemptively exposing, warning, and familiarizing people with the strategies used in the production of fake news helps confer cognitive immunity when exposed to real misinformation. We conducted a large-scale evaluation of the game with N = 15,000 participants in a pre-post gameplay design. We provide initial evidence that people’s ability to spot and resist misinformation improves after gameplay, irrespective of education, age, political ideology, and cognitive style.
Four short links: 8 November 2019
Probabilistic Scripting, Static Analysis, Visual Forensics, and Uninformed Consent
- Probabilistic Scripts for Automating Common-Sense Tasks — In this talk, I’ll introduce a new declarative-programming approach for automating common-sense reasoning tasks: probabilistic scripting. A talk from Strange Loop.
- Infer — Facebook’s open source static analysis tool — if you give Infer some Java or C/C++/Objective-C code, it produces a list of potential bugs.
- VFRAME — Visual Forensics and Metadata Extraction is a computer vision toolkit designed for human rights researchers. It aims to bridge the gap between state-of-the-art artificial intelligence used in the commercial sector and make it accessible and tailored to the needs of human rights researchers and investigative journalists working with large video or image datasets.
- (Un)informed Consent: Studying GDPR Consent Notices in the Field — Most GDPR consent forms come with pre-checked boxes. Cut to section 4.5, “Survey Results,” of this paper for the (not at all) astonishingly low number of people who will click a checkbox to get personalization: 0.4%. (via Arvind Narayanan)
Four short links: 7 November 2019
DNS Wars, Company Culture, Separating Musical Stems, and Program Synthesis
- DNS Wars — But perhaps the position makes more sense if you view this as a major divorce, where the web is separating itself from the internet and wanting to sever all forms of interdependence with the rest of the internet. Why share any of that user data when you can keep it all?
- Readings on Company Culture — 230+ resources you need to learn the ins and outs of company culture.
- Spleeter — source separation library including pretrained models. Extract separate vocals, bass, drums, etc., stems (tracks) from an existing recording. This is not perfect, but it’s impressively ahead of the previous state of the art.
- Understanding the World with Program Synthesis (SIGPLAN) — The most prominent existing application of program synthesis in the natural sciences is in executable biology. Here, one models cellular behavior using stateful, concurrent programs that represent biological entities like proteins and genes. Each “component,” or process, of such a program interacts with neighboring processes and changes state when certain events take place (e.g., when a molecular signal is received). Because of the inherent complexity of concurrency, even a small number of components can collectively describe highly nontrivial system behaviors. Program synthesis allows the automatic extraction of such programmatic models from prior knowledge and data.
Four short links: 6 November 2019
Functional Programming, Fake News in Elections, De-Identification in Video, and Sendmail Lessons Learned
- Things I Wish Someone Had Explained About Functional Programming (James Sinclair) — But it’s not long before things get complicated. We start with a bunch of simple functions. Easy. But the types don’t all line up. And we need to generate some side effects, too. And handle errors. And manage state. And how do you debug this? It gets tricky, quick. Functional programming has idioms and tools to solve all these challenges. But learning them is hard if nobody shows you where to look.
- Electoral Competition with Fake News — We introduce opportunities for political candidates and their media supporters to spread fake news about the policy environment and perhaps about parties’ positions into a familiar model of electoral competition. In the baseline model with full information, the parties’ positions converge to those that maximize aggregate welfare. When parties can broadcast fake news to audiences that disproportionately include their partisans, policy divergence and suboptimal outcomes can result. We study a sequence of models that impose progressively tighter constraints on false reporting and characterize situations that lead to divergence and a polarized electorate.
- Live Face De-Identification in Video — We propose a method for face de-identification that enables fully automatic video modification at high frame rates. The goal is to maximally decorrelate the identity, while having the perception (pose, illumination, and expression) fixed. We achieve this by a novel feed-forward encoder-decoder network architecture that is conditioned on the high-level representation of a person’s facial image. The network is global, in the sense that it does not need to be retrained for a given video or for a given identity, and it creates natural looking image sequences with little distortion in time.
- Lessons Learned from Sendmail (Eric Allman) — Sendmail has a bad rap with the kids, but it’s worth listening to how it came about. If you ever invent something that has the market share that Sendmail had, then you’ll have regrets, too. This is a good opportunity to learn from Eric’s.
Four short links: 5 November 2019
Money Laundering, Privacy Unix, Technical Decisions, and Automated Security Playbooks
- “Nearly All” Counter-Strike Microtransactions Are Being Used for Money Laundering (Vice) — I’ve been chewing on this all week. There are a lot of unmet desires in the world, desires that are unmet because society makes them illegal to meet. If your service can be adapted to meet these desires, then it will be. And you’ll have a problem.
- Whonix — a desktop operating system designed for advanced security and privacy. Whonix mitigates the threat of common attack vectors while maintaining usability. Online anonymity is realized via fail-safe, automatic, and desktop-wide use of the Tor network. A heavily reconfigured Debian base is run inside multiple virtual machines, providing a substantial layer of protection from malware and IP address leaks. Commonly used applications are pre-installed and safely pre-configured for immediate use. The user is not jeopardized by installing additional applications or personalizing the desktop. Whonix is under active development and is the only operating system designed to be run inside a VM and paired with Tor.
- Technical Leadership Masterclass: Decisions (Ruth Malan) — a deep dive into different frameworks and considerations when making decisions.
- SOCless: Automated Security Playbooks — Twilio’s open source serverless framework to execute user-defined workflows in response to alerts or scheduled events.
Four short links: 4 November 2019
Disinformation, Ticketing, Run a City Like a Company, and In-Browser Ultrasonic Data Transfer
- Beyond Bots and Trolls: Understanding Disinformation as Collaborative Work — we examine three case studies of online information operations using a sociotechnical lens that draws on CSCW theories and methods to account for the mutual shaping of technology, social structure, and human action. Through this lens, we contribute a more nuanced understanding of these operations (beyond “bots” and “trolls”) and highlight a persistent challenge for researchers, platform designers, and policy makers—distinguishing between orchestrated, explicitly coordinated, information operations and the emergent, organic behaviors of an online crowd.
- Alf.io — open source ticket reservation system. Their draft new web site does a better job explaining what they do and why they’re good. (via Hacker News)
- How to Run a City Like Amazon, and Other Fables — The idea behind the book is to ask what would it be like to live in a city administered using the business model of Amazon (or Apple, IKEA, Pornhub, Spotify, Tinder, Uber, and more), or a city where critical public services are delivered by these companies? (via Rob Kitchin)
- Quiet.js — a javascript binding for libquiet, a library for sending and receiving data via sound card. It can function either via speaker or cable (e.g., 3.5mm). Quiet comes included with a few transmissions profiles which can be selected for the intended use. For speaker transmission, there is a profile which transmits around the 19kHz range, which is essentially imperceptible to the human ear. Quiet uses the Web Audio functionality in order to send and receive sound. Sending data is supported by Chrome, Firefox, Safari, and Edge. Reception is supported by Chrome, Edge, and to some extent Firefox.
Four short links: 1 November 2019
Research Tool, Conversational AI, Nothing to Hide, and Battery Ingredients
- Vortimo — software that organizes information on webpages you’ve visited. It records pages you go to, extracts data from them, and enriches the data that was extracted. It augments the pages in your browser by allowing you to tag objects as well as decorating objects it deems important. It then arranges the data in a UI. Vortimo supports switching between cases/projects seamlessly. You can also generate PDF reports based on the aggregated information and meta information.
- DeepPavlov — An open source conversational AI framework.
- ‘I’ve Got Nothing to Hide’ and Other Misunderstandings of Privacy — 2007 paper In this essay, I will explore the “nothing to hide” argument and its variants in more depth. Grappling with the “nothing to hide” argument is important because the argument reflects the sentiments of a wide percentage of the population. In popular discourse, the “nothing to hide” argument’s superficial incantations can readily be refuted. But when the argument is made in its strongest form, it is far more formidable.
- Projected Battery Minerals and Metals Global Shortage — eye-watering presentation, estimating the effect of predicted demand upon the supply/availability of key minerals and metals. Contains alarming statements like: What is needed to replace 2016 global fleet would consume 93.3% of Global Ni Resources.
Four short links: 31 October 2019
Property Graphs, Election Tech, Technical Debt, Tunnelling SSH over HTTP Proxies
- Managing Delivery Networks: A Use Case for Graph Databases — very interesting description of architecture of a routing solution. Turns out we didn’t need a graph; we needed a property graph. In short, a property graph is a graph data structure with the addition of properties (key, value pairs) which sit on the edges and vertices of the graph.
- GE2019 Election Tech Handbook — Google Doc keeping track of the various tech projects around the UK’s 2019 general election.
- How to Balance Technical Debt — as no org has beautiful perfect code, every org struggles to balance feature development against paying down technical debt so they can move faster. The insight here is to quantify reliability and set expectations, so you can increase the amount of dev time when reliability falls below expectations.
- Corkscrew — a tool for tunneling SSH through HTTP proxies.
Four short links: 30 October 2019
Teaching Unix, Computational Memory, unfork(), and Modern Javascript Features
- xv6 — a teaching operating system developed in the summer of 2006 for MIT’s operating systems course, 6.828: Operating System Engineering. We hope that xv6 will be useful in other courses too. This page collects resources to aid the use of xv6 in other courses, including a commentary on the source code itself. Including a book.
- Computational Memory — But a lot more processing could be done in the memory. Consider a system that is meant to be secure. Why not process the data in the memory, where the data is encrypted, rather than having to unencrypt it and transfer the data across the bus where it could be intercepted? Why not perform searches on large amounts of data in the memory, only transferring the likely matches for more in-depth processing?
- unfork() — fork(2) splits one process (really, address space) into two. unfork(2) joins two address spaces into one. Useful for dynamic binary analysis and instrumentation of applications with built-in integrity checks.
- Modern Javascript Features You May Have Missed — binary and octal literals, Number.isNaN(), exponent (power) operator, Array.prototype.includes(), Shared array buffers and atomics, some more of Perl’s useful regex features, Array.prototype.flat() and flatMap(), unbound catches, and string trim methods.
Four short links: 29 October 2019
Remastering Games, Rethinking Encryption, Text Editors, and Digital Wellbeing
- The Pawn, Remastered From Source — this business of remastering old games, updating the audio and video, is nifty.
- Rethinking Encryption (Lawfare) — a former FBI general counsel comes out as pro strong encryption. Cory Doctorow’s summary: [Jim] Baker’s argument is primarily instrumental: he rejects the idea that you can create cryptography that works perfectly when it’s being used to protect good guys, but fails completely when bad guys try to use it. He acknowledges that any effort to ban working cryptography would simply send American criminals to offshore software repositories to get access to working crypto, and that in so doing, it would be much harder for American law enforcement to spy on its adversaries, because the metadata from their encrypted communications would be out of US law enforcement’s reach.
- Text Editing Hates You Too — examples of why it’s so hard to write a text editor, from caret positioning to emoji. It is quite approachable at the start, and gets technical later on. E.g., Windows solves this with its eight (8!) types of locks. Although holding a lock across process boundaries may sound questionable to you, most other platforms try to use imperfect heuristics to fix concurrency issues. Or they just hope race conditions don’t happen. In my experience, prayers are not a very effective concurrency primitive.
- Digital Wellbeing Experiments (Google) — a collection of [open source] ideas and tools that help people find a better balance with technology.
Four short links: 28 October 2019
Viz Widget, 6502 History, Hire Juniors, and Postgres Queries as Flame Graphs
- navio — A visualization widget to understand and navigate your data.
- Team 6502 — great history to this pivotal microprocessor.
- Heroes and Juniors — But at different points of an organization’s life cycle hiring more senior people does not increase velocity it actually diminishes it. The senior people create more things than the organization can maintain or support. People work longer and longer hours, spend more time on-call, have to maintain expertise in more and more different parts of an architecture that is getting more and more complex. They burn out and the organization loses institutional knowledge.
- pg_flame — visualise Postgres queries as flame graphs.
Four short links: 25 October 2019
A Human Tale, Algorithm Regulation, ASCII Game, Security Research
- A Security Tale: A Timeline — the story of, and consequences of, one chap’s insane workload being the NZ security person for Equifax during the fallout from the breach. Perfect storm of remote management, global demands, outsourced team, outsourced cloud providers…and 18-hour days spent trying to cover your employer’s ass. He talks about the logistic challenges, and then the personal costs. The phrase “giant pit of despair” is appropriate. Look after yourself. A Kawaiicon talk. (via Fobski)
- Opinion of the Data Ethics Commission — proposing a sliding scale of regulation. For algorithmic systems, they propose a five-level model: no special measures for applications with zero or negligible potential for harm; measures such as formal and substantive requirements (e.g., transparency obligations, publication of a risk assessment) or monitoring procedures (e.g., disclosure obligations toward supervisory bodies, ex-post controls, audit procedures) for applications with some potential for harm; additional measures such as ex-ante approval procedures for applications with regular or significant potential for harm; additional measures such as live interface for “always on” oversight by supervisory institutions for appplications with serious potential for harm; and complete or partial ban of an algorithmic system for applications with an untenable potential for harm. The diagram is Figure 2 on page 19. (via Haydn Belfield)
- ASCIIdent — open-world sci-fi game with a design completely made by text characters. Commercial game with clever idea.
- A Data-Driven Reflection on 36 Years of Security and Privacy Research — Figure 1 is worth checking out. Interesting how no topic is as prevalent today as formalism or trust were in their heydays. Their research considers things like whether new topics are introduced by new authors or by old authors (most started by existing authors, but some important topics were started by new authors—e.g., crypto protocols, side-channels, data privacy). (via Bruce Schneier
Four short links: 24 October 2019
Quantum Supremacy, Instrumenting JavaScript, Social Credit Scoring, and Crappy Fonts
- Quantum Supremacy: The Gloves are Off (Scott Aaronson) — Google demonstrated a quantum system that will be exponentially faster, as the number of qubits increases, than a classical system simulating the problem. IBM proposed a way of using the world’s most powerful classical supercomputer (Summit) to brute force a solution faster than the method listed in Google’s paper—but the classical computer solution will still be exponentially slower, as the number of qubits increases linearly. If Google, or someone else, upgraded from 53 to 55 qubits, that would apparently already be enough to exceed Summit’s 250 petabyte storage capacity. At 60 qubits, you’d need 33 Summits. At 70 qubits, enough Summits to fill a city … you get the idea. Amusingly enough, Google made a particular engineering choice purely to extend the gap between quantum and the classical simulations they foresaw (missing IBM’s “just brute force it, bro” solution).
- VisibleV8 — mods for a V8 JavaScript engine that instruments JavaScript and logs a ton of stuff about function calls, property access, etc. See the paper.
- All Carrots and No Sticks: A Case Study on Social Credit Scores in Xiamen and Fuzhou (Berkman-Klein Center) — the most detailed look at the social credit system, and it’s a long way from Black Mirror. The introduction of these city-level scores by city governments marks the entry of the government in the business of scoring citizens; however, implementation so far reveals a very basic attempt with numerous gaps and question marks, but a far cry from the Western media picture of an all-encompassing score enabled by mass surveillance.
- Schmelvetica — the original Smelvetica was a copy of Helvetica that had its kerning messed with, which made the originally very elegant font look … awful. The creator got a take-down notice. So here’s a Python program that’ll similarly abuse any font you give it. I don’t know why I’m attracted to this horror.
Four short links: 23 October 2019
Safe Interfaces, WebAssembly Numbers, Userland Dataflow, and Serverless Data Processing in Go
- How Not to Rewrite it in Rust — A much better alternative is to reuse the original library and just publish a safe interface to it. As this comment on Lobsters says, Rust can add safety guarantees even to C code! When using a C library, you may need to know things such as “if I pass a pointer to the library, who will free it and when?”, “can this be NULL?”, “is this thread-safe?”, “can I call this function more than once?”. In C, these things are in the manual, but Rust can express them in the type system. When writing Rust wrappers, I literally copy prose from the documentation into Rust type system and have the compiler enforce RTFM!
- A Study on the Prevalence of WebAssembly in the Wild — we examine the prevalence of WebAssembly in the Alexa top one million websites and find that as many as one out of 600 sites execute Wasm code. Moreover, we perform several secondary analyses, including an evaluation of code characteristics and the assessment of a Wasm module’s field of application. Based on this, we find that over 50% of all sites using WebAssembly apply it for malicious deeds, such as mining and obfuscation.
- Userland — an integrated dataflow environment for user applications. Very cool idea! Very early and docs not everywhere, but see this video for demos and explanation of philosophy. (via Twitter)
- Bigslice — a system for fast, large-scale, serverless data processing using Go.
Four short links: 22 October 2019
Book Sponsorship, Policy Impact, Stream Processing Platform, and Fixing Hardware in Software
- Open Library Book Sponsorship — you pay and name the book, they digitize it.
- Public Impact Starter Kit — The diagnostic tool helps you improve the impact of a government initiative. Guiding you through an assessment of your policy against the nine elements of the Fundamentals framework, you’ll determine whether the key drivers of policy success are in place. […] The nine-sided Fundamentals Map helps you map your policy against the Fundamentals framework to illustrate areas of strength and areas for improvement. […] The Checklist for Policymakers is a useful tool when you’re developing or reviewing a policy. By checking off each of the nine elements of the Fundamentals framework, you can help maximize the likelihood of your policy being a success. (via Danny Buerkli)
- Mantis — Netflix open sourced their platform to build an ecosystem of real-time stream processing applications. (via Medium)
- Unshaky — discards immediate second keypresses of the type generated by defective Apple butterfly keyboards.
Four short links: 21 October 2019
Browser History, Competition Programming, Wi-Fi Secrets, and Pascal Returns
- Memex — open source browser extension to full-text search your browsing history and bookmarks.
- Competition Programming and Problem Solving, Fall 2019 — CMU course.
- Pwnagotchi 1.0.0 — I wanted Alpha and Beta to be able to detect each other and exchange with each other very basic information—but how do you communicate anything at all from a computer when: the main and only Wi-Fi interface is in monitor mode and already being used for Wi-Fi scanning, hopping, and frames injection; you have Bluetooth, but you want to keep it free for other uses (tethering, like we’re doing today, or maybe integrating BLE attacks, too, some day); you’re using the USB ports in gadget mode, so you can’t use external USB devices, like another Wi-Fi.
- Turbo Rascal 1.0 — Pascal IDE for C64 games, with a ton of specialized features (memory, level editing, etc). At the same time, FreePascal has a WebAssembly back end. Pascal’s a fun language for learning compilers on, but not really fully featured for building modern systems. I wonder how this new school of compiler and IDE developers can take their skills to a wider audience.