Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 2 August 2019
Cognitive Biases, Conflict, Language Models, and Programmable Memristor Computer
- The Evolutionary Roots of Human Decision Making (NCBI) — paper showing that we share cognitive biases with other primates. In one study, monkeys had a choice between one experimenter (the gains experimenter) who started by showing the monkey one piece of apple and sometimes added an extra piece of apple, and a second experimenter (the losses experimenter) who started by showing the monkey two pieces of apple and sometimes removed one. Monkeys showed an overwhelming preference for the gains experimenter over the losses experimenter—even though they received the same payoff from both. In this way, capuchins appear to avoid options that are framed as a loss, just as humans do.
- 6 Must Reads for Cutting Through Conflict and Tough Conversations (First Round Capital) — a summary of good (?) advice from books. Some I agree with, but others … having worked for narcissists and bean counters, find a new job. Don’t stay any longer than you have to with those jerks.
- ERNIE — Baidu’s open source continual pre-training framework for language understanding. Baidu says: Integrating both phrase information and named entity information enables the model to obtain better language representation compared to BERT. ERNIE is trained on multi-source data and knowledge collected from encyclopedia articles, news, and forum dialogues, which improves its performance in context-based knowledge reasoning. See also the ERNIE paper.
- First Programmable Memristor Computer (IEEE) — The new chip combines an array of 5,832 memristors with an OpenRISC processor. 486 specially-designed digital-to-analog converters, 162 analog-to-digital converters, and two mixed-signal interfaces act as translators between the memristors’ analog computations and the main processor.
Four short links: 1 August 2019
Software-Defined Analog Circuits, Public Domain, Talk Radio Corpus, and Bad Science
- Software-Defined Analog Circuits — Zrna hardware realizes the analog circuit you specify in software, in real time. Change any circuit parameter on the fly with an API request, at your lab bench or embedded in-application. This is … weird. But cool. Cool and weird.c
- Most Pre-1964 US Books are in the Public Domain — and finally, thanks to the work of librarians and archivists, for anything that’s unambiguously a “book”, we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal. (via Evil Mad Scientist)
- RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts — arxiv paper and github.
- A Rough Guide to Spotting Bad Science — some very useful heuristics. Via this considered evaluation of wild claims.
Four short links: 31 July 2019
Provably Correct AI, Porn & Privacy, Math for CS and ML, and Xenophobia Classifier
- ART: Abstraction Refinement-Guided Training for Provably Correct Neural Networks — provably correct neural networks, now there’s an interesting idea …
- Tracking Sex: The Implications of Widespread Sexual Data Leakage and Tracking on Porn Websites — Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample’s domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user.
- Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning — a 1962-page LaTeX book which some wag listed as Math Basics for CS and ML on Hacker News.
- Open-Source Xenophobia Classifier for Tweets — source is a Colab notebook and they make their labeled training data available too.
Four short links: 30 July 2019
Game Translation, Modern Hypercard, Cryptographic Attacks, and Digital Hardware Debugger
- The Near Impossible 20-Year Journey to Translate “Fire Emblem: Thracia 776” (Vice) — an incredible story of translation philosophy, playing out in the context of fan attempts to make an English-language version of a 1999 tactical RPG.
- LiveCode — open-source (GPL) HyperCard-esque app developer, for the modern age. Very nice!
- Cryptographic Attacks: A Guide for the Perplexed (Checkpoint) — various types of cryptographic attacks, with a focus on the attacks’ underlying principles.
- Glasgow — FPGA-based tool for exploring digital interfaces, aimed at embedded developers, reverse engineers, digital archivists, electronics hobbyists, and everyone else who wants to communicate to a wide selection of digital devices with high reliability and minimum hassle. It can be attached to most devices without additional active or passive components, and includes extensive protection from unexpected conditions and operator error.
Four short links: 29 July 2019
Email, End-to-End Encryption, AI Ethics, Reliable Distributed Systems
- Notqmail — Collaborative open source successor to qmail.
- The Encryption Debate is Over—Dead at the Hands of Facebook (Forbes) — Facebook’s model entirely bypasses the encryption debate by globalizing the current practice of compromising devices by building those encryption bypasses directly into the communications clients themselves and deploying what amounts to machine-based wiretaps to billions of users at once.
- Why Ethics Cannot be Replaced by the UDHR — Ethics and the UDHR are on the same page, if we keep it general. But questions about what is the right thing to do or what policy is the right one to implement become challenging only when these dearly held values conflict, necessarily involving trade-offs. When we dive deep, the UDHR is simply unable to guide us on those questions. Solving such challenges is the job of ethical reasoning.
- Operating a Large, Distributed System in a Reliable Way: Practices I Learned (Gergely Orosz) — This post is the collection of the practices I’ve found useful to reliably operate a large system at Uber, while working here. Generalizable beyond Uber.
Four short links: 26 July 2019
Disinformation, Election Meddling, Quantum Supremacy, and International Pineapple Day
- Disinformation’s Spread: Bots, Trolls, and All of Us (Kate Starbird) — a short and on-the-mark summary of misconceptions about disinformation.
- The Unsexy Threat to Our Election Security (Krebs) — surprisingly low-tech threats (SIM stealing, hijacking a Twitter account) that could bugger up elections.
- Quantum Supremacy is Coming (Quanta) — “supremacy” is marketing hype. Quantum computers will still be useless for a while to come. “Supremacy” refers to conquering errors and noise enough to make a system that can use quantum phenomenon to do in parallel what classical computers must do in serial—even if it’s only on a toy problem.
- How I Started Pineapple Day (Andrew Lee) — “That’s not a real thing,” James retorted with an eyeroll as he set his bag down and sat down at his desk. “Sure it is” I insisted, and to back my claim up I pulled up Google Calendar and added “International Bring Your Pineapple to Work Day” to our shared company calendar. I set the event to repeat every year on June 27th. Have a great weekend!
Four short links: 25 July 2019
Mutable Web, Re-Identification, Rule-Based Programming, and Risks of Government Hacking
- The Mutable Web — rewriting Twitter’s web styling is hard but not impossible, and makes the author mull on the value of the mutable web. Transparency and introspection are fundamental to the way the web works and obfuscation, intentional or not, can’t really change that.
- Estimating the Success o Re-Identifications in Incomplete Datasets Using Generative Models (Nature) — Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Reminds me of the finding (claim?) that it only takes 8 (12? citation needed) words to uniquely identify a text.
- Picat — a simple, and yet powerful, logic-based multi-paradigm programming language aimed for general-purpose applications. Picat is a rule-based language, in which predicates, functions, and actors are defined with pattern-matching rules. Interesting take on a language, which made more sense after I read this Hacker News comment.
- Security Risks of Government Hacking (Stanford Cyberlaw) — This paper addresses six main ways that government hacking can raise broader computer security risks. These include: Creating a disincentive to disclose vulnerabilities that should be disclosed because other attackers might independently discover them; Cultivating a market for surveillance tools and 0-days; Risking that vulnerabilities exploited by the malware will be identified and used by other attackers, as a result of either law enforcement’s losing control of the hacking tools, or discovery by outsiders of law enforcement’s hacking activity; Creating an incentive to push for less-secure software and standards; and Risking that the malware will affect innocent users.
Four short links: 24 July 2019
Computer Life, Quantum Hype, Python Antipatterns, and Algorithm Series
- Nils Barricelli (Nautilus) — history of an artificial (computer) life pioneer.
- Quantum Hype (ComputerWorld) — the quantum computing memes are ace, but so is the general discussion of why quantum computing is felt by insiders to be overhyped.
- Python Antipatterns — readable collection of things Not To Do.
- All Hail the Algorithm — Al-Jazeera five-part series exploring the impact of algorithms on our everyday lives.
Four short links: 23 July 2019
Deciphering Linear B, Data Journalism, Innovation Contradictions, Rebuilding Slack
- Applying Deep Learning to Linear B — To compensate for the lack of strong supervision signal, our model design is informed by patterns in language change documented in historical linguistics. […] When applied to the decipherment of Ugaritic, we achieve a 5.5% absolute improvement over state-of-the-art results. We also report the first automatic results in deciphering Linear B, a syllabic language related to ancient Greek, where our model correctly translates 67.3% of cognates.
- Data Science Behind Data Journalism (Chris Knox) — discusses the data analysis that went into a story on vaccination in NZ. A good example of how to use data to do journalism (and not just torture it to say what you want).
- The Hard Truth about Innovative Cultures (HBR) — A tolerance for failure requires an intolerance for incompetence. A willingness to experiment requires rigorous discipline. Psychological safety requires comfort with brutal candor. Collaboration must be balanced with individual accountability. And flatness requires strong leadership. Innovative cultures are paradoxical. Unless the tensions created by this paradox are carefully managed, attempts to create an innovative culture will fail. (via Tim Kong)
- When A Rewrite Isn’t: Rebuilding Desktop Slack — Our plan was to: keep the existing codebase; create a “modern” section of the codebase that would be future-proof and work the way we wanted it to; modernize the implementation of Slack bit by bit, replacing existing code with modern code incrementally; define rules that would enforce a strict interface between existing and modern code so it would be easy to understand their relationship; and continually ship all of the above with the existing app, replacing older modules with modern implementations that suited our new architecture. The final step — and the most important one for our purposes — was to create a modern-only version of Slack that would start out incomplete but gradually work its way toward feature completeness as modules and interfaces were modernized.
Four short links: 22 July 2019
Game Source, Procurement Graph, Data Moats, and Antitrust Regulation
- Game Source Code — Internet Archive has a collection of video game source code. The majority of these titles were originally released as commercial products and the source code was made available to the public at a later time.
- European Public Procurement Knowledge Graph — over 23 million triples (records), covering information about almost 220,000 tenders, built to support competitiveness and accountability by TheyBuyForYou. (via University of Southampton)
- The Empty Promise of Data Moats (Andreessen-Horowitz) — business model wonks reckoned that “data network effects” were a thing, but the benefits seen by companies claiming data network effects seem to be the benefits of simply having a lot of data. And that’s not as defensible as hoped. I liked this essay.
- Why Big Tech Keeps Outsmarting Antitrust Regulators (Tim O’Reilly) — designers of marketplace-platform algorithms and screen layouts can arbitrarily allocate value to whom they choose. The marketplace is designed and controlled by its owners, and that design shapes “who gets what and why”. […] Power over sellers ultimately translates into power over customers as well. When it comes to antitrust, the question of market power must be answered by analyzing the effect of these marketplace designs on both buyers and sellers, and how they change over time. How much of the value goes to the platform, how much to consumers, and how much to suppliers?
Four short links: 19 July 2019
Journal Mining, API Use, Better Conversation, and Apollo 11 Source
- 73 Million Journal Articles for Text Mining (BoingBoing) — The JNU Data Depot is a joint project between rogue archivist Carl Malamud, bioinformatician Andrew Lynn, and a research team from New Delhi’s Jawaharlal Nehru University: together, they have assembled 73 million journal articles from 1847 to the present day and put them into an airgapped repository that they’re offering to noncommercial third parties who want to perform textual analysis on them to “pull out insights without actually reading the text.”
- How Developers Use API Documentation: An Observation Study (ACM) — participants totally mapped to opportunistic (risk-taking, paste-then-adapt, change-without-checking) developers and systematic (start with clean code, read the docs, learn before coding) developers.
- talk — An open-source commenting platform focused on better conversation.
- Apollo 11 — Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.
Four short links: 18 July 2019
Weird Algorithms, Open Syllabi, Conversational AI, and Quantum Computing
- 30 Weird Chess Algorithms (YouTube) — An intricate and lengthy account of several different computer chess topics from my SIGBOVIK 2019 papers. We conduct a tournament of fools with a pile of different weird chess algorithms, ostensibly to quantify how well my other weird program to play color- and piece-blind chess performs. On the way we “learn” about mirrors, arithmetic encoding, perversions of game tree search, spicy oils, and hats.
- Open Syllabus Project — as FastCompany explains, the 6M+ syllabi from courses around the world tell us about changing trends in subjects. Not sure how I feel that four of the textbooks I learned on are still in the top 20 (Cormen, Tanenbaum, Silberschatz, Stallings).
- Plato — Uber open-sourced its flexible platform for developing conversational AI agents. See also their blog post.
- Speediest Quantum Operation Yet (ScienceDaily) — In Professor Michelle Simmons’ approach, quantum bits (or qubits) are made from electrons hosted on phosphorus atoms in silicon.[…] “Atom qubits hold the world record for the longest coherence times of a qubit in silicon with the highest fidelities,” she says. “Using our unique fabrication technologies, we have already demonstrated the ability to read and initialise single electron spins on atom qubits in silicon with very high accuracy. We’ve also demonstrated that our atomic-scale circuitry has the lowest electrical noise of any system yet devised to connect to a semiconductor qubit.” […] A two-qubit gate is the central building block of any quantum computer — and the UNSW team’s version of it is the fastest that’s ever been demonstrated in silicon, completing an operation in 0.8 nanoseconds, which is ~200 times faster than other existing spin-based two-qubit gates.
Four short links: 17 July 2019
Margaret Hamilton, WeChat Censorship, Refactoring, and Ancient Games
- Margaret Hamilton Interview (The Guardian) — I found a job to support our family at the nearby Massachusetts Institute of Technology (MIT). It was in the laboratory of Prof Edward Lorenz, the father of chaos theory, working on a system to predict weather. He was asking for math majors. To take care of our daughter, we hired a babysitter. Here I learned what a computer was and how to write software. Computer science and software engineering were not yet disciplines; instead, programmers learned on the job. Lorenz’s love for software experimentation was contagious, and I caught the bug.
- How WeChat Censors Images in Private Chats (BoingBoing) — WeChat maintains a massive index of the MD5 hashes of every image that Chinese censors have prohibited. When a user sends another user an image that matches one of these hashes, it’s recognized and blocked at the server before it is transmitted to the recipient, with neither the recipient or the sender being informed that the censorship has taken place. Separately, all images not recognized in the hash database are processed out-of-band.
- The Best Refactoring You’ve Never Heard Of (James Koppel) — lambdas vs data structures. Very interesting talk.
- Machine Learning is About to Revolutionize the Study of Ancient Games (MIT TR) — The team model games as mathematical entities that lend themselves to computational study. This is based on the idea that games are composed of units of information called ludemes, such as a throw of the dice or the distinctively skewed shape of a knight’s move in chess. Ludemes are equivalent to genes in living things or memes as elements of cultural inheritance. They can be transmitted from one game to another, or they may die, never to be seen again. But a key is that they can be combined into bigger edifices that form games themselves.
Four short links: 16 July 2019
Quantum TiqTaqToe, Social Media & Depression, Incidents, and Unity ML
- Introducing a new game: Quantum TiqTaqToe — This experience was essential to the birth of Quantum TiqTaqToe. In my quest to understand Unity and Quantum Games, I set out to implement a “simple” game to get a handle on how all the different game components worked together. Having a game based on quantum mechanics is one thing; making sure it is fun to play requires an entirely different skill set.
- Association of Screen Time and Depression in Adolescence (JAMA) — Time-varying associations between social media, television, and depression were found, which appeared to be more explained by upward social comparison and reinforcing spirals hypotheses than by the displacement hypothesis. (via Slashdot)
- CAST Handbook — How to Learn More from Incidents and Accidents.
- ml-agents — Unity Machine Learning Agents Toolkit, open source.
Four short links: 15 July 2019
Climbing Robot, Programming and Programming Languages, Media Player, and Burnout Shops
- NASA Climbing Robot — a four-limbed robot named LEMUR (Limbed Excursion Mechanical Utility Robot) can scale rock walls, gripping with hundreds of tiny fishhooks in each of its 16 fingers and using artificial intelligence to find its way around obstacles.
- Programming and Programming Languages — a new edition of a book that introduces programming and programming languages at the same time.
- IINA — The modern media player for macOS. Open source, and very good.
- Job Burnout in Professional and Economic Contexts (PDF) — In recent times, we are seeing the development of new ‘burnout shops’ that are not short-term projects, but are long-term models for doing business. A new word in my lexicon, on a subject of interest to me.
Four short links: 12 July 2019
Hosting Hate, Releasing, Government Innovation, and Voice Cloning
- The Dirty Business of Hosting Hate Online (Gizmodo) — an interesting rundown of who is hosting some of the noxious sites on the web.
- Releasing Fast and Slow — Our research shows that: rapid releases are more commonly delayed than their non-rapid counterparts; however, rapid releases have shorter delays; rapid releases can be beneficial in terms of reviewing and user-perceived quality; rapidly released software tends to have a higher code churn, a higher test coverage, and a lower average complexity; challenges in rapid releases are related to managing dependencies and certain code aspects—e.g., design debt.
- Embracing Innovation in Government (OECD) — a global review that explores how governments are innovating and taking steps to make innovation a routine and integrated practice across the globe.
- Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning — We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high-quality speech in multiple languages. Moreover, the model is able to transfer voices across languages—e.g., synthesize fluent Spanish speech using an English speaker’s voice, without training on any bilingual or parallel examples. Such transfer works across distantly related languages—e.g. English and Mandarin.
Four short links: 11 July 2019
Museum Copyright, Twitter Apprenticeship, AI Regulation, and Computational Biology
- The Great Wave: What Hokusai’s Masterpiece Tells us About Museums, Copyright, and Online Collections Today — If we consider the customer journey of acquiring a digital image of ‘The Great Wave’ from our fourteen museums, a definite trend emerges — the more open the policy of a museum is, the easier it is to obtain its pictures. Like the other open access institutions in our sample group, The Art Institute of Chicago’s collections website makes the process incredibly simple: clicking once on the download icon triggers the download of a high-resolution image. In contrast, undertaking the same process on the British Museum’s website entails mandatory user registration and the submission of personal data.
- Introducing the Twitter Engineering Apprenticeship Program — Through our new apprenticeship program, participants will go through a one-year rotation program with full-time employment benefits. Upon completion of the program, they will graduate and join one of our engineering teams. For under-represented and non-traditional backgrounds, but I’d love to see more apprenticeship models in software orgs.
- Regulation of AI (LOC) — This report examines the emerging regulatory and policy landscape surrounding artificial intelligence (AI) in jurisdictions around the world and in the European Union.
- A Primer for Computational Biology — an open textbook from Oregon State.
Four short links: 10 July 2019
Optimisations and Security, 512 Byte Pacman, Cell Security, and Meme AI
- Security Implications Of Compiler Optimizations On Cryptography — A Review — This paper is a literature review of (1) the security complications caused by compiler optimizations, (2) approaches used by developers to mitigate optimization problems, and (3) recent academic efforts towards enabling security engineers to communicate implicit security requirements to the compiler. In addition, we present a short study of six cryptographic libraries and how they approach the issue of ensuring security requirements. With this paper, we highlight the need for software developers and compiler designers to work together in order to design efficient systems for writing secure software.
- Pillman — Pac-Man in 512 bytes, small enough to fit on a boot sector. Impressive feat, and nicely documented.
- Gotta Catch ‘Em All: Understanding How IMSI-Catchers Exploit Cell Networks (EFF) — with this post we hope to make accessible the technical inner workings of CSSs [Cell Site Simulators, the IMSI catchers used by law enforcement and others], or rather, the details of the kind of attacks they might rely on. For example, what are the different kinds of location tracking attacks and how do they actually work? Another example: it’s also widely believed that CSSs are capable of communication interception, but what are the known limits around cell network communication interception and how does that actually work? (via BoingBoing)
- Memelearning — In this post we’ll share how we used TensorFlow’s object detection API to build a custom image annotation service for eyeson. Below you can seen an example where Philipp is making the “thinking” ? pose during a meeting which automatically triggers a GIF reaction. I don’t think automatically triggering is awesome, but certainly having them queued up for you to use would be good.