Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 13 May 2020
Marcus Hutchins, Social Software, Lou Montulli, Hyped Technology
- The Confessions of Marcus Hutchins, the Hacker Who Saved the Internet — Story of the MalwareTech security researcher who foiled WannaCry, only to be arrested by the FBI for having sold malware as a kid. Young Marcus had terrible opsec.
- The Next Social Era is Here — Arguing we’re ready for another boom in social software. First, the pandemic is creating a new topology of psychological and emotional needs. […] Second, the work environment is now open game for new social products. Two reasons for this. First, we see how good communication can be with consumer products and demand the same excellence in our work lives. But second, and newer, is that in the last few months, the distance between our work identities and our home identities have blurred.
- Cookies, Chaos and the Browser: Meet Lou Montulli — An interview with a Web oldbie, the guy who worked on https, cookies, forms, animated GIFs, but who will always have a treasured spot in my heart for the Curses-based text-mode browser Lynx.
- Why we at $FAMOUS_COMPANY Switched to $HYPED_TECHNOLOGY — Hilarious parody of a tech announcement.
Four short links: 12 May 2020
Simulation Framework, Errors, Social Software, Politics of Information
- flecs — a Fast and Lightweight ECS (Entity Component System). An ECS […] is a way to organize code that is mostly used in gaming and simulation projects. ECS code generally performs better than traditional OOP, and is typically easier to reuse. The main differences between ECS and OOP are composition is a first class citizen in ECS, and that data is represented as plain data types rather than encapsulated classes.
- Two Ways to Categorize Errors — two dimensions that are useful for categorizing errors: Exceptional Errors vs. Failures; Internal vs. External Errors. Often the first step to solving a problem is finding the right lens to look at it through.
- Chatting with Glue — An interestingly-presented set of ideas about how we might offer more structural affordances in chat software to assist comprehension. I’m not doing it justice: it’s provocative. How to help people think better with software is a conversation I’m always up for, so this has really hit my buttons.
- The Best Books on the Politics of Information — If we are to understand how politics and markets work at the moment, we need to pay attention to how algorithms work, and how the economy is being remade from the ground up by these new forms of information processing. […] My starting point was ‘Okay, if we started thinking about the core of a curriculum for a course on this topic, what could we include?’ These would be the core books you would want as part of the discussion.
Four short links: 11 May 2020
Go IRC Server, DeepFake Cartoon Voices, System Programming Book, TDD Data
- Oragono — a modern IRC server written in Go.
- DeepFake Cartoon Voices — Fifteen.ai is a text-to-speech tool that you can use to generate 44.1 kHz voices of various characters. The voices are generated in real time using multiple audio synthesis algorithms and customized deep neural networks trained on very little available data (between 55 seconds and 120 minutes of clean dialogue for each character). This project demonstrates a significant reduction in the amount of audio required to realistically clone voices while retaining their affective prosodies.
- System Programming Book — CS241 “Intro to Systems Programming” textbook that was created in a wiki by University of Illinois students over 5 years.
- Realizing Quality Improvement Through Test Driven Development: Results and Experiences of Four Industrial Teams — The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD.
Four short links: 8 May 2020
Machine Learning Math, Nerd Humour, Open Data Analytics, Radar Trends
- Mathematics for Machine Learning — We wrote a book on Mathematics for Machine Learning that motivates people to learn mathematical concepts. The book is not intended to cover advanced machine learning techniques because there are already plenty of books doing this. Instead, we aim to provide the necessary mathematical skills to read those other books.
- Cards Against Containers — nerd cards a-la Cards Against Humanity. (But without the swears.)
- OpenSAFELY — a new secure analytics platform for electronic health records in the NHS, created to deliver urgent results during the global COVID-19 emergency. It is now successfully delivering analyses across more than 24 million patients’ full pseudonymised primary care NHS records, with more to follow shortly. All our analytic software is open for security review, scientific review, and re-use. An amazing collaborative piece of work that you can read about in Ben Goldacre’s thread.
- Radar Trends to Watch in May 2020 — Mike Loukides’s roundup of weak signs of the future.
Four short links: 7 May 2020
Reverse Engineering, IBM System/370 on a Pi, Sparklines, and Content Moderation
- Super Bootable 64 — Super Mario 64 shipped before the SDK was finalised, and it had to be compiled with optimisations turned off. This meant the binary was easily reversed to source code, and now the unportable has been ported. This site probably won’t last long, because DMCA, but it’s technically a sweet feat. (via lobsters)
- IBM System/370 on a Raspberry Pi — I have been running a full IBM System/370 Mainframe on a $5 Raspberry Pi Zero for ~5 years. About 7 times faster System/370. Millions of lines of COBOL JCLs running flawless on a battery. Tested an entire bank’s mainframe COBOL on it.
- sparks — A typeface for creating sparklines in text without code.
- Announcing the First Members of the Oversight Board — The Board will review whether content is consistent with Facebook and Instagram’s policies and values, as well as a commitment to upholding freedom of expression within the framework of international norms of human rights. We will make decisions based on these principles, and the impact on users and society, without regard to Facebook’s economic, political or reputational interests. Facebook must implement our decisions, unless implementation could violate the law. Impressive credentials. I’d love to be a fly on the wall for their conversations, because this problem is Hard.
Four short links: 6 May 2020
Open Source Spectrometer, Software Frames, Deleting Data, and Deep Learning for Scientific Discovery
- Raman Spectroscopy — Low Cost, High Performances, 100% Open Source Raman Spectrometer. […] We currently offer the spectrometer in a Starter Edition version designed for teaching Raman spectroscopy and we will soon release a Performance Edition version which achieves a tested 12 cm-1 resolution at low costs. Great to see this getting into the hands of hackers.
- Frames in Software Development — not the Lisp AI frames, but the semantic frames. I always wondered why it isn’t called “product debt” because product took the credit to get a feature faster and must pay back by investing the time to clean up. Technology is the bank that gave credit.
- Phoenix Framework — a web development framework written in Elixir which implements the server-side Model View Controller (MVC) pattern. I’m reminded of ceej’s “Write your own frameworks. You learn a lot. Your framework might solve a problem your ecosystem needs to have solved. By your tenth one, you know enough to write one worth wide adoption. Progress in our industry depends on all of us pushing it forward.”
- Deleting Data Distributed Throughout Your Microservices Architecture — One solution is to think of data deletion not as an event, but as a process. At Twitter, we call this process “erasure” and coordinate data deletion between systems using an erasure pipeline. In this post, we’ll discuss how to set up an erasure pipeline, including data discoverability, access, and processing. We’ll also touch on common problems and how to ensure ongoing maintenance of an erasure pipeline.
- A Survey of Deep Learning for Scientific Discovery — The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models — two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community.
Four short links: 5 May 2020
Leaving Amazon, Observability, Mining Tables, Distributed Ledgers
- Leaving Amazon (Tim Bray) — May 1st was my last day as a VP and Distinguished Engineer at Amazon Web Services, after five years and five months of rewarding fun. I quit in dismay at Amazon firing whistleblowers who were making noise about warehouse employees frightened of Covid-19.
- Observability is a Many-Splendoured Thing (Charity Majors) — if you can’t predict all the questions you’ll need to ask in advance, or if you don’t know what you’re looking for, then you’re in o11y territory.
- Using Neural Networks to Find Answers (Google) — deep learning to figure out how to turn natural language questions into queries over tables of data.
- Redesigning Trust: Blockchain Deployment Toolkit — World Economic Forum report on distributed ledger deployments, with advice. This toolkit provides tools, resources, and know-how to organizations undertaking blockchain projects. It was developed through lessons from and analysis of real projects, to help organizations embed best practices and avoid possible obstacles in deployment of distributed ledger technology
Four short links: 4 May 2020
OS for Heterogeneous Hardware, Code Generation, Only People Get Patents, and Reasoning to Find Fake News
- Popcorn Linux — exploring how to improve the programmability of emerging heterogeneous hardware, in particular, those with Instruction Set Architecture (ISA)-diverse cores, from node-scale (e.g., Xeon/Xeon-Phi, ARM/x86, CPU/GPU/FPGAs) to rack-scale (e.g., Scale-out processors, Firebox, The Machine), in both native and virtualized settings. Additionally, the project is exploring how to automatically compile/synthesize/execute code on ISA-heterogeneous hardware.
- Incorporating External Knowledge through Pre-training for Natural Language to Code Generation — In the second and third example, we can see that the baseline uses the wrong API calls, and sometimes “makes up” APIs on its own (e.g. “random.savefig()”). However, our approach’s outputs, while not perfect, are much more successful at generating correct API calls that actually exist and make sense for the intent. The algorithm developers have made the system guess likely API calls as programmers do.
- US Patent Office Rules that Artificial Intelligence Cannot be a Legal Inventor (Verge) — “Under current law, only natural persons may be named as an inventor in a patent application,” the agency concluded. The ruling text has the arguments.
- Detecting Fake News for the New Coronavirus by Reasoning on the Covid-19 Ontology — interesting to see symbolic AI (reasoning over propositions) being useful here. In the context of the Covid-19 pandemic, many were quick to spread deceptive information. I investigate here how reasoning in Description Logics (DLs) can detect inconsistencies between trusted medical sources and not trusted ones. The not-trusted information comes in natural language (e.g. “Covid-19 affects only the elderly”). To automatically convert into DLs, I used the FRED converter. Reasoning in Description Logics is then performed with the Racer tool.
Four short links: 1 May 2020
ESP8266 Firmware, Deep Learning Anime, Cyber Wargaming, and Deep Learning Music
- Tasmota — Alternative firmware for ESP8266 with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX.
- Selfie 2 Waifu — deep learning constructs an anime character from your photo. Paper for the underlying technique. (via @tkasasagi)
- The Handbook of Cyber Wargames: Wargaming the 21st Century — Cyber wargaming combines two complex fields: wargame design and cyber operations. This handbook is full of examples of such manual games. It includes examples of: Network attack and defence exercises; Committee games; Company and state level games; Example of a Matrix Game; Analysing the cyber security space using Confrontation Analysis; Media Wars: The Battle to Dominate the Information Space; Attack Chain modelling. (via Nick Drage)
- OpenAI Jukebox — deep learning makes actual music in recognisable styles. There’s a clever encoding of audio to make it learnable. It takes approximately 9 hours to fully render one minute of audio through our models. Yow.
Four short links: 30 April 2020
Microservices Problems, GNU Binary Editor, Open-Domain Chatbot, and CopyLeft Conf 2020 Videos
- To Microservices and Back Again: Why Segment Went Back to a Monolith — microservices came with increased operational overhead and problems around code reuse. … If microservices are implemented incorrectly or used as a band-aid without addressing some of the root flaws in your system, you’ll be unable to do new product development because you’re drowning in the complexity.
- GNU poke — interactive editor for binary data. Not limited to editing basic entities such as bits and bytes, it provides a full-fledged procedural, interactive programming language designed to describe data structures and to operate on them. (via Kernel Recipes)
- Blender — Facebook open sourced their open-domain (“can talk about anything!”) chatbot. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements.
- CopyLeft Conf 2020 Videos — the schedule has more info on each talk.
Four short links: 29 April 2020
Game Theory, Disinfo, Ransomware, and Debugging Tales
- podpaperscissors — From the classic “prisoner’s dilemma” to more obscure coördination games, Pod Paper Scissors takes game theory out of the dry textbook and into the real world. … Each episode will feature different kinds of games and situations. Experts in a variety of fields will casually converse with the hosts about how the particular game discussed applies to their work. Some episodes feature original music inspired by the topic at hand. The podcast is hosted by game theorist Ben Klemens and science journalist and composer Liz Landau. (via Ben Klemens)
- Verification Handbook (3ed) — latest guide to investigating disinformation and media manipulation, covering identifying actors, investigating platforms, tracking ads, etc. (via Craig Silverman)
- Ransomware Groups (Microsoft) — analysis of ransomware campaigns yields this report, which includes a great graphic taxonomy of ransomware payloads.
- Bug Stories — great tales of bugs and bug-hunting from the past.
Four short links: 28 April 2020
Learning a Language, Vulnerability Assessment, Distributed Consensus, and AI Playing Football
- Learning a Language — this list of questions facing anyone taking a new language for a test run just burns with truth. (Also: encouraging to see how many of these questions are answered by the Cookbook format)
- OpenVAS — Open Vulnerability Assessment Scanner, aka “what a cheap external security assessment vendor will run and then mail you the report from.”
- Paxos vs Raft: Have we Reached Consensus on Distributed Consensus? — We find that both Paxos and Raft take a very similar approach to distributed consensus, differing only in their approach to leader election. Most notably, Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader provided it then updates its log to ensure it is up-to-date. Raft’s approach is surprisingly efficient given its simplicity as, unlike Paxos, it does not require log entries to be exchanged during leader election. We surmise that much of the understandability of Raft comes from the paper’s clear presentation rather than being fundamental to the underlying algorithm being presented.
- Google Research Football — a novel Reinforcement Learning environment where agents aim to master the world’s most popular sport—football. Modeled after popular football video games, it provides a physics based 3D football simulation where agents control either one or all football players on their team, learn how to pass between them, and manage to overcome their opponent’s defense in order to score goals.
Four short links: 27 April 2020
Distributed Computation, Consistency Maps, WebAssembly on FPGA, and Reliable Information
- Teleforking a Process onto a Different Computer — a working proof of concept (I just don’t replicate tricky things so that I could keep it simple, meaning it’s just a fun tech demo you probably shouldn’t use for anything real) of a telefork() function call that spawns a process on another machine and returns the instance ID.
- Consistency Maps — Jepsen analyses the safety properties of distributed systems–most notably, identifying violations of consistency models. But what are consistency models? What phenomena do they allow? What kind of consistency does a given program really need? In this reference guide, we provide basic definitions, intuitive explanations, and theoretical underpinnings of various consistency models for engineers and academics alike.
- wasmachine — wasmachine is an implementation of the WebAssembly specification in a FPGA. It follows a sequential 6-steps design.
- Expert Twitter Only Goes So Far: Bring Back Blogs (Wired) — we’re surrounded by opinion machines (because opinion is cheap to produce and make inflammatory, it’s a natural fit for engagement-driven businesses), so it’s nice to find knowledgeable people sharing their expertise. I see The Syllabus and newsletter systems like substack as part of the response to this dearth of high-alpha content. More please!
Four short links: 24 April 2020
Remote Playbook, GPU Graphics in Python, NLP Training Costs, and Net Neutrality
- The Suddenly Remote Playbook — I just want to note that if you have to look after kids when you’re supposed to be working, you’re not working from home. Not everyone’s getting a glorious introduction to the delights of working from home.
- taichi — a programming language designed for high-performance computer graphics. It is deeply embedded in Python, and its just-in-time compiler offloads compute-intensive tasks to multi-core CPUs and massively parallel GPUs.
- The Cost of Training NLP Models — We review the cost of training large-scale language models, and the drivers of these costs. The intended audience includes engineers and scientists budgeting their model-training experiments, as well as non-practitioners trying to make sense of the economics of modern-day Natural Language Processing (NLP).
- Killing Net Neutrality Did Not Save the Pandemic Internet — there’s no evidence that European networks have fallen apart during the COVID-19 crisis. Or that any differences in performance have anything to do with deregulation or net neutrality. Netflix’s decision to throttle back its bandwidth usage by 25% was done entirely pro-actively. There was no underlying network data provided by regulators to justify the move. It was just EU regulators being cautious (perhaps overly so). Indeed, similar steps have been taken here in the States. YouTube for example has downgraded video quality to conserve bandwidth. So has game platform Steam, which is slowing some game downloads. You can’t selectively highlight the EU’s efforts on this front then ignore the US ones because it supports your flimsy narrative. Well I guess you can, but you should be laughed at.
Four short links: 23 April 2020
Packet Capture, Improving Instagram with Deep Learning, Data Science, and The Spotify Model
- Moloch — Large scale, open source, indexed packet capture and search.
- 3Dify Instagram Photos — open source toolset for adding a 3d effect to photos on Instagram’s web site. It uses 3d-photo-inpainting running in Colab (free GPU) and Cloud pubsub/storage for communication. A glimpse of the future: we could augment all our apps with deep learning-based services, but we still need to conquer paying for the GPUs and making it easy to use.
- xkcd 2295 — data science in a nutshell.
- Spotify Doesn’t Use “the Spotify Model” and Neither Should You (Jeremiah Lee) — I no longer work at Spotify, so I am sharing my experience to set the record straight. The Spotify squad model failed Spotify and it will fail your company too. EXTREMELY well-written. Full of killer points like Every responsibility a team cedes to increase its focus becomes a new cross-team dependency.
Four short links: 22 April 2020
Product Analytics, Mainframe Stories, Simulation NetLogo, and Database Wisdom
- Posthog — open source product analytics.
- Into the Mainframe (Recurse) — the interviews with two mainframe programmers are a great reminder of how much things have changed. And how they haven’t. For instance, later in my career I kept a weighted punching clown in my office. As programmers, we liked our users, but we also sort of hated them. They would make all these unreasonable requests, give us bad data, stuff like that. So all my staff could come by my office when they were mad at their users and punch the clown to feel better. It was fun. I had two doors in my office, and one time some guy I’d never seen before in my life walked into my office without knocking, punched the clown, and walked out the other door. Never saw him again.
- NetLogo — a multi-agent programmable modeling environment. For simulations/modeling.
- Things I Wished More Developers Knew About Databases (Jaana B. Dogan) — really good points, hard won from experience. You are lucky if 99.999% of the time network is not a problem.
Four short links: 21 April 2020
Making Change, Big Graphs, Conference in Animal Crossing, and Bug Sorting with ML
- It’s Time to Learn (Scott Berkun) — a strong response to Marc Andreessen’s It’s Time to Build. It feels like we are in a disrupted time when anything is possible, and folks are wondering where the levers are to pull.
- pygraphistry — a library to extract, transform, and visually explore big graphs.
- Desert Island Devops — a single-day virtual event, to be livestreamed on twitch.tv/oncallmemaybe on April 30th, 2020. All presentations will take place in the world of Animal Crossing: New Horizons.
- MSFT’s Machine Learning-Powered Bug Sorting — Since 2001 Microsoft has collected 13 million work items and bugs. We used that data to develop a process and machine learning model that correctly distinguishes between security and non-security bugs 99 percent of the time and accurately identifies the critical, high priority security bugs, 97 percent of the time. This is an overview of how we did it. Part of the ongoing augmentation of developers by (ML-powered) software.
Four short links: 20 April 2020
Structured Database, Mainframes, Distributed Platform, and TIL Log
- CastleDB — a structured static database […]. CastleDB looks like any spreadsheet editor, except that each sheet has a data model. […] stores both its data model and the data contained in the rows into an easily readable JSON file. […] allows efficient collaboration on data editing.
- Mainframes Are Having a Moment (IEEE Spectrum) — Although many college and university computer science departments have cut back or dropped mainframe programming curriculum to focus on more modern languages and technologies, faculty and staff at others report an uptick in interest in Cobol and related classes. The increase began well before pandemic-related layoffs inundated state unemployment agency computer systems, causing government officials to put out the call for programmers who know Cobol to step in and help.
- swimOS — a complete, self-contained distributed software platform for building stateful, massively real-time streaming applications. swimOS implements a distributed microkernel, called the Swim Kernel, that is persistent without a database, reactive without a message broker, autonomous without a job manager, and which executes general purpose stateful applications without a separate app server.
- Using a Self-Rewriting README Powered by GitHub Actions to Track TILs (Simon Willison) — writing down what you’ve learned how to do keeps it fresh. I’ve been doing it for years, as have other people — check out this person’s astonishing collection.