Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 18 October 2019
Automated Reasoning, Streamsheets Open Source, Build Management, and Assembler Robots
- The NAI Suite — A prototype for automated reasoning over legal texts, called NAI, is presented. As an input, NAI accepts formalized logical representations of such legal texts that can be created and curated using an integrated annotation interface. The prototype supports automated reasoning over the given text representation and multiple quality assurance procedures.
- Streamsheets — open source release of an open source tool for making your data immediately understandable and for creating IoT applications visually and interactively—without a single line of code.
- Bazel Hits 1.0 — Build software from Google. Key features of the release: semantic versioning, long-term support, features supported on Android, Angular, Java, and C++.
- Assembler Robots Make Large Structures from Little Pieces — “What’s at the heart of this is a new kind of robotics that we call relative robots,” Gershenfeld says. Historically, he explains, there have been two broad categories of robotics—ones made out of expensive custom components that are carefully optimized for particular applications such as factory assembly, and ones made from inexpensive mass-produced modules with much lower performance. The new robots, however, are an alternative to both. They’re much simpler than the former, while much more capable than the latter, and they have the potential to revolutionize the production of large-scale systems, from airplanes to bridges to entire buildings.
Four short links: 17 October 2019
Compositing Software, Website Vulnerabilities, Logic Puzzles, and FPGA-in-USB
- Natron — open source compositing software for VFX and motion graphics.
- is-website-vulnerable — finds publicly known security vulnerabilities in a website’s front end JavaScript libraries.
- Solving Logic Grid Puzzles With An Algorithm That Imitates Human Behavior — We present in this paper our solver for logic grid puzzles. The approach used by our algorithm mimics the way a human would try to solve the same problem. Every progress made during the solving process is accompanied by a detailed explanation of our program’s reasoning. Since this reasoning is based on the same heuristics that a human would employ, the user can easily follow the given explanation.
- Fomu — a programmable FPGA device that fits inside a USB port. It has four buttons, an RGB LED, and an FPGA that is compatible with a fully open source chain and capable of running a RISC-V core. Fomu comes in a custom plastic enclosure that slots perfectly into a USB Type-A port.
Four short links: 16 October 2019
Scriptural Inference, Being Sherlocked, Hand-styled Charts, and Early-stage Startup Programming Principles
- Searching for Alternative Facts (Francesca Tripodi) — Since Google is seen as a neutral purveyor of information, it becomes a conduit for accessing “unbiased” information. And while this quest for truth may start in good faith, significant risks follow: first, searches meant to question political reality can reinforce existing ideological beliefs; second, services like Google and YouTube can unintentionally expose individuals who consider themselves “mainline conservatives” to “far-right” and “alt-right” content through algorithmic recommendations; and third, bad actors looking to exploit an audience disillusioned with mainstream media can take advantage of such intellectual exploration. See also BoingBoing.
- What to Do When You Get Sherlocked by Apple — as someone who worked at a company that got Sherlocked by Google, these points ring true.
- roughViz — Reusable JavaScript library for creating sketchy/hand-drawn styled charts in the browser.
- Programming Principles for Early-stage Startups — 1. Expect to re-write your code and do not over architecture. 2. Use consistency and agree upon the rules. 3. Solve system problems, not the immediate problem. 4. Keep sprints short and features small. 5. Focus on good database design. 6. Avoid processes that add too much overhead.
Four short links: 15 October 2019
NSA Cybersec, Collaborative Natural Language Understanding, Ugly Language, and Lament for Computer Files
- NSA Cybersecurity Directorate — The command center is staffed 24/7, and teams cycle in every 12 hours to monitor real-time internet activity and cyber threats as they unfold over the world. Its connectivity with global intelligence partners ensures immediate communication over global cyber crises. The article has a lot of “cyber” and uses phrases like “souped-up computers,” but the shape of the NSA’s approach is apparent and interesting, especially Neuberger said one of her new directorate’s goals is to provide more actionable threat intelligence at the unclassified level so that partners, customers, and private sector firms can actually reap benefits in real time. Weird to think of a spook shop as having “customers” beyond, say, the President.
- CerealBar — a two-person collaborative game. We built CerealBar to study natural language understanding in collaborative interactions.
- Tilton — At this point, I should have noticed that this language was going to be inexcusably ugly, but astonishingly, I did not notice at the time. I kept pushing on, inspired by better languages like TRAC and LISP. I determined that this was the wrong approach for dealing with browser incompatibility, but I completed the language anyway. I named it Tilton after Robert Tilton, a television faith healer and speaker of tongues. I believe that Tilton is the ugliest programming language that was not intended to be an ugly programming language.
- Computer Files Are Going Extinct — years ago, websites were made of files; now they are made of dependencies.
Four short links: 14 October 2019
Detecting Manipulated Face Images, Deep Learning Cheat Sheets, Chinese Cybersecurity, and Streaming Dataflow
- FaceForensics++: Learning to Detect Manipulated Facial Images — This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. (GitHub)
- CS 230 — My twin brother Afshine and I created this set of illustrated deep learning cheat sheets covering the content of the CS 230 class, which I TA-ed in Winter 2019 at Stanford. They can (hopefully!) be useful to all future students of this course as well as to anyone else interested in deep learning.
- China’s New Cybersecurity Program: NO Place to Hide — This system will apply to foreign owned companies in China on the same basis as to all Chinese persons, entities, or individuals. No information contained on any server located within China will be exempted from this full coverage program. No communication from or to China will be exempted. There will be no secrets. No VPNs. No private or encrypted messages. No anonymous online accounts. No trade secrets. No confidential data. Any and all data will be available and open to the Chinese government.
- Noria — a new streaming dataflow system designed to act as a fast storage back end for read-heavy web applications. […] It acts like a database, but precomputes and caches relational query results so that reads are blazingly fast. Noria automatically keeps cached results up to date as the underlying data, stored in persistent base tables, change. Noria uses partially stateful dataflow to reduce memory overhead, and supports dynamic, runtime dataflow and query change.
Four short links: 11 October 2019
Resilience Engineering, Ancient Emulators, Long Timespan Design, and Exporting Section 230
- Resilience Engineering Papers — This doc contains notes about people active in resilience engineering, as well as some influential researchers who are no longer with us, organized alphabetically. It also includes people and papers from related fields, such as cognitive systems engineering and naturalistic decision-making.
- Sweet 16 — a metaprocessor or “pseudo microprocessor” implemented in 6502 assembly language. Originally written by Steve Wozniak and used in the Apple II, Sweet 16 can also be ported to other 6502-based systems to provide useful 16-bit functionality. This article includes the source code for Sweet 16, along with a brief history, programming instructions, and notes to help port it. I was amazed at how soon emulators appear in the history of computing—eg., John Backus’s Speedcode from 1953.
- Thinking, Storytelling, and Designing with Long Timespans — syllabus for class taught by Stuart Candy at the Long Now Foundation. (via Twitter)
- Section 230 Going Into Trade Deals (NYT) — The protections, which stem from a 1990s law, have already been tucked into the administration’s two biggest trade deals—the United States-Mexico-Canada Agreement and a pact with Japan that President Trump signed on Monday. American negotiators have proposed including the language in other prospective deals, including with the European Union, Britain, and members of the World Trade Organization. […] The American rules, codified in Section 230 of the Communications Decency Act, shield online platforms from many lawsuits related to user content and protect them from legal challenges stemming from how they moderate content. Those rules are largely credited with fueling Silicon Valley’s rapid growth. The language in the trade deals echoes those provisions but contains some differences.
Four short links: 10 October 2019
Unix Passwords, Remote Foo, Text Graphics, and AI in AppInventor
- Ken Thompson’s Unix Password — Somewhere around 2014 I found an /etc/passwd file in some dumps of the BSD 3 source tree, containing passwords of all the old timers such as Dennis Ritchie, Ken Thompson, Brian W. Kernighan, Steve Bourne, and Bill Joy. Those passwords are very amenable to modern cracking methods, but Thompson’s was the last to be cracked…
- How to Run a Remote-First Open-Space Un-Conference — neat!
- Libcaca — a graphics library that outputs text instead of pixels so that it can work on older video cards or text terminals.
- MIT’s AppInventor Now Does AI — AI with MIT App Inventor includes tutorial lessons as well as suggestions for student explorations and project work. Each unit also includes supplementary teaching materials: lesson plans, slides, unit outlines, assessments and alignment to the Computer Science Teachers of America (CSTA) K12 Computing Standards.
Four short links: 9 October 2019
Data Playbook, Global Politics Meets Tech, ML Models, and Lock-free Programming
- IFRC Data Playbook Toolkit — The Data Playbook Beta is a recipe book or exercise book with examples, best practices, how-to’s, session plans, training materials, matrices, scenarios, and resources. The data playbook will provide resources for National Societies to develop their literacy around data, including responsible data use and data protection. The content aims to be visual, remixable, collaborative, useful, and informative.
- The China Cultural Clash — as more companies have a financial interest in China (either partially owned by, or hoping to sell hard into), employees and users are being discouraged from sharing opinions that China disagrees with.
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Adrian Colyer) — Oddly enough given the paper title, the six lessons are never explicitly listed or enumerated in the body of the paper, but they can be inferred from the division into sections. My interpretation of them is as follows: (1) Projects introducing machine learned models deliver strong business value; (2) Model performance is not the same as business performance; (3) Be clear about the problem you’re trying to solve; (4) Prediction serving latency matters; (5) Get early feedback on model quality; (6) Test the business impact of your models using randomized controlled trials (follows from #2).
- Awesome Lock-Free — A collection of resources on wait-free and lock-free programming.
Four short links: 8 October 2019
Visual AI Tools, Software Design, Image Processing Chains, and Music Player Firmware
- Yellowbrick — Open source visual analysis and diagnostic tools to facilitate machine learning model selection.
- Eight Habits of Expert Software Designers: An Illustrated Guide — Experts imagine how a design will work—simulating aspects of the envisioned software and how the different parts of the design support a variety of scenarios. When working with others, experts regularly walk through a design by verbalizing its operation step by step. When alone, they simulate mentally, exercising the design repeatedly over time. This fits with my sense of good programmers as good simulators.
- ImagePlay — A rapid prototyping tool for building and testing image processing algorithms. It comes with a variety of over 70 individual image processors that can be combined into complex process chains. ImagePlay is completely open source and can be built for Windows, Mac, and Linux.
- Rockbox — A free replacement firmware for digital music players. It runs on a wide range of players.
Four short links: 7 October 2019
Screen Addiction, Data Viz, Algorithmic Bias, and Tools for Thought
- Addicted to Screens? That’s Really a You Problem (NY Times) — In “Indistractable,” which was published last month, Mr. Eyal has written a guide to free people from an addiction he argues they never had in the first place. It was all just sloughing off personal responsibility, he figures. So the solution is to reclaim responsibility in myriad small ways. For instance: have your phone on silent so there will be fewer external triggers. Email less and faster. Don’t hang out on Slack. Have only one laptop out during meetings. Introduce social pressure like sitting next to someone who can see your screen. Set “price pacts” with people so you pay them if you get distracted—though be sure to “learn self-compassion before making a price pact.”
- The Perceptual and Cognitive Limits of Multivariate Data Visualization — Almost all data visualizations are multivariate (i.e., they display more than one variable), but there are practical limits to the number of variables that a single graph can display. These limits vary depending on the approach that’s used. Three graphical approaches are currently available for displaying multiple variables: (1) encode each variable using a different visual attribute; (2) encode every variable using the same visual attribute; (3) increase the number of variables using small multiples. In this article, we’ll consider each.
- Any Sufficiently Advanced Neglect is Indistinguishable from Malice: Assumptions and Bias in Algorithmic Systems — A harm created through persistent ignorance, through willful ignorance of harm raised, is not necessarily very different from harm intentionally done.
- How Can We Develop Transformative Tools for Thought? — We believe now is a good time to work hard on this vision again. In this essay, we sketch out a set of ideas we believe can be used to help develop transformative new tools for thought. In the first part of the essay, we describe an experimental prototype system we’ve built, a kind of mnemonic medium intended to augment human memory. This is a snapshot of an ongoing project, detailing both encouraging progress as well as many challenges and opportunities. In the second part of the essay, we broaden the focus. We sketch several other prototype systems, and we address the question: why is it that the technology industry has made comparatively little effort developing this vision of transformative tools for thought?
Four short links: 4 October 2019
Understanding SQL, Pricing, App Configuration, and Internet Jurisdiction
- SQL Queries Don’t Start with SELECT (Julia Evans) — today I learned…
- Vickery Auctions to Discover Demand Curve — this is gold!
- Hydra — A framework for elegantly configuring complex applications. Python library that makes config, command-line flags, logging, etc.
- EU Wants Global Control of Facebook Content (Verge) — On Thursday, the European Union’s top court ruled that lower court judges could order Facebook to remove illegal comments from its platform, expanding on the power individual countries have to extend content bans across the world. See this thread for more commentary. Aside from substance issues, there is a major process issue: this ruling affects billions of Facebook users, but they were not represented in court. Important legal arguments about their rights were simply not raised or considered.
Four short links: 3 October 2019
Content Moderation, Go ORM, Adversarial Interoperability, and Random Sample Elections
- Why Do Companies With Huge Resources Still Have Terrible Moderation? — an extremely readable explanation of why it’s so damn hard. Hint: AI isn’t it.
- ent — An entity framework for Go. Simple, yet powerful ORM for modeling and querying data. Open source, from Facebook.
- Adversarial Interoperability (Cory Doctorow) — collection of articles on when you create a new product or service that plugs into the existing ones without the permission of the companies that make them.
- Random Sample Elections (David Chaum) — The number of voters sampled can be small, depending on how close the contest, yet give overwhelming confidence. For instance, if the margin is at least 10%, then a thousand votes will likely yield a result that itself, without any assumption about the margin and with only a one-in-a million chance of error, establishes that a majority are in favor—even with an electorate of millions or billions. This dramatic reduction in the number of voters participating in each election compared to a conventional election today yields a substantially proportionate reduction in cost.
Four short links: 2 October 2019
Data Fallacies, Transparency Reports, Encryption, and Experimental Declarative Programming Language
- Data Fallacies to Avoid — nifty infographic for the beginning torturer of data.
- Transparency Reports Suffering — “The momentum has faded,” says Peter Micek, general counsel with Access Now. The digital rights advocacy group is updating its index of transparency reports, which it last posted in 2016, and this pending revision will document serious stagnation in these disclosures. The worst rollbacks have happened when companies have merged or sold off large parts of their customer base, leaving the people involved doing business with new management that lacks the old management’s commitment to transparency.
- How Long Will Unbreakable Commercial Encryption Last? (Lawfare) — I believe the tech companies are slowly losing the battle over encryption. They’ve been able to bottle up legislation in the United States, where the tech lobby represents a domestic industry producing millions of jobs and trillions in personal wealth. But they have not been strong enough to stop the Justice Department from campaigning for lawful access. And now the department is unabashedly encouraging other countries to keep circling the tech industry, biting off more and more in the form of law enforcement mandates. That’s a lot easier in countries where Silicon Valley is seen as an alien and often hostile force, casually destroying domestic industries and mores.
- Sentient — an interesting experimental language to describe problems (Prolog-like), with SAT solvers under the hood to find solutions.
Four short links: 1 October 2019
Research, Observability, Self-Enumeration, and Probabilistic Programming
- Just Enough Research — a book that comes recommended by Simon Willison.
- Observations on Observability — I think the future of operating software systems at scale will look like process engineering. We will rely on continuous signals and look at software systems as dynamical systems. We will embrace similar techniques for process control and continuous improvement. This is why I do not like the term observability as it is currently used. In operating software systems at scale, we may want to reserve the traditional definition of observability as it relates to controllability.
- History of Self-Enumerating Pangram Tweet — a history of the work that went into making sentences (and now tweets) that enumerate their contents accurately.
- Anatomy of a Probabilistic Programming Framework — I realized that despite knowing a thing or two about Bayesian modeling, I don’t understand how probabilistic programming frameworks are structured, and therefore couldn’t appreciate the sophisticated design work going into PyMC4. So, I trawled through papers, documentation, and source code of various open source probabilistic programming frameworks, and this is what I’ve managed to take away from it.
Four short links: 30 September 2019
CLOUD Act, Ethical Consumption of Bits, TV Tracking, and Long Projects
- Stamos on CLOUD Act — cogent and informative set of tweets (words I never thought I would say) from Alex Stamos, with context for the latest piece of Internet regulation to get alarmist and wrong media coverage.
- Migrating from Cloudflare — This is pretty cool, and it’s why I’ve used Cloudflare for a few years. However, I don’t really like Cloudflare. I don’t like how they protect hate forums, where mass shootings are planned; I don’t like how they have grown to the point where a huge portion of the internet’s total traffic flows through their infrastructure; I don’t like how un-seriously they treat their responsibilities. So, I wanted to move off. More datapoints for the emerging Ethical Consumption of Bits.
- Three Recent Papers on the Tracking in TVs (Arvind Narayanan) — Here’s a doozy: Roku has a “Limit Ad Tracking” option. Turning it on increased the number of tracking servers contacted ? . It did prevent Roku’s AD ID from being leaked, but a whole bunch of other unique IDs are available. Even Pi-hole wasn’t that effective at limiting tracking. (via Hacker News)
- Strategies for Long Projects (Ben Brostoff) — Relentless, irrational optimism is the only attitude that works.
Four short links: 27 September 2019
Creative Coding, Collective Social Behavior, Programming Language Research, and Social Media Manipulation
- Intro to Creative Coding — this is the repo, also check out p5.js demos and tone.js demos. (via @mattdesl)
- The Dynamics of Collective Social Behavior in a Crowd-Controlled Game — We find that having a fraction of players who do not follow the crowd’s average behavior is key to succeed in the game.
- Recent Programming Language Research (SIGPLAN) — These papers showcase PL [Programming Language] connections to areas as diverse as chemical microfluidics, blockchain smart contracts, and automated debugging.
- 2019 Global Inventory of Organized Social Media Manipulation — Evidence of organized social media manipulation campaigns, which have taken place in 70 countries, up from 48 countries in 2018 and 28 countries in 2017. Social media has become co-opted by many authoritarian regimes. In 26 countries, computational propaganda is being used as a tool of information control in three distinct ways: to suppress fundamental human rights, discredit political opponents, and drown out dissenting opinions. Again, understand dystopic possible futures for your presently-democratic nation so you design your software to avoid them.
Four Short Links: 26 September 2019
Censorship, Tiktok, Machine Learning Workspace, and Deepfake Library
- Tiktok and Ethnic Cleansing — as hardline governments rise all around the world, the whole “design policy and tools for the worst case environment” is looking a whole lot more salient. There are worse situations in the world than that of Xinjiang’s Uyghurs, but not many. Tiktok’s shadowbans and content blocks are worth learning more about. Also in content moderation news today: Facebook is not fact-checking political speech.
- How Tiktok Is Changing the World and You’re Missing It — Imagine if you created a new account on a social network, you had zero followers, and you posted a piece of content, and then you went viral. That would be ridiculous right? Right, it would be ridiculous. But, that’s how TikTok works.
- ml-workspace — all-in-one web-based IDE specialized for machine learning and data science.
- Contributing Data to Deepfake Detection (Google) — To make this dataset, over the past year we worked with paid and consenting actors to record hundreds of videos. Using publicly available deepfake generation methods, we then created thousands of deepfakes from these videos. The resulting videos, real and fake, comprise our contribution, which we created to directly support deepfake detection efforts. As part of the FaceForensics benchmark, this dataset is now available, free to the research community, for use in developing synthetic video detection methods.
Four short links: 25 September 2019
Cleaning ImageNet, Thumbnails, Tracking Users, and AR Tabletop Gaming
- Removing Slurs from ImageNet — The first issue is that WordNet contains offensive synsets that are inappropriate to use as image labels. Although during the construction of ImageNet in 2009 the research team removed any synset explicitly denoted as ‘offensive,’ ‘derogatory,’ ‘pejorative,’ or ‘slur’ in its gloss, this filtering was imperfect and still resulted in inclusion of a number of synsets that are offensive or contain offensive synonyms. […] We are in the process of preparing a new version of ImageNet by removing all the synsets identified as ‘unsafe’ and ‘sensitive’ along with their associated images. This will result in the removal of 600,040 images, leaving 577,244 images in the remaining “safe” person synsets. To see unsafe labelling in action, try ImageNet Roulette and compare pictures of men and women, people with different colored skin, etc.
- Thumbor — open-source smart on-demand image cropping, resizing, and filters.
- SocialPath — a Django application for gathering social media intelligence on specific username. It checks for Twitter, Instagram, Facebook, Reddit and Stack Overflow. Collected data is sorted according to words frequency, hashtags, timeline, mentions, similar accounts, and presented as charts with the help of D3js. This technique allows me to track darknet users who does not use unique nicknames.
- Tilt Five: Holographic Tabletop Gaming (Kickstarter) — an AR gaming project from the remarkable Jeri Ellsworth.