Four Short Links
Nat Torkington’s eclectic collection of curated links.
Four short links: 13 June 2019
Audio GIF, OneMetric, Percolator and Spanner, and Knitting Data
- Audio GIF — An Audio GIF stores audio inside a standards compliant GIF image.
- OneMetric — Understand, share, and improve the numbers that matter to your business.
- Implementing Distributed Transactions the Google Way: Percolator vs. Spanner — Multiple open source ACID-compliant distributed databases have started building such transactions by taking inspiration from research papers published by Google. In this post, we dive deeper into Percolator and Spanner, the two Google systems behind those papers, as well as the open source databases they have inspired.
- Borough Mayor is Knitting to Prove Men Speak Too Much at Meetings (Montreal Gazette) — Sue Montgomery, mayor of the Côte-des-Neiges—Notre-Dame-de-Grâce borough, this week announced she is knitting during council meetings to help her concentrate and to demonstrate that men speak more than women at the meetings. Montgomery knits in red when men speak and in green when women do. So far, she has 15 inches of scarf, 80% of it red. I love physical reification of data.
Four short links: 12 June 2019
Serverless Microservice Patterns, Organizing Information, Internet Trends, and Fake Videos
- Serverless Microservice Patterns for AWS (Jeremy Daly) — I’ve read a lot of posts that mention serverless microservices, but they often don’t go into much detail. I feel like that can leave people confused and make it harder for them to implement their own solutions. Since I work with serverless microservices all the time, I figured I’d compile a list of design patterns and how to implement them in AWS. I came up with 19 of them; though, I’m sure there are plenty more.
- Fans are Better Than Tech at Organizing Information Online (Wired) — coverage of Archive Of Our Own (AO3), a fanfic archive which is nominated for a Hugo this year. AO3’s trick is that it involves humans by design—around 350 volunteer tag wranglers in 2019, up from 160 people in 2012—who each spend a few hours a week deciding whether new tags should be treated as synonyms or subsets of existing tags, or simply left alone. AO3’s Tag Wrangling Chairs estimate that the group is on track to wrangle over two million never-before-used tags in 2019, up from around 1.5 million in 2018.
- Mary Meeker’s Internet Trends, 2019 Edition — like April Fool’s Day, it’s a landmark in the industry, but fewer people look forward to it with glee these days. The big trends driving growth (Moore’s Law, sales of mobile growth, people connected to the internet) have slowed down. Internet ad spend is still rising, customer acquisition costs are going up, etc. Two eye-watering facts: Americans are spending 6.3h on digital media/day, up 7% from the year before, and people are increasingly communicating in images –> 50% of Twitter impressions are of posts with media, which is startling for a medium that was originally SMS.
- Testing Facebook’s Fake Video Policy (Vice) — a fake video of Mark Zuckerberg was uploaded to test their policy. They’re treating it like the earlier Pelosi video: Instead of deleting the video, the company chose to de-prioritize it, so that it appeared less frequently in users’ feeds, and placed the video alongside third-party fact-checker information.
Four short links: 11 June 2019
Premium Firefox, FPGAs for Graph Processing, Decision Framework, and The Online Experience of South Asian Women
- Possible Premium Firefox Coming (ZDNet) — an interesting approach for Firefox, but I’d pay for something as good as Chrome that didn’t have the mixed incentives for developers.
- Graph Processing on FPGAs: Taxonomy, Survey, Challenges — Our survey describes and categorizes existing schemes and explains key ideas. Finally, we discuss research and engineering challenges to outline the future of graph computations on FPGAs.
- Decision Disagreement Framework: How We Encourage Disagreements at Matter — we couldn’t find a framework for handling and supporting disagreements after decisions have been made, especially if you weren’t a part of making that decision. We took inspiration from existing frameworks to create the Decision Disagreement Framework.
- Understanding the Online Safety and Privacy Challenges Faced by South Asian Women — This post, after providing a short background, covers the following topics: Device privacy challenges: This section outlines the privacy challenges faced by South Asian women when using their smartphones; Online safety challenges: Highlights the risks and abuse faced by South Asian women when using online services; Design considerations to promote gender equity: When building products, features that mitigate the risks would help to improve the safety of South Asian women. Ethnographic study that’s super useful for systems designers who aren’t South Asian women.
Four short links: 10 June 2019
Remote Code Development, PWA Builder, Why Platforms Fail, and Designing Rituals
- Visual Studio Code Remote Development May Change Everything (Scott Hanselman) — Visual Studio Code Remote Development allows you to use a container, remote machine, or the Windows Subsystem for Linux (WSL) as a full-featured development environment. It effectively splits VS Code in half and runs the client part on your machine and the “VS Code Server” basically anywhere else. […] As I mentioned, you can run within WSL, containers, or over SSH. It’s early days, but it’s extraordinarily clean. I’m really looking forward to seeing how far and effortless this style of development can go. There’s so much less yak shaving! It effectively removes the whole setup part of your coding experience and you get right to it.
- PWA Universal Builder — scaffolding for Progressive Web Apps with your choice of frameworks, get optimizations and presets for free.
- A Study of More Than 250 Platforms Reveals Why Most Fail (HBR) — We grouped the most common mistakes into four categories: (1) mispricing on one side of the market, (2) failure to develop trust with users and partners, (3) prematurely dismissing the competition, and (4) entering too late. As always, the four categories aren’t significant—how do you go broke? You run out of money by failing to keep enough of it, or by never getting enough users to have enough money in the first place. The individual tales are where juicy stories and interesting thoughts form.
- Friday Wins and a Case Study in Ritual Design (Kellan Elliott-McCrea) — A standard piece of software development practice that many teams let lapse, or merely let lapse into being sub-optimal, is “Friday wins,” sometimes called sprint demos or sprint reviews. But you can take what can be a flaccid and repetitive meeting and make it a valuable ritual by grounding it in values.
Four short links: 7 June 2019
Energy of Deep Learning, Open Source Game Clones, Better Batteries, and Video Magic
- Energy and Policy Considerations for Deep Learning in NLP — training Transformer NLP model w/ neural architecture search is 626,155 lbs of CO2. Compare to Car, avg incl. fuel, 1 lifetime: 126,000 lbs. (via MIT TR)
- Open Source Game Clones — This site tries to gather open source remakes of great old games in one place.
- A Glass Battery That Keeps Getting Better (IEEE Spectrum) — grunty batteries without the fire would be a great thing, indeed, never mind one that got better. Goodenough and collaborators claimed they’d developed a non-flammable lithium battery (whose electrolyte was based on a glass powder) that had twice the energy density of traditional lithium-ion batteries. They also published a graph that showed an increase in capacity over more than 300 charge-discharge cycles. (This increase, however, pales in comparison to the cell’s at least 23,000-cycle lifespan.)
- Text-Based Editing — We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e., no jump cuts). You edit the script and the software edits the video so the person says what’s in the script. (via Stanford)
Four short links: 6 June 2019
Software Engineering for Machine Learning, Generalizations in Learning, Computer Dance, and Firefighting in Product Development
- Software Engineering for Machine Learning (Microsoft Research) — We collected some best practices from Microsoft teams to address [several essential engineering challenges that organizations may face in creating large-scale AI solutions for the marketplace]. In addition, we have identified three aspects of the AI domain that make it fundamentally different from prior software application domains: 1) discovering, managing, and versioning the data needed for machine learning applications is much more complex and difficult than other types of software engineering, 2) model customization and model reuse require very different skills than are typically found in software teams, and 3) AI components are more difficult to handle as distinct modules than traditional software components—models may be “entangled” in complex ways and experience non-monotonic error behavior.
- Open Long-Tailed Recognition (Berkeley) — A practical system shall be able to classify among a few common and many rare categories, to generalize the concept of a single category from only a few known instances, and to acknowledge novelty upon an instance of a never seen category. We define OLTR as learning from long-tail and open-end distributed data and evaluating the classification accuracy over a balanced test set which includes head, tail, and open classes in a continuous spectrum.
- Hype Cycle: Machine Learning (Vimeo) — dance being changed by computers.
- Past the Tipping Point: The Persistence of Firefighting in Product Development — In this paper, we try to answer three questions: (1) why does firefighting exist, (2) why does firefighting persist, and (3) what can managers do about it? The most important result of our studies is that product development systems have a tipping point. In models of infectious diseases, the tipping point represents the threshold of infectivity and susceptibility beyond which a disease becomes an epidemic. Similarly, in product development systems there exists a threshold for problem-solving activity that, when crossed, causes firefighting to spread rapidly from a few isolated projects to the entire development system. Our analysis also shows that the location of the tipping point, and therefore the susceptibility of the system to the firefighting phenomenon, is determined by resource utilization in steady state.
Four short links: 5 June 2019
Open Source, 3D Printer Wear, Multicore TCP, and Super-Resolution Images
- What’s Driving Open Source Software in 2019 — Results from our ranking of proposal phrases show the centrality of data to the open source community: “data” (the No. 5 term) outpacing “code” (the No. 14 term), the rise in AI/ML topics, and in the nascent cloud native paradigm where monitoring and analytics assume critical importance—highlighting the demand for skills in analytics, data acquisition, etc.
- Investigating 3D Printer Nozzle Wear (YouTube) — great video with cross-sections of worn nozzles and discussion of different materials.
- mTCP — high-performance user-level TCP stack for multicore systems. Scaling the performance of short TCP connections is fundamentally challenging due to inefficiencies in the kernel. mTCP addresses these inefficiencies from the ground up—from packet I/O and TCP connection management all the way to the application interface. Open source (modified BSD license).
- Handheld Multi-Frame Super-Resolution — In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multi-frame super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. […] Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone.
Four short links: 4 June 2019
Paper vs. Implementation, Recommendations Run Amok, Copyright Law, and Engineering Management
- Everything You Know About word2vec Is Wrong — The original word2vec C implementation does not do what’s explained above, and is drastically different.
- On YouTube’s Digital Playground, an Open Gate for Pedophiles (NYT) — YT’s recommendation algorithm suggested home movies of families’ kids to users who watched other videos of prepubescent, partially clothed children. (via BoingBoing)
- Canada’s Review of Copyright Law (BoingBoing) — sane proposals for fair dealing, safe harbour, TPMs, and lengthening copyright term.
- How to Size and Assess Teams From an Eng Lead at Stripe, Uber, and Digg — In this exclusive interview, Larson digs into two critical components of organization design. Specifically, he shares his system for gauging the size and state of engineering teams—in not only a highly efficient and effective way, but also with a deeply empathetic and ethical approach.
Four short links: 3 June 2019
Differential Privacy, Future Op-Ed, Spectroscopy, and Research Programming Environment
- Differential Privacy in the Census (Science) — Differential privacy, first described in 2006, isn’t a substitute for swapping and other ways to perturb the data. Rather, it allows someone—in this case, the Census Bureau—to measure the likelihood that enough information will “leak” from a public data set to open the door to reconstruction. “Any time you release a statistic, you’re leaking something,” explains Jerry Reiter, a professor of statistics at Duke University in Durham, North Carolina, who has worked on differential privacy as a consultant with the Census Bureau. “The only way to absolutely ensure confidentiality is to release no data. So the question is, how much risk is OK? Differential privacy allows you to put a boundary” on that risk.
- It’s 2059, and the Rich Kids Are Still Winning (NYT) — Ted Chiang’s “Op-Ed From the Future.” (via Slashdot)
- Classification of Household Materials via Spectroscopy — we collected a data set of spectral measurements from two commercially available spectrometers during which a robotic platform interacted with 50 distinct objects, and we show that a residual neural network can accurately analyze these measurements. Due to the low variance in consecutive spectral measurements, our model achieved a material classification accuracy of 97.7% when given only one spectral sample per object.
- Flowsheets — A research prototype programming environment for making programs while seeing the data the program outputs. See the demo video.
Four short links: 31 May 2019
Google Blocking Ad Blocking, Security Checklist, Maturity Model, and Software Engineering
- Google to Restrict Modern Ad Blocking Chrome Extensions to Enterprise Users (9 to 5 Google) — modern ad blockers, like uBlock Origin and Ghostery, use Chrome’s webRequest API to block ads before they’re even downloaded. With the Manifest V3 proposal, Google deprecates the webRequest API’s ability to block a particular request before it’s loaded. As you would expect, power users and extension developers alike criticized Google’s proposal for limiting the user’s ability to browse the web as they see fit. […] “Google’s primary business is incompatible with unimpeded content blocking. Now that Google Chrome product has achieve high market share, the content blocking concerns as stated in its 10K filing are being tackled.” See also Switch to Firefox.
- SaaS CTO Security Checklist — useful security tips, arranged by stage of your startup.
- Proposing a Maturity Model for Digital Services (David Eaves) — an interesting approach: describes the aspects of maturity (political environment, institutional capacity, delivery capability, skills and hiring, user centered design, cross-government platforms) and then has a rubric for the different aspects of each of them.
- Notes to Myself on Sofware Engineering (François Chollet) — Technology is never neutral. If your work has any impact on the world, then this impact has a moral direction. The seemingly innocuous technical choices we make in software products modulate the terms of access to technology, its usage incentives, who will benefit, and who will suffer. Technical choices are also ethical choices. Thus, always be deliberate and explicit about the values you want your choices to support. Design for ethics. Bake your values into your creations. Never think, I’m just building the capability; that in itself is neutral. It is not because the way you build it determines how it will get used. The whole list is great, and resonates strongly with my experience.
Four short links: 30 May 2019
Open Insulin, Sonification of Data, Security UX, and Advanced Data Structures
- Open Insulin — open biohacking group working on (and making progress toward) the open and cheap manufacturing of insulin.
- The Sound of Data — a gentle intro to sonification for historians.
- UX of Security — an interesting talk on the relationship between UX and security. I particularly liked: “the most expensive dialog box in the world costs an Australian bank $750,000,000/year for password resets.” Slides available. (via Jared Pool)
- Advanced Data Structures — MIT course.
Four short links: 29 May 2019
Robustness Principle, End of Mobile, Beautiful Hack, and Autonomous Radios
- The Harmful Consequences of the Robustness Principle (IETF) — Time and experience shows that negative consequences to interoperability accumulate over time if implementations apply the robustness principle. This problem originates from an assumption implicit in the principle that it is not possible to effect change in a system the size of the internet. That is, the idea that once a protocol specification is published, changes that might require existing implementations to change are not feasible.
- The End of Mobile (Stratechery) — deep dive into numbers on mobile adoption around the world. The end is the kicker, though: I’m not updating my smartphone model anymore. The next fundamental trends in tech, today, are probably machine learning, crypto, and regulation.
- GLS: Goroutine Local Storage — using the call stack to implement local storage for goroutines, against the language’s intentions. A splendidly hacky hack. What are people saying? “Wow, that’s horrifying.” “This is the most terrible thing I have seen in a very long time.” “Where is it getting a context from? Is this serializing all the requests? What the heck is the client being bound to? What are these tags? Why does he need callers? Oh god no. No no no.”
- If DARPA Has Its Way, AI Will Rule the Wireless Spectrum (IEEE) — To tackle spectrum scarcity, I created the Spectrum Collaboration Challenge (SC2) at the U.S. Defense Advanced Research Projects Agency (DARPA), where I am a program manager. […] Teams are designing new radios that use artificial intelligence (AI) to learn how to share spectrum with their competitors, with the ultimate goal of increasing overall data throughput. These teams are vying for nearly $4 million in prizes to be awarded at the SC2 championship this coming October in Los Angeles. Thanks to two years of competition, we have witnessed, for the first time, autonomous radios collectively sharing wireless spectrum to transmit far more data than would be possible by assigning exclusive frequencies to each radio.
Four short links: 28 May 2019
Research Libraries, Disinformation Campaign, Unstructured Text Mining, and Building a PiDP-11
- The Books of College Libraries Are Turning Into Wallpaper (The Atlantic) — University libraries across the country, and around the world, are seeing steady, and in many cases precipitous, declines in the use of the books on their shelves. […] Statistics show that today’s undergraduates have read fewer books before they arrive on campus than in prior decades, and just placing students in an environment with more books is unlikely to turn that around. (The time to acquire the reading bug is much earlier than freshman year.) And while correlation does not equal causation, it is all too conspicuous that we reached Peak Book in universities just before the iPhone came out. Part of this story is undoubtedly about the proliferation of electronic devices that are consuming the attention once devoted to books. The interaction between the book format, information scarcity, and digitalization plays out in research libraries.
- Live Coverage of a Disinformation Operation Against the 2019 EU Parliamentary Elections (F-Secure) — the visualization and research into botnet clusters is interesting.
- Knowledge Extraction from Unstructured Texts — an interesting rundown of approaches and papers.
- PiDP-11 Retro Computer Build (YouTube) — building and operating the PiDP-11.
Four short links: 27 May 2019
Better Figures, Neal Stephenson, Reputation Inflation, Interactive Code
- Ten Simple Rules for Better Figures — A more accurate definition for scientific visualization would be a graphical interface between people and data. In this short article, we do not pretend to explain everything about this interface. […] Instead we aim to provide a basic set of rules to improve figure design and to explain some of the common pitfalls.
- Neal Stephenson Explains His Vision of the Digital Afterlife — I saw someone recently describe social media in its current state as a doomsday machine, and I think that’s not far off. We’ve turned over our perception of what’s real to algorithmically driven systems that are designed not to have humans in the loop, because if humans are in the loop they’re not scalable and if they’re not scalable they can’t make tons and tons of money. The result is the situation we see today where no one agrees on what factual reality is and everyone is driven in the direction of content that is “more engaging,” which almost always means that it’s more emotional, it’s less factually based, it’s less rational, and kind of destructive from a basic civics standpoint.
- Reputation Inflation — A solution to marketplace information asymmetries is to have trading partners publicly rate each other post-transaction. Many have shown that these ratings are effective; we show that their effectiveness deteriorates over time. The problem is that ratings are prone to inflation, with raters feeling pressure to leave “above average” ratings, which in turn pushes the average higher. This pressure stems from raters’ desire to not harm the rated seller. As the potential to harm is what makes ratings effective, reputation systems, as currently designed, sow the seeds of their own irrelevance. AAAAAAAAA++ article, would read again.
- Dal Segno — interactive code editor (the language is a bit like Scheme) that, when you change a function, rewinds until the last time that function was called. It’s like magic.
Four short links: 24 May 2019
Forms by Configuration, GitHub Sponsors, SpaceX's LEO Internet, and a Gallery of Programmer Interfaces
- ncform — a very nice configuration generation way to develop forms.
- GitHub Sponsors — allowing donations.
- Starlink — SpaceX is developing a low latency, broadband internet system to meet the needs of consumers across the globe. Enabled by a constellation of low Earth orbit satellites, Starlink will provide fast, reliable internet to populations with little or no connectivity, including those in rural communities and places where existing services are too expensive or unreliable.
- Gallery of Programmer Interfaces — These images bear witness to the passionate work of so many people striving to improve programming. So often the cobbler’s children are barefoot.
Four short links: 23 May 2019
Deep Fakes, GPU-Friendly Codec, Retro OS, and Production Readiness
- Few-Shot Adversarial Learning of Realistic Neural Talking Head Models — astonishing work, where you can essentially do deep-fakes from one or two photos. See the YouTube clip for amazing footage of it learning from historical photos and even a painting. (via Dmitry Ulyanov)
- Basis Universal GPU Texture Codec — open source codec for a super-compressed image file format that can be quickly transcoded to something ready for GPUs. See this Hacker News comment for a very readable explanation of why it’s important for game developers.
- Serenity — open source OS for x86 machines, which seems like Unix with Windows 98 UI.
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction — We present a rubric as a set of 28 actionable tests, and offer a scoring system to measure how ready for production a given machine learning system is. With an implementation in Excel.
Four short links: 22 May 2019
Software-Defined Memory, SQL Analyzer, Wolfram Engine, and Victims of Passion
- Software-Defined Memory in Warehouse-Scale Computers (ACM) — when you’re Google, you invent new types of memory. In this case, a cheaper, but slower, “far memory” that is slower than DRAM but faster than Flash. Of course you do!
- ZetaSQL — Google’s SQL parser and analyzer. Cf Apache Calcite. (via Hacker News)
- Wolfram Engine — a locally downloadable Wolfram Engine to put computational intelligence into your applications. The Free Wolfram Engine for Developers is available for pre-production software development.
- Love Your Job? Someone May be Taking Advantage of You (Duke) — people see it as more acceptable to make passionate employees do extra, unpaid, and more demeaning work than they did for employees without the same passion. Which goes some way to explaining why I’ve found passion to be strongly correlated with burnout.
Four short links: 21 May 2019
Computational Socioeconomics, AI on Code, AMP, and Social Media's Effect on Adolescents
- Computational Socioeconomics — In this review, we will make a brief manifesto about a new interdisciplinary research field named Computational Socioeconomics, followed by a detailed introduction about data resources, computational tools, data-driven methods, theoretical models, and novel applications at multiple resolutions—including the quantification of global economic inequality and complexity, the map of regional industrial structure and urban perception, the estimation of individual socioeconomic status and demographic, and the real-time monitoring of emergent events.
- Microsoft Applying AI to Entire Developer Lifecycle — Microsoft looks at three different types of code when gathering data: source code—logic and markup (e.g., structure, logic, declarations, comments, variables), distinct learning from public, org, and personal repositories; metadata—interactions (e.g., pull requests, bugs/tickets, codeflow), telemetry (e.g., diagnostics for your app, profiling, etc.); and adjacent sources—documentation, tutorials, and samples; discussion forums (e.g., StackOverflow, Teams / Slack).
- Report from the AMP Advisory Committee Meeting — We heard, several times, that publishers don’t like AMP. They feel forced to use it because otherwise they don’t get into Google’s news carousel—right at the top of the search results.
- Social Media’s Enduring Effect on Adolescent Life Satisfaction (PNAS) — We found that social media use is not, in and of itself, a strong predictor of life satisfaction across the adolescent population. Instead, social media effects are nuanced, small at best, reciprocal over time, gender specific, and contingent on analytic methods.