Say cheese! Your Flickr photos are being used in face rec datasets 📸

And an easy way to save you from downloading malware with deep learning

Oct 18, 2019

Hi hi,

I sort of hate “A Deep Dive into AI.” I like the nod to deep learning, but this isn’t for me. Was trying to sort out something with AI and Ada Lovelace. Who has something better?

Story of the Week

If you uploaded photos that show any faces to Flickr in the past, your snaps might be in MegaFace, a dataset of 700,000 pictures that researchers scrapped from the site. The dataset doesn’t include names, but it does keep an identifying number that makes it easy to trace each photo back to the Flickr account, which The New York Times does in this story. The investigation is a pretty extensive look at a bunch of issues, covering how Illinois residents have more rights than most of us because of a 2008 biometric law, the global impact of datasets like these, and how all the images of children on Flickr is especially useful to train algorithms, which have historically had issues recognizing kiddos.

I have loads of photos on Flickr. One was a public, pre-Instagram account to show off my camera skills (which, if I may say so myself, still holds up!). I created another username to back up some non-curated photos around 2007. I rediscovered that account a few years ago, finding some wildly inappropriate images that I would never upload to any cloud service in 2019. I can’t imagine I’m the only one. I also can’t imagine what the algos think of some of it (or the researcher who had to throw some of these NSFW pictures out). Above is one of the shots of me that I hope made it in and confused MegaFace.

More News

A very dexterous robotic hand can now solve a Rubik’s cube. More nimbleness means robots can be put to work doing a lot more. Here is a non-technical write up explaining the feat and its importance.

With a little big sky thinking, you can shoot out 95 percent of a rocket with AI-powered 3D-printers.

AI can make pulling hydrocarbons out of the ground more lucrative. 🙃 🙃 🙃

Patents for AI, how does that work? Answer: we don’t know, and it will probably take a huge lawsuit to sort it out!

AI will narc on Domino pizza makers if their pies are “bad” with a blaring warning and other sorts of tattling.

Pinterest wants to open the black-box algorithms that suggest pins and boards to users. Great effort from the social media company that least needs to explain the tech behind its site.

An algo can spot fights between partners before they happen and step in between as a prescient marital counselor.

Amazon wants police departments already using footage from Ring, their home surveillance cameras, to consider adding face recognition tech to their repertoire.

Not a novel idea, but a well-written essay about how and why researchers are turning to how babies and toddlers learn to guide the next wave of AI.

How cities and governments are—and aren’t—regulating face recognition tech. Here is a video diving into why you should care that you are likely in a “perpetual lineup.”

Old cellphones and audio algorithms are listening for illegal logging. Stopping it, however, will take more than tech.

PyTorch vs Tensorflow: an analysis and thorough breakdown of which framework is on top.

Prospecting ArXiv

Bad actors aren’t really using sophisticated algorithms to break into your computer. They don’t have to innovate since we fall for email phishing attempts and download files we shouldn’t without much encouragement. But machine learning might be able to save us from ourselves a bit. A new paper with the cutesy title “Would a File by Any Other Name Seem as Malicious?” demonstrates how convolutional neural networks can spot potential malware just by a file name.

The researchers point out current methods for looking for malware by checking the data of every file is laborious and sampling doesn’t always catch the bad files (which are outnumbered by 80:1 by normal files). However, their method of looking at file names, a much simpler computational task, is 99 percent accurate. They list caveats, including how the dataset they used is considered “easy” and that motivated attackers could easily avoid this system with some simple renaming, but suggest that in resource-strapped situations or niche use cases, this check could be an easy layer to add to malware detection.

TL;DR: Don’t click on files with names like Sennepsfabrikkernes0, adobe_epic.dll, or servicess.

Datapoint of the week

Berkeley became the fourth U.S. city this week to ban government use of face rec tech.

Quote

“So automation continues to unfold, piecemeal, at companies of every size and stripe. After each micro-automation event within a company, employees are forced out. Some workers are terminated, some quit. Now imagine this happening tens of thousands—even millions—of times over the course of a decade, at varying intervals and varying times of economic stability.”

“There's an Automation Crisis Underway Right Now, It's Just Mostly Invisible” in Gizmodo

***

Until robot hands are spry enough to work this kink out of my neck,

Jackie

The Machinocene Review

Discussion about this post