The 174th Block: Language, under-represented language, and AI
Read to the end about a close encounter of the athletic kind
This week…
Your reading time is about 6 minutes. Let’s start.

I'd like to recommend InfoEpi Lab. Headed by E. Rosalie Li, the newsletter intersects public health, national security, information disorder, and countering malign influence. Several stories shared in this edition of The Starting Block come from InfoEpi Lab’s Lab Notes, Issue No. 1.
And now, a selection of top stories on my radar, a few personal recommendations, and the chart of the week.
ICYMI: The Previous Block covered stories from Southeast Asia. CORRECTION NOTICE: None notified. Although, my apologies for having the newsletter go out hours before its typical time – instead of scheduling it to post at a later time, I clicked post immediately.
Hong Kong’s new public enemy: the Cantonese language
Mary Hui for Quartz:
Hong Kong’s national security police has put opposition politicians behind bars, chased activists into exile and threatened them with bounties, atomized civil society, and decimated the Hong Kong independent media. Now, it has a new target: the Cantonese language.
Gongjyuhok, a Hong Kong advocacy group that promotes the use of Cantonese, announced on Monday (Aug. 28) it is shutting down after national security police last week entered the founder’s former home, where his relatives now live. The group—whose name translates to “Cantonese study”—was founded in 2013 with the mission of “protecting the language rights of Hong Kong people.”
In a statement (link in Chinese), Gongjyuhok founder Andrew Chan said authorities conducted a warrantless search of the home and accused the group of violating Hong Kong’s national security law by publishing a fictional story.
In an email to Quartz, Chan confirmed that the story in question is “Our Time,” by an author named Siu Gaa. It was one of 18 shortlisted entries in a 2020 writing competition hosted by Gongjyuhok and sponsored by the Hong Kong government. Citing legal pressures, Chan took down the story from the Gongjyuhok website, but an archived version (link in Chinese; translation here) is still available.
“The struggle between man and totalitarianism is the struggle between memory and forgetting.” Uff.
Scale AI is on a hiring spree for speakers of under-represented languages
Andrew Deck for Rest of World:
Scale AI, one of Silicon Valley’s most prominent training data companies, is currently hiring for nearly 60 contract writer roles across dozens of languages. Each job listing claims the work is for a project to train “generative artificial intelligence models to become better writers.” The languages include Hausa, Punjabi, Thai, Lithuanian, Persian, Xhosa, Catalan, and Zulu, among many others. Six job postings, under the category “experts,” are looking to hire writers specifically for regional South Asian languages, including Kannada, Gujarati, Urdu, and Telugu.
There are significant pay disparities between the languages, with Western languages commanding as much as 15 times more than those from the Global South. For example, the job posting for German writers pays $21.55 per hour, compared to a posting for an expert in Telugu that offers just $1.43 per hour.
Langsung tidak mengejutkan sesiapa.
VG launches true crime series, uses AI to animate re-enactments of actual events

Håvard Kristoffersen Hansen and Martin Frogner on INMA:
These old cases have a very limited video supply. How could we effectively portray a robbery and a cocaine smuggling operation with such limited visual resources?
At first, we started using traditional reconstruction with actors, which are commonly used in many true crime series. We were about to hire an animator, but then we discovered AI animation.
Instead of using traditional animation, we trained a model with reference images. Then we used video we filmed and generated ourselves, which in turn generates the animations that illustrate the definable events in the criminals’ lives.We found different styles we liked, then used the AI-generator programmes Stable Diffusion and Midjourney to create hundreds of images. These went into a dataset that was then used to create a LoRA (LoRA models apply tiny changes to a standard checkpoint; several can be used at the same time to guide the AI to a preferred style) for use in Stable Diffusion.
This meant we could generate images in the specific style with simple prompts (the text describing what you want AI to generate). We experimented with several checkpoints before we settled with one that gave us a good result.
At first, we had hoped to only use text-to-video (a prompt that describes what you want to generate video of). The plan was to create all our AI footage from text prompts, but due to time constraints we opted to use some stock footage and filmed some sequences to help guide the AI to where we wanted it.
Some of the guiding videos were created with text-to-video with the AI generator programme RunwayMLs Gen 2, which allowed us to write “people at airport,” for instance, and the programme created a video sequence that showed that. You can also upload a reference photo so Gen2 will understand what kind of style you prefer.
Someone explain the fixation on true crime, which seems to have a global reach.
What I read, listen, and watch…
I’m reading The Oversight Board’s decision to overturn Meta’s decision to leave up a video on Facebook in which Cambodian Prime Minister Hun Sen threatens his political opponents with violence.
I’m listening to CBC’s Sparks “Butterfly Effect” episode on the audio recorder. A pretty homage.
I’m watching a real diplomat review Netflix’s “The Diplomat.” I think I’ll give the show a go then.
Other curious links:
“With five old phones and some Pew data, the BBC’s Marianna Spring monitors social media from the inside” by Sophie Culpepper for Nieman Lab.
“How folk remedies can fuel misinformation” by Katrine K. Donois and Hassan Vally for The Conversation.
“Americans aren’t sure what’s true in this health misinformation age” by Darius Tahir for Poynter.
“To report fully on climate change, journalists need to integrate Indigenous knowledge into their coverage” by Jennifer Thornhill Verma for Reuters Institute.
“Elon’s Twitter is coming for your biometric data and employment history” by Kyle Barr for Gizmodo.
“Motivos para recuperar el optimismo tecnológico. Razón, aquí” ($) por Jaime Rubio Hancock en El País.
« La salive, entre désir et dégoût » ($) par Maïa Mazaurette dans Le Monde.
Chart of the week
In a Reuters exclusive, Katie Paul and Steve Scherer show that Meta’s Canada news ban fails to dent Facebook usage:

And one more thing
Gannett suspends AI-powered local sports reporting because “each formulaic article is riddled with laughably vague statements,” reports Maggie Harrison for Futurism. Several reports feature the phrase “close encounter of the athletic kind.”
Imagine parents scouring the papers for articles about their kids’ games to print or cut out (as proud parents do) and not finding their kids’ names; just box scores written in full but strangely phrased sentences.