Literature Map

In the rapidly evolving AI landscape, staying at the forefront of research is essential. I developed this force-directed interactive map to synthesize scattered reading notes into a structured, evolving knowledge system.

It captures the latest papers I’ve read along with my personal takeaways. By visualizing the collisions and overlaps of keywords, this map reveals hidden correlations between diverse research areas, such as the intersection of Large Language Models and AI Governance.

Tip: Hover over a keyword node to discover the specific papers associated with that theme.

Selected Readings

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret · 2021

Large Language Model · NLP · Environmental Cost · Hegemonic Data

One of the earliest and most influential critiques of large language models. Many still look up-to-date. Unfortunately, 5 years later, the problems have never been solved.

Most LLM company still refuse to disclose their environmental impact and usage of energy resource, even including Gemini, in their latest documentation of model cards and safety reports. Astonishing data discovered:

While the average human is responsible for an estimated 5t per year, a Transformer (big) model [136] with neural architecture search and estimated that the training procedure emitted 284t of CO2.

Interesting but quite ideal viewpoints to enhance the curation and documentation by LLM company themselves. Even newly release model nowadays are not equipped with precise and clear instructions and model cards. Probably not going to work in a fast-competing AI era for accuracy and latency.

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community

Yunze Xiao, Tingyu He, Lionel Z. Wang, Yiming Ma, Xingyu Song, Xiaohang Xu, Mona Diab, Irene Li, Ka Chung Ng · 2026

Behavior Detection · LLM Benchmark · Social Media · Multilingual Task

Data Annotation, Accuracy Testing, 文化溯源解释cross-cultural transfer patterns

Example of identifying self-harm content on social media and capability of improvable detection mechanism

Methodology: Interesting to distinguish first-person and third-party expressions; Inter-annotator reliability: both pairwise (Cohen’s Kappa) and overall (Fleiss’ Kappa) agreement metrics; Baseline: zero-shot (pre-knowledge) and few-shot (few example) learning

Optimal instruction language may not be the target dataset language, but rather one that activates relevant cultural schema for the specific detection task in question.

?Lack of Inner Detials why Japanese outperform English? Japanese instruction prompts outperform Chinese prompts when processing Chinese content. -- Latent embeddings in middle processing layers already encode appropriate concepts but consistently assign higher probability to English tokens than to target language equiv- alents—even when prompted in non-English languages.