3

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations

Existing research often posits spurious features as "easier" to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored. Moreover they mainly focus on the end performance intead of …

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

Large language models have the ability to generate text that mimics patterns in their inputs. We introduce a simple Markov Chain sequence modeling task in order to study how this in-context learning (ICL) capability emerges. In our setting, each …