Why Do Two People with the Same Job Earn Different Wages?
Why the Law of One Price Fails in Labor Markets?
The Law of One Price suggests that in competitive markets, identical goods should command identical prices. By analogy, one might expect that identical workers with same skills, education, and experience should earn identical wages, regardless of where they work.
But the empirical reality is starkly different. Two equally experienced workers with the same education and job title may earn vastly different wages depending on where they work. Why does the law fail in labor markets?
The failure of wage equalization reveals market frictions:
- Incomplete information: Workers do not observe all job offers or wage distributions.
- Limited mobility: Geography, family ties, or institutional constraints can limit movement.
- Firm heterogeneity: Employers differ in productivity, market power, and rent-sharing norms.
- Search and matching frictions: Job-finding is not instantaneous, and not all workers see the same set of firms.
Thus, we arrive at an important question in labor economics:
Are wage differences mainly about “who” the worker is, or “where” they work?
The AKM Framework
In their 1999 paper, High Wage Workers and High Wage Firms, John Abowd, Francis Kramarz, and David Margolis (hereafter AKM) formalized a method to decompose wages into:
- Person effect: what a worker earns across jobs due to their intrinsic attributes
- Firm effect: what a firm pays, regardless of who works there
Before the AKM framework, separating how much of a person’s wage was due to their individual skill versus the firm they worked for was a fundamental and notoriously difficult problem.
The primary obstacle is positive assortative matching, or the strong tendency for high-skilled workers to be employed at high-paying, highly productive firms. This matching is not random; it is the result of a two-sided search.
Workers seek out the best firms: Ambitious and talented individuals actively look for companies that offer higher wages, better benefits, and greater opportunities for growth.
Firms seek out the best workers: The most successful and profitable firms compete to recruit and retain the most skilled and productive talent to maintain their competitive edge.
This non-random sorting creates a major statistical hurdle. If you simply observe that “Firm A” pays higher average wages than “Firm B,” you cannot know the true cause. Is Firm A genuinely more generous or productive (a high firm effect), or has it simply managed to hire a more skilled workforce (a concentration of high person effects)?
With traditional data, such as a cross-sectional survey of workers and firms at a single point in time, these two forces are statistically entangled. Any attempt to measure the firm’s true pay premium would be contaminated by the unobserved skills of the workers it hired. This is a classic case of omitted variable bias, where “innate worker ability” is the omitted variable that biases the estimate of the firm’s effect on wages. Without the ability to track the same worker as they move between different firms, we were largely unable to resolve this puzzle.
Their innovation relied on matched employer-employee longitudinal data, allowing them to track millions of workers as they moved across firms over time.
The AKM Model: Decomposing the Wage Equation
The AKM model decomposes a worker’s wage into individual characteristics, a person fixed effect, a firm fixed effect, and an error term.
The model is specified by the following equation:
\[ y_{it} = \mathbf{x}_{it}'\beta + \theta_i + \psi_{J(i,t)} + \varepsilon_{it} \]
Each term is defined as follows:
- \(y_{it}\): The log wage of worker \(i\) at time \(t\). Taking the logarithm allows us to analyze the proportional changes in wages and has the statistical benefit of making the distribution more closely resemble a normal distribution.
- \(\mathbf{x}_{it}\): A vector of time-varying observable characteristics for worker \(i\) at time \(t\). This typically includes variables like potential experience (often as both experience and experience-squared), and other changing attributes. \(\beta\) is the vector of coefficients representing the returns to these characteristics.
- \(\theta_i\) (Person Fixed Effect): This is a core component of the model. It represents the unobserved, time-invariant characteristics unique to worker \(i\). It captures all the portable value a worker carries with them from job to job, such as innate talent, motivation, personal networks, and negotiation skills.
- \(\psi_{J(i,t)}\) (Firm Fixed Effect): The other key component. It represents the unique wage premium associated with firm \(j\) (where \(j = J(i,t)\)), which employs worker \(i\) at time \(t\). This effect is attributed to the firm’s productivity, pay policies, market power, union presence, and other factors that apply equally to all workers within that firm.
- \(\varepsilon_{it}\): A pure stochastic error term. It captures measurement errors and transitory wage shocks, such as a one-time bonus, that are not explained by the other factors. This error is assumed to be uncorrelated with all other variables in the model (\(E[\varepsilon_{it} | \mathbf{x}, \theta, \psi] = 0\)).
Because this model simultaneously controls for fixed effects at two distinct levels (person and firm), it is known as a Two-Way Fixed Effects Model.
Identification Problem: Skill or Good Workplace?
We have the model, but how can we possibly distinguish \(\theta_i\) from \(\psi_j\)? This is the core identification problem.
Consider a simple scenario. Worker Alice at Firm A earns more than Worker Bob at Firm B. What is the source of this difference?
- The Person Effect Hypothesis: Alice is more skilled and productive than Bob (a higher \(\theta_{Alice}\)).
- The Firm Effect Hypothesis: Firm A is a better-paying company than Firm B (a higher \(\psi_A\)).
If no one ever changes jobs, it is impossible to distinguish between these two effects. We cannot statistically separate whether a worker’s high wage is due to their inherent ability or the generosity of their employer. It’s like having a single equation \(x+y=10\) and trying to solve for both \(x\) and \(y\).
The Solution: Worker Mobility
The genius of AKM was to find the key to identification in worker mobility. By tracking wage changes when workers move between firms, we can begin to isolate the two effects.
- Suppose a worker \(i\) moves from Firm A to Firm B.
- The worker’s person effect, \(\theta_i\), is “portable” and is assumed to remain constant across jobs.
- Therefore, the change in the worker’s wage (\(\Delta y_i\)) primarily reflects the difference in the firm effects (\(\psi_B - \psi_A\)).
\[ \Delta y_i = y_{i,t_2} - y_{i,t_1} = (\theta_i + \psi_B) - (\theta_i + \psi_A) + \dots = (\psi_B - \psi_A) + \dots \]
By observing millions of such moves across different firms, we obtain a vast set of relative firm effect differences, like \((\psi_j - \psi_k)\).
The Connected Set
By piecing together these relative differences like a puzzle, we can eventually align the wage premiums (\(\psi_j\)) of all firms on a common scale. However, this is only possible if worker moves form one large, connected mobility set.
For instance, if Alice moves from Firm A to B, and Charles moves from Firm B to C, we can now link all three firms and compare the relative pay premiums of A, B, and C. As long as these chains of moves are sufficiently widespread throughout the economy, the effects for most firms and workers can be identified. In the French dataset analyzed by AKM, over 88.5% of the labor force was part of this single connected network (p.268).
Estimation Strategy: Solving a Massive Problem
The Computational Challenge
While theoretically elegant, estimating the AKM model presents a formidable computational hurdle. The model requires estimating fixed effects for millions of individuals (\(\theta_i\)) and hundreds of thousands of firms (\(\psi_j\)) simultaneously. This is equivalent to a regression with millions of dummy variables, making standard Ordinary Least Squares (OLS) computationally infeasible due to the immense size of the matrix (\(X'X\)) that needs to be inverted.
The Practical Solution: Step-wise Residual Regression (An Application of the Frisch-Waugh-Lovell Theorem)
To overcome this, AKM employed efficient computational methods that break the problem down into manageable steps, applying the principles of the Frisch-Waugh-Lovell theorem. A common approach is as follows:
Step 1: Purging the Person Effect For each worker \(i\), calculate the mean of all variables over time (e.g., \(\bar{y}_i, \bar{\mathbf{x}}_i, \bar{\psi}_i\)). Then, subtract these person-specific means from the original wage equation. This is known as the within-transformation. \[ y_{it} - \bar{y}_i = (\mathbf{x}_{it} - \bar{\mathbf{x}}_i)'\beta + (\psi_{J(i,t)} - \bar{\psi}_i) + (\varepsilon_{it} - \bar{\varepsilon}_i) \] This process mathematically eliminates the person fixed effect \(\theta_i\) from the equation, leaving us with the task of estimating the firm effects \(\psi_j\).
Step 2: Estimating the Firm Effect Using the equation from which the person effects have been removed, one can now estimate \(\beta\) and \(\psi_j\). While still computationally intensive, this reduced problem can be solved using various iterative algorithms.
AKM noted that because person effects explain substantially more of the variance in wages than firm effects, controlling for them first improves the statistical precision of the estimates.
Key Findings from AKM (1999) and Their Implications
1. Person Effects Dominate Wage Variation
- The standard deviation of person fixed effects (\(\theta_i\)) is substantially larger (more than twice as large) than that of firm fixed effects (\(\psi_j\)).
- This means that most of the inequality in wages comes from differences in workers’ portable skills and abilities, while differences in firm pay policies are a secondary factor.
Implication: Policies aimed at reducing wage inequality might be more effective if they focus on education and skill-building (enhancing \(\theta_i\)) rather than solely on regulating firm pay gaps.
2. Inter-Industry Wage Gaps Are About Sorting, Not Firms
The reason high-wage industries (e.g., finance, tech) pay more is not because the firms in them are uniquely generous, but because they sort and hire workers who are already highly skilled (have high \(\theta_i\)).
- Approximately 90% of the wage differential between industries is explained by this sorting of workers.
- In other words, workers in high-paying industries would likely earn above-average wages no matter where they worked.
Implication: A policy to subsidize a specific industry may not benefit low-wage workers as much as intended. The benefits could be disproportionately captured by the high-skilled workers the industry attracts.
4. High-Paying Firms Are Genuinely More Productive
Firms with high firm effects (\(\psi_j\)), those that pay workers more than their estimated market value, also tend to be more productive and more profitable.
- This finding provides strong support for efficiency wage theories (where firms pay above-market wages to motivate workers and reduce turnover) and rent-sharing theories (where firms share their excess profits, or “rents,” with their employees).
Implication: Paying high wages is not just a cost; it can be a strategic decision correlated with superior firm performance.
Conclusion: The Legacy of the AKM Model
Unfortunately, some of AKM’s early empirical findings were later shown to be influenced by approximation errors in solving the high-dimensional least squares problem, as highlighted in follow-up studies (Abowd, Creecy, and Kramarz 2002; Abowd, Lengermann, and McKinney 2003).
Nevertheless, by cleanly separating “who you are” from “where you work,” the AKM model revolutionized the empirical study of wage determination. The framework has become a foundational tool in countless subsequent studies.
- Gender Wage Gap Research: Used to decompose the gap into differences in sorting (men and women working in different types of firms) versus differences in pay within the same firm.
- Labor Market Power: Used to measure the extent of rent-sharing and analyze how the bargaining power between firms and workers has evolved.
- Analysis of Trade Shocks: Used to see if workers displaced from import-competing industries are re-employed at firms with lower wage premiums (\(\psi_j\)).
- Inequality Dynamics: Used to explore whether a worker’s wage growth over their career is due to personal skill growth or to moving to better firms.
The AKM model provided a powerful microscope to answer fundamental economic questions with complex labor market data, and it remains one of the most important tools in labor economics today.
More broadly, AKM marked a turning point in labor economics by treating worker and firm fixed effects not as nuisance parameters but as core parameters of interest. This shift led to the widespread adoption of high-dimensional fixed effects regressions in matched employer-employee data. Rather than relying solely on firm observables like size or sector, economists began using large-scale administrative data to identify which firms systematically pay more or less, and then relate these patterns to observables in a second step.
Reference
Abowd, J. M., Kramarz, F., & Margolis, D. N. (1999). High Wage Workers and High Wage Firms. Econometrica, 67(2), 251–333.