How Stress Affects Performance and Competitiveness across Gender

Since many key career events, such as exams and interviews, involve competition and stress, gender differences in response to these factors could help to explain the labor-market gender gap. In a laboratory experiment, we manipulate psychosocial stress using the Trier Social Stress Test, and confirm that this is effective by measuring salivary cortisol. Subjects perform a real-effort task under both tournament and piece-rate incentives and we elicit willingness to compete. We find that women under heightened stress do worse than women in the control group when compensated with tournament incentives, while there is no treatment difference for performance under piece-rate incentives. For males, stress does not affect output under competition. We also find that stress decreases willingness to compete overall, and for women, this is related to performance. These results help to explain previous findings on gender differences in performance under competition both in and out of the lab.


Introduction
Since students, job candidates and workers are often required to compete against peers in stressful settings, understanding differences in the way that men and women respond to competition under stress is vital in explaining the persistent gender gap in the labor market, and especially the under-representation of women in top positions in business, government, and academia. The events that are most influential on one's career-such as job interviews, university entrance exams and asking for promotion-involve competition, and in high-paying and influential positions both pecuniary compensation and prestige are often based heavily on one's performance relative to others. These careerdetermining interactions typically take place under heightened psychosocial stress, for example when job candidates or students are required to speak publicly and are judged in front of committees. We study how stress and competition affect men and women differently, by experimentally manipulating exposure to psychosocial stress and subsequently measuring performance in a real-effort task, under both competitive and non-competitive incentive schemes. We also test whether exposure to stress affects willingness to compete, and whether this differs by gender.
Previous research shows that women are less likely to enter competitive situations than men (Gneezy et al., 2003;Vesterlund, 2007, 2011), and this has been highlighted as a potential explanation for the female wage gap. Lower willingness to compete could make women less likely to enter competitive fields than men with similar ability. Experimental measures of willingness to compete can, for example, partially explain female students' choices to enter less prestigious academic tracks (Buser et al., 2014). A substantial part of the gender wage gap is due not to choice of profession, but to quality of employers and advancement within fields (Card et al., 2016;Cardoso et al., 2016). This is plausibly related to lower willingness to compete, which could also make women less likely to ask for promotions or to apply for jobs that have a competitive application process or competitive compensation scheme (Flory et al., 2015).
In addition to preferences for competing, differences in performance under competition for men and women may also explain labor market outcomes. So far, the evidence is mixed. While Niederle and Vesterlund (2007) show that tournament incentives led to higher performance for both men and women, Gneezy et al. (2003) and Gneezy and Rustichini (2004) find that competitive incentives only increase performance among men. Using data from university exams, Ors et al. (2013) find that female students perform comparatively worse when competing against peers, and similarly Jurajda and Munich (2011) find that women do worse than their male counterparts on entrance exams only when applying to more competitive programs.
Recent work shows that stress affects decision making and preferences (Starcke and Brand, 2012;von Dawans et al., 2012;Cahlíková and Cingl, 2017) and given this, reaction to stressors might also shed light on gender differences in competitive behavior. The physiological and psychological aspects of the stress response can differ, depending on the type of stressor and the gender of the individual (Stroud et al., 2002). While "fight or flight" (Cannon, 1932) is the dominant model for understanding how humans (and other animals) respond to a perceived threat, females, who are less physically adapted to fight off foes and less mobile when caring for offspring, may have evolved a tendency to "tend-and-befriend," by leveraging affiliation with social groups to avoid danger in some situations (Taylor et al., 2000;Taylor, 2006). We hypothesize that this difference in stress response between men and women could lead to gender differences in how stress affects performance under competition and willingness to compete; we designed the study to test this conjecture.
Specifically, we examine the effects of psychosocial stress on performance under tournament incentives and willingness to compete. 1 Psychosocial stress is related to psychological and social well-being, for example when an individual faces threats to status, selfimage or social pressure, and is arguably more relevant for modern daily life than other forms of stress, such as physical stress, in which individuals face physical discomfort or threats to their survival (Dickerson and Kemeny, 2004).
Our design consists of an economic experiment with 95 male and 95 female university students, and uses a modified version of the Trier Social Stress Test for Groups (TSST-G) to manipulate stress (von Dawans et al., 2011). Subjects were assigned to either the stress or control treatment for the duration of the experiment. We measure salivary cortisol-a hormone related to stress 2 -and heart rate, in order to confirm that the stress manipulation was successful for both genders.
Using a laboratory experiment solves two problems that make causal inference of the effect of stress on behavior in naturally occurring competitions difficult. First, it avoids problems of self-selection into competitive and stressful situations. Second, since competitive situations can cause stress (Buckert et al., 2015;Fletcher et al., 2008), this makes it difficult to isolate the effects of stress and competition from one another in observational 1 In particular, we study acute (short-term) stress which has been shown to have psychological, neurological and behavioral effects distinct from those of long-term stress (McEwen, 2012). We concentrate on the former, which is more relevant for the labor market events that motivate this study.
2 While the physiological effects of stress are complex, cortisol levels are released into the bloodstream at greater levels after exposure to stressors, and due to ease in measuring salivary cortisol levels, it is the most commonly used bio-marker for measuring physiological stress response (Hellhammer et al., 2009;Everly and Lating, 2013). data.
The experiment measures the change in performance and willingness to compete under stress using a design based on Niederle and Vesterlund (2007). Subjects were compensated for adding up sets of four, two-digit numbers within a time limit. The payment scheme varied by condition: in the baseline condition, each correctly solved problem was rewarded with a fixed, piece-rate payment. This condition was repeated after the stress/control procedure to reveal the effect of stress on individual performance. Subjects also completed the task under a tournament incentive scheme, in which payoff depends on performance relative to another randomly selected participant. Next, subjects chose a linear combination of the piece-rate and tournament payment schemes, which is our measure of willingness to compete, after Gneezy et al. (2016). Subjects then performed the counting task again and were rewarded according to their choice. Afterwards, subjects made additional competition decisions that allow us to draw more precise conclusions regarding the underlying mechanism through which stress affects willingness to compete. We are able to rule out several channels, including risk aversion and confidence.
Our results show that stress indeed has a gender-specific effect on reaction to competition. Women in the stress treatment perform significantly worse in the tournament than do women in the control group. Interestingly, stress alone does not affect performance, and we find no treatment difference for female subjects' performance in the piece-rate rounds. Rather, the treatment difference in the tournament performance is due to an increase in performance for female subjects in the control group relative to that under piece-rate payment-women in the stress treatment actually perform slightly worse in the tournament. In contrast, men's performance under competition is not affected by stress: we find no statistically significant treatment difference in performance in the tournament. We also find that willingness to compete is lower in the stress treatment, overall. For women, this can be explained by differences in tournament performance across treatment. This is consistent with a theory from the psychology literature that predicts lowered executive function with increased arousal, past a certain threshold (Yerkes and Dodson, 1908). Stress could interact with additional sources of strain on executive function that are present in the tournament and to which women might be more sensitive, such as higher stakes (Ariely et al., 2009;Azmat et al., 2016), social comparison (Schram et al., 2015), or stereotype threat (Schmader et al., 2008).
While our paper is the first of which we are aware to study the effect of stress on performance under competition, it contributes to an emerging literature on the causal link between stress and willingness to compete. Goette et al. (2015) induce psychosocial stress, using the TSST-G procedure, and find no average effect on competitiveness. However, they measure only decisions to compete based on the results of past performance-in this study, we replicate this (non)result. Buser et al. (2016) use an experimental game similar to that in our study to measure willingness to compete, but examine the effects of a physical stressor (putting a hand in ice-cold water), which likely has different physical and psychological effects than psychosocial stress (Baum and Grunberg, 1997;Haushofer and Jang, 2015), and may be the reason that, in contrast to our study, they fail to find an impact on willingness to compete. We argue that our procedure produces a type of stress that is more relevant to labor market outcomes, as psychosocial stress is ubiquitous in professional settings, but physical stress is comparatively rare. 3 Our results imply that women may be at a disadvantage when required to compete in stressful settings. This has broader implications for understanding how men and women approach competition, and adds to the discussion on the persistent under-representation of women in prestigious industries, high-paid jobs and leadership positions in politics and business. These careers involve both intense competition and stress. If women know that they do not perform well under these types of environments, they may decide to stay out. When hiring practices involve more competition or stress than the position itself, our results suggest that this could prevent employers from selecting women who are best suited for the job. If managers introduce competitive incentives as a means of boosting employee productivity, this may have the opposite effect for women in the presence of heightened psychosocial stress.

Design
All subjects completed several incentivized tasks which measure performance under piecerate and tournament incentive schemes and willingness to compete. Our experimental manipulation consists of two treatments applied between-subjects: the stress treatment, in which subjects were exposed to a psychosocial stressor in the form of the TSST-G, and a control treatment. For a timeline of all tasks in the experiment, the stress/control procedures and cortisol measurements see Figure 1.

A. Experimental tasks
We measure competitiveness using a design based on Niederle and Vesterlund (2007) and Gneezy et al. (2016). Subjects completed a counting activity, twice under a noncompetitive piece-rate scheme, then again under tournament incentives, after which they were asked which combination of these compensation schemes they preferred for the subsequent counting round.
The counting activity consisted of a series of addition problems, requiring subjects to add up four two-digit numbers in each. They had two minutes per task to solve as many problems as they were able to. Subjects familiarized themselves with the counting activity in an unpaid practice round; in following rounds correct results were incentivized according to two compensation schemes.
Under the piece-rate compensation scheme, participants earned CZK 25 (about EUR 1) per correct answer. Performance under the piece-rate scheme serves as a baseline measure of ability and effort. Subjects performed twice under the piece rate compensation scheme: once before the stress treatment/control procedure (Task 1, Piece rate before treatment) and once after (Task 2, Piece rate under treatment). Comparing the withinsubject differences in performance in Task 1 and Task 2 across treatments allows us to measure the effect of the stress treatment on performance, controlling for baseline differences in ability.
In Task 3, Tournament under treatment, correct answers were rewarded according to the tournament compensation scheme: each participant was informed that he or she would be randomly matched with another participant in the room (there were always four males and four females present) and that whoever had more correct answers would receive CZK 50 per correct answer, while the subject with fewer correct answers would receive nothing in that task. In case of a tie, each participant received CZK 25 per correct answer, as in the piece-rate scheme. Comparing performance in Task 3 across treatments and with individual's performance in Tasks 1 and 2 allows us to assess how competitive incentives affect performance, and whether this changes by treatment and gender.
In Task 4, Choice of compensation scheme for future performance, subjects chose how they would be compensated before completing the counting portion of the task. They did so by splitting 100 points between the tournament and the piece-rate compensation schemes, as in Gneezy et al. (2016). For each point invested in the piece-rate scheme, they earned CZK 0.25 per correct answer. For each point invested into the tournament compensation scheme, they earned CZK 0.5 per correct answer, but only if they had more correct answers in Task 4 than another randomly selected participant, and received noth-6 ing for each point invested in the tournament scheme if they answered fewer questions. In case of a tie, each point invested in the tournament account was rewarded according to the piece-rate scheme (CZK 0.25 per answer). Thus, if subjects invested all points into the piece-rate scheme, they were paid CZK 25 per correct answer, as in Task 1 and Task 2. If all points were invested in the tournament scheme, they received CZK 50 per question if they answered more questions than a randomly matched partner, 0 if they answered fewer problems and 25 in the event of a tie-as in Task 3. If they invested some points in the tournament scheme and some in the piece-rate scheme, they were paid according to a linear combination of the two compensation schemes. In order to make the decision easily understandable, before making their final choice subjects could experiment with different tournament investments and the resulting payoffs per correct question if they won and lost were displayed.
It is important to note that the choice of compensation scheme in Task 4 cannot be driven by pro-social concerns or beliefs about who self-selects into the tournament, as a subject's performance in Task 4 was compared to the Task 3 performance of another randomly selected subject. This information was highlighted in the instructions, and subjects knew that their decision to enter the tournament did not have payoff consequences for anyone else.
The choice of compensation scheme in Task 4 is our main measure of willingness to compete. To estimate the causal effect of stress, we compare the share of the 100 points invested in the tournament in Task 4 across the stress and control treatments. To determine the underlying mechanism, we implemented two additional tasks, in which subjects competed on past performance. This isolates preferences and beliefs related to performing in a competitive environment (which are relevant in Task 4 but not in Tasks 5-6) from willingness to compete and other beliefs and preferences that are present in all three (Niederle and Vesterlund, 2007).
In Task 5, Choice of compensation scheme for past performance before treatment, subjects again split 100 points between the tournament and piece-rate schemes, but were paid according to performance in Task 1. Subjects were reminded that Task 1 was incentivized with the piece-rate scheme and that it took place in the first room-indicating that it was completed before the stress/control procedure. Additionally, they were reminded of how many problems they correctly solved in Task 1. The decision in Task 5 captures willingness to compete, but, since the decision is made for past performance which occurred outside the stress treatment, preferences for engaging in a competitive activity or (beliefs about) the potential negative effect of stress on performance should not be relevant.
In Task 6, Choice of compensation scheme for past performance under treatment, subjects also split 100 points between the tournament and piece-rate schemes, but were paid according to performance in Task 2. Instructions for Task 6 reminded subjects of their performance in Task 2, that this task took place after the stress/control procedure and that it was incentivized with the piece-rate scheme. Therefore, if stress negatively impacts performance, and thus possibly changes subjective beliefs about relative performance, this should influence the subjects' decisions in both Task 4 and Task 6. However, preferences for engaging in a competitive activity, and (beliefs about) performance in tournaments under stress are only relevant in Task 4. In Task 7, we measure risk preferences using a design based on Dohmen et al. (2010). Subjects made a series of choices between a lottery, which paid CZK 240 or 0 with 50% probability each, and a safe payment. The safe payment varied across choices, gradually increasing from CZK 0 to CZK 240 in steps of CZK 20.
To estimate the role of confidence in competitiveness decisions, we asked non-incentivized questions regarding subjects' perceived rank among all eight participants in the given session for Tasks 1-3.
To limit possible hedging, subjects were informed that only two out of the seven tasks (Task 1-Task 7) would be randomly selected for payment at the end of the experiment.
Full experimental instructions for Tasks 1-7 are available in the online appendix.

B. Treatments
We experimentally induced stress in the laboratory, using a modified version of the Trier Social Stress Test for Groups (TSST-G) (Kirschbaum et al., 1993;von Dawans et al., 2011). This procedure was intended to induce psychosocial stress in the stress treatment, with a control procedure designed to similarly prime subjects yet to keep stress levels constant. The TSST-G has been shown to be the most efficient experimental method of inducing stress, as measured by cortisol response (Dickerson and Kemeny, 2004). The stress treatment protocol consisted of two parts: a public speaking task and a mental arithmetic task. The first part took place immediately before Task 2 and the second part immediately before Tasks 4. The reason for this is to ensure that subjects were under sufficient stress throughout the relevant time period (See Figure 1). In order to minimize the time between the TSST-G and decisions, subjects completed the tasks at computers in the same room, immediately after they finished the stress (or control) procedure. The full protocol for the TSST-G is included in the online appendix.
In both parts of the TSST-T, subjects spoke one-by-one in front of a committee of 2 experimenters, who sat at a table in front of the participants wearing white lab coats.
In order to increase subjects' level of psychosocial stress, the committee did not give any feedback and maintained a neutral facial expression throughout the procedure. The setting of the room is depicted in Appendix Figure A.1. Subjects were separated by dividers and wore headphones with ambient traffic noise during the entire TSST-G procedure, except when speaking to the committee, in order to prevent subjects from hearing others during the stress procedure and potentially developing subjective rankings in ability.
In the public speaking task, subjects were told to imagine a situation in which they had been caught cheating during an important academic examination and that they should defend themselves in front of a disciplinary committee. This scenario required participants to talk extensively about their personal qualities, and they were instructed to do their best. They were interrupted and asked additional questions if they spoke too fluently for too long.
In the second portion of our modified TSST-G procedure, subjects in the stress treatment were again called individually and asked to recite the alphabet backwards in steps of two, starting from a given letter. For example, if given the letter Z, they were required to recite Z, X, V,... Subjects engaged in this activity for a minute and were corrected if a mistake was made.
Our version of the TSST-G changes the standard protocol in several ways. We modified the speaking task to avoid possible priming effects: the original procedure is framed as a job-interview, which could have influenced competitiveness and performance in the experiment independently of the stress reaction. In the second portion of the task, subjects were instructed to recite the alphabet rather than counting in intervals. Likewise, this was done to avoid contaminating performance in the experiment, while still allowing us to use the addition of two-digit numbers as the real-effort task, consistent with previous work. 4 The control procedure similarly primed subjects, and involved a similar degree of physical activity, but in a less stressful setting. Subjects were asked to read an article about academic dishonesty, silently for the first fourteen minutes and then aloud for two minutes. In the second part of the procedure, they collectively recited the alphabet out loud for a minute. Two experimenters were again present in the room during the control procedure, but wore casual clothes and behaved naturally. The subjects in the control group also wore headphones with ambient noise and were separated with dividers, to mimic conditions in the stress treatment group.

C. Sample and procedures
The experiment was carried out in 2014-2015, with 24 sessions in total. Subjects were recruited using a standard recruitment database, ORSEE (Greiner, 2004); no details about the nature of the experiment were mentioned in the invitation in order to avoid selfselection based on relevant personal characteristics, such as aversion to stressful or competitive situations. The stress and control treatments were randomized at the session level, for logistical reasons. Each session consisted of eight subjects, four males and four females, and though the gender composition was not directly mentioned (following Niederle and Vesterlund, 2007), it was easily observable-at the end of the experiment, 80% of subjects correctly reported the gender ratio of the session. The final sample is composed of 95 male and 95 female subjects, primarily undergraduate students (82%), majoring mostly in economics, business and related fields (61%). 5 Decisions were made on computers, using the program z-Tree (Fischbacher, 2007). The experiment was conducted in the Czech language and sessions were administered by one experimenter (male), one assistant (female) and two separate committee members for the TSST-G procedure (a male and a female). The average length of the experiment was slightly less than 2 hours and the average payoff was CZK 516 (EUR 20).
The study was approved by the Internal Review Board of the Laboratory of Experimental Economics in Prague, where the experiment took place. We obtained informed consent from all participants, emphasizing that they were free to leave at any time. At the end of the session subjects in the stress treatment were debriefed on the true purpose of the stress procedure.

A. Physiological stress response
To confirm that the TSST-G test was successful in producing a stress response, we analyze the salivary cortisol samples taken throughout the experiment. Results are presented in Figure 2. Baseline cortisol was measured in sample 1, before the stress procedure, and samples 2 and 3 were taken afterwards (see Figure 1). Cortsiol sample 2 was collected only after the second portion of the TSST-G procedure. However, the cortisol response is typically delayed by 15-20 minutes after initial exposure to the stressor (Kemeny, 2003;Allen et al., 2014), and therefore sample 2 primarily captures the physiological response to the first part of the TSST-G procedure, whereas sample 3 reflects responses to both parts 1 and 2.
While cortisol levels for subjects in the control group actually slightly decrease over the course of the experiment, levels for those in the stress treatment group more than doubled. For men in the stress treatment, cortisol levels in samples 2 and 3 increased by 130 and 113 percent of baseline, respectively (signed-rank test, p = 0.000 for both). For women, there was an increase of 109 percent on average, which remained constant in samples 2 and 3 (signed-rank tests, p = 0.000 for both). While it is difficult to directly compare cortisol levels between men and women due to biological differences, we find no evidence that the TSST-G was relatively more successful in either the males or females (p = 0.200 for the percentage increase between samples 1 and 2 and p = 0.407 for the percentage increase between samples 1 and 3). 6 The timing of the cortisol response is difficult to measure precisely. Given this, we analyze heart-rate data to confirm that the elevated cortisol that we observe in the stress treatment is indeed a result of the TSST procedure, and that subjects remained under heightened stress during the tasks. During part one of the TSST protocol, the heart rate of the stress treatment group increases sharply, and is significantly higher than in the control group (p = 0.000). It stays significantly higher during Task 2 (piece-rate under treatment), Task 3 (tournament under treatment) and during the willingness to compete decision in Task 4 (see Appendix Table A.1).

B. Performance and competitive incentives
We next analyze the effect of stress on performance under competition, and whether this differs by gender. Recall that Tasks 1-4 included a counting activity. Since Task 1 took place before the treatment and was incentivized using the piece-rate scheme, this serves as the baseline for ability and motivation. Performance in Task 2 (piece-rate payment, after treatment) isolates the effect of stress on performance, and Task 3 measure how both stress and competition affect performance. If competitive incentives lead to increased performance-a common assumption-then subjects should be expected to complete more problems in Task 3 than in Tasks 1 and 2.
Results from performance in the counting portions of Tasks 1-4 are presented in the upper panel of Figure 3 and in Table 1. Under the piece-rate incentive scheme in Task 1, there is virtually no difference in the number of correctly solved problems between the treatment and control groups (p = 0.931). This demonstrates that randomization was successful. The same holds for both the male and female sub-samples, independently.
In Task 2 we do not find a statistically significant difference between treatments: subjects in the stress and control treatments correctly answered 6.37 and 6.56 problems, respectively (p = 0.560). As before, this result holds for both the male and female subsamples. This indicates that stress alone does not affect performance in the counting task.
In contrast, for performance in the tournament in Task 3 we see a significant treatment effect, with lower performance among the stress group, who solved only 6.24 problems correctly (sd = 2.98), compared to 7.14 (sd = 2.74) in the control group (p = 0.018). This difference is driven by female subjects: women in the stress treatment correctly solved 5.23 (sd = 2.43) problems on average, compared to 6.60 (sd = 2.08) in the control group (p = 0.003). The corresponding treatment difference for men is less than one third the size, 0.41, and is not statistically significant (p = 0.562).
In Table 2 we confirm this pattern by regressing performance under tournament incentives in Task 3 on a dummy that equals 1 if the subject was assigned to the stress treatment, gender, and baseline performance in Task 1, with standard errors clustered at the session level. 7 We find that the stress treatment lowers performance by 0.84 correctly answered questions on average (p < 0.001). 8 In column 2, we add an interaction term, stress treatment*female, and the results indicate that the effect of the stress treatment is specific to the female sub-sample (p = 0.019). In columns 3-4, we estimate the effects separately for males and females: the stress treatment lowers female subjects' performance by 1.45 questions (p < 0.001), while the coefficient for male subjects does not differ statistically from zero (p = 0.513).
Next, we consider the average difference in the number of problems each subject correctly solved in Tasks 3 and 2. The lower panel of Figure 3 demonstrates that tournament incentives influence performance within individuals, across treatment and gender. Overall, subjects in the control group correctly solved 0.57 more problems under the competitive compensation scheme in Task 3 than under the piece-rate scheme in Task 2 (signed-rank test p = 0.034). This holds independently for both men and women, who answered 0.42 (signed rank test, p = 0.042) and 0.73 (signed-rank test, p = 0.007) more questions correctly in Task 3 than in Task 2, respectively.
However, in the stress treatment, only 25.5 percent of female subjects did better in Task 3 than Task 2, compared to 56.3 percent in the control, and 44.7 percent of women in the stress treatment did worse in Task 3 than in Task 2, compared to only 20.8 percent of those in the control. These proportions differ significantly across treatments (Chi-squared test, p = 0.007). On average, women in the stress treatment solved 0.49 fewer problems in Task 3 than in Task 1, which is marginally insignificant (signed-rank test, p = 0.122).
For men, there is no treatment difference in the proportion of subjects who improved or did worse in Task 3 compared to Task 2 (Chi-squared test, p = 0.453), and men in the treatment group solved 0.23 more problems on average under the competitive incentive scheme than they did under piece-rate incentives in Task 2, though this is not statistically different from zero (signed-rank, p = 0.290).
The regression results also show a clear pattern: in column 5 of Table 2, we regress the difference between correctly answered problems in Tasks 3 and 2, which can be interpreted as the effect of the tournament incentive scheme on performance, on stress treatment and gender. The results indicate that the stress treatment diminishes the positive effect of the tournament incentive scheme by 0.70 questions on average (p = 0.008). As before, in columns 6-8 we see that this is driven by the female sub-sample, and that there is no statistically significant effect for men (p = 0.586). 9 Overall, these results indicate that female subjects perform significantly worse in a competitive setting when under increased psychosocial stress. 10 While competition increases performance in the control group, this is not the case for female subjects exposed to increased psychosocial stress. We do not find that stress, on its own, has a negative effect on performance for either gender: neither men nor women in the stress treatment perform significantly worse in Task 2, compared to either their own Task 1 performance or to the Task 2 performance of subjects in the control group. Moreover, we find that both men and women in the control group respond positively to competitive incentives, as they perform significantly better in Task 3 than in Task 2. However, the combination of stress and competition decreases performance for a large portion of female subjects. We do not find any such pattern for men, whose performance under tournaments is not significantly 9 In Appendix Table A.3 we confirm that these results are stable with respect to additional controls: they do not depend on whether we control for baseline performance in Task 1 and they hold when we control for baseline cortisol levels, baseline heart-rate levels, baseline mood, personality traits (BFI inventory), for potential problems with understanding, for whether women take oral contraceptives and for the phase of their menstrual cycle (which can affect cortisol levels).
10 Note that we do not arbitrarily divide the sample by gender; the study was designed with the principle aim of studying differences in reaction to stress by gender (project proposal available upon request). Moreover, the competition task includes only 6 experimental outcomes (Performance in Tasks 2-4 and Investment decisions in Tasks 4-6), which indicates that the risk of false positives due to multiple hypothesis testing is low. 13 affected by the stress treatment. 11

C. Willingness to compete
We now turn to investment in the tournament payment scheme in Task 4, which serves as our principal measure of willingness to compete. This decision captures both preferences for competitive outcomes, as well as those for engaging in a competitive activity as well as expectations of one's future performance under competition. Recall that in Task 4 subjects allocated 100 points between a tournament and piece-rate incentive scheme before completing the counting portion of the task. The results from Task 4 are presented in Figure 4 and panel A of Table 3. Overall, subjects allocated slightly less than half of the total amount, 46.68 points, into the tournament incentive scheme. We find that stress does indeed affect competitiveness: subjects in the stress treatment invested 7.72 fewer points in the tournament scheme than those in the control group (p = 0.046).
We confirm this result by regressing the points invested into the tournament scheme in Task 4 on the stress treatment dummy. We control for gender as well as baseline performance in Task 1 (i.e. before the treatment intervention) and cluster standard errors at the session level. As reported in column 1 of Table 4 we find that the stress treatment was associated with investing 7.59 fewer points in the tournament scheme (p = 0.024). Consistent with the literature, we find that gender has a strong influence on choices in Task 4, with women investing 25.27 fewer points in the tournament investment scheme than men (p = 0.000). This is confirmed by the regression results in column 1 of Table 4, in which we observe that women invested on average 22.06 fewer points, after controlling for treatment and baseline performance (p = 0.000).
The stress treatment has a similar effect on willingness to compete in both men and women. The negative effect of stress on investments in the tournament that we find on average in Task 4 holds separately for both the male and female sub-samples, though the treatment differences are not statistically significant, due to smaller sample sizes ( Figure  4 and panel A of Table 3). In column 2 of Table 4, we add an interaction term between the female and stress treatment dummies and observe no statistically significant gender difference (p = 0.926). In columns 3-4, we run regressions separately on the male and female sub-samples and find that the coefficients for the stress treatment are virtually identical, though both coefficients are marginally insignificant: p = 0.123 and p = 0.124 for the male and female sub-samples, respectively.
Since most studies use a binary measures to measure willingness to compete, we perform a robustness test in which we classify subjects as competitive if they invest at least 50/100 points into the tournament incentive scheme in Task 4 and estimate the effects of the stress treatment and gender using a probit model. Results are similar to the linear measure (see Appendix Table A.5). We also confirm that results are robust to including additional controls (see Appendix Table A.6).
The decisions in Tasks 5 and 6 provide further insight into the mechanism behind the the treatment effect we find for Task 4, which measures willingness to compete for future performance. In Task 5, subjects decided how much to invest in the tournament payment scheme based on the result of their performance in the counting portion of Task 1 (i.e. under the piece-rate payment scheme and before the stress/control treatment). In contrast to the competition decision in Task 4, we do not find a significant difference between the treatment groups for investment in the tournament in Task 5. On average, subjects in the control group invested 40.19 versus 41.20 in the stress treatment group (p = 0.826, see Panel B of Table 3). We do not find a statistically significant treatment difference for either men or women.
In Task 6, subjects also decided how much to invest in the tournament for past performance, this time based on the results from Task 2 (piece-rate, after the stress/control treatment). As in Task 5, we do not find a statistically significant difference in willingness to compete between treatments. Subjects in the control group invested 41.14 points into the tournament, while those in the stress treatment invested 39.64 points on average (p = 0.702). Results are presented in panel C of Table 3. 12 Since Task 2 was completed after the stress treatment, changes in performance or perceived relative performance in response to the stressor should affect competitiveness in Tasks 4 and 6 similarly. The lack of treatment difference in Task 6 thus suggests that the difference in competitiveness we see in Task 4 is not caused by a difference in perceived ability as a result of the stress treatment alone. Together, the results from Tasks 5 and 6 indicate that the decrease in competitiveness that we see in the stress treatment in Task 4 is related to completing the task both under stress and in a competitive setting, rather than either element alone.
The gender difference in competitiveness that we observe in both treatments in Task 4 holds in Tasks 5 and 6 as well.

Discussion and Additional Results
In this section, we perform a series of robustness checks and additional analysis, with the goal of determining the mechanisms that drive our results.

A. Physical stress response
To begin, we relate the physiological stress response, measured as a relative increase in the salivary cortisol levels between the first and the second sample, to both willingness to compete in Task 4 and to tournament performance in Task 3. We examine the correlation between cortisol response and the outcome and estimate the average treatment effect on the treated, using the stress treatment as an instrument. Results are robust: the stronger the physiological stress response, the lower the willingness to compete and the worse the tournament performance for women (See Appendix Table A.8 and Table A.9). This is evidence that the stress treatment indeed affects behavior through stress, rather than some other channel.

B. Stress and willingness to compete
Next, we consider potential channels through which stress might lead to lower willingness to compete and, particularly for women, how this is related to performance under tournament incentives.
The first mechanism we consider is a change in preferences under heightened stress. Our design allows us to distinguish between willingness to compete for future performance and willingness to compete for past performance. Since we find a treatment difference only for the decision in Task 4 (future performance) but not in Tasks 5 or 6 (past performance), our results seem to rule out an effect of stress on preferences for competitive outcomes. This result is in line with Goette et al. (2015) who also find no effect of stress on competitiveness for past performance. 13 This is in contrast to gender differences: consistent with findings in Niederle and Vesterlund (2007), we find that women are less competitive across all three investment decisions.
Second, our finding that women in the stress treatment perform worse under tournament incentives in Task 3 suggests one of two closely related underlying effects (or a combination thereof): stress may affect preferences for engaging in competition, which may in turn lower effort, or stress may lower the ability of women under tournament incentives. Regardless, our results suggest that women in the stress treatment have a lower willingness to compete due to weaker performance in the tournament. Even though subjects were unaware of the number of questions that others correctly answered, they observed their own performance under both the piece-rate and tournament compensation schemes, and could have based their investment in the tournament on their relative performance in these rounds. To test this, we conduct a sensitivity analysis, which is reported in Appendix Table A.11. The results strongly suggest that for women stress affects willingness to compete in Task 4 principally by affecting performance, and that after controlling for Task 3 performance the stress treatment adds no explanatory power to the regression model.
Alternatively, stress could affect subjective beliefs about performance. To this end, we measured subjective confidence for each round-after the experiment-by eliciting beliefs about rank among the 8 subjects in the session, for Tasks 1-3. Appendix Figure  A.2 confirms that beliefs are highly correlated with the performance results: women in the stress treatment have lower confidence in their performance in tournaments compared to women in the control group, but there is no significant difference in confidence in performance under piece-rate incentives, either before or after the stress/control manipulation. Stress has no effect on men's confidence. In Appendix Table A.12, we conduct a similar sensitivity analysis and find that the stress treatment does not additionally explain (subjective) confidence in tournament performance after controlling for actual performance.
Another possibility is that stress influences competitiveness through risk preferences. Cahlíková and Cingl (2017) find that a similar version of the TSST-G leads to higher levels of risk aversion, especially for men. Since the tournament incentive scheme increases subjects' exposure to risk, greater risk aversion might lead to lower willingness to compete. However, a change in risk-preferences would also affect willingness to compete for past performance, and we do not observe any effect in Tasks 5-6. Moreover, in our sample, we fail to find any significant relationship between the stress treatment and risk preferences elicited in Task 7. In fact, those in the stress treatment actually had slightly higher certainty equivalents than those in the control group on average, though the difference is not statistically significant (p = 0.334). This is consistent within both the male and female sub-samples, ( p = 0.501 and p = 0.698, respectively). 14 These results suggest that risk preferences are not a mechanism by which stress affects willingness to compete in our sample.
To summarize, for men, we do not find evidence that lower ex-ante willingness to compete is related to a change in performance, confidence, risk-preferences, or understanding, and by process of elimination we conclude that lower willingness to compete among men is driven by preferences for engaging in competition under stress. For female subjects, we conclude that psychosocial stress lowers willingness to compete because women under stress perform worse under tournament incentives. Based on this, women react by investing less in the tournament incentive scheme when given a choice.

C. Stress and tournament performance
Why do tournament incentives lead to poorer performance for women in the stress treatment? Holding preferences and beliefs stable, under tournament incentives individuals should exert more effort than in the piece-rate scheme, as long as their marginal cost of effort is sufficiently low and they believe that their relative ability is sufficiently high. This seems to be the case for men and women in the control group, who perform better in the tournament. However, there are also several reasons why performance might decrease under competition. We now consider how these factors might be related to stress and gender, and whether they are plausible explanations for our results.
First, tournament incentives might induce additional stress and there could be a threshold level of stress beyond which performance suffers. However, this seems unlikely to explain our findings. While cortisol levels increase sharply after the TSST-G manipulation for subjects in the stress treatment, the cortisol levels of subjects in the control group continue to decrease over the course of the experiment, despite the fact that control subjects also take part in the tournament. It is possible that cortisol would have decreased by even more in the absence of a tournament, but in any case, this suggests that the effect of the tournament on cortisol-and thus stress-is small. Buser et al. (2016), using a similar design as ours, find that cortisol levels increase on average by 3-5% when subjects perform a task compensated with a piece-rate scheme and by 12-15% when they perform under tournament incentives. In comparison, we find that the TSST-G protocol increases cortisol levels by 109% in women and 130% in men. Additionally, the heart-rate data from our study, which provides a more time-specific measure of stress levels, shows that the response of the stress group to the treatment is much stronger than the control group's response to the tournament in Task 3 (Appendix Table A.13). Any such stress threshold would therefore have to be extremely sensitive in order to explain the effect we observe. Moreover, it would have to be specific to women, as men's performance in the tournament is not affected by the stress treatment.
Second, potentially subjects in the stress treatment were under increased cognitive load as a result of the TSST-G procedure, and this affected either their ability to concentrate on the task or their motivation to perform. If this were the case though, we would also expect to see a difference in Task 2 performance, since this also occurred after the stress treatment. To further rule out this possibility, we run two robustness checks. We had subjects complete a d2 attention test (Brickenkamp and Zillmer, 1998) at the end of the experiment and find no treatment differences. 15 Subjects were also asked to rate their understanding of the experimental instructions in the questionnaire at the end of the session and we find no significant treatment difference (p = 0.291). The treatment difference in Task 3 perfomance remains significant after dropping all women who reported less than perfect understanding (n=53, p=0.016).
Third, the results might be explained by the higher stakes inherent to competition. For example, if a subject was matched with a partner in Task 3 who correctly solved the median number of 6 problems, the difference in payoffs between solving 5 and 7 problems would be CZK 350. In the piece-rate tasks, the same 2-problem difference in performance would only change payoff by CZK 50. This is relevant because large stakes have been shown to have a detrimental effect on performance, causing individuals to "choke" under increased pressure. This is demonstrated by a series of experiments in Ariely et al. (2009), who link results to the Yerkes and Dodson (1908) law, a long-standing principle from the psychology literature according to which arousal increases executive function up to a point, but the relationship is defined by an inverted U-shaped function, and ability declines with increased stimulation after passing a threshold. In other words, performance is worse when stakes are either too high or too low. If psychosocial stress also stimulates subjects, the Yerkes-Dodson law may explain why stress and competition lower performance in combination, but not individually. There is emerging evidence that women might be more sensitive to this effect: Azmat et al. (2016), using a data from non-competitive exams, show that increasing the stakes hurts women's performance, but not men's.
A fourth plausible explanation is that tournament incentives accentuate social comparison, which can be detrimental to performance, as shown by Ashraf et al. (2014) and Schram et al. (2015), who find that the anticipation of being ranked by status leads to worse performance among women, while men actually perform better. Since the counting task in our study might be perceived as a male-dominated activity, it plausibly produces a "stereotype threat" among female subjects, in which they are confronted with perceived negative stereotypes about women and math (Spencer et al., 1999). Even though we do not observe negative effects of tournament incentives in the control group, stereotype threat might further lower executive function already challenged by stress-including working memory, which is crucial for solving the addition problems-resulting in worse performance according to the Yerkes-Dodson law. Schmader et al. (2008) provide a theoretical framework for understanding how stereotype threat (social identity threat) affects performance, and argue that this could be exacerbated by increased cortisol, and "physiological stress response could play a direct role in impairing task performance under stereotype threat" (p.343). Our results are in line with this hypothesis, which offers an explanation as to how the combination of higher stakes in the tournament and psychosocial stress could lower performance for women, who plausibly face a stereotype threat in the counting task, but not for men.

Conclusion
This article presents new evidence on the effects of stress on performance under competition and individual willingness to compete. We experimentally induce psychosocial stress in the laboratory using a modified TSST-G protocol and find that subjects in the stress treatment group are less competitive, investing less in the tournament compensation scheme than those in the control group. However, this is only true when the willingness to compete decision is made before the competitive task. By examining salivary cortisol, which increases after exposure to the stress treatment, we confirm that the treatment difference in the willingness to compete is driven by stress; the cortisol response is negatively correlated with willingness to compete. In the tasks for which subjects made willingness-to-compete decisions for past performance, we find no treatment effect.
For women, we find that performance under competition is worse in the stress treatment than in the control group. While female subjects in the control group perform significantly better under tournament incentives than under the piece-rate scheme, female subjects in the stress treatment actually do slightly worse. It is the combination of stress and tournament incentives which is detrimental to performance, and for women, this explains the lower willingness to compete we observe in the stress treatment group. We do not find such a link among men, for whom there is no treatment difference in tournament performance. The lower willingness to compete among men in the stress treatment seems 20 to be driven by a link between stress and preferences for engaging in competition.
We propose that the most plausible explanation for decreased performance under stress and competition among women is that the counting task was perceived as a maledominant activity, and women therefore faced a stereotype threat. Stress, higher stakes in the tournament and stereotype threat are all factors linked to executive function in the Yerkes-Dodson framework, and the combination of all three could decrease certain abilities, including impairment of working memory, thus affecting performance in the counting task (Schmader et al., 2008).
Our findings help to explain past results regarding the effect of tournament incentives on performance. While some studies have found a positive effect for both genders, others show a positive effect only for men (Niederle and Vesterlund, 2007;Gneezy et al., 2003;Gneezy and Rustichini, 2004). Potentially, the environments differed in the degree of stress involved. Moreover, our results also support the claim made in Niederle and Vesterlund (2010) that gender gaps in math test scores may not necessarily reflect differences in math ability. Especially when test results come from highly-competitive and stressful settings, such as university entrance exams, women's performance may fall below their ability.
While a gender-neutral or female-dominated task might produce different results, the labor market settings that motivate this research, such as business or academia, are similarly perceived as stereotypically male. Many competitive situations that affect one's career trajectory-exams, job interviews and asking for a promotion-are also stressful. If women perform worse under competition and stress, this will directly affect labor market outcomes, and could dissuade women from entering competitive environments in the first place.
If employers make hiring decisions in stressful, competitive settings, our results suggest that this will lead to inefficient outcomes as women may under-perform. This is especially relevant if the position itself is not particularly stressful or competitive, compared to the hiring process. Moreover, a combination of tournament incentives and increased pressure to perform are often used to boost output in firms, and while this incentive structure may be effective in motivating male workers, our results show that such policies could have unintended consequences when applied to female workers.
This phenomenon could produce path-dependence across sectors: if the initial composition of a particular field is dominated by males or females, the optimal management and hiring strategies might differ with respect to stress and competition. Both hiring and selfselection of workers into the field would then exacerbate gender disparities in response to the dominant practices and norms. Furthermore, in organizations where competition is inherent, as in many firms or universities, identifying and taking measures to reduce stress could be a viable policy for improving performance, as (especially female) workers might be performing below their potential. Notes: Mean salivary cortisol concentration (in nmol/l) over the course of experiment, by treatment. "Stress treatment" indicates that the TSST-G stress procedure. Sample 1 was collected prior to the TSST-G stress/control procedure. Sample 2 was collected after the second part of the TSST-G protocol (the counting task) and Sample 3 was collected at the end of the experiment. For details see Figure 1. Bars indicate mean ± standard error.  Notes: Performance in the experiment, by gender. "Stress treatment" indicates that the subject was exposed the TSST-G stress procedure. Task 1 is the baseline conducted before the treatment, with piece-rate incentives. Task 2 and Task 3 took place after the first part of the TSST-G, under piece-rate and tournament incentives, respectively. Task 4 took place after the second part of the TSST-G, after the subjects had chosen their preferred incentives scheme for that round.
(3)  Notes: Mean decisions regarding willingness to compete, across tasks, treatments, and gender. Panel A presents the competitiveness decision for future performance, while Panels B and C present the two competitiveness for past performance decisions. "Stress treatment" indicates that the subject was exposed to the TSST-G stress procedure. All differences are tested using a Wilcoxon rank-sum test. Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is investment in the tournament compensation scheme in Task 4, where the choice was made before the counting portion of the task. 0 indicates all points invested in the piece-rate scheme; 100 indicates all points invested in the tournament compensation scheme. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Notes: Mean beliefs about subject's rank among the 8 subjects in the session, by task and treatment (i.e. 1=highest confidence, 8=lowest confidence). "Stress treatment" indicates that the subject was exposed to the TSST-G stress procedure. Task 3 was completed under tournament incentives under the stress/control treatment, Task 2 was conducted under piece-rate incentives under treatment and Task 1 was completed under piece-rate incentives before the stress/control manipulation. Confidence questions were non-incentivized and were elicited after Task 6. For details please consult the timeline in Figure 1. Bars indicate mean ± standard error. Notes: Mean heart rate during the specified tasks. "Stress treatment" indicates that the subject was exposed to the TSST-G stress procedure. Due to equipment failure, data is missing for 11 observations from the stress treatment and control group each. All differences are tested using a Wilcoxon rank-sum test. Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is the number of addition problems that were correctly completed within the time limit in the specified task. Both Task 1 and Task 2 were completed under piece rate incentives, before and after the stress/control manipulation, respectively. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is the number of addition problems that were correctly completed within the time limit in the specified task. Both Task 2 and Task 3 were completed under treatment, under piece rate and tournament incentives, respectively. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is the number of addition problems that were correctly completed within the time limit in the counting portion of Task 4, before which subjects chose their preferred compensation scheme. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure.  Notes: Probit, marginal effects reported. Standard errors are clustered at the session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is a dummy indicating that the subject in Task 4 invested at least 50 points into the tournament compensation scheme, where the choice occurred before completing the counting portion of Task 4. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is investment in the tournament compensation scheme in Task 4, where the choice was made before completing the counting portion of the task. 0 indicates all points invested in the piece-rate scheme, and 100 indicates all points invested in the tournament compensation scheme. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is the investment in the tournament compensation for past performance. For Task 5 (columns 1-4) competition is based on performance in Task 1, (i.e before the stress/control procedure), and for Task 6 (columns 5-8) on performance in Task 2, (i.e after the stress/control procedure). 0 indicates all points invested in the piece-rate scheme; 100 indicates all points invested in the tournament compensation scheme. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Notes: Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is investment in the tournament compensation scheme in Task 4, where the choice occurred before completing the counting portion of the task. 0 indicates all points invested in the piece-rate scheme; 100 indicates all points invested in the tournament compensation scheme. "Cortisol response" is the percentage increase cortisol between samples 1 and 2. Sample 1 was collected prior to the TSST-G stress/control procedure; Sample 2 was collected after the second part of the TSST-G. For details regarding the timeline see Figure 1. In Columns 2, 4 and 6, a dummy variable "Stress treatment", indicating exposure to the TSST-G stress procedure, is used as an instrument for "Cortisol response". First-stage results are presented in Appendix Notes: Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is the number of addition problems that were correctly completed within the time limit in the counting portion of Task 3, which was completed after the treatment and under tournament incentives. 'Cortisol response" is the percentage increase cortisol between samples 1 and 2. Sample 1 was collected prior to the TSST-G stress/control procedure; Sample 2 was collected after the second part of the TSST-G. For details regarding the timeline see Figure 1. In Columns 2, 4 and 6, a dummy variable "Stress treatment", indicating exposure to the TSST-G stress procedure, is used as an instrument for "Cortisol response". First-stage results are presented in Appendix Table A.10 Notes: OLS, standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable, "Cortisol response", is the percentage increase cortisol between samples 1 and 2. Sample 1 was collected prior to the TSST-G stress/control procedure; Sample 2 was collected after the second part of the TSST-G. For details regarding the timeline see Figure 1. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Here we regress willingness to compete in Task 4 on i) stress treatment, ii) Task 3 performance and iii) both. Since Task 3 performance is endogenous (potentially affected by treatment), one must interpret the coefficients and standard errors with caution. However, comparing coefficients, standard errors and R-squared values across specifications is instructive. While adding Task 3 performance lowers the stress treatment coefficient and increases the R-squared, adding stress treatment affects neither the coefficient for Task  Notes: OLS. Standard errors are clustered at a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is investment in the tournament compensation scheme in Task 4, where the choice occurred before completing the counting portion of the task. 0 indicates all points invested in the piece-rate scheme, and 100 indicates all points invested in the tournament compensation scheme. "Solved Task 3" captures the number of correctly answered problems in Task 3, which occurred under tournament incentives under treatment. "Stress treatment" is a dummy variable indicating that the subject was exposed to the TSST-G stress procedure. Here we regress confidence in Task 3 on i) stress treatment, ii) Task 3 performance and iii) both. Since Task 3 performance is endogenous (potentially affected by treatment), the results should be interpreted with caution. However, comparing specifications shows that while Task 3 performance explains confidence, the stress treatment does not additionally explain confidence. Dependent variable Confidence in Task 3 Perceived rank among 8 subjects in session Sample All Males

Females
(1) (3) (8) Notes: OLS. Standard errors are clustered on a session level. *** p<0.01, ** p<0.05, * p<0.1. The dependent variable is confidence in Task 3, measured as subject's subjective ranking among 8 participants in the session. 1 thus indicates highest confidence and 8 the lowest confidence. "Solved Task 3" captures the number of correctly answered problems in Task 3, which occurred under tournament incentives under treatment. "Stress treatment" indicates that the subject was exposed to the TSST-G stress procedure. Notes: Differences between mean heart rates during the specified tasks. "Stress treatment" indicates that the subject was exposed to the TSST-G stress procedure. Due to equipment failure in we have missing data for 11 observations from the stress treatment and control group, each.