Two coders watched the video and used a custom-made python program
in order to measure the searching time for the N − 1th puppet, the time the child searched for a Nth puppet (within the 8-s time window starting when the N − 1th puppet was placed on the tree), and the time of occurrence of the question closing the trial, with respect to this searching window. Twenty-eight check details of the 324 trials were excluded from analyses, for experimenter error (4), excessive distraction (9), searching for the N − 1th puppet for more than 10 s (9 trials + 1 other trial where searching time for the N − 1th puppet could not be assessed), unclear searching behavior (2), or closing question asked too early (3). Because analyses were meaningful only if a child contributed data both in a trial where the box was expected to be empty and in a trial where the box was expected to contain one puppet, this resulted MK8776 in the exclusion of 0–9 children from the analyses in each experiment. When searching time was measured, the tape was played at slow speed (1/6), and each coder recorded searching by pressing
keys on separate gamepads. The following behaviors were included in the searching time: (1) reaching inside the box (from the moment the child’s hand entered the box to the moment it exited), (2) looking inside the box (from
the moment the child’s gaze was aligned with the opening of the box to the moment the child looked away), and (3) shaking the box to listen for noise (from the moment the child picked up the box to the moment when the shaking ended or the box was returned to the table). One of the coders was the experimenter, who advanced the tape at Farnesyltransferase appropriate places. The other coder was blind to the condition and to the hypotheses. The program recorded agreement between coders by sampling their judgment (search/no search) every 30 ms. If the agreement was under 90% (16/279 trials), a second measurement was attempted, and the most convergent measurement was kept for analyses. For the final sample of 279 trials, the average agreement between the two coders was 98.8% (97.4% if considering only the trials with non-zero searching time). Analyses were conducted on the mean of the searching times measured by the two coders. Results were analyzed using ANOVAs with one between- or within- subject factor for Condition (if appropriate), and one within-subject factor for Outcome (box expected empty vs.