To refer to this article use this url: http://ctoz.nl/vol84/nr02/a03

# Material and methods

####
*Simulated character sets*

next section
*Reference topologies.* Data was simulated on three topologies of 20 ingroup OTUs and one outgroup OTU and a total of 18 branch length settings (Fig. 5). One of the reference topologies is fully pectinate (*i.e.*, fully asymmetrical), a second is fully symmetrical for the ingroup (balanced) and a third is an equiprobable (randomly generated) tree, intermediate in asymmetry between the two others (Fig. 5A-F). In this paper, branch lengths represent evolutionary time, but given that we simulated characters using Brownian motion (see below), branch lengths also reflect expected character variance (Felsenstein, 1985).

*Reference trees.* Three outgroup branch lengths were used on each of the reference topology, leading to nine ultrametric reference trees (Fig. 5A-C). Nine other branch lengths differing in both ingroup and outgroup were specified on the equiprobable tree with taxa of various geological ages, leading to 9 additional non-ultrametric reference trees (Fig. 5D-G). For six out of these nine trees, all terminal and internal branch lengths of the ingroup were set to one, and the outgroup branch was set at zero, one, three, five, six tenths, or the full tree depth (Fig. 5D). Three out of the nine trees (Fig. 5E-G) were generated by modifying the ingroup branch lengths on the equiprobable tree with the outgroup branch set at zero length (actual ancestor).

*Simulated matrices.* From each of the nine reference trees illustrated in Fig. 5A-C, 100 matrices of 100 characters × 10 states were simulated (*i.e.*, a total of 900 matrices). Similarly, 100 matrices of 100 characters × 10 states were simulated for each of the nine paleontological trees illustrated in Fig. 5 D-G (*i.e.*, a total of 900 paleontological matrices). We thus simulated a total of 1800 matrices, which were produced using Mesquite and the scripts in Supplementary Online Materials 1 (S1) on data that were discretized using Excel spreadsheets (S2 for parsimony; S3 for 3ta) and that were compiled into S4 (which are the matrices in the parsimony format).

*Character coding*. The characters were simulated with continuous Brownian motion in Mesquite (Maddison and Maddison, 2014) to represent data inherently ordered as morphoclines (such as size or shape characters). Simulations were made using this evolutionary model because it is one of the simplest and most widely used in evolutionary biology to study the evolution of continuous phenotypic characters. For example, phylogenetic independent contrasts (Felsenstein, 1985) and squared-change parsimony (Maddison, 1991) assume this model. Characters simulated through Brownian motion are continuous; they were then discretized into 10 equal intervals representing character states, in order to simulate morphoclines following the simple procedure described in Laurin and Germain (2011). Because Brownian motion has no tendency, the resulting distribution is Gaussian; thus, gap coding cannot be used, and the limits between states are arbitrary. The primitive condition is determined by the outgroup criterion. The variable outgroup branch lengths allow us to assess the influence of polarization errors, whereas the variable ingroup branch lengths allow assessment of the impact of geological age of ingroup taxa on tree resolution (presumably by altering support of the clade subtended by the various branches), thus enabling a comparison of paleontological and neontological datasets. Each of the 100 characters (in the 1800 matrices) was coded in three different ways corresponding to unordered parsimony, ordered parsimony (with linear character states) and 3ta.

#####
*Tree searches*

Characters were analysed with the three methods. Thus, 5400 phylogenetic analyses were performed (three analyses for each of the 1800 matrices).

Cladistic analyses were performed with PAUP* 4.0b10 (Swofford, 2003) for both ordered and unordered parsimony. Three-item analyses were also performed on PAUP* 4.0b10 but parsimony matrices were transformed into three-item statements matrices with fractional weighting (Mickevich and Platnick, 1989; Nelson and Ladiges, 1992) using LisBeth 1.3 (Zaragüeta et al., 2012). All analyses were done with a heuristic search of 50 replicates (a number of searches that our preliminary analyses suggested was sufficient to recover all optimal trees for our matrices), using the TBR algorithm. A strict consensus (Sokal and Rohlf, 1981) of all optimal trees was then constructed using PAUP*.

#####
*Tree comparisons*

The optimal tree yielded by the data analyses (or strict consensus tree when several optimal trees were found) were compared with the reference phylogenies, to assess accuracy of the results. We compared the behaviour (resolving power and artefactual resolution) of ordered parsimony, unordered parsimony and three-item analysis (3ta) in phylogenetic inference, as recently done by Grand et al. (2013).

Several methods are available to compare unrooted trees (Robinson and Foulds, 1981; Estabrook et al., 1985), but we prefer using rooted trees because they are the only classificatory structures which can convey unambiguous information about phylogenetic relationship (Rohlf, 1982). An unrooted tree conveys information, but it is ambiguous in the sense that it is compatible with several alternative relationships between taxa. Tree comparisons were performed with the ‘Inter-Tree Retention Index’ (ITRI) proposed by Grand et al. (2013) to measure the degree of congruence between two trees. The ITRI is based on the proportion of relationships (*i.e.* 3ts relationships between three OTUs, irrespective of their relationships with other OTUs) that are common to two trees. An advantage of this method is that the computed tree-to-tree similarities are not symmetrical (*i.e.*, the proportion of relationships is either the proportion for one tree or for the other). It is thus possible to discriminate between resolving power (power to ﬁnd correct relationships) and artefactual resolution (incorrect relationships), by comparing trees obtained from simulated data with the tree used to generate these data. ITRI is equivalent to the retention index (RI) (Farris, 1989; Archie, 1989), calculated as a proportion of 3ts (Kitching, 1998). The ITRI is deﬁned as:

Where X stands for the sum of fractional weighting (FW) of 3ts implied by a character compatible with a given tree; n is the sum of FW of all 3ts obtained from the character. In each pairwise tree comparison, strict consensus trees were used to summarize congruence between results of an individual analysis (on a single matrix of 100 characters). See Grand et al. (2013) for a more detailed explanation. A strict consensus tree was built whenever analysis of a matrix produced more than one tree.

Our results are presented without mention of the number of optimal trees because the resolving power and artefactual resolution are the direct expression of data congruence and of the ratio homology/homoplasy, and that the ITRI expresses it well. Generally, a strict consensus tree of a thousand trees will show lower resolving power and artefactual resolution than a consensus of ten trees, because of data ambivalence. But there is always the possibility that an analysis will result in one false tree versus another analysis with one hundred slightly better trees. Kearney (2002) argued that ‘resolution of relationships is obviously a goal of phylogenetic analysis’, but her results show that phylogenetic performance cannot be measured by the number of trees. To avoid confusion with artefactual resolution, we complete her sentence by stating that it is *correct* resolution which is a goal of phylogenetic analysis. We think it is a necessary adjunction to remove the ambiguity between resolution and number of trees. Correct resolution does not display a simple relationship with the number of optimal trees.

#####
*Testing the results*

Each consensus tree summarizing a phylogenetic analysis is compared with the reference tree using the ITRI to assess resolving power and artefactual resolution yielded by each method, each set of outgroup branch length, and each topology. Resolving power is calculated as the proportion of FW for 3ts of the reference tree that are also present in the consensus tree, and can be understood as the proportion of ‘true information’ the analysis has retrieved. Artefactual resolution is calculated as the proportion of FW for 3ts of the consensus tree obtained from an analysis that is not present in the reference tree, and this represents artefactual resolution yielded by the analysis. We calculated means of ITRI (*i.e.*, we got a mean value for resolving power artefactual resolution on each reference topology). The statistical significance of differences in resolving power and artefactual resolution of the various methods of analysis was tested with the Wilcoxon signed-rank test (for paired samples) to compare means of ITRI, because there was no sample with normal distribution (Shapiro-Wilk test) and homoscedasticity of variances (Fisher test). Differences in topologies and branch lengths were tested using a Mann-Whitney test, and linear regressions were performed to test tendencies were the external/internal branch length ratio varies, as in trees D, E, F and G (Fig. 5). Because many comparisons were made, we use the false discovery rate procedure (Benjamini and Hochberg, 1995) to assess the statistical significance of the differences in performance.