8 Summary
8.1 Overview
This report thoroughly assessed the formal RE model by Beisbart, Betz, and Brun (2021) by numerical investigation. We ran computer simulations for a broad spectrum of model parameters and initial conditions and used four different model variants. In this chapter, we summarize the most important findings with respect to the metrics described in Section 2.3.
Global Optima and Fixed Points
In Chapter 4, we investigated whether fixed points are global optima (GO efficiency) and, conversely, whether global optima are reachable by equilibration processes (GO reachability).
- Overall, GO efficiency is high for semi-globally optimizing models and medium-high for locally optimizing models.
- GO efficiency drops for locally optimizing models with the size of the sentence pool.
- For \(\alpha_A < \alpha_S\), GO efficiency of the
LinearLocalRE
model is as high as of the modelsQuadraticGlobalRE
andLinearGlobalRE
. - GO reachability is low to medium for all models.
- All models except the
QuadraticGlobalRE
model perform worse concerning GO reachability with an increase in the size of the sentence pool. - The
QuadraticGlobalRE
model outperforms all other models on average. - The
LinearLocalRE
model reaches a higher GO efficiency than theQuadraticLocalRE
model, but it is the other way around with respect to GO reachability.
Full RE States
In Chapter 5, we explored whether fixed points and global optima attain full RE states (i.e., global optima for which the theory fully and exclusively accounts for the commitments).
- Overall, the relative share of full RE states among global optima and fixed points is rather low.
- Heatmaps reveal combinations of weights for
GlobalQuadraticRE
,GlobalLinearRE
andLinearLocalRE
, where the relative share of full RE states among the outputs is acceptable. - There is a slight negative trend for the relative shares of full RE states among global optima and fixed points (result perspective) for increasing sentence pool sizes.
- The sentence pool size does not affect the relative share of full RE fixed points (process perspective) of
LinearLocalRE
.
Consistency
In Chapter 6, we assessed different aspects of consistency conduciveness of the model variants.
- The overall relative shares of consistent outputs, inconsistency-eliminating and consistency-preserving cases, as well as consistent unions, are satisfactorily high for all model variants.
- In view of increasing sentence pool sizes,
LinearLocalRE
performs best with respect to all examined aspects of consistency. - There are regions of weight configurations (\(\alpha_{A} > \alpha_{F}\)) that yield desirable behaviour concerning consistency across all model variants.
- A salient “tipping line” in heatmaps of linear model variants marks off regions of weight configurations that yield a fundamentally different behaviour. The analytical results from Appendix A explain these observations.
Extreme Measure Values
In Chapter 7, we investigated whether global optima and fixed points yield extreme values in the normalized measures \(A\), \(F\) and \(S\).
- Overall, there are no surprising observations: Increasing the weight of a specific measure leads to more outputs that maximize the corresponding measure.
8.2 Appendices
The appendices include additional material, which can be used to explain some of the simulation results and which motivates suggestions for further research.
The Tipping Line of Linear Model Variants
In Appendix A, we provide analytical results concerning a “tipping line” in linear model variants that help to explain various observations in the report.
- For \(\alpha_{A} > \alpha_{F}\), global optima of linear model variants always achieve full and exclusive account (\(A(\mathcal{C}, \mathcal{T}) = 1\)).
- For \(\alpha_{F} > \alpha_{A}\), the commitments of global optima of linear model variants are always maximally faithful to the initial commitments (\(F(\mathcal{C}\,\vert\,\mathcal{C}_{0}) = 1\)).
- These results can be generalized to fixed points of the linear model variants.
Note that the “tipping-line behaviour” we observed in the simulation results for the linear model variants concern their performance with respect to the various validation metrics and not which global optima and fixed points are reached. In other words, in each of the two regions (\(\alpha_{A} > \alpha_{F}\) and \(\alpha_{F} > \alpha_{A}\)), global optima and fixed points will generally depend on the \(\alpha\)-weight combinations. Otherwise, we would have observed the tipping-line behaviour in all results for the linear model variants, which we didn’t.
The described restriction of the tipping-line behaviour is essential because, without this restriction, we could formulate a substantive objection against using the linear model variants. If global optima (and fixed points, respectively) would only depend on whether \(\alpha_{A} > \alpha_{F}\) or \(\alpha_{F} > \alpha_{A}\), and, accordingly not change within these regions, the model would fail to represent different decisions as how to balance account and faithfulness in reaching reflective equilibria—at least, the decision would be trivialized into a binary decision. However, the whole idea of using the proposed achievement function with \(\alpha\) weights on a continuous scale is to allow for a fine-grained spectrum of balancing the different desiderata.
Trivial Endpoints
In Appendix B, we analyzed whether the model variants yield “trivial” outputs—that is, global optima or fixed points that consist of singleton theories and commitments.
- Overall, the relative share of trivial global optima and fixed points (result perspective) is very low for the quadratic model variants.
- Linear model variants exhibit substantially more trivial global optima, but the relative shares are still low.
LinearLocalRE
exhibits a substantial share of trivial fixed points from the process perspective but not from the result perspective.- The relative shares of trivial global optima or fixed points tend to decrease with increasing sentence pool sizes.
- In quadratic model variants, the \(\alpha\) weights have only a small impact on the relative shares of trivial endpoints.
Alternative Systematicity Measures
In Appendix C, we motivated alternative systematicity measures in view of shortcomings of the original systematicity measure in Beisbart, Betz, and Brun (2021). We discussed their advantages and disadvantages in terms of various desiderate for such measures (see Table C.1 for an overview).
One sophisiticated systematicity measure (Section C.3.1) is able to satisfy five out of six desiderata, but no proposed measure is able to satisfy all six of them. In view of the only intuitively motivated desiderata and the lack of simulation data, we conclude that these results are preliminary. In particular, they do not prescribe to replace the original measure of systematicity.
8.3 Conclusion
The results we arrived at are insufficient to draw general conclusions about the overall performance of the four analyzed model variants. Neither did we find conclusive evidence to exclude one model as generally inadequate, nor did we identify one model that outperforms the others in all aspects. Instead, each model variant meets some of the validation criteria to a sufficient degree within some ranges of simulation setups. In cases where a model variant performs poorly on average (over the spectrum of simulation setups), the others did as well. In other words, the performance of a model depends crucially on the specifics of the simulation setup (e.g., the chosen dialectical structure, sentence pool size, \(\alpha\) weights and initial commitments) and the evaluation criterion at hand.
This does not mean there are no differences between the model variants. Instead, in a specific context of using the RE model, there might be good reasons to prefer some model variant over the other. This is because the context might fix certain specifics of the simulation setup and provide independent reasons for them. Similarly, the context might give us a more nuanced picture of the relative importance of the different validation criteria. In light of such specifications, the results we presented can be used (possibly in combination with additional analyses) to choose a specific model (or at least exclude some).
For instance, the context might prescribe a limited range of \(\alpha\)-weight combinations. In other words, there might be independent reasons of how to balance account, faithfulness and systematicity. We already saw that a model’s performance is often highly sensitive to the chosen \(\alpha\) weights. Within this region, one might repeat all those dependency analyses we only averaged over all \(\alpha\)-weight configurations (e.g., a model’s performance in dependence of the sentence pool size). Then, it can (and will) still happen that the models perform differently with respect to the different validation criteria (consistency, reaching global optima and full RE states). However, that only means that there is a trade-off between these metrics. In other words, in addition to balancing account, faithfulness and systematicity, there is a balancing of those desiderata that are connected to the used validation criteria.
From this perspective, it is perhaps not that surprising and worrisome that the described results are mixed but in perfect agreement with central ideas about RE.
8.4 Outlook
In many ways, this technical report is but a starting point for future lines of research. In the following, we describe some promising and pressing issues that call for further research.
Note that the current Python implementation of the model is designed to facilitate extending the model (as demonstrated by the three model variants used in this report). Various components of the formal model, for instance, the measures account, faithfulness, and systematicity can be changed with a few lines of code (source).
8.4.1 The Neighborhood Depth and the Search Strategy of Locally Optimizing Model Variants
The local model variants examine available candidate positions for adjustments during RE processes in a small neighborhood of the current position. For this report, the search depth was confined to adjusting one single sentence per adjustment step. A particular shortcoming of such small neighborhood depths is that they may “miss” sensible adjustments that involve arguments with more than one premise.1 In particular, the adjustment of theories might be severely restricted.
It is important to note that considering larger neighborhood depths reintroduces an exponential growth of the search space depending on the size of the sentence pool. One might, therefore, worry that enlarging the neighborhood depth defies the original motivation to use locally optimizing models—namely, providing a model that works computationally feasible with larger sentence pools.
In view of this and additional reasons, it is worthwhile to devise and analyze locally optimizing models that implement other search strategies for finding subsequent epistemic states. For instance, the process might mimic a random walk, or we might allow the model to “backtrack” different branches, enabling them to avoid dead-ends (i.e., mere local optima).
8.4.2 Alternative Systematicity Measures
The measure of systematicity in the original formal model of Beisbart, Betz, and Brun (2021) has a shortcoming, as it does not discriminate between singleton theories on the basis of their scope (for formal details, see Section 7.1).
In Appendix C, we discussed several alternative suggestions to define systematicity and began to analyze them with respect to some intuitive criteria. These preliminary considerations should be complemented with the exploration of simulation results of corresponding model variants.
8.4.3 The Inferential Density of Dialectical Structures
We did not analyze the performance of the model variants in dependence on the inferential density of the randomly generated dialectical structures (for the definition, see Section 2.4.3). One reason for this omission was the worry that the generated 50 dialectical structures per sentence pool hardly correspond to a representative sample of dialectical structures. Accordingly, we did not analyze whether and to what extent model outcomes depend on properties of the dialectical structure other than the sentence pool size. Hence, it may be interesting to treat, for instance, inferential density as an independent variable to gain new insights about the model’s behaviour.
8.4.4 Extrapolation to Larger Sentence Pools
We considered only a confined range of sentence pools with few sentences (\(12\), \(14\), \(16\) and \(18\)). As it stands, the results of this report provide no solid basis to extrapolate our findings to larger sentence pools. Such results are, however, needed since it is pretty clear that applications of the formal RE model to somewhat realistic cases will involve much more sentences.2 It is, in particular, important to know whether and under which conditions locally optimizing model variants can reach global optima since a semi-global optimization is computationally infeasible with larger sentence pools. In these cases, some form of local optimization has to take over. However, the prospects of using locally optimizing models have to be evaluated carefully beforehand. To arrive at better estimates, one would need dedicated ensembles of simulations comprising larger sentence pools that simultaneously allow the calculation of global optima as reference points.
Results that might suggest such a shortcoming of local model variants can be found in Figure 4.4 and Figure 4.5.↩︎
For instance, the reconstruction of Thomson’s famous “The Trolley Problem” (2008) by Rechnitzer (2022) involves 25 (unnegated) sentences. This would amount to the daring task of considering \(3^{25}\) (roughly 850 billion) candidates per commitment adjustment step in an RE process with a semi-globally optimizing model variant.↩︎