In this article, Dominik, the key initiator behind the last Sterzhaw’s longsword tournament mode, discusses it in more detail and explains the logic behind it!
Table of Contents
- The Ruleset used at Sterzhaw 2022
- Discussion
- Outlook
- Sources and Links
- Data
by Dominik Vucak and Gerhild Grabitzer
The Ruleset used at Sterzhaw 2022
The official ruleset by the ÖFHF (association HEMA Austria) already offers a lot of flexibility to organizers, consisting of modules that can be put together individually for any sanctioned tournament.
However, we wanted to change some key features to provide the HEMA Hivemind™ with food for thought on two main subjects, which will be discussed in this article. We felt it most fruitful to only change what we needed without overcomplicating matters for fencers and referees alike. To ensure a smooth procedure a „training“ tournament for judges’ assistance (time takers, writers etc.) was held prior to the Sterzhaw itself. It is also worth noting, that this piece adresses the „open“ tournament category. The beginner’s tournament was held with ÖFHF standard rules.
Our main points:
- In contrast to many (often unarmed) martial arts, armed confrontations are usually not zero-sum games. This affects pressure testing of the skills associated with armed martial arts and is reflected in Sterzhaw’s scoring system.
- Tournaments have many options to enact fairness upon their participants. Fairness is important for a healthy competitive environment and is reflected in Sterzhaw’s tournament organisation.
Scoring
In most HEMA longsword tournaments fencers usually earn points by hitting their opponent, ending the current exchange. Specific rules define what constitutes a „hit“. It can be a thrust, cut, grappling etc. The first fencer exceeding a point threshold wins the bout.
Instead of winning/losing individual bouts, each exchange was treated separately, resulting in a (often negative) score that was carried over until the end of the tournament at Sterzhaw 2022. At the end of the day, the fencer with the highest score averaged to the number of their exchanges won the tournament. Each bout consisted of three exchanges with a maximum duration of 30 seconds per exchange.
Event | Points for Alice | Points for Bob |
Alice hits Bob | +1 | -2 |
Bob hits Alice | -2 | +1 |
Double hit | -2 | -2 |
Time Out (30 sec) | -2 | -2 |
Tournament Procedure
The fencers were distributed randomly across four pools (each with their own arena). Then the participants engaged in an all-play-all within their respective pools. This was held in parallel across all four arenas, maximizing the number of exchanges in the given timeframe. After each round, a certain number of fencers per pool migrated to another arena, so new pools were formed. This guaranteed that all fencers could fence with a large number of different partners and had overall little down time.
After the pool rounds, which consisted of 648 individual exchanges between 26 fencers, the top 4 fencers were selected for finals and put into a separate finals pool, again all-play-all. When it was time to crown a winner, the points from all the previous pool rounds carried over.
To sum up; the biggest changes from the ruleset most fencers knew at the time, were the point deductions for receiving a hit or double hit, the three exchanges instead of a classic match, and the fact that points throughout the whole tournament were carried over and averaged at the end.
Discussion
In the following section, we discuss in more depth what distinguished this tournament from many others held in Austria.
Double-Hit Penalty
Firstly, at Sterzhaw, bilateral hits were not counted separately from clean hits. A bilateral hit is the umbrella term for either a double hit (both fencers hitting each other within the same tempo) or an afterblow (the fencer scoring a hit is hit back by their opponent within a predefined timeframe).
In accordance with ÖFHF rules, at Sterzhaw no afterblow rules were employed, so each hit was either a clean one or a double.
While bilateral hits are not necessarily a bad thing, too many are generally considered an indicator of unskilled, hazardous fencing. Some tournaments combat this with extra rules punishing bilateral hits. This counts bilateral hits separately from clean hits and can lead to strange outcomes e.g. someone winning a match despite having only landed one clean hit.
At Sterzhaw, 10,5% of exchanges resulted in double hits. Considering the overall average of a 27% double hit rate in tournaments which also feature doubles but no afterblows this is quite a good outcome. However, many other variables besides the tournament mode contribute to the number of double hits. Furthermore, the variation in the rate of double hits is very high over different tournaments ranging from very few to a lot of double hits. So, it has to be said that the average of 27% has little statistical value. We can state, however, that Sterzhaw 2022 was on the lower end of the spectrum (Sword Stem, 2020)
Non-Zero-Sum Tournament Scoring
Secondly, Sterzhaw II was a non-zero-sum tournament, it featured a continuous scoring system. Since each individual exchange contributed to a fencer’s score a single bout was neither won nor lost. The participants just walk away with a new point score on their ledger. The alternative is a zero-sum game where one party gets something (a won match) and the other party loses something of equivalent value. Instead, in this scoring, the results add up a number making it a “non-zero-sum” game.
Therefore, to come out on top, a fencer’s priority is not “being a little bit better than their current opponent” but receiving the least hits among all participants. To avoid overly passive fencing, a time limit of 30 seconds was set. This is not to say that a fencing style focusing on protecting oneself is “bad”, however, in this sportive setting we aimed to disincentivise not engaging one’s partner to avoid receiving a hit.
To summarize our key elements;
In a zero-sum HEMA tournament, the optimal strategy is usually “hit them more than they hit you”, meaning being hit is only a problem if you don’t hit them back enough. This differs from the usual gold standard of HEMA fencing behaviour “hit and don’t get hit”. This gold standard is what tournaments as tools for developing martial skills should strive to test for, in order to promote excellence.
As soon as we imagine the same exchanges without protective gear, it becomes very obvious why simply hitting them more can’t be the best solution.
Organisation
The following paragraphs explain how fencers were grouped and matches were paired. As mentioned before, Sterzhaw II featured only pool rounds. These pool rounds prioritised the raw number of exchanges against many different opponents to approximate fair play. This also typically negates the benefits of systems trying to produce fair results despite a limited number of competitions, such as seeded pools or a “Swiss-tournament system” (which are difficult to run under time constraints). Having many different opponents not only reduces the significance of a single opponent you face but also produces a more diverse experience for each participant. The tournament, therefore, becomes more robust against odd-match pairings. For example: Being in the same starting pool as Rambo McLongsword, the best fencer to ever walk the earth, won’t decrease your chance of getting into the finals as much since you have to face a higher number of matches against other opponents (who also have to face him). The same is true for a Lemming McSelfdestruct who decreases your score by provoking lots of bilateral hits.
With 216 matches held in the qualification pools, Sterzhaw 2022 did a pretty decent job of approximating a full all-play-all of 325 matches (n*(n-1)/2 with n = 26 competitors), where all competitors would have faced each other. Especially considering that not all pools contained the same number of fencers. In our particular case, this led to some participants facing each other twice and not all getting the same number of exchanges.
With a non-zero-sum approach, a single-elimination style final is not possible. This is not necessarily a disadvantage, as eliminations, while being time efficient, are generally considered to have a low probability of determining the best competitor. To produce a fairer outcome we opted for points to be carried over from the pool rounds to the finals. The downside is that this does not lead to an exciting finale as one with finalists starting from scratch. Since overall results were not published prior to the final pool at Sterzhaw, from a spectator perspective the difference was not that big.
Now, why did we opt for three exchanges per match and not one or 20? To test historical fencers, an easy case can be made to only have opponents fence exactly one exchange per match, as this tests for the martial skills associated with HEMA. However, in practice, three exchanges lead to the ratio of organisational downtime and fencing being on a good level. Additionally, as fencers ourselves, we know that getting a second (or third) chance in scoring a point helps keep frustration levels low.
That being said, there is much room for debate on how to rank tournament participants in a fair way. Especially because there is a stark discrepancy between what is fair in a technical sense (meaning to impart a high probability determination on each rank) and what feels fair to the affected individual. Generally, the fairest modes of ranking consume the most resources e.g. time or manpower.
The significance of a ranking (tournament fairness) is made up of two factors, namely “quality of the measurements” (the certainty as to which fencer is more skilful when comparing two) and the “sample size” (the number of such comparisons). In practice, a compromise needs to be reached between these two.
Erring on the side of larger sample sizes leads to tournaments better fulfilling their secondary role of being a practice playground. If resources are too scarce, a tournament mode might not produce a large enough sample size to determine a fencer’s skills at all.
Also in question is: Should each rank get the same significance or should higher ranks get more attention because they carry more value? With an attempted all-play-all and only a small finale of little overall significance, we opted for the first option. The latter is not only problematic for the development of newcomers, but it can also more easily (earlier in the tournament) lead to matches without meaning to the ranking, which can be a problem for fair play especially when the match is only meaningless for one of both fencers. Tournaments as a tool to fairly rank athletes by their skill should strive for a high probability determination of contestant ranking in order to promote excellence.
The scoring system at Sterzhaw was admittedly arbitrary. Since fencers weren’t used to it, we gave -2 points to discourage being hit (and importantly deal with loss aversion) and +1 to encourage engaging the opponent. This was more done to make the scoring overly obvious. Mathematically speaking it doesn’t matter how many points are awarded for each event. We could assign +100 points for a hit and +99 points for being hit, and the ranking of the fencers would not change, because it is not about points, it is about the ratio of the events hit vs. being hit.
We applied this to the Sterzhaw data, did away with any points, calculated the chance of a fencer emerging unscathed from any given exchange and ranked the fencers again after the new metric. As expected, the ranking stayed the same but the metric (lacking a better name dubbed the “Survival Rate”) for performance is much clearer (see table 2). In the pool rounds, it ranged from 25% to 71%. I think for the future this is a much better-ranking metric than an abstract average point value, as it is more intuitive for people to tally performance.
A further improvement could be made by lengthening the time per exchange from the 30s to e.g. 1min. Despite no timeout being recorded, some participants reported rushed fencing behaviour. This may not only lead to unsafe situations, fencers not being able to probe their opponents means potentially lesser utilisation of complex techniques. Both things martial art tournaments usually don’t want.
Overall we are very proud of having tried something entirely new. The tournament was incredibly fast-paced with almost no downtime for boredom and no fencers standing around because they were eliminated early on in the tournament. We also want to point out that no injuries were inflicted during the tournament!
Gerhild (the organizer) wants to thank Dominik for his continued hard work in coming up with the tournament mode, the many explanations and help in training our amazing helpers.
Dominik wants to thank the whole Sterzhaw organizing team because it was a great event! A big acknowledgement goes to SwordSTEM, a blog about applying scientific principles to HEMA. We strongly encourage you to take a look a look at it. Most of the things discussed in this article and the general shape of the Sterzhaw tournament is taken from there.
Sources and Links:
More on non-zero-sum tournaments and our main sources: https://swordstem.com/2018/05/17/non-zero-sum-concepts-in-tournament-rulesets/ and https://swordstem.com/2019/01/02/fny-rules-are-not-about-single-exchanges/
A deeper dive into this and the importance of significant tournament outcomes: https://swordstem.com/2019/01/02/fny-rules-are-not-about-single-exchanges/ and https://swordstem.com/2018/08/29/does-bracket-seeding-mean-anything/
Loss aversion: https://swordstem.com/2019/02/20/loss-aversion-counting-up-vs-down/
Sign up for our 2023 Sterzhaw: www.sterzhaw.at/sign-up
Review of the Sterzhaw 2022 in general: https://hemagraz.com/2022/12/06/sterzhaw-ii-mehr-klingen-mehr-kernol/
The Data
Here are the anonymised statistics from Sterzhaw. We would like to point out that the Sterzhaw 2022 was only a single tournament with 26 participants which is a limitation to any conclusion and insight being drawn from this data.
Fencer | Rank | Survival Rate [%] | Avg. Points | No. of Exchanges | Hits Taken (incl. Bilat.) | Bilateral Rate [%] | Bilat. / Hit Taken [%] |
Alice | 15 | 42,6 | -0,72 | 54 | 31 | 18,5 | 32,3 |
Bob | 6 | 56,3 | -0,31 | 48 | 21 | 12,5 | 28,6 |
Claire | 18 | 39,6 | -0,81 | 48 | 29 | 6,3 | 10,3 |
Dominic | 3 | 58,8 | -0,24 | 51 | 21 | 9,8 | 23,8 |
Emily | 1 | 71,1 | 0,13 | 45 | 13 | 8,9 | 30,8 |
Florian | 26 | 25,0 | -1,25 | 48 | 36 | 14,6 | 19,4 |
Giulia | 13 | 43,1 | -0,71 | 51 | 29 | 5,9 | 10,3 |
Harry | 25 | 26,5 | -1,20 | 51 | 37 | 21,6 | 29,7 |
Isabel | 20 | 35,6 | -0,93 | 45 | 29 | 17,8 | 27,6 |
Joseph | 9 | 50,0 | -0,50 | 48 | 24 | 2,1 | 4,2 |
Karen | 22 | 33,3 | -1,00 | 54 | 36 | 11,1 | 16,7 |
Lorence | 5 | 58,3 | -0,25 | 48 | 20 | 6,3 | 15,0 |
Martha | 11 | 45,1 | -0,65 | 51 | 28 | 5,9 | 10,7 |
Nick | 16 | 41,2 | -0,76 | 51 | 30 | 9,8 | 16,7 |
Ophelia | 21 | 35,4 | -0,94 | 48 | 31 | 8,3 | 12,9 |
Patrick | 16 | 41,2 | -0,76 | 51 | 30 | 17,6 | 30,0 |
Quinn | 2 | 63,0 | -0,11 | 54 | 20 | 9,3 | 25,0 |
Rick | 3 | 58,8 | -0,24 | 51 | 21 | 7,8 | 19,0 |
Sandra | 6 | 56,3 | -0,31 | 48 | 21 | 12,5 | 28,6 |
Tim | 24 | 27,1 | -1,19 | 48 | 35 | 14,6 | 20,0 |
Ursula | 13 | 43,1 | -0,71 | 51 | 29 | 7,8 | 13,8 |
Victor | 12 | 44,4 | -0,67 | 54 | 30 | 9,3 | 16,7 |
Whitney | 18 | 39,6 | -0,81 | 48 | 29 | 4,2 | 6,9 |
Xavier | 23 | 27,5 | -1,18 | 51 | 37 | 11,8 | 16,2 |
Yvonne | 8 | 51,0 | -0,47 | 51 | 25 | 11,8 | 24,0 |
Zeke | 10 | 45,8 | -0,63 | 48 | 26 | 6,3 | 11,5 |