Test Guidance

Overall

You can use the template from one of the tests at https://toebes.com/codebusters/ and just replace the questions with your own.

For all questions, it is important to carefully check the encoded text for any inappropriate or potentially offensive words that may be generated by the encoding.

When writing the question text, all cipher or plain decoded text should be indicated by using a bold courier font.

Timed question

This should be an Aristocrat with a hint (see Aristocrats/Patristocrats).

Atbash Cipher {3.e.i} {9.d}

The Atbash is one of the easiest Ciphers for students to encode or decode because the alphabet is fixed. The letter v will always stand for the letter e and vice-versa. There should only be one Atbash Cipher question on a test.

Setting Difficulty:

The only factor for difficulty with this question is in the number of characters in the phrase. The choice of letters/words/word length has no impact on the difficulty. In general the question should be between 45 and 80 characters. It is only slightly harder to have the students Encode because it isn't obvious when they have the correct answer. An Encode problem is also slightly harder to grade as it requires carefully checking each letter instead of simply reading the answer.

Points:

An Atbash Cipher Decode should be worth around 100 points (approximately 1.2 points per character plus 2.5 points per unique character in the Plain Text). An Atbash Cipher Encode could be worth 120-150 points depending on the length of the encode string (approximately 1.2 points per character rounded to the nearest 10).

Question Text:

The question should clearly indicate that cipher has been encoded using the Atbash Cipher as well as the origin of the phrase or quote. It should not include a hint. Some examples:

Solve this quote from <person> which has been encoded with the Atbash Cipher.

Encode this quote by <person> using the Atbash Cipher.

Caesar Cipher {9.e} {3.e.ii} {3.e.i}

The Caesar Cipher is a fairly simple cipher where once you know a single letter mapping, the remainder of the problem is a simple transposition. There should be no more than one or two Caesar Cipher questions on a test.

Setting Difficulty:

There are two factors which affect the difficulty. First is the length of the cipher which in general should be approximately 80-90 characters. Having a single or two character word in the Plain Text also makes it easier to solve. Secondarily is the Shift of the letters. Shifts of 1,2,3 or 13 are the easiest because once the shift is known, it can generally be done without even looking it up in a table. 13 is a special case because it operates similar to the Atbash in that letters map in pairs. It is only slightly harder to have the students Encode because it isn't obvious when they have the correct answer. An Encode problem is also slightly harder to grade as it requires carefully checking each letter instead of simply reading the answer.

Points:

A Caesar Cipher Decode should be worth 100-120 points (approximately 1 point per character in the Plain Text plus 2.5 points per unique character) with the 120 point being associated with a Caesar with a larger shift and no short words. A Caesar Cipher Encode should be worth 120-160 points with absence of short words, larger shift values and longer phrases

Question Text:

The question should clearly indicate the origin of the phrase and that it has been encoded using a Caesar Cipher. It should not indicate the shift amount (for a Decode) or include a hint. For example:

Solve this quote from <person> which has been encoded with the Caesar Cipher.

A quote from <person> has been encoded with the Caesar Cipher. What does it say?

Encode this quote from <person> using a Caesar Cipher with a shift of 9 (e.g. S encodes as J).

Aristocrats/Patristocrats {3.e.iii}

These will take up the bulk of the questions. It is helpful to search for a variety of quotes and phrases to use in order to pick the ones which best meet your needs. A good quote will have around 20 words and about 100-120 characters (including spaces) for Aristocrats and 110-130 characters for Patristocrats. Note that the only difference between an Aristocrat and a Patristocrat is whether you take the spaces out or not, so the same quotes work for both. However, you generally want slightly longer phrases for the Patristocrats to give more patterns for the team to find.

Setting Difficulty:

There are multiple factors which affect the difficulty of an Aristocrat. First and foremost is the match of the phrase to the frequency of distribution of letters relative to the standard distribution of English letters:

ETAONIRSHLDCUPFMWYBGVKQXJZ
13%

9%

8%

7%

6%

4%

3%

2%

1%

-

Table 1 - Frequency of English Letters

Fortunately the tool will calculate the Chi-Square Value to give an indication of how close the phrase is to a standard distribution. The lower the value, the easier the phrase.

Secondarily, the choice of words matters. A phrase using obtuse/archaic language or atypical words will be more difficult than one which uses simple words. The presence of single and double letter words helps to make the problem easier.

Third, phrases that start with it is, have multiple occurrences of the or contain the words these, there, little or people tend to be easier. You will also want some samples which have repeated words to use for test questions providing hints. In general, it is good to avoid quotes which are unattributed or by anonymous to allow the author of the quote to serve as an extra hint.

When spelling and grammar errors are introduced, the problem becomes much harder. Likewise, eliminating all the spaces for a Patristocrat makes it the hardest to solve.

In order to make the harder problems solvable, it is good to consider using a K1 or K2 alphabet to provide some additional hints. A K2 alphabet is slightly harder than a K1 alphabet, but both are easier than a random alphabet.

Points:

Aristocrats/Patristocrats have a wide range of scores, but there are some general guidance to follow in setting the score. Each of the types below provide a nominal range to start with. When encoding the sample text, the tool will report a Chi-Square Value. The range of the value provides a general guide to the difficulty of the problem (0-20 = Easy, 20-30=Medium, 30-40=Medium Hard, 40-50=Difficult, >50=Extremely Difficult). Taking the Chi-Square Value indication as a starting point, also look for short words and pattern words in the phrase as an adjustment to the difficulty up or down. A problem that is perceived to be Easy should not be worth more than 300 points while a problem that is Difficult should not be worth less than 500 points. As a rule of thumb, an Easy problem should start with a base of 200 points while a Difficult one should be 750 points. For this reason it is often good to start with a number of phrases to determine their difficulty and then assign them to the type of Aristocrat/Patristocrat problem. It is also worth noting that with a phrase that has a high Chi-Square Value, while it may not be ideal for an Aristocrat/Patristocrat, it can easily be used for just about all the other Cipher types including Baconian, Affine, Atbash, Caesar and Vigenère

A problem can be made less difficult in two ways. Adding a hint that reveals characters makes it easier to solve. With a hint, each letter beyond 3 letters for an Aristocrat (or 5 letters for a Patristocrat) revealed by the hint should drop 50 points from the score. By using a K1 or K2 alphabet, the score can also be dropped by 100 and 75 points respectively.

Eliminating the spaces to make a phrase into a Patristocrat automatically adds 500 points to the difficulty and eliminates any benefit from short words.

For example starting with a phrase:

Is there any knowledge in the world which is so certain that no reasonable man could doubt it?

with a Chi-Square of 11.50, we see multiple short words (it, is, in), and a hint of three letters (man) this problem would be one of the easiest with a score of 200. It would also make for a good timed question. By changing it to a Patristocrat, we would add 500 to get a base score of 700. If we encode it with a K1 alphabet we drop 100 points for a score of 600. At this range we should also be providing a hint that it starts as “isthe” revealing only 5 characters leaving this as a 600 point question.

Aristocrat with a hint {3.e.iii.(1)}

Points:

A standard Aristocrat with a hint will be about 200 points. See the General Aristocrat/Patristocrat point guidance for more details.

Question Text:

The question should clearly indicate that it is an Aristocrat, provide the source of the phrase and a hint of what word(s) may be found in the decoded phrase. For example:

Solve this Aristocrat which is a quote by <person> which has the word MAN in it.

<person> was heard to say the following phrase which starts with: Never.

Aristocrat without a hint {3.e.iii.(2)}

Points:

A standard Aristocrat without a hint will be between 250 and 350 points. See the General Aristocrat/Patristocrat point guidance for more details.

Question Text:

The question should clearly indicate that it is an Aristocrat and provide the source of the phrase. For example:

Solve this Aristocrat which is a quote by <person> when they <event>.

During <event>, <person> was heard to say the following phrase.

Aristocrat with spelling/grammar errors and a hint {3.e.iii.(3)}

These can be a lot of fun to generate and can be something that the students relate to. In general the thought process here is something that their phone mistyped when texting using voice. Generating these takes a couple of tries to get something that works out well. It is best to start out with phrases that contain homophones (like you/ewe/hue) and then either generate variations by hand or use a homophone generation tool (likehttp://homophonemachine.allaboutlearningpress.com/ or https://evashort.com/homophone/). Sometimes you can get lucky trying to send it as a voice text message with Siri or Google Voice to get a phrase which has been slightly twisted. You may want to try a couple of times to get something that is appealing. The phones have gotten a lot smarter lately and don't make as many mistakes as they used to. By using a K2 or K1 alphabet, the difficulty of solving it goes down.

Points:

An Aristocrat with spelling/grammar errors and a hint should be worth between 350 and 500 points, based on the difficulty of the phrase, number of letters revealed with the hint and whether or not a K1/K2 alphabet is used. See the General Aristocrat/Patristocrat point guidance for more details.

Question Text:

The question should clearly indicate that it is an Aristocrat with misspellings or grammar errors and provide the source of the phrase as well as a hint. For example:

Alexa severely misheard a phrase from <person> which has the word THE twice and then encoded it as an Aristocrat. What did it come out as?

When one of <person's> manuscripts was automatically scanned and OCRed before converted to an Aristocrat, there were quite a few mistakes but the word piece appears twice. What was the final result?

Aristocrat with spelling/grammar errors with no hint {3.e.iii.(4)}

Approach these exactly the same as Aristocrat with spelling/grammar errors and a hint except that you don't provide a hint. This is one of the hardest questions that can be found on the test. In general these should be encoded with a K1 or K2 alphabet.

Points:

An Aristocrat with spelling/grammar errors and no hint should be worth between 450 and 700 points. See the General Aristocrat/Patristocrat point guidance for more details.

Question Text:

The question should clearly indicate that it is an Aristocrat with misspellings or grammar errors and provide the source of the phrase. For example:

Alexa severely misheard a phrase from <person> and then encoded it as an Aristocrat using a K1 alphabet. What did it come out as?

When one of <person's> manuscripts was automatically scanned and OCRed before converted to an Aristocrat using a K2 alphabet, there were quite a few mistakes. What was the final result?

Aristocrat K1-K3 Keyword Recovery {3.e.iii.(7)}

Recovering a keyword/phrase from an Aristocrat first requires competitors to decode the Aristocrat, so all of the guidance in the General Aristocrat/Patristocrat guidance applies. Recovering a K1 keyword (keyword in the plain text) is typically very straightforward. Recovering a K2 keyword (keyword in the cipher text) is also very doable once the concept is understood. Recovering a K3 keyword (keyword in both the cipher text and the plain text, with the alphabets shifted) requires a completely different process and merits special consideration (described below) to ensure that a plaintext phrase/quote is chosen which is a suitable candidate for this type of problem.

For K1-K3 keyword/phrase recovery problems, it is helpful to be mindful of the fact that, as is typical with keywords, duplicate letters are removed when the word or phrase is used as a key, and competitors are provided with blanks corresponding to the length of each word in the key phrase to aid them in recovering it. For example, when they recover the characters TOHESAR and are provided with an appropriate number of blanks (2,3,5), they can work out that the key phrase here is TO THE STARS. Therefore, it is good to be especially mindful to avoid using a keyword that would result as indistinguishable from another when reconstructed: for example, the two different five-letter words DOSES and DOSED would both result in a key of DOSE, so either of these would be a poor keyword choice as competitors would not be able to reliably recover which one was used.

The Aristocrat encoder prefers that the plaintext be quotes or phrases that contain at least 19 unique characters. For K1 and K2 keyword recovery, this is not strictly required. For K3 keyword recovery, 19 unique characters is an absolute minimum in selecting an appropriate phrase, as the less of the alphabet is used, the more difficult the problem becomes as it results in a higher number of unknown mappings. Letter frequencies in the plaintext still need to be distributed enough to make it solvable (i.e., if many letters are only used once, this creates its own difficulty factor for solving the Aristocrat before beginning the recovery process). A plaintext with 22 or 23 unique characters is more reliably solvable than one with fewer and is the easiest way to make sure that the problem isn’t inadvertently too difficult without having to get into the nuts and bolts of solving these as described in the next paragraph. It is also recommended to keep the Offset at 1, 2, or 3.

The process of K3 keyword recovery is described in detail here:

To put it simply, to recover a K3 keyword, competitors must first reconstruct a single, 26-character alphabet (which is the reason more unknown mappings make this more difficult). If that is straightforward, the difficulty is reduced; if it requires them to interleave smaller chains, the difficulty is increased. After that, they must perform the decimation step. If the decimation factor is odd, the procedure should be somewhat straightforward; if the decimation factor is even, this requires them to use alternating interleave spacing which makes the decimation more complicated.

Setting Difficulty:

K1 and K2 keyword recovery are significantly easier than K3 keyword recovery. Difficulty for all types is somewhat increased when the keyword contains duplicate letters. For K3 keyword recovery, difficulty is also increased when fewer unique characters are used in the plaintext; when the keyword draws from low-frequency letters; when the reconstruction results in two or more short loops than in one long loop; and when the decimation step requires alternating interleave spacing.

Points:

See the General Aristocrat/Patristocrat point guidance for more details. Finding a K1 or K2 keyword/phrase should add 100 points to the total; finding a K3 keyword/phrase should add a minimum of 150 points to the total and could increase based on the factors for difficulty described above.

Question Text:

The question should clearly indicate that they are to find the keyword/key phrase and provide the source of the phrase. For example:

A famous phrase from <person> has been encoded as an Aristocrat using a K1 alphabet. What was the key phrase used to encode it?

<person> was often heard to say the following phrase which has been encoded as an Aristocrat using a K2 alphabet. Please provide the keyword used to encode it.

Patristocrats with a hint {3.e.iii.(5)}

Patristocrats should always be encoded with a K1 or K2 alphabet in order to keep them from being too difficult. It is also good to pick phrases with a very low Chi-Square Value to keep the problem from being too difficult.

Points:

An Patristocrat with a hint should be worth between 450 and 650 points. See the General Aristocrat/Patristocrat point guidance for more details.

Question Text:

The question should clearly indicate that it is an Patristocrat and provide the source of the phrase as well as a hint that reveals no more than 6 unique characters. For example:

A famous phrase from <person> has been encoded as a Patristocrat using a K1 alphabet. It starts with the letters ‘tomyf'. What did they say?

<person> was often heard to say the following phrase which has been encoded as a Patristocrat using a K2 alphabet. The sequence ‘you' appears four times in the original text. What did they say?

Patristocrats with no hint {3.e.iii.(6)}

Patristocrats with no hint must be encoded with a K1 or K2 alphabet in order to keep them from being too difficult. It is also good to pick phrases with a very low Chi-Square Value to keep the problem from being too difficult.

Points:

An Patristocrat without a hint should be worth between 500 and 700 points. See the General Aristocrat/Patristocrat point guidance for more details.

Question Text:

The question should clearly indicate that it is an Patristocrat and provide the source of the phrase. For example:

A famous phrase from <person> has been encoded as a Patristocrat using a K1 alphabet. What did they say?

<person> was often heard to say the following phrase which has been encoded as a Patristocrat using a K2 alphabet. What did they say?

Affine Cipher decrypt B{3.e.iv} C{3.e.iii}

There should be a single one of these on the test. It should use a phrase about 25 characters long that doesn't have too many occurrences of the letter A in it, preferably with as large a variety of letters as possible… (e.g. The quick brown fox jumps over the lazy dog isn't actually bad).

Pick a value for a which is coprime with 26 (1,3,5,7,9,11,15,17,19,21,23 or 25). The actual value doesn't matter, but larger ones tend to be slightly harder. If you are generating tests for multiple regions, pick numbers that are near each other. I.e. 7, 9 and 11 would be good to have as equivalent a values.

Pick a value for b between 1 and 25 inclusive. Unlike a where the larger values become slightly harder, the value of b can truly be any number and be the same level of difficulty.

Setting Difficulty:

There is very little variability in the difficulty other than the length of the string. Larger values of a are only slightly harder while the value of b has no real impact on the difficulty.

Points:

100.

Question Text:

The question should state to use the Affine Cipher and with the values of a and b for them to use (note that a and b are italicized). For example:

Decrypt the following cipher text which was encoded using the Affine Cipher with a=5 and b=9>.

Affine Cipher encrypt {3.e.iii}

There should be at most a single one of these on the test. It should use a phrase about 25 characters long that doesn't have too many occurrences of the letter A in it, preferably with as large a variety of letters as possible… (e.g. The quick brown fox jumps over the lazy dog isn't actually bad).

Pick a value for a which is coprime with 26 (1,3,5,7,9,11,15,17,19,21,23 or 25). The actual value doesn't matter, but larger ones tend to be slightly harder. If you are generating tests for multiple regions, pick numbers that are near each other. I.e. 7, 9 and 11 would be good to have as equivalent a values.

Pick a value for b between 1 and 25 inclusive. Unlike a where the larger values become slightly harder, the value of b can truly be any number and be the same level of difficulty.

Setting Difficulty:

There is very little variability in the difficulty other than the length of the string. Larger values of a are only slightly harder while the value of b has no real impact on the difficulty.

Points:

100.

Question Text:

The question should state to use the Affine Cipher and include the phrase to encode along with the values of a and b for them to use (note that a and b are italicized). For example:

Encrypt the common phrase The quick brown fox jumps over the lazy dog using the Affine Cipher with a=5 and b=9.

Vigenère Cipher encrypt/decrypt {3.e.v}

There can be one of each Encrypt and Decrypt on the test. There are no restrictions on the phrase, although try to avoid a phrase with a lot of a's in it. I.e. An amazing aardvark allows all answers would be a poor choice because the letter a is trivial to encode. As this question is nominally worth two points per letter, a 50 letter phrase is ideal.

Additionally there needs to be a Key to encode it with. It should be 5 or 6 characters with no repeating letters and avoid the letter a as it causes a letter to map to itself. By setting the Block Size to the same as the length of the Key, the problem is much easier than with the default Block Size of 0 that keeps the original spacing. Setting the Block Size to a size other than the length of the Key increases the difficulty somewhat.

Note that for the Regional/Invitational events, the Key must be given as part of the question.

Setting Difficulty:

The Block Size is the major contributor to the difficulty followed by the length of the phrase.

Points:

100 for a Decode (approximately 2 points per letter), 120 for an Encode. If the Block Size is the same as the Key length subtract 20 points. If the Block Size is non-zero and different from the Key length, add 25 points.

Question Text:

The question should indicate that the Vigenère Cipher is being used (don't forget the accented è), whether they are to Encode or Decode and the Key to use for it. It is generally nice to give the origin of the phrase if it is known. For example:

A phrase by <person> has been encoded using the Vigenère cipher with a code word of SLEPT. What does it say?

Using a keyword of HORSE, encode this famous quote by <person> using the Vigenère cipher.

Porta Cipher encrypt/decrypt {3.e.v}

There can be one of each Encrypt and Decrypt on the test. There are no restrictions on the phrase, although try to avoid a phrase with a lot of a's in it. I.e. An amazing aardvark allows all answers would be a poor choice because the letter a is trivial to encode. As this question is nominally worth two points per letter, a 50 letter phrase is ideal.

Additionally there needs to be a Key to encode it with. It should be 5 or 6 characters with no repeating letters and avoid the letter a as it causes a letter to map to itself. By setting the Block Size to the same as the length of the Key, the problem is much easier than with the default Block Size of 0 that keeps the original spacing. Setting the Block Size to a size other than the length of the Key increases the difficulty somewhat.

Note that for the Regional/Invitational events, the Key must be given as part of the question.

Setting Difficulty:

The Block Size is the major contributor to the difficulty followed by the length of the phrase.

Points:

100 for a Decode (approximately 2 points per letter), 120 for an Encode. If the Block Size is the same as the Key length subtract 20 points. If the Block Size is non-zero and different from the Key length, add 25 points.

Question Text:

The question should indicate that the Porta Cipher is being used, whether they are to Encode or Decode and the Key to use for it. It is generally nice to give the origin of the phrase if it is known. For example:

A phrase by <person> has been encoded using the Porta cipher with a code word of SLEPT. What does it say?

Using a keyword of HORSE, encode this famous quote by <person> using the Porta cipher.

Nihilist cipher {3.e.ix}

The Nihilist cipher is a substitution cipher. It uses two keywords, the polybius key and normal keyword, to encode the plaintext. The polybius key is used to generate a Polybius Square, which is a 5x5 table that maps every letter to a number based on its row and column. The normal keyword is used to encode the plaintext, applying each letter in the keyword in sequence in just the same way that the Vigenère cipher works. There should be no more than 2 Nihilist decryption problems on a test.

Setting Difficulty:

In general, more room for error makes it more difficult to get the right answer. There is more room for error when the key length and the block size are different. A block size of 0 may provide clues to the encoded words, so reduces difficulty. A longer polybius key makes it more difficult because there is increased chance of creating the polybius square incorrectly. A longer encoding key also means there is more work to do. needed. Putting 'Z' in the polybius key means it will not have a numeric value of 55, which is a common assumption.

Points:

125-150. Longer plain text should be worth more because there is simply more work to do, in addition to the considerations above.

Question Text:

For decryption, the question text should include the polybius key and normal keyword used to encrypt the plain text. Quote length should be at most around 50 characters.

A phrase by <person> was encoded using a Nihilist Substitution Cipher with a Polybius Key of SCIENCE OLYMPIAD and a key of CODE. What does it say?

Nihilist cipher (cryptanalysis) {3.f.v}

The Nihilist cipher is a substitution cipher. It uses two keywords to encode the plaintext. The first is used to construction a Polybius Square which is used to map all of the letters to numbers. The second keyword is used to encode the plaintext, applying each letter in the keyword in sequence in just the same way that the Vigenère cipher works. Doing cryptanalysis, some of the mapping of plain text to cipher text is given and the polybius square and keyword need to be derived from that mapping. This is an iterative process an increases the difficulty quite a bit.

Setting Difficulty:

In addition to the notes above, cryptanalysis makes Nihilist ciphers significantly more difficult due to the iteration and deduction required to solve it.

Points:

225-275. Cryptanalysis adds 100 points or more, depending on the length of the crib.

Question Text:

For cryptanalysis, the question text should include a hint of plaintext and the location of that plaintext.

The following quote by <person> has been encoded with the Nihilist Substitution Cipher using a very common word for the key. The 35th through 39th cipher units (58 93 66 54 45) decode to be HAPPY.

Baconian cipher {3.e.vi}

This is where the test writer can have a lot of fun. The phrase should be approximately 40-60 characters long and ideally should be one for which the ending can't necessarily be guessed once they have decoded the first half; it is important to make sure that they have to work the entire problem. There is little impact on the difficulty.

With the Words Baconian cipher which is the harder of the three options, it is important to pick a pattern for the letters mapping to A and B. Obviously the default with the tool of ABABAB… is the easiest and it is not recommended to use this for the test. There are lots of patterns which work (first half of the alphabet is A, second half is B). But there are some that don't work such as BABABA… which eliminates all vowels from being an A. Fortunately the tool tells you when the pattern picked doesn't have any words to match.

For the Letter for letter and Sequence Baconian ciphers it is possible to have multiple letters stand for A and B, but it is recommended to use symbols (which can be pasted into the fields). Having a variety of symbols which have potential overlap can make it harder. For example on the 2018 North Carolina State test, the key to the solution was that 'A' was represented by any arrow pointing down (e.g. ) and 'B' was represented by any arrow pointing up (e.g. ); so that the students had to figure out whether the direction of the arrow or the line was what indicated the difference.

Setting Difficulty:

There are three factors which affect the difficulty. First is the choice of symbols to represent the Baconian. The harder to distinguish, the more points the question should be worth. Second is how much of a clue is provided. Lastly is the size of the grouping. By picking a Line Width which is not a multiple of 5, it ensures that the students have to carefully manage wrapping a group of 5 across a line boundary.

Points:

200-400. A Letter for letter with a single substitution and Line Width which is multiple of 5 should be worth 200 points. A Words Baconian with a complex pattern and a couple letters of clues would be worth 400.

Question Text:

The question should indicate that the Baconian Cipher is being used and the origin of the phrase if known and any clue. For example:

The following symbols encodes a phrase by <person> using a Baconian alphabet. What does it say?

The following strange headlines encodes a phrase by <person> using a Baconian alphabet. You have been told that it starts out with “SOME” What does it say?

The following odd symbols were found when a tomb was opened, but you recognized it as a prankster who scratched it on the wall using the Baconian alphabet. What does it say?

Spanish Xenocrypt {3.e.vii}

This will be one of the hardest questions on the test, but it is good to have in order to provide a challenge.

Pick a Spanish phrase which primarily consists of words which a second-year Spanish class would cover. Phrases which have both la and las present are good choices as well as phrases which contain y or cognates (Spanish words which are substantially like their English equivalent words such as ciencia, composición and básico) are also good. For a good source see https://www.realfastspanish.com/vocabulary/spanish-cognates). Although it isn't strictly necessary, try to avoid phrases which depend on accented characters. As with the approach for the English Aristocrats, pay attention to the frequency of letters. The tool automatically calculates the Chi-Square Value to verify the match:

EAOSNRILDTUCMPBHQYVGÓÍFJZÁÉÑXÚKWÜ
13%

12%

8%

7%

6%

5%

3%

2%

1%

-

If the encoded string uses both N and Ñ, it is best to re-encode until you don't get them both to avoid confusion on the part of the teams. Although it is possible to get an encoding that doesn't use Ñ at all, it is perfectly fine to generate a question which has one. Having both both N and Ñ, increases the difficulty of the problem.

Setting Difficulty:

The presence of cognates greatly reduces the difficulty. It is also expected that a K1 alphabet with an English key should be used. A K2 alphabet can be used, but it doesn't reduce the difficulty as much.

Points:

600-700.

Question Text:

The question should clearly indicate that it is a Spanish Xenocrypt and provide the source of the phrase (if known). It should also indicate the use of a K1 (or K2) alphabet with an English keyword. For example:

Solve this Xenocrypt which is a translation of a quote by <person> into Spanish and has been encoded with a K1 alphabet using an English keyword.

Cryptarithm {3.e.vi}

Setting Difficulty:

.

Points:

150-600

Question Text:

The question should indicate that they are to compute the decryption matrix using the Hill Cipher and provide the key. For example:

After solving the cryptarithm SEND+MORE=MONEY decode the phrase 9015 10992.

Generating Cryptarithms

There are several sites that allow for generation of cryptarithms including:

Complete Columnar {3.e.vii}

The Complete Columnar Cipher is a transposition cipher. Plain text is written across a table with a pre-defined number of columns. The column order is mixed up by providing a column ordering string. For example, if the column count is 7 and the column order string is SCIENCE, the column order will rearranged to be in the order: 2, 6, 4, 7, 3, 5, 1. The column order string gets sorted to CCEEINS and we find the first character C in position 2 of the column sorting string, the next C in position 6 of the column sorting string, the first E in position 4, etc. The columns of the plain text table are rearranged using this order and the cipher text is generated by reading down each column, starting with the first column on the left and working across.

Note: Single letter and/or number characters can be used in the column order string. Numbers 0-9 and uppercase characters A-Z are used and lower case letters are automatically converted to upper case. The order is determined by sorting the characters using their ASCII value (numbers come before letters). Duplicate characters are taken in order, left to right. For greater than 10 columns, double digit numbers are not honored. However, letters can be used to avoid duplicate values in the column order string, although it is not necessary.

To put these columns back in the correct order (mentally renumbering them 1-7), take them in numeric order. I.e., the column labeled 1 is in the seventh position, the column labeled 2 is in the first position, the column labeled 3 is in the fifth position, etc. So the transposed table would be rewritten with column ordering: 7, 1, 5, 3, 6, 2, 4 to reveal the original plain text in a seven column table.

A crib is given for cryptanalysis to provide a starting point for deducing the column order and decoding the ciphertext.

Setting Difficulty:

Difficulty is increased when the crib is split over two rows; when the crib appears in more than one location in the column block; when there are duplicate letters in the crib; when there are more possible columns (i.e. more factors of the cipher text length) that must be analyzed; when there are no pad characters (i.e. cipher text is a multiple of the number of encode columns).

Points:

150-275 The point value should be higher when more columns are used in the encoding; when the crib is split over two rows; when there is more than one instance of the crib; when there are duplicate letters in the crib; when there are more possible columns that could have been used to encode the cipher text; when no pad characters are used in the cipher text to make it fit in the grid. Also, the longer the cipher text, the more difficult because it just takes more time to analyze.

Question Text:

The question should indicate that they are to use the crib text (given in the question text) to help determine the number of columns and the order of the columns to decode the cipher text. For example:

Decode a quote by Baden-Powell which has been encoded using the Complete Columnar Cipher. You are told that the quote has BADGE somewhere in it.

Choosing a Crib

The crib should contain a sequence of plain text characters which is no shorter than one less than the number of columns used to encode.

Decryption matrix for Hill Cipher {3.e.viii}

For Regional and Invitational competitions, only the Compute Decryption option is used for the Hill Cipher. For this there needs to be a 4 character phrase which corresponds to an invertible matrix. (https://en.wikipedia.org/wiki/Invertible_matrix). Fortunately, the tool will tell you if it is not invertible. There is also a list of known valid word keys at https://toebes.com/codebusters/HillKeys.html (which is linked to at the top of the tool) for both the 2x2. Note that there are other keys that can be used which are not words, it is more likely to be invertible if you use the odd letters B, D, F, H, L, N, R, T, X and Z. as they are odd and non-prime, but you can mix in some other letters. Just make sure that the keyword is not an inappropriate phrase. A total non-sense phrase is perfectly acceptable, but it helps the style of the test if it looks like a word.

for both 2x2 and 3x3 matrices, we can improve the likelihood that the determinant is invertible if half to three-quarters of the letters are odd;

Setting Difficulty:

There is little variability in the difficulty although letters at the end of the alphabet generate larger numbers.

Points:

100.

Question Text:

The question should indicate that they are to compute the decryption matrix using the Hill Cipher and provide the key. For example:

Using a key of TEST compute the decryption matrix for a 2x2 Hill with a 26 character alphabet.

Details on picking a key

For a 2x2 matrix we can be explicit. Since it is very likely that we will want either the 2nd or 3rd letter to be a vowel, The more useful advice would be that one or both of the 2nd and 3rd letters should be vowels, while both the first and last letters come from the set of odd letters listed above. This would guarantee that the determinant is odd, and then we only need to check to make sure it isn't 0 (mod 13), which means we would succeed with probability 12/13.

For a 3x3 matrix it isn't as easy, but if we ask for about half the letters to be odd (same set as before), then there is a good chance that the determinant will be odd. In fact, assuming random placement of the odd letters, the exact odds for each possible number of odd letters is as follows:

00.000
10.000
20.000
30.071
40.286
50.571
60.426
70.500
80.000
90.000

Basically, we want 5, 6, or 7 odd letters in the mix in order to have a decent shot at being invertible mod 2. If we are lucky, we may get by with 4 odd letters; with only 3 odd letters we need to be extremely lucky; and for any other number of odd letters, it is simply impossible.

Morbit Cipher Decode {3.e.viii}

The Morbit Cipher encodes text by first converting the plain text into morse code with the space between characters represented by an × and the space between words represented by two × characters. The resulting morse code is then broken into pairs of characters (adding an × at the end if necessary). The numbers 1-9 are randomly assigned to the unique pairs of characters (●●, ●–, ●×, –●, ––, –×, ×●, ×–, ××,). The pairs of morse characters are then replaced with the corresponding number to generate the final cipher string. Decoding is done by reversing the process.

Setting Difficulty:

The dificulty is driven by the number of letters and the variety of morse code combined with which letters are chosen. Looking at the generated solving guide can help guide the question writing. Not telling them which digit corresponds to ×× increases the difficulty.

Points:

100-150.

Question Text:

The question should be around 40 characters.

Solve this quote from <person> which has been encoded using the Morbit Cipher. You are told that 2=●●, 4=●–, 6=●×, 9=–●, 1=––, and 7=–×

Pollux Cipher Decode {3.e.viii}

The Pollux Cipher encodes text by first converting the plain text into morse code with the space between characters represented by an × and the space between words represented by two × characters. The resulting morse code is then randomly assigned to digits (0-9) chosen to represent the morse characters: (, , ×) The pairs of morse characters are then replaced with the corresponding number to generate the final cipher string. Decoding is done by reversing the process. Note that because one or more digits may represent a morse character piece, the mapping can be somewhat random.

Setting Difficulty:

The dificulty is driven by the number of letters and the variety of morse code combined with which letters are chosen. Looking at the generated solving guide can help guide the question writing. Not telling them all the Hint Digits which correspond to × increases the difficulty. Assigning more characters for one morse digit, thereby reducing another, also increases the difficulty.

Points:

100-150.

Question Text:

The question should be around 40 characters.

Solve this quote from <person> which has been encoded using the Pollux Cipher. You are told that 2,3=●, 4,7=–, 9,1=×

Fractionated Morse Cipher Cryptanalysis {3.e.viii}

The Fractionated Morse Cipher encodes text by first converting the plain text into morse code with the space between characters represented by an × and the space between words represented by two × characters. The resulting morse code is then assigned to letters using a K1-like replacement table to represent the morse characters: (, , ×) The sets of morse characters are then replaced with the corresponding letter to generate the final cipher string. Decoding is done by reversing the process..

Setting Difficulty:

The difficulty can depend on how long the crib is (longer is easier), where the crib text is placed (it may reveal more morse mappings). The difficulty is also related to the letters that are chosen for the keyword -- the difficulty increases when the keyword draws letters from throughout the alphabet, and when it draws from the end of the alphabet. Another factor for difficulty is how frequently the letters that the crib maps to are used throughout the rest of the cipher.

Looking at the autosolver's first attempt will give an idea of what students have to start with, looking at its final attempt will give an approximate idea of what students should have to end with, and looking at its iterations in the middle will help show how much trial and error students might have to do between those two points.

Points:

225-300.

Question Text:

The question should be around 40 characters.

A quote has been encoded using the Fractionated Morse Cipher for you to decode. You are told that the quote has CHAN in it corresponding to the encoded text V J T B P D, meaning B = ●×●; D = ●×–; J = –●×; P = –×–; T = ●●●; V = ×–●.

Cryptarithm Decode {3.e.viii}

The Cryptarithm ...

Setting Difficulty:

The dificulty is driven by .....

Points:

100-150.

Question Text:

The problem should be in base 10.

Solve this Cryptarithm..

Running Key Cipher [NOT USED THIS YEAR]

Pick a phrase which is approximately 80 characters long. The actual content has little impact on the difficulty. By default there are four encoding texts (Gettysburg address, Declaration of Independence, Constitution of United States of America and MAGNA CARTA (In Latin)) but they can be changed by going to https://toebes.com/codebusters/EditRunningKeys.html

Setting Difficulty:

There is little variability in the difficulty other than the length. Encode is slightly harder than Decode.

Points:

200-250. Approximately 3 points per letter.

Question Text:

The question should indicate that the Running Key Cipher is being used and whether they are to Encode or Decode. If they are to Encode, it should indicate which encoding text to use. For example:

The following quote from <person> has been encoded using a running key cipher against a famous document. What does it say?

Encode what <person> said about <topic> using a running key cipher against the MAGNA CARTA.

Vigenère Cipher Decrypt with a known plaintext {3.f.iii}

There can be one of these on a State/National test. There are no restrictions on the phrase, although try to avoid a phrase with a lot of a's in it. I.e.An amazing aardvark allows all answers would be a poor choice because the letter A is trivial to decode. As this question is nominally worth four points per letter, a 50 letter phrase is ideal.

Additionally there needs to be a Key to encode it with. It should be 5 or 6 characters with no repeating letters and avoid the letter a as it causes a letter to map to itself. By setting the Block Size to the same as the length of the Key, the problem is much easier than with the default Block Size of 0 that keeps the original spacing. Setting the Block Size to a size other than the length of the Key increases the difficulty somewhat.

For the decoding clue, pick a word in the phrase which is at least 5 characters long, carefully count to determine the position in the encoded phrase as well as the encoded character.

Setting Difficulty:

The length of the phrase is the major

Points:

200-250. If the Block Size is 0, add 25 points. If the Block Size is non-zero and different from the Key length, add 50 points.

Question Text:

The question should indicate that the Vigenère Cipher is being used (don't forget the accented è), the length of the Key and the plain text corresponding to some part of the phrase. It is generally nice to give the origin of the phrase if it is known. For example:

<person> once said this about <topic>. It has been encoded using the Vigenère cipher using a very common five letter word. You have been told that the 17th through the 22nd letters in the code (YMNCHU) actually is the word REASON. What does the message decode to?

Porta Cipher Cryptanalysis with a known plaintext {3.f.iii}

There can be one of these on a State/National test. There are no restrictions on the phrase, although try to avoid a phrase with a lot of a's in it. I.e.An amazing aardvark allows all answers would be a poor choice because the letter A is trivial to decode. As this question is nominally worth four points per letter, a 50 letter phrase is ideal.

Additionally there needs to be a Key to encode it with. It should be 5 or 6 characters with no repeating letters and avoid the letter a as it causes a letter to map to itself. By setting the Block Size to the same as the length of the Key, the problem is much easier than with the default Block Size of 0 that keeps the original spacing. Setting the Block Size to a size other than the length of the Key increases the difficulty somewhat.

For the decoding clue, pick a word in the phrase which is at least 5 characters long, carefully count to determine the position in the encoded phrase as well as the encoded character.

Setting Difficulty:

The length of the phrase is the major

Points:

200-250. If the Block Size is 0, add 25 points. If the Block Size is non-zero and different from the Key length, add 50 points.

Question Text:

The question should indicate that the Porta Cipher is being used, the length of the Key and the plain text corresponding to some part of the phrase. It is generally nice to give the origin of the phrase if it is known. For example:

<person> once said this about <topic>. It has been encoded using the Porta cipher using a very common five letter word. You have been told that the 17th through the 22nd letters in the code (YMNCHU) actually is the word REASON. What does the message decode to?

RSA Cipher {3.f.iv}

There are several variants of the RSA question. The Safe Combo and Exchange Keys options test the knowledge of the RSA algorithm by presenting randomized order of RSA components with randomized values. The Quantum Computer and Compute d options test using the extended Euclidean Algorithm compute the inverse. Decode Year tests using Rapid Modular Exponentiation to compute the mod of a prime raised to a large factor. All of these dynamically generate and update the question text with all of the computed values.

For the Safe Combo

and Exchange Keys choices, it is good to pick values for the Prime Digits and Safe Combination Digits/Data Digits so that the generated RSA keys and numbers provide lots of similar digits. The Randomize button will regenerate the values. Note that the student names are also randomized, picking from the names which were common around the turn of this century.

For the other options, click on the Randomize button until it generates a problem which has the desired level of difficulty.

Setting Difficulty:

The choice of question type and the size of the digits in the key are the major impact on difficulty.

Points:

Safe Combo: 100 points

Quantum Computer, Compute d: 15 points per step of the extended Euclidean Algorithm (shown by the tool) with a minimum of 120 points.

Decode Year:15 points per 1 bit and 5 points per 0 bit in the binary representation of d (shown by the tool) with a minimum of 120 points.

Exchange Keys:120 points

Question Text:

The question text should be automatically generated, but can be edited. Note that the tool keeps track of all the edits so that if it generates new values, the edits are kept in place.

Hill Cipher Encrypt/Decrypt {3.f.v}

Pick a phrase to encode. As a rule of thumb for a 2x2 matrix, every pair of letters is worth 20 points. Ideally you want an odd length string to force them to use a padding Z. For a 3x3 matrix, every group of three letters is worth 25 points. It is important to pick a phrase which is not a multiple of 3 characters long so that they must add the appropriate number of padding characters.

Pick an encoding key. For a 2x2 it is 4 characters long and for a 3x3 it is 9 characters long. This is probably the hardest part to making the test as the matrix has to be invertible (https://en.wikipedia.org/wiki/Invertible_matrix). Fortunately, the tool will tell you if it is not invertible. There is also a list of known valid keys at https://toebes.com/codebusters/HillKeys.html for both the 2x2 and 3x3 encodings. In general, it is more likely to be invertible if you use the letters B, D, F, H, L, N, R, T, X and Z. as they are odd and non-prime, but you can mix in some other letters. Just make sure that the keyword is not an inappropriate phrase. A total non-sense phrase is perfectly acceptable, but it helps the style of the test if it looks like a word.

Setting Difficulty:

Encrypt is harder than Decrypt and 3x3 is harder than 2x2. The other factor is the number of characters for them to encode.

Points:

160-250. About 16 points per group of 2 for a 2x2, 21 points per group of 3 for a 3x3.

Question Text:

The question should indicate whether they are to Encode or Decode using the Hill Cipher with a 26 character alphabet along with the Key. For example:

Using a key of CARNIVALS encode the string ASTROBIOLOGIST using the Hill Cipher with a 26-character alphabet.

Using a key of LOST decode the string QNFWNQNAFCCT using the Hill Cipher with a 26 character alphabet.

Affine Cryptanalysis {3.f.vi}

There should be a single one of these on the State/National test. It should use a phrase about 25 characters long that doesn't have too many occurrences of the letter a in it, preferably with as large a variety of letters as possible. (e.g. The quick brown fox jumps over the lazy dog isn't actually bad).

Pick a value for a which is coprime with 26 (1,3,5,7,9,11,15,17,19,21,23 or 25). The actual value doesn't matter, but larger ones tend to be slightly harder. If you are generating tests for multiple regions, pick numbers that are near each other. I.e. 7, 9 and 11 would be good to have as equivalent a values.

Pick a value for b between 1 and 25 inclusive. Unlike a where the larger values become slightly harder, the value of b can truly be any number and be the same level of difficulty.

Setting Difficulty:

There is very little variability in the difficulty other than the length of the string. Larger values of a are only slightly harder while the value of b has no real impact on the difficulty. Picking the two mapping letters so that they are toward the end of the alphabet also increase the difficulty. Having phrases which use the letters G, W, Y, B, V, K, X, J, Q, and Z increase the difficulty slightly.

Points:

140-240. A good rule of thumb is 6 points per character plus 6 points for each of the letters G, W, Y, B, V, K, X, J, Q, and Z that are used.

Question Text:

The question should state to use the Affine Cipher and the indication of what two letters map to. It should not give the values of a or b. For example:

A message from <person> encrypted with the Affine Cipher using an alphabet of 26 characters has been received. You have been told that the first two letters are TH. With that knowledge, what does this message say?

Dancing Men Cipher {9.h}

The Dancing Men Cipher (also known as the Running Men Cipher) is one of the easiest Ciphers for students to decode because the alphabet is fixed. The letter E will always be represented by a person with both arms up in the air and both legs split. There can be a couple of Dancing Men Cipher questions on a test.

Setting Difficulty:

The only factor for difficulty with this question is in the number of characters in the phrase. The choice of letters/words/word length has no impact on the difficulty. In general the question should be approximately 40 characters.

Points:

A Dancing Men Cipher Decode should be worth 80 points (approximately 1 point per character in the Plain Text and 2 points for how many of the characters are unique).

Question Text:

The question should clearly indicate that cipher has been encoded using the Dancing Men Cipher as well as the origin of the phrase or quote. It should not include a hint. Some examples:

Solve this quote from <person> which has been encoded with the Dancing Men Cipher.

PigPen Cipher {9.h}

The PigPen/Masonic Cipher is one of the easiest Ciphers for students to decode because the alphabet is fixed. The letter E will always be represented by a square. There can be a couple of PigPen Cipher questions on a test.

Setting Difficulty:

The only factor for difficulty with this question is in the number of characters in the phrase. The choice of letters/words/word length has no impact on the difficulty. In general the question should be approximately 40-50 characters.

Points:

A PigPen Cipher Decode should be worth 60-80 points (approximately 1 point per character in the Plain Text plus 1.5 points for each unique character).

Question Text:

The question should clearly indicate that cipher has been encoded using the PigPen Cipher as well as the origin of the phrase or quote. It should not include a hint. Some examples:

Solve this quote from <person> which has been encoded with the PigPen Cipher.

Tap Code Cipher {9.i}

The Tap Code is is a simple cipher to decode because the mapping is is fixed. A single dot followed by a single dot will always stand for the letter a. What makes it slightly more challenging is that the students need to memorize the table or just know how to recreate it on the test. There should be one or two Tap Code Cipher questions on a test.

Setting Difficulty:

The two factors for difficulty with this question is in the number of characters in the phrase and how far down in the alphabet the letters are. The further in the alphabet, the more characters that they have to count.

Points:

A Tap Code Cipher should be worth 55-75 points. This works out to about 1.5 points per letter or one point for every 3 or 4 taps generated.

Question Text:

The question should clearly indicate that cipher has been encoded using the Tap Code Cipher. It should not include a hint. Some examples:

Solve this quote from <person> which has been encoded with the Tap Code Cipher.

Rail Fence Cipher

The rail fence cipher encodes messages by 'zig-zagging' the letters along 2 to 6 'rails'. The cipher can be decoded by brute force; trial and error; or procedurally by applying some simple math and understanding spacing of characters in the rail patterns.

Setting Difficulty:

The difficulty gets slightly higher with more rails because it requires more diligence when solving.

Points:

100-150. A rule of thumb: 2 rails = 100; 3 rails = 110; 4 rails = 120; 5 rails = 130; 6 rails = 140 points. 150 points for when only the range of rails is given.

Question Text:

The question should be around 75 characters.

Solve this quote from <person> which has been encoded with the rail fence cipher. The message was encoded with <rail count> rails.

Morbit Cipher Cryptanalysis {3.f.vii}

The Morbit Cipher encodes text by first converting the plain text into morse code with the space between characters represented by an × and the space between words represented by two × characters. The resulting morse code is then broken into pairs of characters (adding an × at the end if necessary). The numbers 1-9 are randomly assigned to the unique pairs of characters (●●, ●–, ●×, –●, ––, –×, ×●, ×–, ××,). The pairs of morse characters are then replaced with the corresponding number to generate the final cipher string. Decoding is done by reversing the process.

Setting Difficulty:

The dificulty is driven by the number of letters and the variety of morse code combined with which letters are chosen. Looking at the generated solving guide can help guide the question writing. Not giving them Hint Digits which exposes the digit corresponding to ×× increases the difficulty.

Points:

150-200.

Question Text:

The question should be around 40 characters.

Solve this quote from <person> which has been encoded using the Morbit Cipher. You are told that the first four letters are SOME.

Pollux Cipher Cryptanalysis {3.f.vii}

The Pollux Cipher encodes text by first converting the plain text into morse code with the space between characters represented by an × and the space between words represented by two × characters. The resulting morse code is then randomly assigned to digits (0-9) chosen to represent the morse characters: (, , ×) The pairs of morse characters are then replaced with the corresponding number to generate the final cipher string. Decoding is done by reversing the process. Note that because one or more digits may represent a morse character piece, the mapping can be somewhat random.

Setting Difficulty:

The dificulty is driven by the number of letters and the variety of morse code combined with which letters are chosen. Looking at the generated solving guide can help guide the question writing. Not giving them Hint Digits which exposes all the digits corresponding to × increases the difficulty. Assigning more characters for one morse digit, thereby reducing another, also increases the difficulty.

Points:

150-200.

Question Text:

The question should be around 40 characters.

Solve this quote from <person> which has been encoded using the Pollux Cipher. You are told that the first four letters are SOME.