OPPORTUNITIES TO MINIMIZE HARDWARE AND SOFTWARE COSTS FOR IMPLEMENTING BOOLEAN FUNCTIONS IN STREAM CIPHERS

1) V. N. Karazin Kharkiv National University, Svobody sq., 6, Kharkiv, 61022, Ukraine 2) JSC “Institute of Information Technologies”, Bakulin St., 12, Kharkiv, 61166, Ukraine, kuznetsov@karazin.ua, potav@ua.fm, nlfsr01@gmail.com 3) Security Service of Ukraine, Volodymyrska str., 33, Kyiv-1, 01601, Ukraine, mongol_1979@ukr.net 4) Department of Information Protection Administration of the State Service of Special Communication and Information Protection of Ukraine, Solomianska 13 str., Kyiv, 03680, Ukraine, stelnik_i@i.ua, mdv@dsszzi.gov.ua


INTRODUCTION
The analysis of the following modern stream cypher schemes: SNOW 2.0 [1], Decim [2], KCipher-2 [3], Sosemanuk [4], Grain [5], MICKEY 2.0 [6] and Trivium [7] shows that the main components are the iterative bitrate generators, as well as the function of complication that forms an output unit from several combinations of inner state bits.
Iterative bitrate stream generators are usually formed based on Linear Feedback Shift Register (LFSR). Their main function is to guarantee the uniqueness of the generator's inner state for a rather long period of time while it is working and make sure that good local statistical properties are provided. The registers that form the de Bruijn sequence meet these requirements. However, such sequence can also be generated by Nonlinear Feedback Shift Register (NLFSR). Applying NLFSR as iterative generators allows taking all the advantages of using LFSR, while omitting the main disadvantage of NLFSR -linearity.
Apart from that, when making crypto systems with LFSR new problems regarding the selection of nonlinear register arise. The LFSR theory is well studied [8], NLFSRs are much less researched that LFSR [9]. The first algorithm for building the smallest NLFSR of a given binary sequence was presented by Jansen in 1991 [10,11]. Alternative algorithms were given in [12], as well as in [13], and [14].
The way to form a maximum period LFSRs is clear, their feedback functions correspond to primitive polynomials for 2 F . Generally, it is unclear how to construct all NLFSR with the maximum period. The main method is to find such registers with the corresponding properties. Typically, all nonlinear feedback registers referenced in literature have a rather complex structure and consist of several structural elements. If LFSR is of a simple structure, then it is a small size register.
There are no general methods for designing maximum period NLFSR [8], [15]. The formation of a special class NLFSR with a maximum period was presented by Mykkeltveit and others in [16]. The work of E. Dubrova [15] contains an example of a Galois shift register with the size of 100  L cells that generates a sequence with a maximum period, but this sequence does not have the property of the de Bruijn sequence, i.e., some tuples of L bits appear more than once in sequences.
At the same time, with the size increase in the register used, its constructive complexity increases. If it is possible to form the LFSR forming the de Bruijn sequence of the required size, then its structure will be so complex that its implementation in encryption systems, as an iterative generator, will be unacceptably resource-consuming.
The complexity of the design is directly related to the size and cost of the hardware implementation of the cipher that uses it, and in most cases affects its performance. If the structural complexity of a function is smaller, its circuit implementation is simpler. This is especially true for the so-called lightweight (or little-resources-consuming) cryptography.
On the other hand, if searching initially for a LFSR with the given design features, in particular, ease of implementation, the register found may be vulnerable to certain types of attacks, to neutralize which one would need to introduce additional units into the algorithm and, as a result, increase the overall resource consumption of the entire scheme.
Thus, it is reasonable to ask about the possibility of optimizing the structure of the LFSR between the simplicity of the hardware/software implementation and correspondence with some of the specified properties of the generated sequence. The simplicity of implementation suggests the number of operations necessary and sufficient to calculate the next state of an iterative generator.
The work presents the obtained results that reflect the interrelation of the constructive characteristics of the LFSR (such as the maximum algebraic degree, the number of monomials) and some necessary cryptographic properties of the sequence it forms (autocorrelation and linear complexity).

BOOLEAN FUNCTIONS
Some definitions given below are used further in the paper: The  denotes exclusive disjunction addition (XOR operation). It is known that each Boolean function can be uniquely defined by its algebraic normal form (ANF); the ANF is also known as the Zhegalkin polynomial.
ANF is an expression of a Boolean function: where is the set of all subsets To calculate the ANF of a given function, there are simple algorithms (see, for example, [17]).
The degree of a monomial (a Boolean monomial) Its ANF is as follows: A function is called quadratic, cubic, etc., if its algebraic degree is 2, 3, etc., respectively. It is easy to notice that one can get a modified de Bruijn sequence from the de Bruijn sequence by leaving out one zero, as well as vice versa by adding one.

DE BRUIJN SEQUENCE
Similar sequences can also be constructed for the alphabet of k elements. For example, 0011 and 002212011 are de Bruijn sequences of 2  L order over alphabets {0,1} and {0,1,2} respectively. De Bruijn sequences are used in cryptography due to their proper statistical properties as indistinguishable, in a statistical sense, from truly random sequences.
In 1894, S. Fly Santé Marie [19] and de Bruijn in 1946 [20] proved the existence of such sequences for any natural number L for any alphabet of k elements and showed that the number of different (3) To assess the obtained results, Table 1 shows the quantitative values of the number of different de Bruijn sequences obtained in accordance with the expression (2) where  is the Euler function.
The review [21] gives the most complete insight into the theory of de Bruijn sequences and the history of their use for various problems solving.
Those NLFSR, that implement those Boolean functions forming the modified de Bruijn sequences will be called M-NLFSR. If these functions are linear, then the corresponding registers will be called M-LFSR. In general, M-LFSR is a particular case of M-NLFSR.
The period of the de Bruijn sequence   PB T is determined by the size of the alphabet or, as it is also called, the basis of the de Bruijn sequence k , as well as the digit capacity of the states (the number of memory cells of the iterative generator) -L : And, correspondingly, the period of the modified de Bruijn sequence   T will be defined as:

MAXIMUM ALGEBRAIC ANF DEGREE OF DE BRUIJN SEQUENCES
The maximum algebraic degree of ANF for M-NLFSR with This statement was described by Golomb in the work (Theorem 4 [22]).
The distribution of the number of modified de Bruijn sequences was obtained depending on the non-linearity order of the M-NLFSR forming this sequence or, equivalently, on the maximum algebraic degree of the ANF of the forming polynomial. The resulting distribution for 2  k is shown in Table 2. As it is seen, with an increase in the ANF algebraic degree, the number of polynomials that form the modified de Bruijn sequences and, consequently, the probability that a randomly chosen M-NLFSR has a more algebraic degree will increase exponentially.

THE NUMBER OF ANF M-NLFSR MONOMIALS
Let  be the number of monomials in the recurrence relation defining the feedback for the LFSR. For all M-LFSR,  is an even number. The distribution of the total number of monomials depending on for 6 4   L was published in [22]. The [22] also proved (Theorems 5, 6) that the minimum number of monomials in a polynomial corresponds to 2 (only for M-LFSR), and the maximum is calculated by the relation: . The distribution has a Gaussian nature.
A similar distribution that takes into account   f deg was obtained and is presented in Table 3. Applying the principles of combinatorics, it can be indicated that the number of possible feedbacks for the NLFSR of L cells and with a nonlinearity   ,280  16  --420,692 19,139,124  18  --141,894 15,146,272  20  --23,808  7,057,286  22  --1,742  1,892,112  24  --58  274,994  26  ---20,294  28  ---628  30  ---8 As it can be seen in Table 4, the number of monomials for the studied M-NLFSR lies in the following range:    In addition, the value of the ACF varies significantly depending on the selected M-NLFSR in the formation of the resulting sequence. From a practical point of view, it is clear that the less the sequence correlates with its own shifted copy, the less it is subjected to attacks that use the given vulnerability. Besides, it is important that there is no correlation for any of the shifts.

AUTOCORRELATION FUNCTION
Let's introduce the value of the ACF which will characterize its maximum value (taken in modulo) for all 1     As it can be seen, all the studied sequences formed by non-linear registers are inferior in their characteristics to the ACF sequences formed by linear registers.
The best value for the studied M-NLFSRs corresponds to T AC 5 max  and is achieved in the presence of the maximum order of nonlinearity.
The nature of the distribution is approximately the same for any order of nonlinearity.

LINEAR COMPLEXITY
The linear complexity ( Li ) of a pseudo-random sequence is called the shortest shift register with the help of which this periodic sequence is formed, provided that the first Li values of the sequence are the initial fillings of the register. The evaluation of linear complexity is one of the main parameters of the system. Any sequence that can be generated by an automaton (linear or nonlinear) over a finite field has finite linear complexity. Thus, it is possible to construct an algorithm by which the linear complexity of any sequence is determined, regardless of how it is generated, and knowledge of the structure of the circuit that forms the initial sequence is not necessary.
The most common mean to calculate the linear complexity is the Berlekamp-Massey algorithm, the essence of which is described in detail in [23,24]. Thus, the large linear complexity of the formed sequence is a necessary (but not sufficient) condition for the practical durability of pseudo-random sequence generators.

CONCLUSION
The Boolean functions that form the de Bruijn sequence are efficient as iterative generators in stream ciphers due to their proper local statistical characteristics, the maximum period of the generated sequence, and the simplicity of implementation.
However, the use of nonlinearity in Boolean functions leads to an increase in design characteristics.
Thus, the majority of M-NLFSRs have the number of monomials equal to 1/3 of the maximum value in their structure. For example, for 32  L , most M-NLFSRs will have approximately 9 10 of the items, and each item will contain the product of up to 30 different values, that from the point of view of practical implementation for streaming encryption systems is unacceptable. However, as it is shown in the work, there are M-NLFSRs with a minimum number of feedback coefficients.
All studied sequences formed by non-linear registers are inferior as of ACF characteristics to sequences formed by linear registers.
The best value for the studied M-NLFSR equals Thus, the search for constructively simple NLFSRs forming the de Bruijn sequence, as well as the optimization of the structure to match the necessary cryptographic properties is a complex task that requires further study.