Program data is stored in the computer’s memory (operational or constant) as a sequence of zeros and ones. At this level, there is no difference between strings, numbers, or booleans; everything in memory looks the same. The difference appears only as a result of interpretation. The program knows that a string is stored inside some variable, so it takes zeros and ones and passes them through the code table, which indicates which number corresponds to which letter. As a result, the programmer sees the line.
At the very beginning there was exactly one encoding - ASCII, based on the English alphabet. In this encoding, one character corresponds to 7 bits, in total, 128 characters are encoded in it. 95 of them are printed, they include uppercase and lowercase letters of the alphabet, numbers and punctuation marks, as well as 33 non-typed characters or so-called control codes. Most of them are not relevant now, but some, for example, the translation of the string
\n is still used. For example, the
i in lowercase corresponds to the binary number
1101001, which corresponds to the number
105 in decimal notation.
At first everything was fine, but with the proliferation of computers there was a need for other alphabets. Each country solved this problem by creating its own encoding, most of which are compatible with ASCII. That is, the first 128 numbers fully corresponded to ASCII, while the remaining 128 numbers were filled with the local alphabet. 128 + 128 = 256, and this is 2 to 8 degrees. These encodings were single-byte (one byte was required to store one character). Suddenly the gates of hell opened. An attempt to open a file in a different encoding in the editor resulted in the appearance of kryakozyabr: Øèðîêàÿ ýëåêòðèôèêàöèÿ þæíûõ ãóáåðíèé äàñò ìîùíûé òîë÷îê ïîäú¸ìó ñåëüñêîãî õîçÿéñòâà. They arise because the same code in different encodings corresponds to completely different characters, with the exception of the first 128. Therefore, the text using English letters is always read, and the rest is as lucky. The situation was aggravated by the fact that even within the same alphabet, many different encodings were created, for example: Windows-1252, KOI8-R, CP 866, ISO 8859-5.
In programming languages at that time, all functions for working with strings were created on the basis that one character is one byte. At least, this property was common to all encodings.
Different encodings have caused constant problems in the interaction of people and programs. This problem was especially acute with the development of the Internet. This situation could not continue indefinitely, and in the end the Unicode standard (Unicode) was created. At the moment it contains more than 100 thousand characters and includes all existing (and even dead) languages. The Unicode standard is not encoding and says nothing about how the characters should be stored in memory, it only defines the connection between the character and a certain number. The specific way of encoding Unicode is determined by the appropriate encodings, among which are UTF-8, UTF-16 and some others. In these encodings, one byte is no longer enough to store one character; they use more. UTF-8 behaves smarter: for the characters of the English alphabet (and some others) one byte is used, for other alphabets - 2.
After many years of popularizing Unicode, a miracle happened, and now the vast majority of software uses UTF-8. This process was painful and had different effects on programming languages.
invertCase function, which inverts the case of each character in the passed string. You can use the methods
Character.toUpperCase(char ch) and
String str = "ПрИвЕт!"; invertCase(str); // пРиВеТ!
This task is for training and is not related to the topic of the lesson.
Please sign in with your GitHub account, this is necessary to track the progress of the lessons. If you do not have an account yet, now is the time to create an account on GitHub.