Unicode

Program data is stored in the computer’s memory (operational or constant) as a sequence of zeros and ones. At this level, there is no difference between strings, numbers, or booleans; everything in memory looks the same. The difference appears only as a result of interpretation. The program knows that a string is stored inside some variable, so it takes zeros and ones and passes them through the code table, which indicates which number corresponds to which letter. As a result, the programmer sees the line.

At the very beginning there was exactly one encoding - ASCII, based on the English alphabet. In this encoding, one character corresponds to 7 bits, in total, 128 characters are encoded in it. 95 of them are printed, they include uppercase and lowercase letters of the alphabet, numbers and punctuation marks, as well as 33 non-typed characters or so-called control codes. Most of them are not relevant now, but some, for example, the translation of the string \n is still used. For example, the i in lowercase corresponds to the binary number 1101001, which corresponds to the number 105 in decimal notation.

At first everything was fine, but with the proliferation of computers there was a need for other alphabets. Each country solved this problem by creating its own encoding, most of which are compatible with ASCII. That is, the first 128 numbers fully corresponded to ASCII, while the remaining 128 numbers were filled with the local alphabet. 128 + 128 = 256, and this is 2 to 8 degrees. These encodings were single-byte (one byte was required to store one character). Suddenly the gates of hell opened. An attempt to open a file in a different encoding in the editor resulted in the appearance of kryakozyabr: Øèðîêàÿ ýëåêòðèôèêàöèÿ þæíûõ ãóáåðíèé äàñò ìîùíûé òîë÷îê ïîäú¸ìó ñåëüñêîãî õîçÿéñòâà. They arise because the same code in different encodings corresponds to completely different characters, with the exception of the first 128. Therefore, the text using English letters is always read, and the rest is as lucky. The situation was aggravated by the fact that even within the same alphabet, many different encodings were created, for example: Windows-1252, KOI8-R, CP 866, ISO 8859-5.

In programming languages at that time, all functions for working with strings were created on the basis that one character is one byte. At least, this property was common to all encodings.

Different encodings have caused constant problems in the interaction of people and programs. This problem was especially acute with the development of the Internet. This situation could not continue indefinitely, and in the end the Unicode standard (Unicode) was created. At the moment it contains more than 100 thousand characters and includes all existing (and even dead) languages. The Unicode standard is not encoding and says nothing about how the characters should be stored in memory, it only defines the connection between the character and a certain number. The specific way of encoding Unicode is determined by the appropriate encodings, among which are UTF-8, UTF-16 and some others. In these encodings, one byte is no longer enough to store one character; they use more. UTF-8 behaves smarter: for the characters of the English alphabet (and some others) one byte is used, for other alphabets - 2.

After many years of popularizing Unicode, a miracle happened, and now the vast majority of software uses UTF-8. This process was painful and had different effects on programming languages.

Languages divided into two camps. Some have built in support for already existing functions and the transition to UTF-8 had no effect on the programming process. Among them are Java, Ruby, JavaScript.

instructions

Implement the invertCase function, which inverts the case of each character in the passed string. You can use the methods Character.isUpperCase(char ch), Character.toUpperCase(char ch) and Character.toLowerCase(char ch).

String str = "ПрИвЕт!";
invertCase(str); // пРиВеТ!

This task is for training and is not related to the topic of the lesson.


If you got stuck and don't know what to do, you can ask a question in our huge and friendly community
Exercise available only for signed users.

Please sign in with your GitHub account, this is necessary to track the progress of the lessons. If you do not have an account yet, now is the time to create an account on GitHub.