JavaScript: Encoding
At the most basic level, a computer works only with zeros and ones, which make up what is called binary code. Each one or zero is called a bit (from binary digit).
Any data in a computer is represented simply as a sequence of bits, for example images, music, and text. The familiar numbers from the decimal system can also be represented in binary form.
- 0 →
0 - 1 →
1 - 2 →
10
How to encode text?
A computer does not "understand" text. To work with letters and other characters, they also need to be turned into numbers. This is done using encodings, that is, tables in which each character corresponds to a specific number.
The simplest way is to number the letters, starting from 1.
a→1b→2- ...and so on up to
z→26
Now we can represent the word hello as a set of numbers.
h e l l o
↓ ↓ ↓ ↓ ↓
8 5 12 12 15And good turns into the following sequence.
g o o d
↓ ↓ ↓ ↓
7 15 15 4The program does not know that this is a word. It simply sees the instruction "display the character with code 8, then the one with code 5, and so on".
ASCII. The first mass encoding
The first computers worked mostly with the English language. For it, in the 1960s the ASCII table was invented, including 128 characters, among them the Latin alphabet, digits, punctuation marks, special characters (@, #, !, \n), and control codes.
This was enough for the first programs, but not for the whole world.
When computers began to be used in other countries, a problem arose. ASCII has no Cyrillic, hieroglyphs, Arabic script, accents, currency symbols, and so on.
Each country or company started making its own encoding based on ASCII.
- Windows came up with Windows-1251 for Russian
- Apple created Mac Roman
- Countries in Eastern Europe, Asia, and the Middle East developed their own variants
All these encodings were incompatible with each other. Code 226 in one encoding could be the letter é, in another a different letter, and in a third some technical character altogether. This led to real chaos.
What encoding problems looked like
If you see this in a text.
ÐÑивеÑ!It means that the program incorrectly determined the encoding of the text. It received a sequence of bytes but read them with the wrong table.
This was the norm in the 1990s and 2000s. One program wrote text in Windows-1251, another read it as ISO-8859-1, and the result was garbage.
Unicode and UTF-8. The end of the mess
To fix everything, in the 1990s work began on creating the universal Unicode table, which contains the characters of all the writing systems of the world, among them the Latin alphabet and Cyrillic, Chinese and Arabic script, mathematical signs, ancient Egyptian writing, and even emoji.
Inside Unicode there are several storage formats. The most widespread of them is UTF-8. It compactly encodes English characters but can expand to fit any others.
Today UTF-8 is the default standard on the internet, in JavaScript, Linux, databases, and code editors.
Why should a programmer know this?
- You will work with text, and encoding errors still happen, especially when reading files, processing data, interacting with APIs and databases.
- JavaScript uses Unicode for strings by default.
- You need to be able to diagnose problems. For example, if you see "gibberish", it is almost certainly an encoding error.
Instructions
The program receives the numeric codes of characters and prints them to the screen — this is convenient when a character is hard to type on the keyboard. Find the characters with codes 126, 94, and 37 in the ASCII table and print each on a separate line using the String.fromCharCode() function.
console.log(String.fromCharCode(...));
console.log(String.fromCharCode(...));
console.log(String.fromCharCode(...));For example, the ? character has code 63:
console.log(String.fromCharCode(63)); // output: ?Tips
If you've reached a deadlock it's time to ask your question in the «Discussions». How ask a question correctly:
- Be sure to attach the test output, without it it's almost impossible to figure out what went wrong, even if you show your code. It's complicated for developers to execute code in their heads, but having a mistake before their eyes most probably will be helpful.
Tests are designed so that they test the solution in different ways and against different data. Often the solution works with one kind of input data but doesn't work with others. Check the «Tests» tab to figure this out, you can find hints at the error output.
It's fine. 🙆 One task in programming can be solved in many different ways. If your code passed all tests, it complies with the task conditions.
In some rare cases, the solution may be adjusted to the tests, but this can be seen immediately.
It's hard to make educational materials that will suit everyone. We do our best but there is always something to improve. If you see a material that is not clear to you, describe the problem in “Discussions”. It will be great if you'll write unclear points in the question form. Usually, we need a few days for corrections.
By the way, you can participate in courses improvement. There is a link below to the lessons course code which you can edit right in your browser.
Создавать обучающие материалы, понятные для всех без исключения, довольно сложно. Мы очень стараемся, но всегда есть что улучшать. Если вы встретили материал, который вам непонятен, опишите проблему в обратной связи нашего сообщества
Your exercise will be checked with these tests:
// @ts-check
import { expect, test, vi } from 'vitest';
test('hello world', async () => {
const consoleLogSpy = vi.spyOn(console, 'log').mockImplementation(() => {});
await import('./index.js');
const firstArg = consoleLogSpy.mock.calls.join('\n');
expect(firstArg).toBe('~\n^\n%');
});Teacher's solution will be available in:
20:00
