At the deepest level, the computer operates exclusively with the numbers 0
and 1
. This is called binary code, the ones and zeros are called bits, which is derived from the term "binary digit".
The numbers that we usually use in the decimal system are encoded using binary numbers:
But does it deal with text? The computer isn't aware of letters, punctuation, or other text characters. All these characters are encoded by numbers too.
We can take the English alphabet and give each letter a number, starting with one:
This is the essence of coding.
Then you can teach the computer to understand this table and translate text into numbers and vice versa:
hello
→ 8
5
12
12
15
7
15
15
4
→ good
These tables that match letters and numbers are called encodings. Besides letters of the alphabet, encodings include punctuation marks and other useful characters. You've probably encountered encodings such as ASCII or UTF-8.
Different encodings contain different numbers of characters. At first, small tables like ASCII were enough for programmers. But it has only Latin letters, a few simple characters like %
and ?
, and special control characters like line feed.
As computers became more widespread, different countries needed their own, broader tables. This included Cyrillic letters, Chinese and Japanese characters, Arabic script, additional mathematical and typographic symbols, and later on emojis.
Today, in most cases, one of the Unicode – utf-8. It includes characters from almost all the written languages found in the world. Thanks to this, a letter formed by a person in China in Chinese can easily be opened and seen in its original form on a computer in Finland (whether they'll understand it or not is another question).
Programmers have to deal with encodings regularly. Unicode support in different programming languages is carried out on a different level. Moreover, encodings must be declared when working with databases and files.
In PHP you can "request" and display any ASCII character. For instance:
<?php
print_r(chr(63));
Character 63 will be printed - a question mark ?
. You can print any character this way.
Use the ASCII code table. In this table, we're interested in the decimal code (dec or decimal), which is used to encode characters.
Using the example above and the table you found, display ~^%
.
(Of course, you could "cheat" the tests and just write print_r('~^%')
, but that would be no fun at all :)
If you've reached a deadlock it's time to ask your question in the «Discussions». How ask a question correctly:
Tests are designed so that they test the solution in different ways and against different data. Often the solution works with one kind of input data but doesn't work with others. Check the «Tests» tab to figure this out, you can find hints at the error output.
It's fine. 🙆 One task in programming can be solved in many different ways. If your code passed all tests, it complies with the task conditions.
In some rare cases, the solution may be adjusted to the tests, but this can be seen immediately.
It's hard to make educational materials that will suit everyone. We do our best but there is always something to improve. If you see a material that is not clear to you, describe the problem in “Discussions”. It will be great if you'll write unclear points in the question form. Usually, we need a few days for corrections.
By the way, you can participate in courses improvement. There is a link below to the lessons course code which you can edit right in your browser.
Your exercise will be checked with these tests:
1<?php // phpcs:ignore PSR1.Files.SideEffects
2
3namespace HexletBasics\Strings\Encoding;
4
5use PHPUnit\Framework\TestCase;
6
7\HexletBasics\Functions\runScript();
8
9class Test extends TestCase
10{
11 public function test()
12 {
13 $expected = "~^%";
14 $this->expectOutputString($expected);
15 require 'index.php';
16 }
17}
18
Teacher's solution will be available in: