In this repository I’m gonna write down a roadmap with the topics, concepts, techniques, and technologies I intend to master over time. My intention is to create a strong mental model of how things correlate and fill all the gaps in my knowledge.
I aim to master the following topics:
The CPU (Central Processing Unit) fetches instructions from memory, decodes it to get the type and operand, and executes them. That cycle is repeated until the program finishes. Each CPU architecture has a different set of instructions therefore a specific architecture cannot run code compiled targeting a different one. Accessing memory to fetch instructions or data is more expensive than executing it. For that reason, CPU has registers inside it to store important values. CPU has at least two modes: kernel mode and user mode. When running in kernel mode, the CPU can execute any instruction in its instruction set.
CPU clock
Memory is made up of a number of locations and each of them is uniquely identifiable and has the ability to store information. This identifier of each memory location is known as its address. The total number of identifiable memory locations is known as its address space.
My focus is on Unix-like operating systems – more specifically Linux – as they use to be open source and are the fundamental building blocks in the server-side space.
The kernel is the most important part of a preemptive and time-sharing operating system, it is responsible for providing an execution environment for user applications and a layer of service to interface the communication between user applications and hardware.
Kernel mode and User mode
CPU has at least two modes: kernel mode and user mode. The kernel mode is a privileged mode where operating systems can use any instruction of the CPU instruction set architecture (ISA). The user applications run in user mode so when they need to execute a privileged operation, they request the kernel to execute it on their behalf. The mechanism used to accomplish this is the system calls, which are the interface between the user applications and the hardware capabilities exposed by the kernel, such as reading a file from disk. Hardware instructions allow switching from one mode to the other and areas of virtual memory can be marked as part of user space or kernel space.
Preemption
Reentrancy
System Calls
System calls are services offered by the kernel to user applications. Even though it seems like simple function calls, actually they are assembly instructions. They are also called kernel entrypoints.
To trigger a system the call it is used a special CPU instruction:
int
on x86syscall
on modern x86-64svc
on armThe program, typically through a library function (e.g libc), triggers a special CPU instruction (like syscall
), which causes the CPU to switch from user mode to kernel mode. The CPU then begins executing the kernel’s system call handler, which is provided by the operating system. This handler interprets the request made by the program, using parameters — such as the system call number and arguments — that were set up in registers or memory before the instruction was called.
The Unix-like filesystems are organized as a hierarchical tree. In Unix-like OSes, everything is a file.
A file is an unstructured stream of bytes. Another definition could be: a file is an information container structured as a sequence of bytes. It means that the operating system does not impose any structure to the file. The responsability of defining the file structure is of the application that manipulates it.
open()
, read()
, write()
, and close()
can be used for any type of file.The VFS is a kernel software layer that handles all the system calls related to a standard Unix filesystem. It provides a common interface to several kind of filesystems.
An i-node (index node) is a data structure that maintains some information about a file, such as:
File descriptors are (generally small) non-negative integer numbers. A file descriptor is used to refer to all type of open files.
When a process is created it inherits three files descriptors: standard input (STDIN
), standard output (STDOUT
), and standard error (STDERR
).
Files in Unix are protected by assigning to each of them a 9-bit binary protection code. The protection code consists of three 3-bit fields. One for the owner, one for other members of the owner’s group, and one for everyone else. Each field has a bit for read access, a bit for write access, and a bit for execute access. These bits are known as the rwx bits
.
For example, suppose that I run the following command in the terminal and I get this result:
$ ls -l
6940089 -rwxrwxr-x 1 lucas lucas 15776 Mar 3 21:52 a.out
The a.out
file has the following protection code: rwxrwxr-x
(the dash represents the type of file), therefore:
/proc
and /sys
filesystemsA process is an instance of a program in execution. We also can say that a process is the basic unit by which the kernel allocates resources such as CPU time and memory.
Memory Layout of a Process
The memory layout of a process is devided into parts called segments:
IPCs are the mechanism by which processes can communicate with each other. Linux provides the following IPCs:
Sockets
Sockets are a mechanism of IPC that allow data to be exchanged between processes, either on the same host or on different hosts connected by network.
Socket Domains:
AF_UNIX
allows communication between applications on the same host.AF_INET
allows communication between applications running on hosts connected using the protocol ipv4.AF_INET6
allows communication between applications running on hosts connected by the protocol ipv6.Domain | Communication performed | Communication between applications | Address format |
---|---|---|---|
AF_UNIX | within kernel | same host | sockaddr_uni |
AF_INET | ipv4 | hosts connected by ipv4 | sockaddr_in |
AF_INET6 | ipv6 | hosts connected by ipv6 | sockaddr_in6 |
Socket Types
Error Codes
EACCES
EADDRINUSE
EAGAIN
ECONNREFUSED
ECONNRESET
ENOENT
ETIMEDOUT
/etc/passwd
filels
cd
rm
chmod
chown
dpkg -i [package]
which
xargs
- build and execute command lines from the standard inputsource
ulimit
awk
strace
printenv
grep
umask
- set file mode creation masklsop
perf
Network protocols are typically implemented in the kernel space for performance and security reasons.
Application Layer
HTTP - Hypertext Transfer Protocol
DNS - Domain Name System
SSL - Secure Socket Layer and TLS - Transport Layer Security
SSH (Secure Shell)
Transport Layer
Internet Layer
IP (Internet Protocol)
The IP protocol implements two basic functions: addressing and fragmentation.
Private Address Space
The Internet Address Numbers Authority (IANA) has reserved the following three blocks of the IP address space for private internetes:
range start | range end | cidr prefix |
---|---|---|
10.0.0.0 | 10.255.255.255 | 10/8 prefix |
172.16.0.0 | 172.31.255.255 | 172.16/12 prefix |
192.168.0.0 | 192.168.255.255 | 192.168/16 prefix |
etc/hosts
fileLatency components
To learn more about databases I am going to use the Postgres documentation. It’s a very rich documentation. You can go to https://www.postgresql.org/docs/ and select the version you wanna read.
To handle data consistency, Postgres leverages the MVCC model (Multiversion Concurrency Control). In that model, each SQL statement sees a snapshot of the database.
Transactions are a fundamental concept of all database systems. Given a set of operations executed within the context of a transaction, they must be executed atomically. This means that either all operations are completed sucessfully or they are rolled back. The intermediate states are not visible to concurrent transactions. Postgres treats every SQL statement as running within a transaction, so if you don’t use a BEGIN
command, it implicitly wraps it with one. And COMMIT
in case of success.
Isolation Levels:
Phenomena (anomalies):
Also read: https://www.postgresql.org/docs/current/transaction-iso.html
Read about deferred, immediate, or exclusive transactions. Read about lock and the types of locks
Common symbols across languages:
;
- semicolon
,
- colon
{}
- curly braces
[]
- square brackets
()
- brackets or parenthesis
@
- at
.
- dot
/
- slash
-
- dash
Paradigms
Idiomatic Code
Ecosystem
Dependency Management
Actor Model
Concurrency and Parallelism
Distributed Programming
Regular Expressions
Language Server Protocol
Runtime Data Validation
Turing Completeness
Authentication (AuthN) - the process of verifying “who” a user or system is, typically via credentials like passwords, tokens, or biometrics.
Authorization (AuthZ) - the process of determining “what” an authenticated user or system is allowed to do, based on permissions or policies.
OAuth2 (RFC 6749) Authorization Framework
OAuth2 defines four roles:
Ctrl + R
- in the terminal for search history.Ctrl + Alt + -
- in the vs code to go back to previous location.Ctrl + D
- select a text and press the sequence to select the next equal occurrence. You can hold Ctrl
and keep pressing and it will select the next one.Ctrl + shift + L
- works similar to Ctrl + D
, but it select all occurrences at once instead of one by one as pressing.Ctrl + C
selection.epoll
: a mechanism for obtaining notification of file I/O eventsnotify
: a mechanism for monitoring changes in files and directorieskqueue
set -e
#!/usr/bin/env node
)(WIP) LLMs
In Introduction to Computer Systems, it says:
To be perfectly precise, it is not really the case that the computer differenciates the absolute absense of voltage (0) from the absolute presence of voltage (1). Actually the eletronic circuits differenciates the voltages close to zero from voltages far from 0.