When a message is transmitted across a channel, in bib⁄Shannon’s terminology, it is beneficial (in terms of bandwidth scarcity) to compress the message into the shortest possible code. This is only feasible because natural language features redundancy to clarify meaning and avoid misunderstanding.
Given a fixed context, agreed upon on both sides of the channel, the redundancy can be eliminated. The context implies a distribution of possible messages, from which an optimal encoding can be deduced, as determined by Shannon’s theory of information. Schematically, in a military context, the context in which most information technology originated, the options “launch the missile” and “retreat the troops” can be reduced to a single bit: 0 or 1.
It is critical to note that this is context-dependent. Compression is conditional on who is receiving the message, a process that is reified through deterministic encoders and decoders, e.g. h264 for video transmission.
Nonetheless, redundancy is not completely discarded. The channel along which the encoded message is sent is not necessarily reliable. A noisy channel may accidentally flip a bit in the message sent. In our new formalized encoding, redundancy is re-introduced to attenuate these problems. For example, an additional parity bit may be used to increase the probability of an altered code being detected, avoiding miscommunication.