Data Compression
June 21, 2021
I took a 
course on 
rhetoric as a 
college freshman.  It was taught by a 
rotund humanities professor who had the unenviable task of teaching this subject to a group of 
scientists and 
engineers whose only goal was to complete a humanities requirement.  As you can imagine, not many 
textbooks were available for such a niche topic, and the thin volume we had was quite uninspiring.  I did, however, glean a few important principles from this course, one of which was the need to express an 
idea clearly, but in as few 
words as possible.
 
Rhetoric, the art of persuasion, is most commonly associated with the Greek orator, Demosthenes (Δημοσθένης, 384-322 BC).
Plutarch wrote that Demosthenes had a speech impediment as a boy, and it's thought that it was one in which he mispronounced r as an l.  This is a condition known as de-rhotacism, named after the Greek letter for r, ρ (rho).  This doesn't seem like that much of a problem, and I'm reminded of the way that news anchor, Tom Brokaw, 
pronounces his l.
It's written that Demosthenes cured himself of this speech impediment by shouting above the noise of ocean waves with pebbles in his mouth.
(Portion of Demosthenes on the Seashore, an 1859 painting by Eugène Delacroix (1798–1863), presently at the National Gallery of Ireland, via Wikimedia Commons)
Such brevity in 
speech is desirable, and this idea is applied 
technically in the various ways that 
language is 
coded into more 
compact forms.  Prior to the development of rapid and accurate 
speech-to-text software, 
symbolic writing methods known as 
shorthand were used to allow people to write as rapidly as others would speak.  One of my 
aunts learned shorthand in 
secretarial school.  A 
mechanical means of recording shorthand known as 
stenotyping can be seen in 
courtroom scenes in older 
movies and 
television shows.
 
It's all Greek to me.
An example of shorthand from page 153 the 1916 book, "Gregg Shorthand. A Light-Line Phonography for the Million", by John Robert Gregg.
The idiom, "It's all Greek to me," can be found in Shakespeare's, The Tragedy of Julius Caesar (1599), in which one of the characters, Casca, describes a speech made in Greek by Cicero, a Roman scholar of Greek literature.  Casca, who doesn't understand Greek, says that "... those that understood him smiled at one another and shook their heads; but, for mine own part, it was Greek to me."
(Portion of a Wikimedia Commons image.)
The first principal method of 
electrical signal compression was the 
telegraph Morse code, 
invented by 
portrait painter, 
Samuel F. B. Morse.[1-2]  Morse was 
inspired to invent the telegraph after the 
death of his 
wife while he was absent from his 
New Haven, Connecticut, 
home while 
painting a 
portrait of the 
Marquis de Lafayette in 
Washington, D.C..  Morse, who had received news via a 
horseback messenger that his wife was 
ill, immediately returned home, but his wife died before he arrived.  He reasoned that if he had received the news earlier, he would have been at her 
bedside, and at that point he devoted himself to the problem of 
telecommunication.
Morse was assisted in the invention of the telegraph by 
Alfred Vail,who also assisted in development of the code.  Since 
certain letters are used more often than others, Morse created his code to make the more common letters easier to transmit. Thus "e" (
frequency of occurrence 12.7%) is a single dot, a "t" (9.1%) is a single dash, while the less common "z" (0.074%) is dash-dash-dot-dot.  The 
Federal Communications Commission abandoned its Morse code requirement for 
amateur radio operator licensing in 2005, and the 
United States Coast Guard stopped monitoring Morse Code 
distress calls in 1993.  An interesting Morse code generating program was created by Frans van Dorsselaer in 1998, and I've included his 
C programming language source code here.
 
Binary tree of Morse code letters and numbers.  Dot-heavy letters are to the left, and dash-heavy letters are to the right.  In 1854, The US Supreme Court ruled (O'Reilly v. Morse) against applicability of the Morse Code apart from the telegraph implementation.  This ruling confirmed the idea that an abstract idea is not patentable, only its implementation.  (Created by the author using Inkscape, and also found at Wikimedia Commons.  Click for larger image.)
Another example of useful data compression is the 
barcode, which is an 
automated method for retrieving 
data from a 
lookup table.  The barcode was 
invented by 
Norman J. Woodland and 
Bernard Silver and 
patented in 1952.[3]  The available 
electronics in 1952 were primitive, and this invention was demonstrated using 
vacuum tubes and a non-
laser light source.  The patent was eventually sold to 
Philco for just $15,000.  Barcodes became useful only after the availability of inexpensive 
computers and laser light sources.  Advances in 
machine vision have made the similar 
QR code possible.
Early 
mainframe computers had limited 
data storage, so data compression became an important topic.  An early method of 
lossless data compression was 
Huffman coding, published in 1952 by 
computer scientist, 
David A. Huffman (1925-1999), when he was a 
doctoral student at 
MIT.[4]  It works much like Morse code by creating a table of symbols for data items that's generated based on their frequency of occurrence.  Instead of the dots and dashes of Morse code, 
binary ones and zeros are used.
At the advent of 
ubiquitous computing and the 
Internet, text was easy to store and send, but 
images were a problem.  The saying that 
a picture is worth a thousand words was replaced by the need to represent even simple images by tens of thousands of 
word
words of computer data.  The solution was data compression using the 
Graphics Interchange Format (GIF).  GIF was based on 
Lempel–Ziv–Welch (LZW) data compression, pioneered by 
Jacob Ziv (b. 1931) and 
Abraham Lempel (b. 1936) in 1977-78, and improved by 
Terry Welch (1939-1988) in 1983.[5-6]  This 
algorithm was deemed significant enough to achieve 
IEEE Milestone status in 2004, an achievement reserved for such important things as the 
invention of the transistor.
The difference between Huffman coding and LZW is that LZW looks for the frequency of occurrence of repetitive 
sequences of data bytes, rather than individual data bytes, and uses these as table entries.  For this reason it is especially effective in image encoding in which such repetition is common, such as for a uniform white background.  The LZW patent (US Patent no. 4,558,302) was filed on June 20, 1983, and it was issued on December 10, 1965.[6]  The importance of this algorithm was such that the patent was 
licensed to more than a hundred companies, the most important of which was the license to 
CompuServe in connection with 
GIF image encoding in 1993.  
Unisys, the patent owner, did not require 
licensing, or payment of fees, for non-commercial GIF usage, including GIFs on the Internet.
 
Some computer nostalgia.  A snippet of FORTRAN source code, fig. 6 of US Patent No. 4,558,302, "High speed data compression and decompression apparatus and method," by Terry A. Welch, December 10, 1985.[6]  (Via Google Patents.)
The 
Institute of Electrical and Electronics Engineers (IEEE) recently announced that Jacob Ziv, the sole member of the LZW team who was trained as an 
electrical engineer, has been awarded the 
2021 IEEE Medal of Honor.[7]  This award, first given in 1917, has been given to such luminaries as 
Edwin Armstrong (1890-1954), the inventor of 
Frequency modulation (1917), and 
Claude Elwood Shannon (1916-2001), the father of 
information theory," (1966).
 
Jacob Ziv in 2015.
Ziv experimented with electronic circuits as a boy, and his education was in electrical engineering.  He was awarded the BSc (1954), Dip-Eng (1955), and MSc (1957), each in electrical engineering from Technion (Israel Institute of Technolog).  His 1962 Sc.D from MIT was for research in message transmission in a noisy channel.[7]
Ziv is presently Technion Distinguished Professor Emeritus in its Faculty of Electrical Engineering,[7]
(Portion of a Wikimedia Commons photograph by Mark Neyman, image courtesy of the Spokesperson unit of the President of Israel.)
As with other awards, the IEEE Medal of Honor isn't given for just a single achievement, but for a life's work in a particular field.  Beyond LZW, Ziv made other compression innovations.  The 
citation for the IEEE medal is for his "... fundamental contributions to information theory and data compression technology, and for distinguished research leadership."[7]  He is also the recipient of another IEEE awards, the 
Claude E. Shannon Award of the 
IEEE Information Theory Society.[7]  Ziv, who is an IEEE 
fellow, has 
published about 100 
peer-reviewed papers, and he is a member of the 
U.S. National Academy of Engineering, and the 
U.S. National Academy of Sciences.[7]
One complaint that Ziv made of his electrical engineering education was its primary focus on 
power systems and not electronics.  As a consequence, he and his 
peers who were working to convert 
telemetry systems from 
vacuum tubes to 
transistors needed to teach themselves electronics.[7]  He also had a complaint about computing early in his career.  He says that he hated these punch card systems, and that's why he didn’t go into 
computer science.[7]  That's also the reason why I never took a computer programming course during my own college days.  I can still envision all those poor students 
trudging around 
campus with huge boxes of punch cards under their arms.
Ziv met Abraham Lempel as a fellow faculty member at Technion.  Ziv says he and Lempel were the 
perfect match to tackle lossless data compression because of their complementary knowledge.[7]  Ziv and Lempel were on simultaneous 
sabbaticals in the US, Ziv at 
Bell Labs, and Lempel at 
Sperry Rand, when their compression work was completed[7]  Bell decided not to patent their algorithm, since 
software patents were not possible at the time, but Sperry decided to patent a 
hardware implementation.[7]  Sperry subsequently patented Terry Welch's LZW algorithm.[6]
References:
-   Samuel F. B. Morse, "Improvement in the Mode Of Communicating Information by Signals by by Application of Electromagnetism," US Patent No. 1,647, June 20, 1840.
 -   Samuel F. B. Morse, "Improvement in Electro-Magnetic Telegraphs," US Reissue Patent No. RE117, June 13, 1848.
 -   Norman J. Woodland and Bernard Silver, "Classifying Apparatus And Method," US Patent No. 2,612,994, October 7, 1952.
 -   D. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," Proceedings of the IRE, vol. 40, no. 9 (September 9, 1952), pp. 1098-1101, doi:10.1109/JRPROC.1952.273898.
 -   J. Ziv and A. Lempel, "A universal algorithm for sequential data compression," IEEE Transactions on Information Theory, vol. 23, no. 3 (May 1977), pp. 337-343, doi: 10.1109/TIT.1977.1055714.
 -   Terry A. Welch, "High speed data compression and decompression apparatus and method," US Patent No. 4,558,302, December 10, 1985.
 -   Tekla S. Perry, "From WinZips to Cat GIFs, Jacob Ziv's Algorithms Have Powered Decades of Compression," IEEE Spectrum, April 21, 2021.