Table of ContentsChapter 4
Oil 101

Chapter 4

Chemistry of Oil

Petroleum chemistry explained: paraffins, naphthenes, aromatics, olefins, and the molecular structures that determine fuel quality.

The Building Blocks: Carbon and Hydrogen

Crude oil chemistry is the basis for refinery processes, the reason different crudes and products trade at different values, and how petrochemicals are made. This chapter assumes no chemistry background. A molecule is two or more atoms held together by chemical bonds. Water contains two hydrogen (H) atoms and one oxygen (O) atom, hence the familiar H2O.

Crude oil typically consists of 84 to 87 percent carbon, 11 to 14 percent hydrogen, 0 to 6 percent sulfur, and less than 1 percent nitrogen, oxygen, metals, and salts by weight. Carbon and hydrogen combine to form hydrocarbons, the molecules that make oil so valuable. A barrel of crude can contain thousands of distinct hydrocarbons arranged in many different ways. The reason oil matters economically is simple: hydrocarbons release a very large amount of energy when combined with oxygen during combustion.

Gas chromatograph coupled to a mass spectrometer in a laboratory setting
Figure 4-1: A two-dimensional gas chromatograph (GCxGC) connected to a quadrupole time-of-flight mass detector. Instruments like this separate a crude sample into thousands of individual hydrocarbon peaks, producing the molecular fingerprint that underpins every assay. (Source: Sarka Na kopci / Wikimedia Commons (CC BY-SA 4.0))

Valency and bonds

Each carbon atom has a valency of four: it wants to form four bonds with neighbouring atoms. Each hydrogen atom has a valency of one. When a carbon atom cannot find enough hydrogen partners, two carbons will share a double bond (or, rarely, a triple bond) to satisfy that need for four connections. Single bonds are stable. Double and triple bonds are weaker, more reactive, and the defining feature of the unsaturated molecules discussed below.

Carbon Count, Boiling Point, and Physical State

Refining separates crude into products based on carbon count, because molecules with fewer carbons boil at lower temperatures. At normal atmospheric pressure and room temperature, hydrocarbons with 1 to 4 carbons are typically gases, those with 5 to 24 carbons are liquids, and those with 25 or more carbons are solids. A heavy crude contains a higher fraction of long-carbon molecules; a light crude is skewed toward short ones.

Table 4-1: Crude oil fractions by carbon count

FractionCarbonsStateBoiling rangePrimary uses
Petroleum gases (methane, ethane, propane, butane)1 to 4GasBelow 20 CHeating, power, LPG, petrochemical feedstock
Light ends (naphtha, gasoline)5 to 11Liquid70 to 200 CGasoline, solvents, petrochemicals
Middle distillates (kerosene, gas oil)11 to 18Liquid200 to 300 CJet fuel, diesel, heating oil
Heavy gas oil, lube base18 to 25Liquid300 to 400 CLubricants, feed for cracking
Residual fuel, waxes20 to 35Liquid or solid350 to 500 CBunker fuel, candles, paraffin wax
Bitumen, coke35 and upSolid500 C and upRoad paving, roofing, steelmaking fuel
IUPAC naming primer.Modern chemistry names alkanes by a carbon-count prefix plus the suffix “-ane”: meth- (1), eth- (2), prop- (3), but- (4), pent- (5), hex- (6), hept- (7), oct- (8), non- (9), dec- (10). Swap “-ane” for “-ene” to get the corresponding alkene (one double bond), or “-yne” for an alkyne (triple bond). The oil industry still uses older names like paraffin (alkane), olefin (alkene), diolefin (diene), ethylene (ethene), propylene (propene), and butylene (butene). Both systems appear throughout this book.

The Four Molecular Structures: PONA

Despite thousands of distinct hydrocarbons in crude, each belongs to one of only four structural families. Traders characterise a crude by its PONA ratio: Paraffinic, Olefinic, Naphthenic, Aromatic. A refinery that knows the PONA mix of its feedstock has a good idea of what its product slate will look like before a single molecule touches a distillation tower.

Paraffins (alkanes)

Paraffins follow the general formula CnH2n+2. All bonds are single, so paraffins are saturatedand chemically stable. They come in two shapes: straight-chain (called “normal”, written with an n- prefix) and branched (called isomers, written with an iso- or i- prefix). A normal and iso molecule can share the same chemical formula yet behave very differently, because the arrangement of atoms changes their physical properties.

The canonical example is octane (C8H18). Normal-octane is a straight chain of 8 carbons and has a research octane number of roughly 20, meaning it knocks badly in a spark-ignition engine. Iso-octane (technically 2,2,4-trimethylpentane) is a branched isomer with the same formula but it resists knock so well that it was chosen as the reference point at octane 100. Branched paraffins dominate high-octane gasoline pools for exactly this reason. Very long-chain paraffins form petroleum waxes at room temperature.

Methane molecule 3D model
Figure 4-2: Methane (CH4): the simplest hydrocarbon, one carbon bonded to four hydrogens. The primary component of natural gas. (Source: Wikimedia Commons)
Ethane molecule ball and stick model
Figure 4-3: Ethane (C2H6): two carbons, six hydrogens, all single bonds. A saturated paraffin and a key petrochemical feedstock for ethylene. (Source: Wikimedia Commons)
Propane molecule ball and stick model
Figure 4-4: Propane (C3H8): three carbons in a straight chain. Compresses easily to a liquid; half of LPG. (Source: Wikimedia Commons)
n-butane molecule ball and stick model
Figure 4-5: n-Butane (C4H10): four carbons in a straight chain. Branching the same four carbons produces iso-butane, an isomer with different boiling point and reactivity. (Source: Wikimedia Commons)
Iso-octane molecule ball and stick model
Figure 4-6: Iso-octane (2,2,4-trimethylpentane, C8H18): the branched isomer chosen as the octane number 100 reference. Same formula as n-octane, very different anti-knock behaviour. (Source: Wikimedia Commons)

Naphthenes (cycloalkanes)

Naphthenes are saturated rings of carbon. A single-ring naphthene follows the formula CnH2n. The workhorse example is cyclohexane, a six-carbon ring that is abundant in most crudes and an important precursor to nylon. Because naphthenes are saturated, they behave much like paraffins: chemically stable, good blending components, and easy for refinery units to handle. Paraffinic and naphthenic crudes are often lumped together as “paraffinic” by traders, against the more reactive aromatic crudes.

Cyclohexane molecule ball and stick model
Figure 4-7: Cyclohexane (C6H12): a saturated six-carbon ring. The simplest naphthene and a building block for nylon. (Source: Wikimedia Commons)

Aromatics

Aromatic hydrocarbons are ring structures built around at least one benzene ring: six carbons with three alternating double bonds. Because of those double bonds, aromatics are unsaturated and more reactive than paraffins or naphthenes. That reactivity is exactly what makes them valuable as petrochemical feedstocks; it is also why benzene, a known carcinogen, is tightly capped in gasoline. The key aromatics are benzene, toluene, and the three xylene isomers, grouped together as BTX.

Molecules with two or more fused benzene rings are called Polycyclic Aromatic Hydrocarbons (PAH). Naphthalene, with two fused rings, is the simplest. At the extreme heavy end sit asphaltenes: very large PAH molecules, often with more than 70 carbon atoms, heavy branches, and heteroatoms. Asphaltenes absorb visible light, which is why crude oil is black, and they dominate the residue fraction of heavy sour crudes. They can also clog pipelines in cold weather and produce undesirable shot coke in a coker.

Benzene molecule ball and stick model
Figure 4-8: Benzene (C6H6): six carbons in a ring with three alternating double bonds. The parent structure of every aromatic hydrocarbon. (Source: Wikimedia Commons (public domain))
Naphthalene molecule ball and stick model
Figure 4-9: Naphthalene (C10H8): two fused benzene rings. The simplest polycyclic aromatic hydrocarbon. Asphaltenes, the heaviest fraction of crude, are far larger cousins of this structure. (Source: Wikimedia Commons)

Olefins (alkenes)

Olefins are aliphatic hydrocarbons with at least one carbon-carbon double bond (mono-olefins), two double bonds (diolefins or dienes), or a triple bond (alkynes). They are rarely present in crude oil straight out of the reservoir because they are too reactive to survive geological time. Almost every olefin in the oil system was manufactured at a refinery, typically by thermal or catalytic cracking of paraffins. Ethylene and propylene, the two most important olefins, are the foundation of the global petrochemical and plastics industry.

Ethylene molecule ball and stick model
Figure 4-10: Ethylene (ethene, C2H4): two carbons joined by a double bond. The world's most-produced organic chemical by weight and the starting point for polyethylene. (Source: Wikimedia Commons)
Propylene molecule ball and stick model
Figure 4-11: Propylene (propene, C3H6): three carbons with one double bond. Feedstock for polypropylene and a long list of other polymers. (Source: Wikimedia Commons)

Typical PONA by Crude Grade

PONA mix shifts systematically with crude quality. A light sweet grade like WTI or Bonny Light is paraffin-rich, which is why it yields so much gasoline-range material with relatively little cracking. A medium sour grade like Arab Light sits in the middle of the spectrum. A heavy sour grade like Maya or Arab Heavy is aromatic-rich and resin-laden, with a long tail of asphaltenes and very little natural naphtha. The table below shows indicative PONA-plus-asphaltene ranges by crude class.

Table 4-2: Indicative PONA composition by crude grade (percent by weight of whole crude)

StructureLight sweetMediumHeavy sour
Paraffins45 to 6030 to 4515 to 25
Naphthenes25 to 3525 to 3520 to 30
Aromatics10 to 2020 to 3030 to 45
Resins and asphaltenesBelow 23 to 810 to 20
Olefins (in reservoir)TraceTraceTrace

Ranges vary by field and by how an assay laboratory draws the boundary between heavy aromatics and resins, so treat these as orders of magnitude rather than hard numbers. The shape of the story, however, is robust: paraffin fraction falls and aromatic plus asphaltene fraction rises as you move from light sweet to heavy sour.

Saturated vs Unsaturated

A molecule is saturated when every carbon is bonded to four other atoms through single bonds, and unsaturated when one or more double or triple bonds are present. Paraffins and naphthenes are saturated and therefore stable; aromatics and olefins are unsaturated and therefore reactive. Fuels prize stability (you do not want gasoline polymerising in the tank), while petrochemical feedstocks prize reactivity (that is how you build polymers in the first place). The entire economic logic of a refinery, splitting molecules into a stable fuel pool and a reactive chemical pool, follows from this single distinction.

Heteroatoms: Sulfur, Nitrogen, Oxygen, Metals

Not every atom in crude is carbon or hydrogen. The other elements, collectively called heteroatoms, are usually impurities but they drive a surprising amount of refinery economics.

Sulfur appears as hydrogen sulfide (H2S), as mercaptans (thiols), and bound inside thiophene rings. H2S is acutely toxic and corrodes steel; mercaptans carry the rotten-egg odour that gives sour crude its name and are the reason odorant is added to household natural gas. Thiophenes are the hardest to remove because the sulfur is locked inside an aromatic ring. Nitrogen shows up in pyridines and pyrroles and poisons the catalysts used in cracking and reforming, which is why high-nitrogen crudes carry a discount. Oxygen appears mainly as naphthenic acids (measured by Total Acid Number) and phenols; high-TAN crudes can corrode carbon-steel piping above 220 C. Metals, chiefly vanadium and nickel, sit at the centre of porphyrin rings and end up in the residue. Even trace amounts poison FCC catalysts, so heavy sour crudes with high metals are restricted to refineries with cokers that can reject metals into petroleum coke.

Table 4-3: Typical heteroatom ranges in whole crude

ElementCommon formTypical rangeWhy it matters
SulfurH2S, mercaptans, thiophenes0.05 to 5 percent by weightSOx emissions, corrosion, catalyst poisoning
NitrogenPyridines, pyrroles0.05 to 0.8 percent by weightNOx precursor, poisons cracking catalysts
OxygenNaphthenic acids, phenols0.05 to 1.5 percent by weightCorrosion above 220 C (high-TAN crudes)
VanadiumVanadyl porphyrins1 to 1,200 ppmPoisons FCC catalyst, ends up in coke
NickelNickel porphyrins1 to 150 ppmSame as vanadium, less severe

Combustion Chemistry

The reason hydrocarbons matter is that their oxidation reaction is strongly exothermic. For methane, the simplest case:

CH4 + 2 O2 → CO2 + 2 H2O + heat

For iso-octane, the reference fuel for gasoline, the balanced reaction is:

C8H18 + 12.5 O2 → 8 CO2 + 9 H2O + heat

From that equation you can derive the stoichiometric air-fuel ratio: the mass of air that exactly consumes a unit mass of fuel with nothing left over. For iso-octane that ratio is roughly 14.7 to 1 by weight, the famous number engine control units target in closed-loop operation. Real combustion is messier. If the reaction runs cool, or oxygen is short, some carbon exits as carbon monoxide (CO), unburnt hydrocarbons, or soot. Air is mostly nitrogen, so at the peak flame temperatures inside an engine cylinder some of that nitrogen oxidises to NOx. Any sulfur in the fuel oxidises to SOx. NOx and SOx are the two pollutants that drive most downstream environmental regulation. Chapter 15 (Environmental) picks up their story.

Octane and Cetane: Measuring Fuel Quality

Spark-ignition engines want a fuel that resists auto-ignition (knock). The octane scale was built by defining n-heptane as 0 and iso-octane as 100, then rating any other fuel against a matching blend of the two. Some molecules beat iso-octane outright and score above 100. Compression-ignition (diesel) engines want the opposite: a fuel that auto-ignites quickly under compression, which is measured by the cetane number. Here n-cetane (n-hexadecane) is defined as 100 and alpha-methylnaphthalene as 0.

Table 4-4: Indicative octane and cetane numbers for reference molecules

MoleculeResearch octaneCetane number
n-Heptane0 (reference)56
Iso-octane (2,2,4-trimethylpentane)100 (reference)15
Toluene111Low (not a diesel fuel)
Benzene100Low
Methanol107Very low
Ethanol108Very low
n-Cetane (n-hexadecane)Very low100 (reference)
alpha-MethylnaphthaleneHigh0 (reference)
Typical finished gasoline87 to 95Not applicable
Typical finished dieselNot applicable40 to 55

Figure 4-12: Research Octane Number (RON) of Selected Fuels

The octane scale is defined by two reference fuels: n-heptane (RON 0) and iso-octane (RON 100). US pump ratings use the Anti-Knock Index (AKI), which averages RON and MON and is typically 4 to 6 points below RON. Source: ASTM D2699.

Notice the pattern: aromatics (toluene, benzene) and branched paraffins (iso-octane) score well on octane but poorly on cetane. Straight-chain paraffins (n-heptane, n-cetane) do the opposite. A molecule that is good for a gasoline engine is almost by definition bad for a diesel engine, which is why refineries cannot simply pour one finished stream into both pools.

Cracking and Combining

Basic distillation leaves a refinery at the mercy of whatever carbon-count distribution nature handed it. The real power of modern refining lies in using heat, pressure, and catalysts to rearrange molecules. Thermal cracking heats long paraffins until carbon-carbon bonds snap, producing shorter paraffins plus olefins. In shorthand:

C16H34 → C8H18 + C8H16

A sixteen-carbon wax molecule becomes an octane paraffin plus an octene olefin. Catalytic cracking (the FCC unit) does something similar at lower temperatures using a zeolite catalyst that favours branched, higher-octane products. Alkylation runs the process in reverse: isobutane plus a small olefin combine across an acid catalyst to yield a branched octane-range paraffin (alkylate) that is prized in gasoline blending.

A refinery can lift gasoline yield from roughly 20 percent of a barrel under basic distillation to 55 percent or more by cracking and combining hydrocarbons. This flexibility is what makes a complex refinery so much more valuable than a simple one.

Why Chemistry Drives Pricing

Every concept in this chapter comes back to one point. Crude grades trade at different prices because their molecular mixes yield different product slates. A light sweet crude is paraffin-rich and low in heteroatoms, so a simple refinery can turn it into high-value gasoline and distillate with minimal treatment. A heavy sour crude is aromatic- and asphaltene-rich with meaningful sulfur, nitrogen, and metals, so it needs a coker, a hydrocracker, and extensive hydrotreating before the same products can be made. Those extra units cost capital and hydrogen, and that cost is exactly what shows up in the light-heavy and sweet-sour price differentials discussed in Chapter 17 (Oil Prices). PONA, heteroatoms, and bond chemistry are the foundation for every price spread in the oil complex.

Chapter 7 (Refining) picks up the process units (FCC, hydrocracker, coker, reformer, alkylation) that turn PONA theory into gasoline and diesel. Chapter 10 (Petrochemicals) returns to olefins, aromatics, and BTX as petrochemical feedstocks. Chapter 15 (Environmental) follows NOx, SOx, and CO2 from combustion through to regulation.

The above was updated in 2026. For the full original 2009 chapter, download the 1st edition 2009 PDF.