2. Algebraic root systems

This construction, which is my own, is an alternative axiomatization of the root systems from Lie theory. This is not completely unparalleled; there is a similar construction by Dyer (in the paper On rigidity of abstract root systems of Coxeter systems), though it is a bit more complicated. These are a combination of standard results about Coxeter groups and standard results about root systems combined in one abstract framework. For more traditional reading try Combinatorics of Coxeter groups by Björner and Brenti or Reflection Groups and Coxeter Groups by Humphreys.

A pre-root system is defined as a set \(R\) with a binary operation \(\ast:R\times R\to R\) satisfying the following axioms:

Axiom 1: \(x\ast(x\ast y)=y\)

Axiom 2: \(x\ast(y\ast z)=(x\ast y)\ast(x\ast z)\)

Note that this operation is is virtually never associative. One might suspect that this leads to pathologies but with the third axiom below quite the opposite is true.

For convenience in notation we make the following definitions.

Definition 1: Let \(R\) be a pre-root system. If \(x\in R\), define a function \(s_x:R\to R\) by \(s_x(y)=x\ast y\) for all \(y\in R\). \(s_x\) is the reflection corresponding to \(x\). Let \(W(R)\) be the group generated by the reflections in \(R\); \(W(R)\) is called the Weyl group of \(R\). Define also \(-x=x\ast x=s_x(x)\). Then we have the following identities (proof left to the reader):
  1. \(s_x^2=1\), the identity element of \(W(R)\), and in particular \(-(-x)=x\).
  2. \(s_x(-y)=-s_x(y)\)
  3. \(s_{-x}(y)=s_x(y)\)
  4. \(s_x(y\ast z)=s_x(y)\ast s_x(z)=s_{s_x(y)}(s_x(z))\)


Definition 2: Let \(B\subseteq R\) be a subset of a pre-root system \(R\) and define \(R^+(B)\) to be the smallest subset of \(R\) satisfying
  1. \(B\subseteq R^+(B)\).
  2. If \(y\in R^+(B)\), \(x\in B\), and \(x\neq y\), then \(x\ast y\in R^+(B)\).

Now we can state the third axiom. A pre-root system \(R\) is called a root system if the following additional axiom holds:

Axiom 3: There exists a subset \(B\subseteq R\) such that for all \(x\in R\) we have \(x\in R^+(B)\) if and only if \(-x\notin R^+(B)\).

A subset \(B\subseteq R\) satisfying Axiom 3 is called a basis of \(R\), the elements of \(B\) are called simple roots, and \(R^+(B)\) is called the set of positive roots. The subset \(-R^+(B)=\{y\in R|-y\in R^+(B)\}\) is called the set of negative roots. The reflections \(s_x\) for \(x\in B\) are called the simple reflections and the set of these will be denoted by \(S(B)\). These definitions are dependent on the choice of basis, so most of the time we will be fixing a basis of \(R\) and considering pairs \((R,B)\) where \(R\) is a root system and \(B\) is a basis. It will therefore be convenient to refer to the pair \((R,B)\) as a root system.

The inevitable question is, "Why should I care about this?" Let's construct an example. Let $V$ be the real vector space spanned by the indeterminates $t_1,t_2,\ldots,t_{n+1}$ and let $R_{A}^n$ be the set of all differences $t_i-t_j$ with $i\neq j$. For positive integers $a,b$ with $a\neq b$ define a permutation $s_{a,b}$, the transposition exchanging $a$ and $b$, by $$s_{a,b}(i)=\left\{\begin{array}{ll}b&\mbox{ if }i=a\\a&\mbox{ if }i=b\\i&\mbox{ otherwise}\end{array}\right.$$ Then we define a product $\ast:R_A^n\times R_A^n\to R_A^n$ by $$(t_a-t_b)\ast (t_c-t_d)=t_{s_{a,b}(c)}-t_{s_{a,b}(d)}$$ I leave it to you to prove that $R_A^n$ satisfies Axioms 1 and 2. To convince you that our notation is consistent, note that $$(t_a-t_b)\ast (t_a-t_b)=t_b-t_a=-(t_a-t_b)$$ where the right-hand side of the equation can be interpreted either in a root-system-theoretic sense or in the vector space sense; the two are equivalent.

For Axiom 3 we need to choose a basis. Define $$B_A^n=\{t_i-t_{i+1}|1\leq i\leq n\}$$ Theorem 2.1: $(R_A^n,B_A^n)$ is a root system.

Proof: I claim that $$R^+(B_A^n)=\{t_i-t_j|i < j\}$$ To see that $R^+(B_A^n)$ must contain all differences $t_i-t_j$ with $i < j$ we use induction on $j-i$, the result being by definition for $j-i=1$. Suppose then that $t_a-t_b$ for all $b-a < k $ are contained in $R^+(B_A^n)$. Then $(t_{b}-t_{b+1})\ast (t_a-t_b)=t_a-t_{b+1}$, so the result follows by induction. To see that $R^+(B_A^n)$ must be exactly equal to the set of these differences, note that if $t_a-t_b$ is such that $a < b$, then if $c\neq a$ or $c+1\neq b$ then $(t_c-t_{c+1})\ast (t_a-t_b)=t_{s_{c,c+1}(a)}-t_{s_{c,c+1}(b)}$ satisfies $s_{c,c+1}(a) < s_{c,c+1}(b)$, so since $R^+(B_A^n)$ is the smallest subset with the property that for all $y\in R^+(B_A^n)$ and $x\in B_A^n$ with $x\neq y$ we have that $x\ast y\in R^+(B_A^n)$ we have the result. $\square$

Now, ask yourself the following question: "What is the Weyl group of $R_A^n$ (that is to say $W(R_A^n)$)?" It is the symmetric group $S_{n+1}$! To see this, extend the action of $t_a-t_b$ to the elements $t_i$, that is $(t_a-t_b)\ast t_i=t_{s_{a,b}(i)}$. Where an element of $W(R_A^n)$ sends any root is completely determined by where it sends all $t_i$, so an element of $W(R_A^n)$ is uniquely determined by the bijection it induces of $\{t_1,\ldots,t_{n+1}\}$ with itself. This gives us a permutation representation of $W(R_A^n)$. This representation contains all of the transpositions, so it must be the entirety of $S_{n+1}$. The theory below will allow us to prove in short order highly nontrivial results about elements of $S_{n+1}$.

Pre-root systems that do not satisfy Axiom 3 are of little interest to us. However, they are far from useless. Structures that satisfy only Axiom 2 are called quandles, and structures satisfying both Axioms 1 and 2 are called involutory quandles. Both have applications in knot theory.


Now we turn to the development of the theory of root systems.

Proposition 2.2: Let \((R,B)\) be a root system. Then
  1. For all \(w\in W(R)\) and \(x\in R\) we have \(ws_xw^{-1}=s_{w(x)}\).
  2. For all \(y\in R\) there exist \(w\in W(R)\) and \(x\in B\) such that \(y=w(x)\).
  3. The set of simple reflections \(S(B)\) is a generating set of \(W(R)\).
Proof of 1: Let \(w\in W(R)\) and \(x\in R\). We prove the result by induction on the minimal number of reflections \(x_1,\ldots,x_k\) such that \(w=s_{x_1}\cdots s_{x_k}\). If \(k=0\) this is clear. If the result holds for \(k-1\), note that \(s_{x_1}w=s_{x_2}\cdots s_{x_k}\). Then $$s_{x_1}ws_x(s_{x_1}w)^{-1}=s_{s_{x_1}w(x)}$$ by the induction hypothesis. Thus $$\begin{align} ws_xw^{-1}(y)=s_{x_1}s_{s_{x_1}w(x)}s_{x_1}(y)&=x_1\ast (s_{x_1}w(x)\ast (x_1\ast y))\\ &=(x_1\ast s_{x_1}w(x))\ast (x_1\ast (x_1\ast y))\\ &=w(x)\ast y=s_{w(x)}(y) \end{align}$$ and the result follows by induction.

Proof of 2: First assume \(y\in R^+(B)\). Set \(B_0=B\), and for \(i>0\) define $$B_i=\{x\ast y|x\in B,x\neq y,y\in B_{i-1}\}$$ We claim that \(B_i\subseteq R^+(B)\) for all \(i\). We prove this by induction, the base case \(B_0\subseteq R^+(B)\) being clear by definition. Suppose \(y\in B_i\), \(i>1\). Then there exists \(x\in B_i\) such that \(x\ast y\in B_{i-1}\) (note we are using the fact that \(x\ast (x\ast y)=y\)). Since by the induction hypothesis \(x\ast y\in R^+(B)\), and \(x\in B\), by definition of \(R^+(B)\) we must have that \(x\ast (x\ast y)=y\in R^+(B)\), so the result follows by induction.

It follows that $$\bigcup_{i=0}^{\infty}{B_i}\subseteq R^+(B)$$ We also have that if \(y\in\bigcup_{i=0}^{\infty}{B_i}\), \(x\in B\), and \(x\neq y\) then \(x\ast y\in \bigcup_{i=0}^{\infty}{B_i}\). By definition, \(R^+(B)\) is the smallest subset satisfying this property, so the opposite inclusion \(R^+(B)\subseteq\bigcup_{i=0}^{\infty}{B_i}\) holds. For each \(y\in B_i\) we have that there exists \(x\in B\) and \(w\in W(R)\) such that \(w(x)=y\); namely, if we set \(y_i=y\in B_i\) and let \(x_i\in B\) be such that \(x_i\ast y_i\in B_{i-1}\), then $$y=s_{x_i}s_{x_{i-1}}\cdots s_{x_1}(x_0)$$ where $x_0\in B$ satisfies $x_1\ast x_0=y_1$, so we may take \(w=s_{x_i}s_{x_{i-1}}\cdots s_{x_1}\) and $x=x_0$, hence the result follows for positive roots. For negative roots \(y\), note that \(-y=s_y(y)\) is positive by Axiom 3, so there exists \(w'\in W(R)\) and \(x\in B\) such that \(w'(x)=s_y(y)\); thus \(s_yw'(x)=y\), so we may take \(w=s_yw'\).

Proof of 3: It suffices to show that each reflection \(s_y\) for \(y\in R^+(B)\) can be written as a product of simple reflections since \(s_{-y}=s_y\). We know from the proof of part 2 that there exist simple roots \(x_1,\ldots,x_i\in B\) and \(x\in B\) such that \(s_{x_1}\cdots s_{x_i}(x)=y\). Set \(s_{x_1}\cdots s_{x_i}=w\). Then \(ws_xw^{-1}=s_{w(x)}=s_y\) by part 1, so the result follows. \(\square\)


Definition 3: Let \((R,B)\) be a root system and let \(w\in W(R)\). A sequence of simple reflections \((s_{x_1},\ldots,s_{x_n})\) is called a word for \(w\) if \(w=s_{x_1}\cdots s_{x_n}\). We know by the previous proposition that since $S(B)$ generates $W(R)$, every element of $W(R)$ has at least one word. If \((s_{x_1},\ldots,s_{x_n})\) is a word for \(w\) such that the length \(n\) is as small as possible, then the word is called a reduced word. Define \(\ell(w)\) to be the length of a reduced word for \(w\). Define also the inversion set \(I(w)\) of \(w\) by $$I(w)=\{y\in R^+(B)|w(y)\notin R^+(B)\}$$ A positive root $y\in R^+(B)$ such that $w(y)\notin R^+(B)$ will correspondingly be called an inversion.


Theorem 2.3: Let \((R,B)\) be a root system and let \(w\in W(R)\). Let $y\in R^+(B)$. Then \(\ell(ws_y) < \ell(w)\) if and only if \(y\in I(w)\). If $y\in I(w)$ and $(s_{x_1},\ldots,s_{x_n})$ is a (possibly unreduced) word for $w$, then there is an index $i$ such that $$(s_{x_1},\ldots,\widehat{s_{x_i}},\ldots,s_{x_n})$$ is a word for $ws_y$.

Proof: Suppose $y\in I(w)$ and let $(s_{x_1},\ldots,s_{x_n})$ be a word for $w$. Let $i$ be the maximal index such that $s_{x_i}\cdots s_{x_n}(y)\notin R^+(B)$. We have that $x_i\ast s_{x_{i+1}}\cdots s_{x_n}(y)\notin R^+(B)$, hence $s_{x_{i+1}}\cdots s_{x_n}(y)=x_i$ because $x_i\in B$. Thus $y=s_{x_n}\cdots s_{x_{i+1}}(x_i)$, hence $s_y=s_{x_n}\cdots s_{x_{i+1}}s_{x_i}s_{x_{i+1}}\cdots s_{x_{n}}$. Thus $(s_{x_1},\ldots,\widehat{s_{x_i}},\ldots,s_{x_n})$ is a word for $ws_y$, where the caret indicates omission. If we assume that the word is reduced, then since $ws_y$ has a word that is shorter than a reduced word for $w$ we must have that $\ell(ws_y) < \ell(w)$.

Now note that if $y\in R^+(B)-I(w)$, then $ws_y(y)=w(-y)=-w(y)\notin R^+(B)$, hence $y\in I(ws_y)$. Thus $\ell(w)=\ell((ws_y)s_y) < \ell(ws_y)$, so the result follows. $\square$


The previous theorem is more important than it looks. It implies that $(W(R),S(B))$ is a Coxeter system and $W(R)$ is a Coxeter group. Coxeter groups have extremely nice properties and there is a huge amount of literature on them. We're assuming no prior knowledge in this blog, so we will be proving the required results about Coxeter group theory as they come up.


Proposition 2.4: Let $(R,B)$ be a root system and let $w\in W(R)$.
  1. $\ell(w) = |I(w)|$; that is, the length of a reduced word for $w$ is the number of inversions of $w$.
  2. If $x\in I(w)\cap B$, then $\ell(ws_x)=\ell(w)-1$.
Proof: We prove part 1 by induction on $|I(w)|$, the result being true for the identity, which has empty inversion set. Let $(s_{x_1},\ldots,s_{x_n})$ be a reduced word for $w$. Then $x_n\in I(w)\cap B$ since $\ell(ws_{x_n}) < \ell(w)$. Let $y\in R^+(B)$. We have that $ws_{x_n}(y)\notin R^+(B)$ if and only if $w(x_n\ast y)\notin R^+(B)$. Thus $I(ws_{x_n})=x\ast (I(w)-\{x_n\})$. In particular, $|I(ws_{x_n})|=\ell(ws_{x_n})=|I(w)|-1$. We can append $s_{x_n}$ to any reduced word for $ws_{x_n}$ to obtain a word for $w$, which must be reduced since $\ell(w)>\ell(ws_{x_n})$. Thus $\ell(w)=\ell(ws_{x_n})+1=|I(ws_{x_n})|+1=|I(w)|$, so both results follow. $\square$


As a final note, I promised you nontrivial results about $S_{n+1}$, so here is one that comes for free. Note that the set of simple reflections in $S_{n+1}$ is the set of all $s_{i,i+1}$, that is to say the adjacent transpositions.

Corollary 2.5: If $w\in S_{n+1}$, then the minimal number of adjacent transpositions required to express $w$ (meaning $\ell(w)$) is exactly the number of pairs $i < j$ such that $w(i) > w(j)$.

Proof: We know from the previous proposition that $\ell(w)$ is equal to the number of inversions of $w$. What is an inversion of an element of $S_{n+1}$? It is a root $t_i-t_j$ with $i < j$ such that $t_{w(i)}-t_{w(j)}$ is a negative root, meaning $w(i) > w(j)$. This is exactly what the statement of the corollary claims. $\square$

1. Introduction

In this blog we will be exploring Schubert calculus without any algebraic geometry or algebraic topology. Our method of investigation is just plain algebra. With just some undergraduate level abstract algebra under your belt, you can go from zero to the very core of the deep combinatorics of Schubert calculus in minutes. Allow me to demonstrate.

For this introduction we will be focusing only on type \(A\), and the main object of study is the symmetric group \(S_n\). \(S_n\) for our purposes will be the group of all bijective functions from the set \([n]=\{1,2,\ldots,n\}\) to itself. There are natural injective homomorphisms \(S_n\hookrightarrow S_{n+1}\) for all \(n\) obtained by considering \([n]\) as a subset of \([n+1]\); the image of a function in this homomorphism \(S_n\hookrightarrow S_{n+1}\) will fix \(n+1\) and do to the rest of the elements what it is doing already. It will be convenient to consider all of these groups to be sitting inside \(S_\infty\), which is the group of bijective functions \(f:\mathbb N\to\mathbb{N}\) such that \(f(i)=i\) for all but finitely many \(i\). Essentially, $$S_\infty=\bigcup_{n=1}^{\infty}{S_n}$$

Let \(F\) be a field of characteristic zero. We turn to the polynomial ring \(F[x_1,x_2,\ldots]\) in countably many indeterminates, which we will for convenience refer to as \(R\). \(S_\infty\) acts on \(R\) by \(F\)-automorphisms (that is, automorphisms that are \(F\)-linear). Specifically, for \(f\in S_\infty\) the action of \(f\) on \(x_i\) is $$f\cdot x_i=x_{f(i)}$$ There is only one way to extend this to an \(F\)-automorphism of \(R\). Essentially \(f\)'s action is the substitution \(x_i\mapsto x_{f(i)}\) in the usual sense. This action, while natural, is not commonly used in the literature because while it works perfectly well as a left action on the polynomials themselves, meaning \(f\cdot (g\cdot p)=(f\circ g)\cdot p\), it does not come from a left action on the variables. If you move the action inside the parentheses of the polynomial \(p(x_1,\ldots)\), viewed as a function, the order of multiplication in \(S_\infty\) is reversed. Despite this, this will be the most convenient action for our purposes.

Now we will need to refer to specific elements of \(S_\infty\), namely the adjacent transpositions \(s_i\) defined by $$s_i(j)=\left\{\begin{array}{cc}i+1&\mbox{ if }j=i\\i&\mbox{ if }j=i+1\\j&\mbox{ otherwise}\end{array}\right.$$ The elements \(s_i\) generate \(S_\infty\) as a Coxeter group, which is very important and will be discussed later, but we will not dwell on it now. Now we are interested in defining the divided difference operators \(\partial^i\) for \(i\in\mathbb{N}\). \(\partial^i\) takes elements of \(R\) to elements of \(R\) and is \(F\)-linear, and is defined abstractly as follows: $$\partial^i=\frac1{x_i-x_{i+1}}(1-s_i)$$ If this definition is too abstract for you, that's no problem. It is sufficient to know that \(\partial^i\) acts as follows: $$\partial^i(p)=\frac{p-s_i\cdot p}{x_i-x_{i+1}}$$ That is, we take the polynomial, switch two of the variables, subtract from the original, and divide by the difference of the two variables we swapped. It is not obvious that this always yields a polynomial, and you may want to convince yourself of that before moving on.

Now we define the operators \(\partial^w\) for all \(w\in S_\infty\) built from the \(\partial^i\). Namely, if \(w\) can be expressed as a specific product of adjacent transpositions \(w=s_{i_1}\cdots s_{i_k}\) with \(k\) as small as possible (so the expression is as short as possible) then we define $$\partial^w=\partial^{i_1}\cdots\partial^{i_k}$$ If we take such a product where the expression is not as short as possible, we instead get \(0\).

It is not difficult to work out a product rule for \(\partial^i\). You may want to try it yourself, but in any case the formula is $$\partial^i(pq)=\partial^i(p)q+p\partial^i(q)+(x_{i+1}-x_{i})\partial^i(p)\partial^i(q)$$ We can iterate this to obtain a product rule for \(\partial^w\) for arbitrary permutations \(w\). We end up with a formula of the form $$\partial^w(pq)=\sum_{u,v\in S_\infty}{c_{u,v}^w\partial^u(p)\partial^v(q)}$$ where the coefficients \(c_{u,v}^w\) are polynomials depending on \(u\), \(v\), and \(w\).

You may be surprised to know that by now I have already made good on my promise of bringing you from zero to Schubert calculus in minutes. The coefficients \(c_{u,v}^w\), the equivariant Littlewood-Richardson coefficients, are one of the central objects of study in Schubert calculus, and finding a formula of a certain type for them is a hopelessly difficult unsolved problem. While irrelevant to our geometry-free investigation, it is interesting to note that the polynomials \(c_{u,v}^w\) are the structure constants in the equivariant cohomology rings of complete flag varieties over the complex numbers (rings we will construct combinatorially later), and whenever \(c_{u,v}^w\) is an integer it counts the number of points in transverse triple intersections of Schubert varieties. It is known via geometric proofs (see Graham, Duke Math. J. Volume 109, Number 3 (2001), 599-614) that \(c_{u,v}^w\) is a polynomial in the differences \(x_{i+1}-x_i\) with nonnegative integer coefficients. This has not yet been paralleled combinatorially except in special cases, but we're working on it.

In the next post we will build our general framework for equivariant Schubert calculus.