Back to TOC

Bei's Study Notes


Machine Learning

Machine Learning Study Note - Introduction
Last updated: 2017-10-10 15:39:14 PDT.

Types of Machine Learning Problems (wiki)

ML tasks can be classified into three broad categories based on the nature of learning signal or feedback avialable to a learning system:

  1. Supervised learning
  2. Unsupervised learning
  3. Reinforcement learning

Between supervised and unsupervised learning is semi-supervised learning.

Depending on the output:

  1. Classification
  2. Regression
  3. Clustering
  4. Ranking
  5. Recommendation


  1. Density estimation
  2. Dimensionality reduction


  1. Paramatric methods - Methods that make assumptions that the data comes from a distribution and it intends to estimate these parameters.
  2. Nonparamatric methods - Methods that does not assume certain form of distributions. (Or rather, a free-form of distributions. "nonparametric really means many parametric.").

Machine Learning Approaches

  1. Linear Model
  2. Kernel machines
  3. Decision tree
  4. Bayesian Estimation
  5. Hidden Markov Models
  6. Ensemble learning
    1. Random Forest
    2. Gradient Boosting
  7. Graphical models (Bayesian networks)
  8. Clustering
  9. Association rules learning
  10. Artificial neural network
  11. Reinforcement learning
  12. Generalized Additive Model
  13. Inductive logic programming
  14. Feature learning
    1. Sparse dictionary learning
  15. Genetic algorithms
  16. Rule-based machine learning
  17. Learning classifier systems (LCS)

Model Selection

Hypotheses set \mathcal H . Bias - underfitting. Variance - overfitting. Generalization. Triple trade-of.

Cross validation. Test set. Analogy with exams.

General Relativity

General Relativity 02 - Manifold and tensor fields
Last updated: 2017-04-19 21:55:03 PDT.

< prev next >

(This is from Charpter 2.2 of Wald)


In GR, we discuss about spaces that are not exactly flat. If we can split the space into parts and each part is continously corresponsive to a region of \mathbb R^n , then it is called a manifold. n is called toe dimension of the manifold.

Normally, a finite dimensional manifold can be embeded in a higher dimensional Euclidean space. Whereas such embedding might not be natural. In GR, the spacetime does not naturally live in a higher dimensional space, so an abstract definition of a manifold is necessary.

Definition (Manifold): An n-dimensional, C^\infty , real manifold M is a set together with a collection of subsets \{O_\alpha\} satisfying the following properties:

  1. \{O_\alpha\} is a open cover of M .
  2. For each \alpha , there is a one-to-one, onto, map \psi_\alpha: O_\alpha \to U_\alpha where U_\alpha is an open subset of \mathbb R^n .
  3. For each O_\alpha and O_\beta that O_\alpha \cap O_\beta \neq \emptyset , the map \psi_\beta \circ \psi_\alpha^{-1} maps \psi_\alpha[O_\alpha \cap O_\beta] \subset U_\alpha to \psi_\beta[O_\alpha \cap O_\beta] \subset U_\beta . Then both sets must be open and the map must be C^\infty .

Each map \psi_\alpha is called a "chart" or a "coordinate system". The definition of C^k manifold and complex manifold simply is the same with some natural changes.

Through out this book, the manifold involed are all assumed to be Hausdorff and paracomact.

NOTE By this definition, the charts, by no means, are required to provide a "straight coordinate" to \mathbb R^n , by which I mean the tangent vector can change through out each O_\alpha .

Example Euclidean space \mathbb R^n with one chart and map to be identity function.

Example 2-sphere (a 2-dimensional spherical surface embedded in \mathbb R^3 ).

In SR, each coordinate system is applicable to the whole universe. but in GR, a coordinate system is only usable inside its corresponding open set.

With the mapping to \mathbb R^n , we can easily define differentiability and smoothness of maps between manifolds.

Defintion (Diffeomorphism): Let M and M' to be manifolds and \{\psi_\alpha\} and \{\psi_\beta\} to be chart maps. A map f:M\to M' is said to be C^\infty if for each \psi'_\beta \circ f \circ \psi_\alpha^{-1} is C^\infty in Euclidean spaces. If f is C^\infty , one-to-one and onto, then it is called a diffeomorphism, and M and M' are said to be diffeomorphic.

NOTE diffeomorphism requires the manifolds to have the same dimension.

Tangent vector in a manifold (without embedding in \mathbb R^n ):

We can define tangent vectors as directional derivatives. In \mathbb R^n , the mapping between vectors and directional derivatives is one-to-one. (v^1,...,v^n) defines derivative operator \sum_\mu v^n(\partial/\partial x^\mu) . Directional derivatives are characterized by Leibnitz rules.

Let \mathcal F denote the collections of all C^\infty functions from manifold M into \mathbb R . We define a tangent vector v at point p \in M to be a map v: \mathcal F \to \mathbb R which is (1) linear and (2) obeys Leibnitz rules: \forall f,g \in \mathcal F, a, b \in R,

  1. v(af+bg)=av(f) + bv(g)
  2. v(fg) = f(p)v(g) + g(p)v(f)

NOTE: Be very careful that the second rule also applies the function f and g at p .

Prop: h\in\mathcal F,\ \forall q \in M, h(q) = c \text{ (constant) } \implies v(h) = 0


\begin{align*} p \mapsto h(p)h(p) &= p \mapsto ch(p) \implies hh = ch\\ v(hh) &= 2h(p)v(h) = 2cv(h)\quad,\, \text{whereas}\\ v(ch) &= cv(h)\quad,\text{therefore} \\ cv(h) &= 0 \\ \end{align*}

If c = 0 , v(h) = 0 since linearlity, otherwise divide the equation by c , and v(h) = 0 . \square

The maps of a tangent vectors of a point p forms a vector space by adding this addition law: \forall a \in \mathbb R, v_1 + av_2 \equiv h \mapsto v_1(h) + av_2(h) .


Theorem 2.2.1: Let M be an n -dimensional manifold. Let p\in M and let V_p denote the tangent space at p , then \mathrm {dim}(V_p) = n .

Proof. Given a chart \psi of open set O where p \in O , If f \in \mathcal F , then f\circ \psi^{-1} \to \mathbb R is C^\infty . For \mu = 1, ..., n define functional X_{p,\mu}: \mathcal F \to \mathbb R by

\begin{align*} X_{p,\mu}[f] &= \frac{\partial}{\partial x^\mu}\left[f\circ\psi^{-1}\right](\psi(p)) \\ \end{align*}

This means, the \mu th component of X is a functional that takes the \mu th partial derivative of function f\circ\psi^{-1} in \mathbb R^n , and apply the point \psi(p) to it. It is clear that X_{p,\mu} is a derivative from the chain rule. Now we need to prove that V_p = \mathrm{span}\left\{X_{p,\mu}\right\} .

For any function F: \mathbb R^n \to \mathbb R , if F is C^\infty ,

\forall a\in\mathbb R^n, \exists H_{a,\mu}(x\in \mathbb R^n) \in C^\infty, s.t.\\ F(x) = F(a) + \sum^n_{\mu=1}(x^\mu-a^\mu)H_{a,\mu}(x)\quad,

especially, we have

\begin{align*} H_{a,\mu}(a) &= \lim_{x\to a} H_{a,\mu}(x)\\ &= \frac{\partial}{\partial x^\mu}[F](a)\quad. \end{align*}

Let F= f\circ \psi^{-1} , and a=\psi(p) we have

\begin{align*} f(q) &= f(p) + \sum^n_{\mu=1}(x^\mu \circ \psi(q)-x^\mu \circ \psi(p)) H_{\psi(p),\mu}(\psi(q))\quad.\\ H_{\psi(p),\mu}(\psi(p)) &= \frac{\partial}{\partial x^\mu}[F](\psi(p)) \\ &= \frac{\partial}{\partial x^\mu}[f \circ \psi^{-1}](\psi(p)) \\ &= X_{p,\mu}[f]\quad, \end{align*}

where x^\mu \circ \psi denotes (x\mapsto x^\mu) \circ \psi , that is, picking the \mu th element of the result.

For any v \in V_p , apply the functional,

\begin{align*} v[f]&=\left.\cancel{v[f(p)]} + \sum^n_{\mu=1}v[x^\mu \circ \psi-\cancel{x^\mu \circ \psi(p)}]\cdot (H_{\psi(p),\mu}\circ\psi)(p) \\ + \sum^n_{\mu=1}\cancel{\left(x^\mu \circ \psi(p)-x^\mu \circ \psi(p)\right)} v[H_{\psi(p),\mu}\circ\psi]\right.\quad, \\ &= \sum^n_{\mu=1}v[x^\mu \circ \psi]X_{p,\mu}[f]\quad. \end{align*}

This means v[f] is a linear combination of \{X_{p,\mu}[f]\} . \square

The basis \{X_\mu\} is called a coordinate basis, frequently denoted as simply \partial/\partial x^\mu . For each different chart \psi' chosen, there is a different coordinate basis \{X'_\nu\} , and

\begin{align*} X'_{p,\nu}[f] &= \sum^n_{\nu=1} X'_{p,\nu}[x^\mu \circ \psi] X_{p,\mu}[f] \\ &= \sum^n_{\nu=1} \frac{\partial}{\partial x'^\nu}[x^\mu \circ \psi\circ\psi'^{-1}](\psi'(p)) X_{p,\mu}[f] \\ &= \sum^n_{\nu=1} \frac{\partial x^\mu}{\partial x'^\nu}(\psi'(p)) X_{p,\mu}[f]\quad, \\ X_{p,\mu} &= \sum^n_{\nu=1} \frac{\partial x'^\nu}{\partial x^\mu}(\psi(p)) X'_{p,\nu}\quad. \\ \end{align*}

We can also get the vector transformation law from it:

v'^\nu=\sum^n_{\mu=1}v^\mu\frac{\partial x'^\nu}{\partial x^\mu}\quad.

A smooth curve, C on a manifold M is a C^\infty map of \mathbb R into M , C: \mathbb R \to M . At each point p \in M \cap C , there is a tangent vector T \in V_p associated with C as follows.

\begin{align*} T[f]&=\frac{d}{dt}[f\circ C] \\ &=\sum_{\mu=1}^n \frac{d x^\mu}{dt}\frac{\partial}{\partial x^\mu}[f\circ \psi^{-1}] \\ &=\sum_{\mu=1}^n\frac {dx^\mu}{dt} X_\mu(f)\quad.\\ \end{align*}

Therefore the components of T is given by

T^\mu = \frac{dx^\mu}{dt}\quad.

If p and q are on the manifold, there is no way to correlate them in a general manifold. Another construct ("connection", or "parallel transportation") must be introduced to do so. However, if the curvation is nonzero, the identification of V_p with V_q obtained in this manner will depend on the choice of curve.

A tangent field, v , on a manifold M is an assignment of a tangent vector, v\vert_p \in V_p for each point p \in M . Despite the fact that the tangent spaces V_p and V_q at different points are different vector spaces, there is a natural notion of what it means for v to vary smoothy from point to point.

A one-parameter group of diffeomorphisms \phi_t is a C^\infty map from \mathbb R \times M \to M . In particular:

  1. \forall \in\mathbb R, \phi_t is a diffeomorphism, and
  2. \forall s, t \in \mathbb R, \phi_{s+t} = \phi_s \circ \phi_t .

At each point p , \phi_t(p) is a curve, called the orbit of \phi_t which passes through p at t=0 . Define v|_p to be the tangent vector at t=0 . Thus we can consider the vector field v to be the generator of such a group.

Conversely, we can ask a question, that if given a vector field v , can we find a family of curves s.t. for each point p \in M , there is one and only one curve that passes through the point with the tangent vector equals to v\vert_p . The answer is yes.

Therefore we have a one-to-one mapping between a tangent field and a diffeomorphism. We can thus consider a tangent field to be a mapping of type M \to M :

v|_p = \left.\frac{\mathrm d\phi_t(p)}{\mathrm dt}\right|_{t=0}

Given two smooth vector fields v and w , we can define the commutator field [v, w] as follows:

[v,w](f) = v(w(f)) - w(v(f))

next >

General Relativity 01
Last updated: 2017-04-19 21:55:03 PDT.

next >

Topological space (definition dump)

Definition (Topological space, open set): A topological space (X,\mathcal J) is a set X with a collection \mathcal J of subsets of X that satisfies:

  1. X, \emptyset \in \mathcal J ;
  2. If O_1, O_2, ... \subseteq \mathcal J , then \bigcup_\alpha\,O_\alpha \in \mathcal J ;
  3. If n \in \mathbb Z^+, O_1, ... O_n \in \mathcal J , then

    \bigcap_{i=1}^n O_i \in \mathcal J\,.

Sets in \mathcal J are called "open set"s.

An example of a topology on \mathbb R contains all the open intervals in \mathbb R . Thus the name "open set"s.

Definition (Induced topology): If (X,\mathcal J) is a topological space and A is a subset of X , we may make A into a topological space by defining the topology \mathcal F = \{U\mid U=A\cap O, O\in\mathcal J\} , then (A, \mathcal F) forms a topology space. \mathcal F is called induced (or relative) topology.

Definition (Product topology): If (X_1,\mathcal J_1) and (X_2,\mathcal J_2) are both topological spaces, the direct prodct of both naturally forms a topological space (X_1\times X_2, \mathcal J) . \mathcal J is called the product topology.

NOTE This lifts the dimension of the topological space.

Open balls on \mathbb R^n naturally form a topology.

Definition (Continuous mapping): If (X,\mathcal J) and (Y,\mathcal K) are topological spaces, a map f:X\to Y is continuous if the inverse image f^{-1}[O] \equiv \{x\in X \mid f(x) \in O\} maps every open set in Y to an open set in X .

Definition (Homeomorphism): If f is continuous, one-to-one, onto, and its inverse is continues, then f is called a homeomorphism, and the spaces are said to be "homeomorphic".

NOTE Not to be confused with homomorphism and homomorphic.

Definition (Closed set): The complement of an open set is called a "closed set". Sets in a topology can be open, close, both, or neither.

Definition (Connected): The topology is said to be conneted if the only subsets that are both open and closed are X and \emptyset . \mathbb R^n is connected.

Definition (Closure): If (X, \mathcal J) is a topological space, \forall A \subseteq X , the closure \overline A is the intersection of all open sets that contains A .


  1. \overline A is closed;
  2. A \subseteq \overline A ;
  3. \overline A = A \iff A is closed.

NOTE Meaning "to make a set closed". Closure of a set is unique and is necessarily in the topology.

Definition (Interior, Boundary): Interior of A is defined as the union of all the open sets contained in A . The boundy of A , denoted \dot A (or \partial A ), is defined as elements in \overline A but not the interior of A , \equiv \mathrm{int}(A) .

NOTE alternatively, \partial A \equiv \overline A \cap \overline {X \setminus A} .
NOTE alternatively, \mathrm{int}(A) = X \setminus \overline {X \setminus A} .
NOTE alternatively, \partial A \equiv \{ p \mid p \in X. O \in \mathcal J. p \in O \to (\exists \, a, b \in O. a \in A \wedge b \notin A) \} .

Definition (Hausdorff): A topological space is Hausdorff if any two distinct points can be included in two disjoint open sets.

\mathbb R^n is Hausdorff.


One of the most powerful notions in topology is that of compactness, which is defined as follows.

Definition (Open cover): If (X, \mathcal J) is a topological space and a collection of open sets C=\{O_\alpha\} has \bigcup_\alpha\,O_\alpha = X , then C is said to be an open cover of X , and C "covers" X . Also if Y is a subset of X , and Y \subseteq \bigcup_\alpha\,O_\alpha , then C is said to be an open cover of Y and C "covers" Y . A subcollection of C forms a subcover if it also covers X (or Y ).

Definition (Compact space): If every open cover can be written as finite subcover, then the topological space is compact.

Alternative definitions of compact space. The following are equivalent:

  1. A topological space X is compact.
  2. Every open cover a X has a finite subcover.
  3. X has a sub-base such that every cover of the space by members of the sub-base has a finite subcover (Alexander's sub-base theorem).
  4. Any collection of closed subsets of X with the finite intersection property has nonempty intersection.
  5. Every net on X has a convergent subnet (see the article on nets for a proof).
  6. Every filter on X has a convergent refinement.
  7. Every ultrafilter on X converges to at least one point.
  8. Every infinite subset of X has a complete accumulation point.

Definition (Open cover of a set, subcover of a set): If (X, \mathcal J) is a topological space and A is a subset of X . A open cover U is a open cover of A if A \subset U . A subcover that also covers A is called a subcover of A .

Definition (Compact subset): A is said to be compact if every open cover of A has a finite subcover.

The relation ship between compact space and compact subset is given by these two theorems:

  1. Compact subset of a Hausdorff space is closed.
  2. Closed subset of a compact space is compect.

Heine-Borel Theorem. A closed interval [a, b] of \mathbb R is compact.

Open interval (0, 1) is not compact (since the open cover O_\alpha = (1/\alpha, 1) has no finite subcover).

A subset of \mathbb R is compact iff it is closed and bounded.

NOTE A unbounded set can totally be closed. For example, \mathbb Z is obviously unbounded and closed.

Let (X, \mathcal J) and (Y, \mathcal K) be topological spaces. Suppose (X, \mathcal J) is compact and f: X \to Y is continuous. Then f[X] \equiv \{y\in Y \mid y = f(x)\} is compact.

NOTE This transfers compactness through homeomorphisms.

A continuous function from a compact topological space into \mathbb R is bounded and attains its maximum and minimum values.

Tychonoff theorem: Product of compact topological spaces is compact. Given the axiom of choice, the number of such spaces can be infinite.

An application of these is that S^n is compact, because 1) the sphere in \mathbb R^{n+1} is closed and bounded, therefore compact; 2) there is a continuous function from \mathbb R^{n+1} to S^n .

Convergence of sequences

To extend the normal definition of sequence convergence, a sequence \{x_n\} of points in a topological space (X, \mathcal J) is said to converge to point x if \forall O \in \mathcal J .( x \in O \to \exists N \in \mathbb Z. \forall n > N. x_n \in O) . x is called the limit of the sequence.

A point y \in X is said to be a accumulation point of \{x_n\} if every open neiborhood of y contains infinitely many points of the sequence.

NOTE The difference between a limit and an accumulation point is that the former requires a particular set of infinite points in \{x_n\} . For example, the alternating sequence 1, -1, 1, -1, ... has two accumulation points 1 and -1 , but it does not have a limit.

Definition (First countable): For every point p in X , if there is a countable collection of open sets \{O_\alpha\} that for every neiborhood O of p , O contains at least one element in \{O_\alpha\} .

Definition (Second countable): There is a countable collection of open sets that every open set can be written as the union of some of the sets in the collections. The sets in that collection are called basis.

NOTE The basis of a linear space is a collection of vectors, s.t. every vector in the space is a linear combination of the basis. The basis of a topological space is a collection of open sets, s.t. every opens set in the space is a union of the basis.

NOTE \mathbb R^n is second countable. Open balls with rational radii centered on rational coordinates can form a countable collection of open sets.
NOTE Every second countable space is first countable.

The relationship between compactness and convergence of sequences is expressed by Bolzano-Weierstrass theorem:

Bolzano-Weierstrass theorem Let (X, \mathcal J) be a topological space and let A \subset X

  1. If A is compact, then every infinite sequence \{x_n\} of points in A has a accumulation point lying in A ;
  2. Conversely, if (X, \mathcal J) is second countable and every sequence in A has an accumulation point in A , then A is compact.

Thus, in particular, if (X, \mathcal J) is second countable, A is compact iff every sequence in A has a convergent subsequence whose limit lies in A .


Definition (Neighborhood) Given p in topological space (X, \mathcal J) , a neighborhood V of p is a subset of X that includes an open set U containing p :

V \subseteq X, \exists U \in \mathcal J, U \subseteq V, p \in U

NOTE: V may not be open, but it contains a open set U that contains p .

Definition (Refinement of an open cover): Open cover \{V_\beta\} of X is said to be a refinement of open cover \{O_\alpha\} of X if \forall V_\beta. \exists O_\alpha. V_\beta \subseteq O_\alpha .

NOTE Refinements forms a partially ordered set.
NOTE Subcover is always a refinement of a open cover. A refinement of a open cover is not always a subcover.

Definition (Locally finite): \{V_\beta\} is locally finite if each x \in X has an open neighborhood W such that only finitely many V_\beta satisfy W \cap V_\beta \neq \emptyset .

NOTE Compactness requires a finite subcover, locally finiteness only requires a finte refinement. It is a weaker requirement.

Definition (Paracompactness): A space is paracompact if every open cover has a locally finite refinement.

NOTE: Locally finiteness is weaker than finitenes of subcovers. Therefore every compact space is paracompact.

NOTE: (wiki) Every metric space is paracompact. A topological space is metrizable if and only if it is a paracompact and locally metrizable Hausdorff space. Paracompactness has little to do with the notion of compactness, but rather more to do with breaking up topological space entities into manageable pieces.

A paracomact manifold M implies:

  1. M admits a Riemannian metric and
  2. M is second countable.

The most important implication is that a paracompact manifold M will have a partition of unity.

Definition (Partition of unity): If (X, \mathcal T) is a topological space, and R is a set of continuous functions from X to unit interval [0, 1] , such that for every point x \in X :

  1. there is a neighborhood of x where all but finite number of functions in R are 0 , and
  2. the sum of all the functions at x is 1 :

    \sum_{\rho\in R}\rho(x) = 1

This is for the ease of defining integrals on the manifold.

next >

Starting to Learn General Relativity
Last updated: 2017-04-19 21:55:03 PDT.

I was told that usually, a physics student can only master either GR or QFT, but not both at the same time. It will be an interesting experiment if I can start both since I'm not in a rush right now. So this is it. This is the textbook I will use: General Relativity by Robert M. Wald.

General Relativity - UCI OCW

Perfect Fluid in Special Relativity
Last updated: 2016-06-08 11:28:33 PDT.

I was watching a OCW from UC Irvine on GR. It starts with some discussions about special relativistic. This one about perfect fluid caught me because is not obvious how the Euler equation is derived. So I did it here.

It is an interesting topic in astrophysics since most of the objects we study can be considered (general) relativistic perfect fluid. As a first step, perfect fluid in special relativity is studied.

Starting by write down the stress-energy tensor T^{\alpha\beta} at a point x_0 . First create the reference frame at that point, and the tensor looks like

\begin{align*} T^{00} &= \rho \\ T^{i0} &= 0 \\ T^{ij} &= p \delta^{ij}\quad. \\ \end{align*}

Then generalize it using Lorentz transformation:

\begin{align*} T^{\alpha'\beta'}&=\Lambda^{\alpha'}{}_\alpha\Lambda^{\beta'}{}_\beta T^{\alpha\beta} \\ \end{align*}

where consider only the boost

\begin{align*} \Lambda^{0}{}_0 &= \gamma \\ \Lambda^{0}{}_j &= \gamma v_j \\ \Lambda^{i}{}_0 &= \gamma v_i \\ \Lambda^{i}{}_j &= \delta^{i}{}_j + (\gamma-1)v_i v_j / v^2 \\ \end{align*}

where \gamma \equiv 1/\sqrt{1-v^2} is the Lorentz factor, then the tensor T^{\alpha\beta} is

\begin{align*} \newcommand{\Lij}[2]{\Lambda^{#1}{}_{#2}} \newcommand{\Lzz}[0]{Lij 00} \newcommand{\Lzj}[1]{Lij 0{#1}} \newcommand{\Liz}[1]{Lij {#1}0} \newcommand{\LUDzz}[0]{\gamma} \newcommand{\LUDzj}[1]{\gamma v_{#1}} \newcommand{\LUDiz}[1]{\gamma v_{#1}} \newcommand{\LUDij}[2]{\delta^{#1}{}_{#2}+(\gamma-1) v_{#1}v_{#2}/ v^2} T^{0'0'} &= {\Lij 00}{\Lij 00}T^{00}+ {\Lij 00}{\Lij 0j}T^{0j}+ {\Lij 0i}{\Lij 00}T^{i0}+ {\Lij 0i}{\Lij 0j}T^{ij} \\ &= \gamma^2 \rho + \gamma^2 v^2 p \\ T^{0'b'} &= \Lambda^0{}_0\Lambda^{b'}{}_\beta T^{0\beta} + \Lambda^0{}_i\Lambda^{b'}{}_\beta T^{i\beta} \\ &= \gamma\Lambda^{b'}{}_\beta T^{0\beta} + \Lambda^0{}_i \Lambda^{b'}{}_\beta p\delta^{i\beta} \\ &= \gamma\Lambda^{b'}{}_0 T^{00} + p \Lambda^0{}_i \Lambda^{b'}{}_i \\ &= \rho\gamma^2v_{b'} + p (\LUDzj{i})(\LUDij{b'}{i}) \\ &= \rho\gamma^2v_{b'} + p (\gamma v_{b'} + \gamma (\gamma -1 )v_{b'}) \\ &= \rho\gamma^2v_{b'} + p \gamma^2 v_{b'} \\ &= (\rho + p)\gamma^2v_{b'} \\ T^{a'b'} &= {\Lij {a'}0}{\Lij {b'}0}T^{00}+ {\Lij {a'}0}{\Lij {b'}j}T^{0j}+ {\Lij {a'}i}{\Lij {b'}0}T^{i0}+ {\Lij {a'}i}{\Lij {b'}j}T^{ij} \\ &= \rho{\Lij {a'}0}{\Lij {b'}0}+ p{\Lij {a'}i}{\Lij {b'}j}\delta^{ij} \\ &= \rho\gamma^2v_{a'}v_{b'}+ (\LUDij {a'}i)(\LUDij {b'}i)p \\ &= \rho\gamma^2v_{a'}v_{b'}+ p(\delta^{a'b'}+2(\gamma-1)v_{a'}v_{b'}/v^2 + (\gamma-1)^2v_{a'}v_{b'}/v^2) \\ &= \rho\gamma^2v_{a'}v_{b'}+ p(\delta^{a'b'}+(\gamma^2-1)v_{a'}v_{b'}/v^2) \\ &= \rho\gamma^2v_{a'}v_{b'}+ p(\delta^{a'b'}+\gamma^2v_{a'}v_{b'}) \quad\quad \text{ since } (\gamma^2 -1) / v^2 = \gamma^2 \\ &= \rho\gamma^2v_{a'}v_{b'}+ p\delta^{a'b'} \quad.\\ \end{align*}

Combine the equations:

T^{\alpha\beta} = p \eta ^{\alpha\beta} + (\rho + p)u^\alpha u^\beta\quad.

To get the equation of motion, we can use the energy-momemtum conservation law:

\begin{align*} \partial_\alpha T^{\alpha\beta} &= \partial_\alpha p \eta ^{\alpha\beta} + \partial_\alpha (\rho + p)u^\alpha u^\beta \\ &=\left. (-\dot p, \nabla \mathbf p)^\beta + (\dot\rho + \dot p, \nabla \rho + \nabla p)_\alpha u^\alpha u^\beta + (\rho + p) \partial_\alpha(u^\alpha u^\beta) \right. \\ &= 0 \quad. \end{align*}

For \beta=0 ,

\begin{align*} \partial_\alpha T^{\alpha0} &= -\dot p + \gamma^2(\dot\rho + \dot p + (\mathbf v \cdot \nabla) (\rho + p)) + \gamma^2(\rho + p) \nabla \cdot \mathbf v \\ &=(\gamma^2-1)\dot p + \gamma^2(\dot\rho + (\mathbf v \cdot \nabla) (\rho + p)) + \gamma^2(\rho + p) \nabla \cdot \mathbf v \\ &= 0 \\ v^2\dot p + \dot\rho &= -(\mathbf v \cdot \nabla) (\rho + p) + (\rho + p) \nabla \cdot \mathbf v \\ &= -\nabla \cdot ((\rho + p)\mathbf v) \end{align*}