Accelerating math operations.

Nov 8, 2012 at 5:40 AM

Hi,I got this idea to put a BOOL switch inside your Vector#f and Matrix#f objects.So the BOOL will be either LOAD or STORAGE.So you can do like this: Vector3f.SIMD_State(LOAD); and it allocates for a 16-byte aligned XMVECTOR.So you just do Vector3f.SIMD_State(LOAD) once,then you apply all the other methods on it,like Vector3f.Normalize() and so on and inside these methods it calls xmmath functions,then you o Vector3f.SIMD_State(STORE) and it stores it back into a XMFLOAT3 and deallocates the vector to save memory(or you can just leave it sit there).I can also put a bool that checks if there is going to be any SIMD functionality at all or to just use the linear math it currently has,since for smaller calculations loading and storing just causes too much overhead.I'll take a try at it when I can and post it if I have success.

Coordinator
Nov 10, 2012 at 5:57 PM

Hello,

I 100% support and encourage experimentation with the code base.  Several other people have wondered about the lack of SIMD math classes in the engine, but I have kept it that way to minimize the amount of maintenance that has to be done as the various math libraries come and go.  So I would be interested to hear your results, but I think overall there would have to be a massive performance increase to justify overhauling all of the vector/matrix classes...

Looking forward to hearing how you make out!

- Jason

Nov 12, 2012 at 12:16 AM
Edited Nov 12, 2012 at 2:27 AM

I decided to make it with unions,in the end it'll take more RAM than just using your method,so It's all encapsulated with an #if defined _SIMD_MATH_ ,so the user can only chose to use it for heavy SIMD math applications,like heavy physics.I'm aiming at changing the vectors and matrix classes to work with it without having to alter any other class in the program,so it's all clean.

Nov 12, 2012 at 2:33 AM
Edited Nov 12, 2012 at 2:40 AM

Here is a rough example of what I meant(I wrote it in a hurry and didn't try to compile,so it might be erroreous.Now I know it could use some more organization and I should probably change the xyzw[4] array into float x,y,z,w, since the rest of your program uses myVec,x etc.That way the outside program will never care what happens inside it.It is infact heavier than the normal linear version,however there is 1 thing in DirectXMath that really shines = the transform coord stream functions.Then again you would rarely use that for anything :D



SIMD_MATH.h 

#ifndef _SIMD_MATH_H_
#define _SIMD_MATH_H_

#define _SIMD_MATH_

#include <DirectXMath.h>
#include <Windows.h>

#if !defined(XM_NO_ALIGNMENT)
#define _DECLSPEC_ALIGN_16_   __declspec(align(16))
#else
#define _DECLSPEC_ALIGN_16_
#endif

#if !defined(XMASSERT)
#if defined(_PREFAST_)
#define XMASSERT(Expression) __analysis_assume((Expression))
#elif defined(XMDEBUG)
#define XMASSERT(Expression) ((VOID)((Expression) || (XMAssert(#Expression, __FILE__, __LINE__), 0)))
#else 
#define XMASSERT(Expression) ((VOID)0)
#endif 
#endif 

#define INTERSECTION_NONE 0
#define INTERSECTION_PARTIAL 1
#define INTERSECTION_FULL 2

using namespace DirectX;

enum VECTORTYPEDIMENSIONS
{
	FLOAT2D, 
	FLOAT3D, 
	FLOAT4D,
};

enum MATRIXTYPEDIMENSIONS
{
	FLOAT3X3D,
	FLOAT4X4D
};

enum DATASTATE
{
	LOADED,
	STORED
};

namespace Hieroglyph3
{
#if defined(_SIMD_MATH_)

	struct VectorDataContainer
	{
		union data
		{
			data()
			{
				SecureZeroMemory( &xyzw, sizeof( xyzw ) );
				SecureZeroMemory( &xyzwSIMD, sizeof( xyzwSIMD ) );
			}
			float xyzw[4];
			_DECLSPEC_ALIGN_16_ XMVECTOR xyzwSIMD;
		}data;
		VECTORTYPEDIMENSIONS dataType;
	};

	struct MatrixDataContainer
	{
		union data
		{
			data()
			{
				SecureZeroMemory( &entry2x2, sizeof( entry2x2 ) );
				SecureZeroMemory( &entry3x3, sizeof( entry3x3 ) );
				SecureZeroMemory( &entry4x4, sizeof( entry4x4 ) );
				SecureZeroMemory( &entry4x4SIMD, sizeof( entry4x4SIMD ) );
			}
			float entry2x2[2*2];
			float entry3x3[3*3];
			float entry4x4[4*4];
			_DECLSPEC_ALIGN_16_ XMVECTOR entry4x4SIMD[4];
		}data;
		MATRIXTYPEDIMENSIONS dataType;
	};

#define XY VectorDataContainer dataContainer; DATASTATE state
#define XYZ VectorDataContainer dataContainer; DATASTATE state
#define XYZW VectorDataContainer dataContainer; DATASTATE state
#define M2X2 MatrixDataContainer dataContainer; DATASTATE state
#define M3X3 MatrixDataContainer dataContainer; DATASTATE state
#define M4X4 MatrixDataContainer dataContainer; DATASTATE state

#else if

#define XY float x, y
#define XYZ float x, y, z
#define XYZW float x, y, z, w
#define M2X2 float entry2x2[2*2]
#define M3X3 float entry3x3[3*3]
#define M4X4 float entry4x4[4*4]

#endif
};

#endif

 


Vector2f.h

#ifndef _VECTOR_2_F_H_
#define _VECTOR_2_F_H_

#include "SIMD_Math.h"

namespace Hieroglyph3
{
	class Vector2f
	{
	public:
		Vector2f( );
		Vector2f( float x, float y );
		Vector2f( const Vector2f& vector );

		void Clamp( );
		void MakeZero( );
		void Normalize( );
		float Magnitude( );

		Vector2f& operator= ( const Vector2f& vector );

		float operator[] ( int index ) const;
		float& operator[] ( int index );

		bool operator== ( const Vector2f& vector ) const;
		bool operator!= ( const Vector2f& vector ) const;

		Vector2f operator+ ( const Vector2f& vector ) const;
		Vector2f operator- ( const Vector2f& vector ) const;
		Vector2f operator* ( float scalar ) const;
		Vector2f operator/ ( float scalar ) const;
		Vector2f operator- ( ) const;

		Vector2f& operator+= ( const Vector2f& vector );
		Vector2f& operator-= ( const Vector2f& vector );
		Vector2f& operator*= ( float scalar );
		Vector2f& operator/= ( float scalar );

		#if defined _SIMD_MATH_
		void Load();
		void Store();
		#endif
	public:
		XY;
	};
};

#endif

 

 

Vector2f.cpp

#include "PCH.h"
#include "Vector2f.h"

using namespace Hieroglyph3;

Vector2f::Vector2f( )
{
}

Vector2f::Vector2f( float x, float y )
{
#if defined _SIMD_MATH_
	dataContainer.data.xyzw[0] = x;
	dataContainer.data.xyzw[0] = y;
	dataContainer.data.xyzw[0] = 1.0f;
	dataContainer.data.xyzw[0] = 1.0f;
	dataContainer.dataType = FLOAT2D;

	state = STORED;
#else
	this->x = x;
	this->y = y;
#endif
}

Vector2f::Vector2f( const Vector2f& vector )
{
#if defined _SIMD_MATH_
	if( vector.state == STORED )
	{
		dataContainer.data.xyzw[0] = vector.dataContainer.data.xyzw[0];
		dataContainer.data.xyzw[0] = vector.dataContainer.data.xyzw[1];
		dataContainer.data.xyzw[0] = 1.0f;
		dataContainer.data.xyzw[0] = 1.0f;
		dataContainer.dataType = FLOAT2D;

		state = STORED;
	}
	else
	{
		dataContainer.data.xyzwSIMD = vector.dataContainer.data.xyzwSIMD;
		dataContainer.dataType = FLOAT2D;

		state = LOADED;
	}
#else
	this->x = vector.x;
	this->y = vector.y;
#endif
}

Vector2f& Vector2f::operator= ( const Vector2f& vector )
{
#if defined _SIMD_MATH_
	if( (state == STORED) && (vector.state == STORED) )
	{
		dataContainer.data.xyzw[0] = vector.dataContainer.data.xyzw[0];
		dataContainer.data.xyzw[0] = vector.dataContainer.data.xyzw[1];

	    return( *this );
	}
	else if(  (state == LOADED) && (vector.state == LOADED) )
	{
		dataContainer.data.xyzwSIMD = vector.dataContainer.data.xyzwSIMD;

	    return( *this );
	}
	else
	{
		return( *this );
	}
#else
	this->x = vector.x;
	this->y = vector.y;

    return( *this );
#endif
}

void Vector2f::MakeZero( )
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
		dataContainer.data.xyzw[0] = 0.0f;
		dataContainer.data.xyzw[1] = 0.0f;
	}
	else
	{
		ZeroMemory( &dataContainer.data.xyzwSIMD, sizeof( dataContainer.data.xyzwSIMD ) );
	}
#else
	this->x = 0.0f;
	this->y = 0.0f;
#endif
}

void Vector2f::Normalize( )
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
		float inverseMagnitude = ( 1.0f / Magnitude() );

		dataContainer.data.xyzw[0] *= inverseMagnitude;
		dataContainer.data.xyzw[1] *= inverseMagnitude;
	}
	else
	{
		dataContainer.data.xyzwSIMD = XMVector2Normalize( dataContainer.data.xyzwSIMD );
	}
#else
	float inverseMagnitude = ( 1.0f / Magnitude() );

	this->x *= inverseMagnitude;
	this->y *= inverseMagnitude;
#endif
}

float Vector2f::Magnitude( )
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
	float length = 0.0f;

	length += dataContainer.data.xyzw[0] * dataContainer.data.xyzw[0];
	length += dataContainer.data.xyzw[1] * dataContainer.data.xyzw[1];

	return( sqrtf( length ) );
	}
	else
	{
		XMVECTOR magnitudeVec = XMVector2LengthSq( dataContainer.data.xyzwSIMD );
		float magnitude;
		
		memcpy( &magnitude, &magnitudeVec.m128_i16, sizeof( magnitudeVec.m128_i16 ) );

		return( magnitude );
	}
#else
	float length = 0.0f;

	length += this->x * this->x;
	length += this->y * this->y];

	return( sqrtf( length ) );
#endif
}

void Vector2f::Clamp()
{
#if defined _SIMD_MATH_
	if( state == LOADED )
	{
		dataContainer.data.xyzwSIMD = XMVector2ClampLength( dataContainer.data.xyzwSIMD, 0.0f, 0.1f );
	}
	else
	{
	if ( dataContainer.data.xyzw[0] > 1.0f ) dataContainer.data.xyzw[0] = 1.0f;
	if ( dataContainer.data.xyzw[0] < 0.0f ) dataContainer.data.xyzw[0] = 0.0f;

	if ( dataContainer.data.xyzw[1] > 1.0f ) dataContainer.data.xyzw[1] = 1.0f;
	if ( dataContainer.data.xyzw[1] < 0.0f ) dataContainer.data.xyzw[1] = 0.0f;
	}
#else
	if ( this->x > 1.0f ) this->x = 1.0f;
	if ( this->x < 0.0f ) this->x = 0.0f;

	if ( this->y > 1.0f ) this->y = 1.0f;
	if ( this->y < 0.0f ) this->y = 0.0f;
#endif
}

float Vector2f::operator[] ( int index ) const
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
	if ( index == 0 ) return( dataContainer.data.xyzw[0] );
	return( dataContainer.data.xyzw[1] );
	}
#else
	if ( index == 0 ) return( this-.x );
	return( this->y );
#endif
}

float& Vector2f::operator[] ( int index )
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
	if ( index == 0 ) return( dataContainer.data.xyzw[0] );
	return( dataContainer.data.xyzw[1] );
	}
#else
	if ( index == 0 ) return( this->x );
	return( this->y );
#endif
}

bool Vector2f::operator== ( const Vector2f& vector ) const
{
#if defined _SIMD_MATH_
	if( (state == STORED) && (vector.state == STORED) )
	{

		if ( ( dataContainer.data.xyzw[0] - vector.dataContainer.data.xyzw[0] ) * ( dataContainer.data.xyzw[0] - vector.dataContainer.data.xyzw[0] ) > 0.01f )
			return false;
		if ( ( dataContainer.data.xyzw[1] - vector.dataContainer.data.xyzw[1] ) * ( dataContainer.data.xyzw[1] - vector.dataContainer.data.xyzw[1] ) > 0.01f )
			return false;

		return( true );
	}
	else if(  (state == LOADED) && (vector.state == LOADED) )
	{
		return XMVector2Equal( dataContainer.data.xyzwSIMD, vector.dataContainer.data.xyzwSIMD );
	}
	else
	{
		return( false );
	}
#else
	if ( ( x - vector.x ) * ( x - vector.x ) > 0.01f )
		return false;
	if ( ( y - vector.y ) * ( y - vector.y ) > 0.01f )
		return false;

	return( true );
#endif
}

bool Vector2f::operator!= ( const Vector2f& vector ) const
{
    return( !( *this == vector ) );
}

Vector2f Vector2f::operator+ ( const Vector2f& vector ) const
{
#if defined _SIMD_MATH_
	if( (state == STORED) && (vector.state == STORED) )
	{
		Vector2f sum;
		sum.state = state;

		sum.dataContainer.data.xyzw[0] = dataContainer.data.xyzw[0] + vector.dataContainer.data.xyzw[0];
		sum.dataContainer.data.xyzw[0] = dataContainer.data.xyzw[1] + vector.dataContainer.data.xyzw[1];

		return( sum );
	}
	else if( (state == LOADED) && (vector.state == LOADED) )
	{
		Vector2f sum;
		sum.state = state;

		sum.dataContainer.data.xyzwSIMD = XMVectorAdd( dataContainer.data.xyzwSIMD, vector.dataContainer.data.xyzwSIMD );

		return( sum );
	}
	else
	{
		Vector2f sum;
		sum.state = state;

		return( sum );
	}
#else
	Vector2f sum;

	sum.x = x + vector.x;
	sum.y = y + vector.y;

	return( sum );
#endif
}

Vector2f Vector2f::operator- ( const Vector2f& vector ) const
{
#if defined _SIMD_MATH_
	if( (state == STORED) && (vector.state == STORED) )
	{
		Vector2f difference;
		difference.state = state;

		difference.dataContainer.data.xyzw[0] = dataContainer.data.xyzw[0] - vector.dataContainer.data.xyzw[0];
		difference.dataContainer.data.xyzw[1] = dataContainer.data.xyzw[1] - vector.dataContainer.data.xyzw[1];

		return( difference );
	}
	else if( (state == LOADED) && (vector.state == LOADED) )
	{
		Vector2f difference;
		difference.state = state;

		difference.dataContainer.data.xyzwSIMD = XMVectorSubtract( dataContainer.data.xyzwSIMD, vector.dataContainer.data.xyzwSIMD );

		return( difference );
	}
	else
	{
		Vector2f difference;
		difference.state = state;

		return( difference );
	}
#else
	Vector2f difference;

	difference.x = x - vector.x;
	difference.y = y - vector.y;

	return( difference );
#endif
}

Vector2f Vector2f::operator* ( float scalar ) const
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
		Vector2f product;
		product.state = state;

		product.dataContainer.data.xyzw[0] = dataContainer.data.xyzw[0] * scalar;
		product.dataContainer.data.xyzw[1] = dataContainer.data.xyzw[1] * scalar;

		return( product );
	}
	else
	{
	Vector2f product;
	product.state = state;

	XMVECTOR multiplierVec = XMLoadFloat( &scalar );

	product.dataContainer.data.xyzwSIMD = XMVectorMultiply( dataContainer.data.xyzwSIMD, multiplierVec );

	return ( product );
	}
#else
	Vector2f product;

	product.x = x * scalar;
	product.y = y * scalar;

	return( product );
#endif
}

Vector2f Vector2f::operator/ ( float scalar ) const
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
		Vector2f quotient;
		quotient.state = state;

		if( scalar != 0.0f )
		{
			float inverseScalar = 1.0f / scalar;

			quotient.dataContainer.data.xyzw[0] = dataContainer.data.xyzw[0] * inverseScalar;
			quotient.dataContainer.data.xyzw[1] = dataContainer.data.xyzw[1] * inverseScalar;
		}
		else
		{
			quotient.MakeZero();
		}
		return( quotient );
	}
	else
	{
		Vector2f quotient;
		quotient.state = state;

		if( scalar != 0.0f )
		{
			XMVECTOR dividerVec = XMLoadFloat( &scalar );

			quotient.dataContainer.data.xyzwSIMD = XMVectorDivide( dataContainer.data.xyzwSIMD, dividerVec );
		}
		else
		{
			quotient.MakeZero();
		}
		return ( quotient );
	}
#else
	Vector2f quotient;
	if ( scalar != 0.0f )
	{
		float inverseScalar = 1.0f / scalar;
		quotient.x = x * inverseScalar;
		quotient.y = y * inverseScalar;
	}
	else
	{
		quotient.MakeZero();
	}

	return( quotient );
#endif
}

Vector2f Vector2f::operator- ( ) const
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
		Vector2f negative;
		negative.state = state;

		negative.dataContainer.data.xyzw[0] = -dataContainer.data.xyzw[0];
		negative.dataContainer.data.xyzw[1] = -dataContainer.data.xyzw[1];
		
		return( negative );
	}
	else
	{
		Vector2f negative;
		negative.state = state;

		negative.dataContainer.data.xyzwSIMD = XMVector2ReciprocalLength( dataContainer.data.xyzwSIMD );
	}
#else
	Vector2f negative;

	negative.x = -x;
	negative.y = -y;

	return( negative );
#endif
}

Vector2f& Vector2f::operator+= ( const Vector2f& vector )
{
#if defined _SIMD_MATH_
	if( (state == STORED) && (vector.state == STORED) )
	{
		dataContainer.data.xyzw[0] += vector.dataContainer.data.xyzw[0];
		dataContainer.data.xyzw[1] += vector.dataContainer.data.xyzw[1];

		return( *this );
	}
	else if( (state == LOADED) && (vector.state == LOADED) )
	{
		dataContainer.data.xyzwSIMD = XMVectorAdd( dataContainer.data.xyzwSIMD, vector.dataContainer.data.xyzwSIMD );

		return( *this );
	}
	else
	{
		return( *this );
	}
#else
	x += vector.x;
	y += vector.y;

	return( *this );
#endif
}

Vector2f& Vector2f::operator-= ( const Vector2f& vector )
{
#if defined _SIMD_MATH_
	if( (state == STORED) && (vector.state == STORED) ) 
	{
		dataContainer.data.xyzw[0] -= vector.dataContainer.data.xyzw[0];
		dataContainer.data.xyzw[1] -= vector.dataContainer.data.xyzw[1];

		return( *this );
	}
	else if( (state == LOADED) && (vector.state == LOADED) )
	{
		dataContainer.data.xyzwSIMD = XMVectorSubtract( dataContainer.data.xyzwSIMD, vector.dataContainer.data.xyzwSIMD );

		return( *this );
	}
	else
	{
		return( *this );
	}
#else
	x -= vector.x;
	y -= vector.y;

	return( *this );
#endif
}

Vector2f& Vector2f::operator*= ( float scalar )
{
#if defined _SIMD_MATH_
	if( state == STORED ) 
	{
		dataContainer.data.xyzw[0] *= scalar;
		dataContainer.data.xyzw[1] *= scalar;

		return( *this );
	}
	else
	{
		XMVECTOR multiplierVec = XMLoadFloat( &scalar );

		dataContainer.data.xyzwSIMD = XMVectorMultiply( dataContainer.data.xyzwSIMD, multiplierVec );

		return( *this );
	}
#else
	x *= scalar;
	y *= scalar;

	return( *this );
#endif
}

Vector2f& Vector2f::operator/= ( float scalar )
{
#if defined _SIMD_MATH_
	if( state == STORED )
	{
		if( scalar != 0.0f )
		{
			float inverseScalar = 1.0f / scalar;

			dataContainer.data.xyzw[0] = dataContainer.data.xyzw[0] * inverseScalar;
			dataContainer.data.xyzw[1] = dataContainer.data.xyzw[1] * inverseScalar;
		}
		else
		{
			MakeZero();
		}
		return( *this );
	}
	else
	{
		if( scalar != 0.0f )
		{
			XMVECTOR dividerVec = XMLoadFloat( &scalar );

			dataContainer.data.xyzwSIMD = XMVectorDivide( dataContainer.data.xyzwSIMD, dividerVec );
		}
		else
		{
			MakeZero();
		}
		return ( *this );
	}
#else
	if ( scalar != 0.0f )
	{
		float inverseScalar = 1.0f / scalar;	
		x *= inverseScalar;
		y *= inverseScalar;
	}
	else
	{
		MakeZero();
	}

	return( *this );
#endif
}

#if defined _SIMD_MATH_
void Vector2f::Load( )
{
	dataContainer.data.xyzwSIMD = XMLoadFloat2( &XMFLOAT2( dataContainer.data.xyzw[0], dataContainer.data.xyzw[1] ) );

	state = LOADED;
}

void Vector2f::Store( )
{
	XMFLOAT2 stored;
	XMStoreFloat2( &stored, dataContainer.data.xyzwSIMD );
	dataContainer.data.xyzw[0] = stored.x;
	dataContainer.data.xyzw[1] = stored.y;

	state = STORED;
}
#endif
Nov 12, 2012 at 4:57 PM

Ok I did all sots of test on 1GB large vector arrays with both the normal math and the SIMD one,in tasks like Clamping the DirectXMath ones performed almost 40% slower,I looked into it's source,theres a huge amount of XMVECTOR operations there,where with simpl emath it's 4 lines of code,but for bigger stuff DirectXMath wins.I'm now confused whether I should go on with converting to full DirectXMath,I mean I really like abstracting it,so the user doesn't have to care if it's DirectXMath or any other math lib,but then again currently(until MS decides to change it again) DirectXMath is the opus magnum.

Editor
Nov 12, 2012 at 5:27 PM
Edited Nov 12, 2012 at 5:35 PM

I used the DirectXMath library a fair bit and decided I _really_ didn't like the API - i.e. converting scalar code over generally resulted in substantially uglier and longer code. It's also much nastier to debug because you lose visibility in the values stored in the registers. So I definitely understand the compunction to wrap it / provide a swappable implementation. But then I had the eventual realization that doing so is probably solving the wrong problem.

The scalar approach is sufficient for many uses-cases (such as the Camera) and is substantially more convenient.  Whereas to really benefit from SIMD typically requires one to make larger scale changes anyway and more importantly, requires one think in terms of solving a specific problem: e.g. instead of providing a fast operation in a generic math library, you're building an optimized frustum cull. Given that, why worry about wrapping? These are specialized cases you can solve as one-offs and you're likely not going to want some non-optimal wrapped API there anyway.

Also, once you're thinking in terms of implementing "subsystems" rather than a generic API via SIMD, it makes a lot of sense to consider this to do so because it's get you back to a far more pleasant programming experience (it's no longer C/C++ but can create libraries callable from those languages with no overhead):

http://ispc.github.com/

I've only started playing with but it actually works and besides the obvious productivity benefits, you get produce e.g. AVX/2 versions "for free".

Just $0.01 from someone admittedly not doing AAA games...

Jason