MindMaker Blueprint Functions Overview
There are two primary components to MindMaker — an executable learning engine and a set of blueprint nodes which interface with the learning engine and pass information back and forth to UE via the Plugin. This article covers the usage of the MindMaker blueprint assets. There are two default agent classes for use with MindMaker, one is for a 3rd person NPC and the other is for Generic Actor object such as a sphere or a cube.
Assets for both uses cases are located within the MindMakerStarterContent directory, in the MindMakerAIControlerBP and the MindMakerActorBP blueprints respectively. MindMakerAIControlerBP is used for training NPCs and the other, MindMakerActorBP is used for endowing UE objects with intelligent, self-learning capabilities. In both cases, the MindMaker backend executable is accessed via the LaunchMindMaker node and connects to blueprints via SocketIO connection(SocketIO plugin included). To use the MindMakerAIControlerBP for a given AI character, start with any third person character mesh, and then under the Pawn properties of the mesh, under the AI Controler class, specify MindMakerAIControlerBP as the controller class. This will make your mesh controllable by MindMaker. Next we will cover the individual parameters of the LaunchMindMaker blueprint node, which is the main component of the MindMaker Plugin.
LaunchMindMaker Blueprint Function
RL Algorithm — This is where one can select the flavor of RL algorithm one wants to train the agent with. There are ten options in the drop down menu, with each algorithm having its own pros and cons. A detailed discussion of the available of the relevant algorithms and their use cases can be found here. https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
Num Train EP –this is an integer input representing the number of training episodes one wishes the agent to undertake. The larger the number of training episodes, the more exploration the agent does before transitioning to the strategic behavior it acquires during training. The complexity of the actions the agent is attempting to learn typically determines the number of training episodes required — more complex strategies and behaviors require more training episodes.
Num Eval EP — This is also an integer input and represents the number of evaluation episodes the agent will undergo after training. These are the episodes in which the agent demonstrates its learned behavior.
Continuous Action Space — This is a Boolean input which determines if the agent is using a continuous action space. A continuous action space is one in which there are an infinite number of actions the agent can take, for example if it is learning to steer a car, and range of angles over which the steering column can change is a decimal value between 0 and 180, than there is an infinite number of values within that range such as .12 and 145.774454. You will want to identify at the outset of using if your agent has an infinite number of actions or finite number actions they can take. The action space must either be continuous or discrete, it cannot be both.
Discrete Action Space — This is a Boolean input which determines if the agent is using a discrete action space. A discrete action space is one in which there are a finite number of actions the agent can take, such as if the AI can only move right one space or left one space. In which case it only has two actions available to it and the action space is discrete. The user determines which kind of action space the agent will be using before using MindMaker and set these values accordingly.
Action Space Shape — This defines the lower and upper boundaries of the actions available to the agent. If you are using a discrete action space, than this is simply the total number of actions available to the agent, for instance 2 or 8. If you are using a continuous action space, things are more complicated and you must define the low and high boundaries of the action space seperatly. The format for doing so is as follows: low= lowboundary, high= highboundary,shape=(1,) In this case, lowboundary is an value such as -100.4 and highboundary is a values such as 298.46. All decimal values between these bounds will then represent actions available to the agent. If you had an array of such actions, you could change the shape portion to reflect this.
Observation Space Shape — Properly speaking this input is a python derivative of the OPEN AI custom environment class and defines the lower and upper boundaries of observations available to the agent after it takes an action. The format for doing so is as follows: low=np.array([lowboundary]), high=np.array([highboundary]),dtype=np.float32. Imagine an agent that needed to take three specific action in a row to receive a reward, then its observation space would need to include access to those three actions, which would each be represented by a unique observation. Therefore the array of observations would have to include three different values, each one with own unique boundaries. For example, such an action space might be defined as such: low=np.array([0,0,0]), high=np.array([100,100,100]),dtype=np.float32 if each of its own actions that agent needed to observe was a value between 0 and 100. A rule of thumb is that if a value is part of the reward function for the agent, ie their behavior is only rewarded if some condition being met, than the observation space must include a reference to that value. If five conditions must be met for an agent to rewarded, than each of these five conditions must be part of the agents observation space.
Load Pre Trained Model — This is a Boolean value that determines if you want to the agent to load some pre trained behavior that was previously saved. If you set this to true, you will want to specify the name of the file in the Save /Load Model name input box. All models are saved by default to the app data roaming directory of the computer for instance C:\Users\username\AppData\Roaming
Save Model After Training — This is a Boolean value that determines if you want to the agent to save the behavior it has learned after training. If you set this to true, you will want to specify the name of the file in the Save/Load Model name input box. All models are saved by default to the app data roaming directory of the computer for instance C:\Users\username\AppData\Roaming
Save/Load Model Name — This is a string representing the name of the model you wish to save or load. Files are saved to the app data roaming directory of the computer for instance C:\Users\username\AppData\Roaming
Use Custom Params — This is Boolean value that determines if you want to use the stock version of the algorithm you have selected or wish to modify its parameters. If you wish to use custom parameters these can be accessed via the custom parameters structure variables. If you click on the them, for instance A2Cparams, you will see all the values that can be set within these structures. A detailed breakdown of the parameters for each algorithm can be found here: https://stable-baselines.readthedocs.io/en/master/
Other Blueprint Functions
A sample list of functions from the example project are presented below to understand how information is passed between MindMaker and Unreal Engine. All of the UE assets relevant to the toy problem are contained in the Assets/DeeplearningNPC folder. Of particular importance is the blueprint called AI_Character_Controler_BP In the AI_Character_Controler_BP blueprint, all of the environment variables are configured for passing to the MindMaker standalone application. These include the following essential functions
Load Sensory Input function — Imports the objects to which the AI will have access to for sensing or manipulation of its environment Environmental Controls function — This controls the logic for parts of the environment that change such switching lights on and off etc
Define Action Space function — Encode all possible agent actions into a single numeric value that can be passed to the mindmaker executable for evaluation by the RL algorithm
LaunchMindMaker function — this calls the standalone application at the commencing of play so that it can begin evaluation data from the UE environment. After this is initiated, the RL application begins probing the environment with random actions it generates itself, like a blind person searching in the dark for a light. The light is the reward,which is specified in UE function Check Reward function. LaunchLearningEngine also passes in some basic UE environment information to the standalone application, like the number of actions the agent can take, the total number of episodes to train for, and the number of episodes to display the agents acquired strategy after training. Displaying all the agents random training would take far too long.
ReceiveAction function — after the launch learning engine function has begun, the next function to fire is recieveaction. This receives the action that is chosen by the standalone application, and does a number of follow up procedures with it, such as updating the agents location in the environment, checking if the new action satisfies the reward condition, displaying the agents actions if we are through with training, and updated the agents observations about its environment so that they can be passed back to the standalone application in the next episode.
Make Observations function — The purpose of this is to update the agents observations about its environment following the action it has just taken. These will include, for instance, the agents location with the environment and any other environmental data that has changed since it last took an action. These are stored in a custom structure variable.
CheckReward — this specifies the reward condition for the agent in the environment. If this reward condition is met following the agent taking an action, this information is passed to the standalone application in the send observations function that follows. Send Observations Function — takes the new observations made by the agent as well as any reward information and passes them to the standalone application. This is how the RL algorithm will be able to evaluate whether the action it has just taken was a good one, and update its strategy accordingly. After this function fires, the one iteration or episode of the game is complete, and the process repeats ad infinitum.