Framework for Automating HIL Tests

Table of Contents

Version 1 of this project was presented at the Mathworks Automotive Conference in 2025

As a Controls Modeling and Analysis Lead Engineer at Nikola, I was responsible for controls logic validation and verification using model-in-the-loop (MIL) and hardware-in-the-loop (HIL). As the MIL project died down due to constrained resources, more focus was given to HIL testing as a means for validating controls logic. However, HIL testing, as I inherited it, was inefficient and induced friction in new test creation. A framework was created to alleviate these deficiencies and help with effective resource utilization.

Tech Stack
#

Version 1:
- MATLAB 2024a
Version 2:
- Python 3.10
- Libraries:
  - Data wrangling and analysis: NumPy, Pandas
  - Excel writing: OpenPyXL
  - MATLAB interaction: matlabengine
  - Other: Internally built libraries that will be discussed as needed

Background
#

A vehicle has multiple electronic control units (ECUs) - or micro-controllers - to control a variety of functions. Each major system gets an ECU: body control, frame control, thermal management, and vehicle control. Components also have their own ECUs its own control; for example, the motor control unit. All ECUs communicate with each either directly or through a proxy to maintain smooth communication between all vehicle systems to ensure smooth operation in all possible conditions.

Akin a normal computer, ECUs have multiple inputs and outputs. Just as a a keystroke produces a letter on the computer screen, an input to the ECU produces a certain output that is determined by the control logic uploaded on it. The summation of control logic on all the ECUs determines the vehicle’s overall operation. This involves simple behaviors like cabin lights turning on when a passenger opens a door or complex and critical ones like the deployment of specific airbags in case of a crash. It is, therefore, imperative to test control logic of all ECUs as thoroughly as possible to ensure reliability, safety and efficiency of the vehicle. HIL allows us to do that without getting into an actual vehicle.

ECU Picture (Credit: Bosch) — ECU Picture Credit: Bosch

Hardware-in-the-loop (HIL) testing isolates each ECU, physically, from the vehicle to validate its specific control logic. A real-time computer is used to probe the isolated ECU with a myriad of inputs so that outputs from it can be monitored and verified against acceptable behavior. But the vehicle is a full system and if an ECU is isolated, it may not behave as in an actual vehicle. Other systems in the vehicle are modeled using simulations to spoof the ECU into believing that it is in an actual vehicle. Vehicle models are uploaded on the real-time computer for the ECU to maintain its umbilical connection with the “vehicle” via wire harnesses.

HIL Setup at Nikola (don’t mind the mess)

Old Process
#

Nikola used Speedgoat real-time machines. Speedgoat, the company, was acquired by Mathworks. Therefore, all tooling used to run HIL tests was Mathworks based. A toolbox called Simulink Test was the foundation for HIL testing in the old process.Simulink, shipped with MATLAB, was used to create system and test models.

Main Model
#

System models for each ECU were created to ensure that it had all the inputs needed to be in a non-faulted state. A main system model, acted like a scaffolding to the ECU under test and represented the ECU’s interactions with the vehicle. For example, the main model for a vehicle control module (VCM) under test is shown below.

This model contained communication blocks for CAN, LIN and General Purpose Input/Output (GPIO) enabling the real-time computer to communicate with the ECU. These blocks also read messages from the ECU. A closed loop was created by applying logic based on messages received from the ECU and feeding them back. The main model was divided into two sections of sub-models: plant and ECU. Component and subsystems like powertrain, battery, vehicle dynamics, etc. formed the plant sub-models. Other ECUs of the vehicle (not under test) formed the ECU sub-models. Ancillary models existed to receive test inputs and for general housekeeping.

Test Model
#

A test model was created to probe/stimulate the main model. Thus, the test model provided inputs to the main model which in turn probed the ECU under test via the real-time machine.

flowchart LR;
A[Test Harness]-->|Provide inputs|B[Main Model];
B-->|Stimulates via real-time machine|C[ECU under test]

The test model included two special blocks - the test sequence and the test assessment blocks. Both these blocks are state machines that behave based on internal or external states. The test sequence block included test steps to determine ECU inputs for that test, akin a recipe. For example, at test start turn ignition on; once ignition is on, change gear from Park to Drive, etc. The test assessment block, on the other hand, is a real-time verifier of the output. It had multiple verify directives to validate incoming signal values based on the current test step.

Multiple test models were created for each ECU under test. Test models could be opened and studied via the main model as each test model was dependent on the main model. VCM had about 40 test models when I inherited the project.

Build and Test
#

The embedded nature of the real-time computer requires that each model be converted to C or C++ code, be compiled and built before upload. Simulink and Simulink Test provided a single keyboard shortcut to perform these sequential tasks: Ctrl + B. However, it was inefficient to open each individual test model, press Ctrl + B, wait for the build, and repeat for the remainder of the models. To simplify this, I created a buildAllTestModels function which used the sltest API provided by the Simulink Test toolbox.

function buildAllTestModels(mainModel)
% Function builds all test models

    arguments
        mainModel          char = 'Main_MR';
    end

    testModels = sltest.harness.find(mainModelName);
    testModelNames = {testModels.name};
    
    cellfun(@(testModelName) buildTestModel(mainModel, testModelName), ...
        testModelNames, ...
        'UniformOutput', false);
end

function buildTestModel(mainModel, testModelName)
% Function builds one test model

    arguments
        mainModel          char = 'Main_MR';
        testModelName      char = '';
    end

    if isempty(testModelName)
        warning('No test model built. Test model name name was empty.')
        return;
    end

    sltest.harness.load(mainModel, testModelName); 
    slbuild(testModelName); 
    sltest.harness.close(mainModel, testModelName);
end

Each model took 18-20 minutes to build. With 40 test models, it took approximately 12 hours to build all test models. It took additional ~4-6 hours to execute the tests sequentially. The ~12 hours and ~4-6 hours quoted before are when everything went smoothly, i.e., no errors, no real-time machine startup failures, etc., and that seldom happened. Wrestling such failures/errors pushed the full regression testing time to 4 days.

I knew this process had to change the first time I executed a full regression suite. However, I had to do repeat it a couple of more times to maintain deliverables while I was developing the new process.

Disadvantages
#

1. Long Build Times
#

No one wants to wait 18-20 minutes to build one test. Certainly, I didn’t. This wait time caused enormous friction in developing new tests; the intangibles weighed more on the validation engineer than the tangibles.

Long wait times were a function of the main model. The VCM, being the brains of the vehicle, contained a giant communication network (CAN, LIN, GPIO), and therefore, required 18-20 minutes to build. The depth of subsystems in the VCM main model were also responsible for the long build times. Body and frame control (BCM, FCM) that largely dealt with lighting, cabin comfort, etc. neither had a large communication network nor convoluted subsystems. These models built in 5-8 minutes.

2. Difficult Trial and Error
#

Creating a new test inherently involves trial and error. An engineer tries out parameter values to incite a certain behavior from the ECU, and repeats until a full-fledged test is written. Given the complexity of the system, theoretical parameter values seldom worked in the real world full of inconsistencies and tolerances. Imagine a 18-20 minute lag between each trial. Even the strongest of minds would lose their train of thought in that time. A weak mind, like mine, gravitated to chatting with someone in person or pulling the magical device from my pocket.

3. Lack of Reusability
#

ECUs, in their normal operation, achieve various states based on a combination of inputs. For example, the VCM would start in StandBy state but would achieve HVOn if ignition was turned on and the battery was fault-free. There was a test to validate this behavior with a model and its sequence block. However, there were other tests that validated other behavior once the VCM was in HVOn. For these next tests, the same steps from the first test would be repeated. That is, there was no easy way of using the test sequence block from the first test in subsequent ones.

4. Lack of Maintainability
#

Because we were no longer D.R.Y., due to lack of stackability, it made maintenance of tests a nightmare. For example, if a future release changed behavior of achieving HVOn with an additional input, all the test models that depended on this behavior would need to be updated. Engineers flat out forgot to update all tests. Test models which were once useful languished as “old” tests, never to be used again, while effective validation of control logic suffered.

5. Message rates skewed
#

Messages from the ECU are transmitted at a variety of rates to ensure effective and uninterrupted communication. Some messages are transmitted at 1ms intervals, while others at 250ms, 500ms 1s, and even 10s. These message rates were not maintained in the old process due to the real-time nature of the assessments, where messages were analyzed and verified while they were flowing in, and at the simulation timestamp (generally, 0.001s). This was incorrect validation.

The disadvantage became apparent during an interesting verification. Some messages have counter signals that, at every interval, count up from 0 to 15 in discrete whole number steps, repeatedly. Due to the interpolation of a 500ms message to a 1ms simulation rate, float values started to appear in the signal that only allowed unsigned integers, and tests started failing intermittently.

6. Missing robustness
#

In previous disadvantages, I highlighted the intermittent nature of failures. These are the most insidious, in my opinion. A repeatable failure lets you get to a root cause somewhat easily. An intermittent failure keeps you questioning about the exact root cause. In the latter, if you fix something and the issue reappears, you are back at where you started.

The old process seemed to stand on weak wooden flooring, ready to cave. It lacked robustness, not only in how it reported failures but also how some issues surfaced: sneakily.

Version 1
#

I wanted to eliminate these disadvantages. My goal was to have a system that, with a click or command, executed all the tests, analyzed logs and generated a test report without having an engineer monitor anything. An ancillary, but equally important, goal was to do this in as expeditious way as possible.

I wouldn’t be able to achieve either goal without eliminating long wait times associated with model building.

`slrealtime`
#

Simulink Test ships with the slrealtime API, which coincidentally, I learned about when I was presenting at the Mathworks Automotive Conference 2024. We made a graphical user interface for running HIL simulations while it was still using the old process, and that GUI used slrealtime generously. If this could be used in a GUI environment, surely, it would work as a standalone.

slrealtime is a quasi ASAM XIL - an API protocol to communicate between test benches and test automation frameworks - abstraction baked into MATLAB. It allows us to modify, read, and otherwise monitor the real-time machine’s state. The API lets us initiate the real-time machine using tg = slrealtime('TargetPC1') and perform various operations on the tg object:

tg.load('ModelName');                   % Loads model on target
tg.addInstrumentation(hInst);           % Adds instrumentation to read signals
tg.start('Stimulation', 'on');          % Starts the real-time machine with a model
tg.stop();                              % Stops the real-time machine/simulation
tg.getsignal('/path/to/block', 1);      % Gets signals using the added instrumentation
tg.setparam('', 'myParam', uint(42));   % Sets parameters while the simulation is running

slrealtime, therefore, provided an interface to manipulate the main model that the test sequence block was meant to do. In the previous process, we had multiple test models so we could have different test sequence blocks. With the knowledge of slrealtime, I could eliminate the test sequence blocks, and thus eliminate the need for multiple test models. One test model would be enough. With one model, my wait time was reduced from 12 hours to only 18-20 minutes.

Reusable functions
#

A test unit is a complex sequential system. The steps used to probe the ECU to capture behavior are only a small part of the test unit. There are steps preceding and proceeding the “actual” test, like setting up parameters for the test, starting the ECU and real-time machine, cleaning up after the test, and most importantly starting and ending the logger at precise timestamps.

flowchart TD;
A[Load model on target]-->B[Set pre-conditions];
B-->C[Start real-time machine]
C-->|Start data logger|D[Execute test logic]
D-->|Stop data logger|E[Clean up]
E-->F[Stop Target]

Majority of these steps are common between each test unit, except for the test itself, and can be abstracted away into reusable functions that all the tests could utilize. Therefore, I authored multiple functions that handled these repeating processes. The Appendix section shows three such reusable functions.

These functions eliminate two disadvantages: lack of reusability and maintainability. The Appendix deep dives into only three functions, however, there were at least forty-one such functions that covered a host of scenarios. Most importantly, they contained logic to get the ECU into a particular state (getVCMInxxx.m files), something that the previous method lacked. The text nature of the functions allowed for easy comparison and version control with tools like git diff, which could not be done with Simulink models. Lastly, if controls software changed, only one function would need updates for the changes to be reflected in all the tests.

├── changeVehSpd.m
├── closeCanalyzer.m
├── deactivateHVIL_AuxContactors_Overrides.m
├── disableBMSDiagEventStat.m
├── ESCOverrides.m
├── forceHVIL_AuxContactors.m
├── getACCTimeGapMode.m
├── getVCMInACC.m
├── getVCMInCharge.m
├── getVCMInDrive.m
├── getVCMInHVOn.m
├── getVCMInStandby.m
├── initiateACCTests.m
├── initiateDCDCSigs.m
├── initiateIMDStatSigs.m
├── initiateTest.m
├── killECUPowerAndStopTg.m
├── loadModelOnTargetPC.m
├── MBD_Cycle.m
├── openCanalyzer.m
├── overrideBattLinkVolt.m
├── overrideBattStrVolt.m
├── overrideFaultBMSHVIL.m
├── overrideBMSDemClntFlood.m
├── overrideBMSMax_Min_ModTemp.m
├── overrideBMSModTempStr.m
├── overrideBMSPrechrgState.m
├── overrideBMSStrState.m
├── pressCCButton.m
├── readValOnTarget.m
├── restartPowerSupply.m
├── runFaultClearUDSRoutine.m
├── separateBLFDataWithCAN.m
├── setHILMainTestSeq.m
├── setInvTrq.m
├── setBMSDemClntFlood.m
├── setBMSEmergReqFault.m
├── setBMSThermalFaultStat.m
├── setRegenLevel.m
├── setVehSpd.m
├── shiftVCMToPark.m
├── switchOffAndEndSim.m

Test Function
#

The reusable functions created the building blocks for writing effective tests. The test function itself was about bringing these functions together strategically. With only a few lines of code, ECU behavior could be changed and observed.

The example test below starts by calling initiateTest from the reusable functions list. It then verifies if the targetPC - the real-time machine - is running. This check allows early test abortion if hardware misbehaves. Next, the code attempts to transition the ECU to HVOn and waits for fifty seconds for the ECU to oblige. The test depends on the ECU achieving HVOn, and should abort if the ECU does not get there in time. These two checks allow the caller to retry the test and most times these issues are resolved by a simple restart. The function then runs the actual test sequence, which in this case is setting one parameter to True. pauses are added to ensure the ECU is in equilibrium after a parameter change and aid in assessing the output behavior. Finally, killECUPowerAndStopTg shuts down the ECU and real-time machine. The last line of the function is a call to the assessment for verifying desired behavior.

function filename = tst_IMDStat_ExcitePulseOff()

    modelName = 'test_model';
    harnessName = 'tst_IMDStat_ExcitePulseOff';
    
    % Initiate test
    [targetPC, filename] = initiateTest(modelName, 1, harnessName);

    % Check if real-time machine started without issues. If not, clean up
    if ~targetPC.isRunning
        closeCanalyzer(false);
        return;
    end

    % Check if the ECU achieves HVOn state in 50 seconds. If it doesn't 
    % something is wrong. Clean up and return
    tic;
    while ~getVCMInHVOn(targetPC, modelName)
        if toc >= 50
            killECUPowerAndStopTg(targetPC);
            closeCanalyzer(false);
            return;
        end
        continue;
    end

    pause(30);
    targetPC.setparam('', 'PowertrainCAN_IMDStat_IMDExcitePulseOff',...
        TrueFalse_t.TrueFalse_True);
    pause(60);

    % Switch to park and turn the vehicle off
    killECUPowerAndStopTg(targetPC);
    closeCanalyzer(false);
    
    % Assessment calls
    testDets = assessResults(filename);
end

function assessResults()
% Rest of the assessment
end

Assessments
#

When I first started this project in January 2024, my group did not have all the toolboxes offered by Mathworks (as an Enterprise license). Most importantly, I did not have the Vehicle Network Toolbox (VNT) that can read CAN and LIN logs within MATLAB. This was a blessing in disguise, of course, else Version 2 would never have been developed; more on that later.

The lack of VNT forced me to use logsout to assess results and determine a pass/fail for a test. logsout is the data dump of all the messages/signals that are “instrumented” in loadModelOnTarget. Once the test is complete, the real-time machine sends all the data into MATLAB’s base workspace environment (global environment). A typical assessResults function would then pick and choose variables from this logsout for assessments.

An example assessResults function follows. evalin brings the logsout variable from the base workspace into the function scope. It finds variables in logsout to get specific signals from them: VehState and VehFaultLvl.These signals are from VehState and VCMDiag messages respectively, both part of Powertrain CAN. The assessment function returns a cell array of column information to be exported to a Excel Spreadsheet using writecell.

function testDets = assessResults(faultLvl, filename)
% Function assesses results for the fault management test cases based on
% the faultLvl that is sent to the caller

    logsout = evalin('base', 'logsOut');
    powCANRx = logsout.find('PowertrainCANRx');
    vcmFaultLvl = powCANRx.Values.VCMDiag.VCMFaultLvl;
    vehState = powCANRx.Values.VehState.VehState;
    desFaultLvlTm = vcmFaultLvl.Time(vcmFaultLvl.Data == faultLvl);

    testDets = {filename, 'Fault management', 'Req-Num-00000', ...
        ['Set fault level to ' char(faultLvl)]};
    ... % rest of the assessment
end

With test and assessment functions, another disadvantage was eliminated: difficulty in trial and error. With test functions, we could reference the same test model for all the tests, and change parameter values to incite a particular behavior from the ECU. Test functions gave more control to the user for making changes quickly and verifying its impact on the ECU in a span of a few minutes.

However, with logsout, the interpolation/extrapolation issue mentioned before still persisted. Messages and their signals were matched with the simulation timestamp (at 0.001sec) regardless of the transmission rate of the message. This disadvantage would not be resolved until Version 2.

Framework
#

Test functions created a strong base from which a framework could be built for regression testing. In regression testing we run the same tests for each software release to find bugs (or regressions) that may have been introduced with new feature development. The goal of a framework was also to assess results and create a report as an Excel spreadsheet, all with one command (or one click).

MATLAB’s unittest came in handy to create the framework. A object-oriented type programming interface, unittest’s functionality was similar to Python’s standard unit test: ability to set up and tear down for each test or for the entire test class, properties accessible throughout the class, and running singular tests within a large test class.

An example of a test class follows. It has one property resultsTable, which is updated with every test’s result. The resultsTable gets a header in the class setup method and is written to an Excel sheet once all the tests are complete in the class teardown method, using writetable. In each test method, a test function is called. After each test - in the test method teardown - the base workspace is cleared of logsout to free up RAM.

classdef BMSVCMValidation < matlab.unittest.TestCase

    properties
        resultsTable;
    end

    methods(TestClassSetup)
        % Shared setup for the entire test class
        function setUp(tc)
        % Sets up a table for writing data and 
           tableHeader = {'Filename', 'Fault Response', 'Test Type', ...
            'Drive Scenario', 'Expectation', 'Actual', ...
            'Result'};
            rt = cell2table(cell(0, 7), 'VariableNames', ...
                tableHeader);

            tc.resultsTable = rt;
        end
    end

    methods(TestClassTeardown)
        % Shared teardown for the entire test class
        function saveExcel(tc)
            dtNow = string(datetime, 'yyyy-MM-dd_hh-mm-ss');
            writetable(tc.resultsTable, ['Results/BMSVCMValidation_' char(dtNow) '.xlsx']);
        end
    end

    methods(TestMethodSetup)
        % Setup for each test
    end

    methods(TestMethodTeardown)
        % Teardown for each test
        function tearDownEach(tc)
            writetable(tc.resultsTable, tc.excelFile, 'WriteMode', 'append');
            tc.resultsTable = tc.emptyResultsTable;
            
            % Clear logsout
            evalin('base', 'clear logsOut;');
            Simulink.sdi.clear;
        end
    end

    methods(Test)
        % Test methods
        function tst_HVStrVoltDelta_tc(tc)
        % Tests HVStrVolt delta for various scenarios and adds the results
        % to a table for excel outputting
            voltDiff = 12.1;
            numOfPacks = 1;
            testDets = tst_HVStrVoltDelta(numOfPacks, voltDiff);
            tc.resultsTable = [tc.resultsTable; testDets];
        end

        ... % rest of the tests
    end
end

Multiple control pathways
#

Similar classes were created for validating each control logic pathway for maximum test coverage.

├── BMSInit
├── CruiseControl
├── Cybersecurity
├── OneOffs
├── BMSVCMValidation
├── TorqueMonitoring
├── VF1_VehicleState
├── VF2_TorqueCommand
├── VF9_Fault_Management
├── VF12_ThermalIndicators
└── VPE_Req

To run the entire regression suite, I created a runRegression function that leveraged MATLAB’s built-in runtests (part of the matlab.unittest.TestCase API). Thus, one command - runRegression() - ran all the tests, assessed results and generated a report.

function runRegression()
% Runs the entire regression suite with all the logical pathways

    logicPaths = { ...
        'BMSInit', ...
        'CruiseControl', ...
        % rest of the classes
    }

    % Run tests using MATLAB's built-in runtests
    cellfun(@(logicPath) runtests(logicPath), logicPaths)
end

Impact (so far)
#

The positive impact of this process was enormous. The ability for quick trial and error eliminated a key source of friction from the old process. I could write as many as ten tests in a day. In the course of two months, I had written 462 tests for the VCM (from 40 tests), a 10-fold increase in test coverage.

Coincidentally, while I was developing this process, Nikola had a critical safety recall for its BEV trucks. In this crisis, test coverage improvement was key among all teams so that Nikola could resolve the issue and return trucks to its customers. In the course of its resolution, Nikola decided to change the BEV’s battery supplier, who needed controls validation of the VCM with their battery management system (BMS). My framework was used to generate critical validation results for the expeditious approval of the new battery packs.

Problems
#

Fissures in the process started to emerge intermittently (my least favorite word in the dictionary, in this context). Occasionally, regression tests stopped with tracebacks, defeating the purpose of this framework. After deep diving, I noticed that the issue was excessive RAM usage by logsout. The large download of real-time machine variables into the base workspace consumed large amounts of RAM.

The last picture shows MATLAB crashing after approximately fifteen tests. I tried multiple solutions to return RAM back to the OS, in isolation, in combination, and at different points throughout my code base. Nothing worked. Invoking JAVA’s garbage collector didn’t help either.

% Things I tried to release RAM back to the OS

% 1. Clear logsout after analysis is complete
evalin('base', 'clear logsout;')

% 2. Clear Simulink cache
Simulink.sdi.clear;

% 3. Invoke JAVA garbage collection
java.lang.System.gc();
java.lang.Runtime.getRuntime().gc;

Version 2
#

With RAM being the main cause of MATLAB crashes, I needed an orchestrator to manage MATLAB sessions, and taking corrective action if something went wrong. As much as I was against another layer of abstraction, it was a good tradeoff for continuing the regression test workflow. Plus, it allowed me to include fail-safes in the code base.

Python and `matlabengine`
#

Due to the rich library support of Python, I assumed - not actually knowing - that there would be a library to run MATLAB commands from Python. I found matlabengine, a library created by Mathworks for running MATLAB commands from Python. More specifically, this library let the user open background MATLAB instances via Python, and run MATLAB commands in those instances while receiving outputs in Python.

start_matlab_project, shown below, used matlabengine to open MATLAB in the background (with the -nodesktop flag), and started the project in that instance. The instance could then be passed around to the caller and stays active until closed.

A non-trivial project in MATLAB, that has many moving parts like imports or global variables, can be saved as a MATLAB Project (.prj file). The .prj file stores information about which files are part of the project and adds them to PATH when the project is started, usually by double-clicking the .prj file or calling the built-in openProject function.

import matlab.engine

def start_matlab_project():
    """
    Starts MATLAB and the project that is relevant to the folder / directory
    we are in
    """
    engine = matlab.engine.start_matlab("-nodesktop")
    engine.openProject("../..")  # type: ignore

    return engine

With Python I could set up robust checks and retry logic if tests failed based on various factors: MATLAB crashes, real-time machine mishaps, vehicle state not achieved, logger failures, etc. One such retry function is shown below. With MATLAB engine as one of its inputs, it runs the test here and retries (three times) if the desired vehicle state is not achieved. All callers of this function had additional logic if status was False.

def retry_if_failed(
    matlab_eng, test_func: Callable, des_vehstate: int, *args, **kwargs
) -> Tuple[bool, str]:
    retries = 0
    veh_state_ach = False

    blf_filename: str = ""
    while not veh_state_ach:
        try:
            blf_filename = test_func(*args, **kwargs)
            cast(str, blf_filename)
            veh_state_ach = is_vehstate_achieved(blf_filename, des_vehstate)
        except:
            retries += 1
            matlab_eng.closeCanalyzer(False)
            continue

        retries += 1
        if retries > 3:
            return False, blf_filename

    return True, blf_filename

Hand-shake
#

As shown before, fifteen tests were MATLAB’s nemesis. However, with ten tests, MATLAB did not crash. Therefore, I devised a way to run ten tests in any MATLAB instance, close it, and continue the remainder of the tests into a new instance.

For any particular logical pathway (discussed here), I had created a class in Python to execute and analyze tests, similar to version 1. A class then had two properties defined in the __init__() method of the class:

class BMSVCMValidation:
    def __init__(self):
        self._current_mateng = None
        self._new_mateng = None
        # rest of the class

Before the first test, self._current_mateng would be initiated:

self._current_mateng = start_matlab_project()
self.run_hv_strvolt_delta()
# Rest of the tests

And after ten tests, that instance would be destroyed to start a new instance:

self._current_mateng.quit()  # type: ignore
self._current_mateng = start_matlab_project()
# Begin next set of tests

The logic worked great for “standalone” tests. However, numerous regression tests, especially for the BEV truck, were run on each battery pack (for a total of nine packs). For example, a typical fault injection would be tested on the first battery pack, then second, and so on till the ninth battery pack. Being sequential “batch” tests, these were run in a traditional for loop in MATLAB (Version 1).

% Loop through 9 battery packs and inject a thermal fault in each
for i = 0:1:8
    testDets = tst_BMSThermalFault(desVehState, i);
    tc.resultsTable = [tc.resultsTable; testDets];
end
...

It was the same in Python as well with one change. Because a .prj file for the VCM HIL project took 1-2 minutes to start up, start_matlab_project was called after the seventh test of the current “batch”. This gave the new instance enough time to set up the project while the first “batch” finishes. As the next “batch” is starting up, the current instance would be closed, and the new instance would take over the tests. This is where self._new_matlabeng served its purpose.

flowchart TB
    subgraph NM[New MATLAB Instance]
    A1["Test 1"]-->C1["..."]-->D1["Test 7"]-->E1["..."]-->F1["Test 10"]
    end
    subgraph CM[Current MATLAB Instance]
    A["Test 1"]-->C["..."]-->D["Test 7"]-->E["..."]-->F["Test 10"]
    end
    D --Start new after 7th test--> NM
    F --Next test on new--> A1

def run_thermal_fault_stat_in_acc(self):
    for i in range(9):
        blf_filename, result = self._run_thermal_fault_stat(15, i)
        row = [
            blf_filename,
            # Other information of the test in this row
            "Pass" if result else "Fail",
        ]
        write_excel_row(self._results_path, [row])

        if i >= 7:
            self._new_mateng = start_matlab_project()

    self._current_mateng.quit()  # type: ignore
    self._current_mateng = self._new_mateng

Assessments
#

The lack of Vehicle Network Toolbox (VNT) had forced me to use logsout for writing assessments. However, messages in logsout were interpolated/extrapolated to the simulation timestamp (0.001s). This was not ideal for validating multiple signals that depended on message rates as prescribed in the DBCs.

Thus, I created a library that would parse CAN and LIN data from log files rather than logsout, use the DBC files to extract relevant data, and perform analysis. I talk about that library, in depth, here. The library’s API provided an important function called extract_messages which extracted messages based on a list. Writing functions in Python was now possible, similar to how they were written in MATLAB, except using venerable libraries like NumPy and Pandas.

retry_if_failed, shown above, calls one such assessment function: is_vehstate_achieved.

def is_vehstate_achieved(blf_filename: str, des_vehstate: int) -> bool:
    blf_filepath = Path(HIL_LOGS, f"{blf_filename}.BLF")
    dbc_files = [Path(TRE_COMMS_DIR, "/path/to/PowertrainCAN.dbc")]
    messages = ["VehState"]

    data = extract_messages(blf_filepath, dbc_files, messages)
    vehstate = data["VehState"]["VehState"].to_numpy()
    vehstate_filt = vehstate[vehstate == des_vehstate]

    if vehstate_filt.size == 0:
        return False
    else:
        return True

Impact
#

The development of Version 2 while reusing Version 1’s test functions was a boon to the group. Not only did it catapult HIL test coverage, but it also ensured that all the tests were run overnight without human intervention. Even if a few tests failed to run, the rest of the tests continued as intended. That was the most important improvement of Version 2 over Version 1. This allowed engineers to deep dive results the next day, and rerun the few missed tests manually.

Another notable impact of Version 2 was the ease of developing libraries that were pip installable from internal servers. The CAN and LIN message extractor was one such library. There were other libraries that were used for standardizing assessments across multiple platforms (BEV and FCEV), making it easy for other engineers to enjoy the same efficiencies for their HIL testing. If control logic changed for a particular pathway common between the two platforms, changes in one library would reflect them for all platforms.

Lastly, and this is somewhat personal, using free and open source tooling for a lion share of the project. Version 2 of this project used many libraries that were FOSS such as NumPy, Pandas, cantools, python-can, etc. Moreover, the open source nature of these libraries, especially of the latter two, allowed me to learn more about how CAN log parsing worked. It also allowed me to fork the library, add Nikola specific changes to them and use them effectively.

Appendix
#

Loading model on target and setting up instrumentation

This function uses three slrealtime functions to load the model on target, remove all prior instrumentation and add new instrumentation. It creates a reusable function for the first block of the test unit flowchart.

flowchart TD;
A[Load model on target]-->B[Set pre-conditions];
B-->C[...]

style A fill:#e879f9,stroke:#e879f9

function targetPC = loadModelOnTargetPC(modelName)
% Loads model on target pc
    
    arguments
        modelName char;
    end
    targetPC = slrealtime('TargetPC1');

    removeAllInstruments(targetPC);
    load(targetPC, modelName);

    hInst = slrealtime.Instrument(modelName);
    hInst.addSignal({'test_torquemonitoring/Data Store Read'}, 1); % BattCANRx
    
    % Add instrument object to target object
    addInstrument(targetPC, hInst);
end

Initiate Test

A separate function that calls the loadModelOnTarget, it performs additional tasks and satisfies tasks related to the next few steps of the flowchart.

flowchart TD;
A[...]-->B[Set pre-conditions];
B-->C[Start real-time machine]
C-->|Start data logger|D[...]

style B fill:#e879f9,stroke:#e879f9
style C fill:#e879f9,stroke:#e879f9

setting pre-conditions before the ECU is powered on (setRegenLevel, targetPC.setparam)
starting the real-time computer (start(targetPC ...))
starting the logger for recording data (openCanalyzer)
starting the power supply to the ECU (restartPowerSupply)
setting more pre-conditions after the ECU is powered on (initiateDCDCSigs, initiateIMDStatSigs)

function [targetPC, filename] = initiateTest(modelName, regenLevel, harnessName, stopTime)
% Initiates the test by starting stimulation and restarting the
% power supply after some time.

    arguments
        modelName char;
        regenLevel single;
        harnessName char;
        stopTime double = 3600;
    end

    % Initiate test
    targetPC = loadModelOnTargetPC(modelName);
    filename = openCanalyzer(harnessName);

    % Start the simulation
    start(targetPC, 'StartStimulation', 'on', 'ExportToBaseWorkspace', true, ...
        'ReloadOnStop', false, 'StopTime', stopTime);

    pause(5);
    setRegenLevel(targetPC, modelName, regenLevel)
    targetPC.setparam([modelName '/BrkPedPct'], 'Value', 0);
    targetPC.setparam('', 'VehicleDynamicsCAN_StblCtrlStat_VDCFullyOp', ...
        ActvInactvSNA_t.ActvInactvSNA_Actv);
    
    % Start power supply
    restartPowerSupply(targetPC);

    % Initiate DCDC signals
    initiateDCDCSigs(targetPC);
    % Initiate IMD stat signals using the function that was created for it
    initiateIMDStatSigs(targetPC)
end

Clean up and stop real-time

So far I have covered steps above the test execution block. This function takes care of steps below the test execution block. It cleans up variables lingering on the real-time target after the test (deactivateHIL_...), shuts the power supply for the ECU to ensure that it is not awake (targetPC.setparam('', 'EndPSParam', 1)), and finally stops the real-time machine (stop(targetPC)).

flowchart TD;
D[...]
D-->|Stop data logger|E[Clean up]
E-->F[Stop Target]

style E fill:#e879f9,stroke:#e879f9
style F fill:#e879f9,stroke:#e879f9

function killECUPowerAndStopTg(targetPC)
% Kills ECU power and stops the target computer

    arguments
        targetPC slrealtime.Target;
    end

    deactivateHVIL_AuxContactors_Overrides(targetPC);

    targetPC.setparam('', 'EndPSParam', 1);
    pause(1);
    stop(targetPC);
end

Tech Stack #

Background #

Old Process #

Main Model #

Test Model #

Build and Test #

Disadvantages #

1. Long Build Times #

2. Difficult Trial and Error #

3. Lack of Reusability #

4. Lack of Maintainability #

5. Message rates skewed #

6. Missing robustness #

Version 1 #

slrealtime #

Reusable functions #

Test Function #

Assessments #

Framework #

Multiple control pathways #

Impact (so far) #

Problems #

Version 2 #

Python and matlabengine #

Hand-shake #

Assessments #

Impact #

Appendix #

Tech Stack
#

Background
#

Old Process
#

Main Model
#

Test Model
#

Build and Test
#

Disadvantages
#

1. Long Build Times
#

2. Difficult Trial and Error
#

3. Lack of Reusability
#

4. Lack of Maintainability
#

5. Message rates skewed
#

6. Missing robustness
#

Version 1
#

`slrealtime`
#

Reusable functions
#

Test Function
#

Assessments
#

Framework
#

Multiple control pathways
#

Impact (so far)
#

Problems
#

Version 2
#

Python and `matlabengine`
#

Hand-shake
#

Assessments
#

Impact
#

Appendix
#