|
| 1 | +Requirements |
| 2 | +============ |
| 3 | + |
| 4 | +Functional: |
| 5 | + |
| 6 | +- It should be possible to have multiple simultaneous connections to the |
| 7 | + device and run parallel tests on it. Example: one serial connection |
| 8 | + and one SSH connection. |
| 9 | + |
| 10 | +- It should be possible to interact not only with "high-level" software |
| 11 | + such as bootloader and OS, but with bare metal as well. |
| 12 | + |
| 13 | +- Support for different types of images: |
| 14 | + |
| 15 | + - pre-built image |
| 16 | + - linaro hwpack + rootfs |
| 17 | + - kernel + ramdisk/rootfs |
| 18 | + - tftp |
| 19 | + - nfsroot |
| 20 | + |
| 21 | +- Support for different bootloaders on same platform. Example: uboot, |
| 22 | + uefi, and second stage (grub) pipeline |
| 23 | + |
| 24 | +- It should be possible to choose which device to boot from. This |
| 25 | + impacts both the deployment code and the boot code |
| 26 | + |
| 27 | +- It must be possible to test advanced multi boot test cases with |
| 28 | + repetition - suspend, kexec, wake test cases. In special it is |
| 29 | + necessary to test wake, suspend, reboot, kexec etc. |
| 30 | + |
| 31 | +- The dispatcher should be able to provide interactive support to |
| 32 | + low-level serial. For some new devices, remote bringup is often |
| 33 | + necessary because developers can't have a device on their desks. When |
| 34 | + necessary, interact with the scheduler to put board online/offline. |
| 35 | + |
| 36 | +Non-functional: |
| 37 | + |
| 38 | +- Speed. Avoid as much overhead as possible. |
| 39 | + |
| 40 | +- Security. Should not require to be run as root. If necessary, let's |
| 41 | + have a separate helper program that can be setuid to do the stuff that |
| 42 | + actually needs root privileges. |
| 43 | + |
| 44 | +- Simplicity. |
| 45 | + |
| 46 | + - Having master image and test system on the same device makes several |
| 47 | + actions harder than they need to be. Master images must be booted |
| 48 | + from the network so that the actualy storage on the devices are left |
| 49 | + entirely to the test system. When possibel, deployment to the test |
| 50 | + system should be done by "just" dd'ing an image to the desired |
| 51 | + device. |
| 52 | + |
| 53 | + - Avoid as much as possible running commands on the target. When |
| 54 | + it is possible to perform some operation in the dispatcher host, |
| 55 | + let's not perform it on the target. |
| 56 | + |
| 57 | +Design |
| 58 | +====== |
| 59 | + |
| 60 | +The proposed design is based around the Pipes and Filters architectural |
| 61 | +pattern, which is reified for instance in the UNIX pipes system. The |
| 62 | +idea is to have every piece of funcionality as self-contained as |
| 63 | +possible, and to be able to compose them in sequence to achieve the |
| 64 | +desired high-level funcionality. |
| 65 | + |
| 66 | +Main concepts in the design |
| 67 | +--------------------------- |
| 68 | + |
| 69 | +- *Device* represents the device under test. |
| 70 | + |
| 71 | +- *Connection* is a data connection between the dispatcher host and the |
| 72 | + device under test. Examples of connections: serial connection, SSH |
| 73 | + connection, adb shell, etc. |
| 74 | + |
| 75 | +- *Action* an action that has to be performed. A Action can be a |
| 76 | + shell commands run on the target, an operations run on |
| 77 | + the dispatcher host, or anything. Actions should be as constrained as |
| 78 | + possible so that all possible errors can be easily tracked. Where |
| 79 | + multiple operations are required, use an action which contains |
| 80 | + an internal pipeline and add the individual commands as actions |
| 81 | + within that pipeline. |
| 82 | + |
| 83 | + Actions must be aggregated into a *Pipeline* - the top level object is |
| 84 | + always a pipeline. Pipelines can repeat actions and actions can include |
| 85 | + internal pipelines containing more actions. Actions have parameters which |
| 86 | + are set during the parsing of the YAML submission. Parameter data is |
| 87 | + static within each action and is used to validate the action before any |
| 88 | + pipeline is run. Dynamic data is set in the context which is available |
| 89 | + via the parent pipeline of any action. Actions must be idempotent and |
| 90 | + must raise a RuntimeError exception if the dynamic data is absent or |
| 91 | + unusable. Errors in parameter data must raise a JobError exception. |
| 92 | + Each command will receive a connection as an input parameter and can |
| 93 | + optionally provide a different connection to the command that |
| 94 | + comes after it. Usually, the first command in a pipeline will receive |
| 95 | + *None* as connection, and must provide a connection to the subsequent |
| 96 | + command. |
| 97 | + |
| 98 | + See `Connection Management`_ below for other requirements that |
| 99 | + Actions must observe. |
| 100 | + |
| 101 | +- *Image* represents the test system that needs to be deployed to the |
| 102 | + target. |
| 103 | + |
| 104 | + Each command in a pipeline will be given a chance to insert data into |
| 105 | + the root filesystem of the image, before the pipeline starts to run. |
| 106 | + |
| 107 | +- *Deployment* is a strategy to deploy a given image to a given device. |
| 108 | + Subclasses of deployment represent the different ways of deploying |
| 109 | + images to device, which depend on both the type of image and on the |
| 110 | + capabilities of the device. |
| 111 | + |
| 112 | +- *Job*. A Job aggregates a *Device* representing the target device to |
| 113 | + be used, an *Image* to be deployed, and *Action* to be executed. The |
| 114 | + Action can be, and usually *will* be, a composite command composed |
| 115 | + of several subcommands. |
| 116 | + |
| 117 | + The chosen deployment strategy will be chosen based on the image and |
| 118 | + the device. |
| 119 | + |
| 120 | +Connection management |
| 121 | +--------------------- |
| 122 | + |
| 123 | +Connections to devices under test are often unreliable and have been a |
| 124 | +major source of problems in automation. This way, in the case where a |
| 125 | +connection failure (disconnection, serial corruption) during the |
| 126 | +execution of a command, that command will be re-tried. Because of this, |
| 127 | +every step performed by a command must be prepared to be idempotent, |
| 128 | +i.e. to do nothing in the case where it has been performed before, and |
| 129 | +more importantly, to not crash if has been performed before. |
| 130 | + |
| 131 | +Exceptions |
| 132 | +---------- |
| 133 | + |
| 134 | +LAVA must be clear on what was the likely cause of an incomplete test |
| 135 | +job or a failed test result. Any one failure must trigger only one |
| 136 | +exception. e.g. A JobError which results in a RuntimeError is still |
| 137 | +a bug in the dispatcher code as it should have been caught during |
| 138 | +the validation step. |
| 139 | + |
| 140 | +- *JobError*: An Error arising from the information supplied as part of |
| 141 | + the TestJob. e.g. HTTP404 on a file to be downloaded as part of the |
| 142 | + preparation of the TestJob or a download which results in a file |
| 143 | + which tar or gzip does not recognise. This exception is used when |
| 144 | + data supplied as the parameters to an Action causes that action |
| 145 | + to fail. Job errors should always be supported by a unit test. |
| 146 | + |
| 147 | +- *InfrastructureError: Exceptions based on an error raised by a component |
| 148 | + of the test which is neither the LAVA dispatcher code nor the |
| 149 | + code being executed on the device under test. This includes |
| 150 | + errors arising from the device (like the arndale SD controller |
| 151 | + issue) and errors arising from the hardware to which the device |
| 152 | + is connected (serial console connection, ethernet switches or |
| 153 | + internet connection beyond the control of the device under test). |
| 154 | + Actions are required to include code to check for likely |
| 155 | + infrastructure errors so that pipelines can retry or fail the |
| 156 | + test, recording whether a retry fixed the infrastructure error. |
| 157 | +
|
| 158 | +- *TestError*: exceptions raised when the device under test did not |
| 159 | + behave as expected. |
| 160 | + |
| 161 | +- *RuntimeError*: Exceptions arising from dynamic data prepared by |
| 162 | + LAVA Dispatcher and failures of Actions not already handled by |
| 163 | + the code. Runtime errors are bugs in lava-dispatcher code. (It is |
| 164 | + also a bug to use the wrong exception type). Fixes for runtime |
| 165 | + error bugs should always include a unit test. |
0 commit comments