DCCL v4
|
DCCL messages are encoded and decoded using a set of "field codecs" that are responsible for encoding and decoding a given field. The library comes with a default implementation for all the Google Protocol Buffers (Protobuf) types. It is also possible to define custom field codecs to encode one or more fields of a given message (or even the entire message).
The following pseudo-code gives the process of encoding a DCCL message (using the dccl::v4::DefaultMessageCodec). Note this is not precisely how the actual C++ code works, but is rather given to explain the encoded message structure. Keep in mind that DCCL messages are always encoded and decoded from the least significant bit to the most significant bit.
Definitions:
The following DCCL Message "CommandMessage" gives an example DCCL definition of a basic command, and on the left shows the parts of the encoded message. Note that LSB = least significant byte, MSB = most significant byte.
Example of encoding the DCCL "CommandMessage" for a representative set of values. The table gives the unencoded \(x\) and encoded \(x_{enc}\) values using the formulas in Default Field Codec Reference. Below the table is the encoded message in little endian format (both in hexadecimal and binary notation).
You certainly don't need to know how the fields are encoded to use DCCL, but this may be of interest to those looking to optimize their usage of DCCL or to implement custom encoders. First we will casually introduce the default encoders, then you can reference Table 2 below for precise details.
Remember that DCCL messages are always encoded and decoded from the least significant bit to the most significant bit.
Each DCCL message must have a unique numeric ID (defined using (dccl.msg).id = N). To interoperate with other groups, please see http://gobysoft.org/wiki/DcclIdTable. For private work, please use IDs 124-127 (one-byte) and 128-255 (two-byte).
This ID is used in lieu of the DCCL message name on the wire. It is encoded using a one- or two-byte value, allowing for a larger set of values to be used than a single byte would allow, but still preserving a set of one-byte identifiers for frequently sent messages. This is always the first 8 or 16 bits of the message so that the dccl::Codec knows which message to decode.
The first (least significant) bit of the ID determines if it is a one- or two-byte identifier. If the first bit is true, it's a two-byte identifier and the next 15 bits give the actual ID. If the first bit is false, the next 7 bits give the ID.
Two examples:
Numeric values are all encoded essentially the same way. Integers are treated as floating point values with zero precision. Precision is defined as the number of decimal places to preserve (e.g. precision = 3 means round to the closest thousandth, precision = -1 means round to the closest tens). Thus, integer fields can also have negative precision, if desired. Fields are bounded by a minimum and maximum allowable value, based on the underlying source of the data.
To encode, the numeric value is rounded to the desired precision, and then multiplied by the appropriate power of ten to make it an integer. Then it is increased or decreased so that zero (0) represents the minimum encodable value. At this point, it is simply an unsigned integer. To encode the optional field's "not set", an additional value (not an additional bit) is reserved. To allow "not set" to be the zero (0) encoded value, all other values are incremented by one.
This default encoder assumes unset fields are rare. If you commonly have unset optional fields, you may want to implement a "presence bit" encoder that uses a separate bit to indicate if a field is set or not. These are two extremes of the more general purpose idea of an entropy encoder, such as the arithmetic encoder. In that case, "not set" is simply another symbol that has a probability mass relative to the actual values to capture the frequency with which fields are set or not set.
For example:
The field takes 18 bits: \(\lceil \hbox{log}_2(10000-(-10000)\cdot 10^1 + 1) \rceil =\lceil 17.61 \rceil = 18 \).
Say we wanted to encode the value 10.56:
Enumerations are treated like unsigned integers, where the enumeration keys are given values based on the order they are declared (not the value given in the .proto file).
For example:
In this case (for encoding): AUV is 0, USV is 1, SHIP is 2. After this mapping, the field is encoded exactly like an equivalent integer field (with max = 2, min = 0 in this case).
Booleans are simple. If they are required, they are encoded with false = 0, true = 1. If they are optional, they are tribools with "not set" = 0, false = 1, true = 2
Strings are given a maximum size in the proto file (max_length). A small integer (minimally sized like a required unsigned int field to encode 0 to max_length) is included first to specify the length of the following string. Then the string is encoded using the basic one-byte character values (ASCII).
For example:
Say we want to encode "HELLO":
Like the string codec, but not variable length. It always takes max_length bytes in the message, and if it is optional, a presence bit is added at the front. If the presence bit is false, the bytes are omitted.
Sub-messages are encoded recursively. In the case of an optionanl message, a presence bit is added before the message fields. If the presence bit is false (indicating the message is not set), no further bits are used.
See Table 1 in the DCCL Interface Descriptor Language (IDL) for symbol definitions. The formulas below in Table 2 refer to DCCLv3 defaults (i.e. codec_version = 3 which is equivalent to codec = "dccl.default3"). A few things that may make it easier to read this table:
To define your own codecs: