Gem #27: Changing Data Representation (Part 1)
by Robert Dewar —AdaCore
Let's get started…
A powerful feature of Ada is the ability to specify the exact data layout. This is particularly important when you have an external device or program that requires a very specific format. Some examples are:
type Com_Packet is record Key : Boolean; Id : Character; Val : Integer range 100 .. 227; end record; for Com_Packet use record Key at 0 range 0 .. 0; Id at 0 range 1 .. 8; Val at 0 range 9 .. 15; end record;
which lays out the fields of a record, and in the case of Val, forces a biased representation in which all zero bits represents 100. Another example is:
type Val is (A,B,C,D,E,F,G,H); type Arr is array (1 .. 16) of Val; for Arr'Component_Size use 3;
which forces the components to take only 3 bits, crossing byte boundaries as needed. A final example is:
type Status is (Off, On, Unknown); for Status use (Off => 2#001#, On => 2#010#, Unknown => 2#100#);
which allows specified values for an enumeration type, instead of the efficient default values of 0,1,2.
In all these cases, we might use these representation clauses to match external specifications, which can be very useful. The disadvantage of such layouts is that they are inefficient, and accessing individual components, or in the case of the enumeration type, looping through the values, can increase space and time requirements for the program code.
One approach that is often effective is to read or write the data in question in this specified form, but internally in the program represent the data in the normal default layout, allowing efficient access, and do all internal computations with this more efficient form.
To follow this approach, you will need to convert between the efficient format and the specified format. Ada provides a very convenient method for doing this, as described in RM 13.6 "Change of Representation".
The idea is to use type derivation, where one type has the specified format and the other has the normal default format. For instance for the array case above, we would write:
type Val is (A,B,C,D,E,F,G,H); type Arr is array (1 .. 16) of Val; type External_Arr is new Arr; for External_Arr'Component_Size use 3;
Now we read and write the data using the External_Arr type. When we want to convert to the efficient form, Arr, we simply use a type conversion.
Input_Data : External_Arr; Work_Data : Arr; Output_Data : External_Arr; (read data into Input_Data) -- Now convert to internal form Work_Data := Arr (Input_Data); (computations using efficient Work_Data form) -- Convert back to external form Output_Data := External_Arr (Work_Data);
Using this approach, the quite complex task of copying all the data of the array from one form to another, with all the necessary masking and shift operations, is completely automatic.
Similar code can be used in the record and enumeration type cases. It is even possible to specify two different representations for the two types, and convert from one form to the other, as in:
type Status_In is (Off, On, Unknown); type Status_Out is new Status_In; for Status_In use (Off => 2#001#, On => 2#010#, Unknown => 2#100#); for Status_Out use (Off => 103, On => 1045, Unknown => 7700);
There are two restrictions that must be kept in mind when using this feature. First, you have to use a derived type. You can't put representation clauses on subtypes, which means that the conversion must always be explicit. Second, there is a rule RM 13.1(10) that restricts the placement of interesting representation clauses:
10 For an untagged derived type, no type-related representation items are allowed if the parent type is a by-reference type, or has any user-defined primitive subprograms.
All the representation clauses that are interesting from the point of view of change of representation are "type related", so for example, the following sequence would be illegal:
type Val is (A,B,C,D,E,F,G,H); type Arr is array (1 .. 16) of Val; procedure Rearrange (Arg : in out Arr); type External_Arr is new Arr; for External_Arr'Component_Size use 3;
Why these restrictions? Well the answer is a little complex, and has to do with efficiency considerations, which we will address in next week's GEM.