Thursday, 26 September 2013

Reading Binary Files into Structures - C++

Hello,

this tutorial aims to give an overview on how to read binary files into structures in C++. These are the methods used for reading the LAS files in DASOS [1], an open source softwaring for managing full-waveform LiDAR data. I found reading binary files pretty interesting and challenging at the same time, so I decided to write a short tutorial about it.

If you want to read a binary file, you should first know how the bytes are structured inside the file. For example the first 10bytes may represents a word, the next 4 bytes may be a float number, the next 6 bytes may be 3 short int numbers, etc. For that reason you should also know how many bytes each type is. If not then, you can use the sizeof(<type>) command and find out. 

Let's assume that we have a file with a word, a float number and 3 short ints as the above example, then a struct with these information should be defined.

typedef struct myStructure
{
   char word[10];           // 10 bytes                    
   float number;            //  4 bytes
   short int A;             //  2 bytes
   short int B;             //  2 bytes
   short int C;             //  2 bytes
}myStructure;

The above should be 20 bytes, but that is not guarantee. While I was writing my code I came across a case where my struct should have been 235 bytes, but sizeof(myStructure) returned 243. This occured because of the way C++ allocates memory for structures. In order to avoid it, you have to use #pragma and specify how your data should be packed. If not then your binary data will not match with the structure since you will try to match 235 bytes into a structure which is 243 and the results will be wrong. 

#pragma pack(push)
#pragma pack(1)
typedef struct myStructure
{
   char word[10];           // 10 bytes                    
   float number;            //  4 bytes
   short int A;             //  2 bytes
   short int B;             //  2 bytes
   short int C;             //  2 bytes
}myStructure;
#pragma pack(pop)

Once the structure is defined the next step is to open the file as follow:
file.open(filename.c_str(),std::ios::binary);
if(!file.is_open())
{
   std::cerr << "File noT found \n"
   exit(EXIT_FAILURE);
}

Then define a variable of type myStructure and read the data into the structure:

myStructure data;
file.read((char *) &data,sizeof(data));

and you are done! The data is now into the structure.

There were occasions where I couldn't read the data straight into the structure, because I didn't know the length of a few variable from the beginning. This problem was solved by first reading all the data into an array of char (each char is a byte) and then use memcpy to copy the data into structures or arrays.

 char allData [sizeOfAllData];
// read all the data from the binary file
file.read((char *) allData,sizeOfAllData);
// in this example we want to read ints
int partOfData = new (std::nothrow) int[numOfInts];
// testing if memory has been allocated for that data
if(partOfData==0) // memory  couldn't not been allocated
{
   std::cout << "Allocation of memory failed\n"
   exit(EXIT_FAILURE);
}
memcpy((void *)partOfData,(void *)allData,numOfInts*sizeof(int));

By the end once we get the information we need, we should close the file:

   file.close();

Please note that most of the code is written by heart, so there may be a few spelling mistakes.

I hope you find this tutorial useful. If you have any comments, corrections or questions please don't hesitate to contact me. =)


More information about the software here:
Miltiadou, M., Grant, M. G., Campbell, N. D., Warren, M., Clewley, D., & Hadjimitsis, D. G. (2019, June). Open source software DASOS: Efficient accumulation, analysis, and visualisation of full-waveform lidar. In Seventh International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2019) (Vol. 11174, p. 111741M). International Society for Optics and Photonics.