mbrtoc16() and mbrtoc32() in C++ with examples

In this tutorial, we will learn about the mbrtoc16() function and mbrtoc32() function in C++. They are both declared in the cuchar header in C++.
We use these functions to work with the char16_t and char32_t datatypes in C++. We will look at the working of these functions with the help of simple examples.

cuchar in C++

As I mentioned earlier, both these functions are declared in the cuchar header. Therefore, to use these functions, our first step is to include cuchar.

We are usually familiar with the char fundamental datatype. We know that it helps us to store ASCII characters. Another common encoding scheme that we use today is Unicode. The Unicode Transformation Format (UTF) refers to various Unicode character encodings including UTF-8, UTF-16 and UTF-32.

We use the cuchar header specifically for working with 16-bit and 32-bit characters which are encoded using UTF-16 and UTF-32.

We include cuchar in code using #include as follows

#include <cuchar>

Declaration of mbrto16() and mbrto32()

mbrto16() takes the following parameters to convert a multibyte sequence to a 16-bit character. It returns the length (in bytes) for such a character.

char16_t * pc16 
const char * pmb 
size_t max 
mbstate_t * ps

Now let us look at this in more detail.

  1. pmb points to the multibyte sequence to be converted.
  2. max gives us the number of bytes to read from pmb. MB_CUR_MAX is a macro constant. It defines the largest value of a multibyte character under the current locale settings. Usually, we pass MB_CUR_MAX as the argument for max.
  3. ps holds the information necessary to maintain state while converting multibyte sequences to characters and vice-versa. If we want the function to use its internal shift state, we can pass a null pointer.
  4. pc16 points to a char16_t object. This object stores the converted 16-bit character. In case pc16 is a null pointer, the object does not store anything. The function, however, still returns a valid length.
  5. size_t is an unsigned integer type. It represents the sizes of objects in bytes.

The declaration of mbrtoc32() is almost the same as that of mbrtoc16.

Return values of mbrtoc16() and mbrtoc32()

Depending on the parameters, the function can return the possible values

0. This happens if we pass a null pointer. It also happens if the character (stored in pc16) is a null character.
(size_t)-3 (i.e. typecasted size_t of minus 3). This happens if the multibyte sequence cannot be represented by just a single char16_t.
(size_t)-2. This happens in the case of incomplete conversion. pc16 does not store a value in this case.
(size_t)-1. This happens if pmb does not point to a valid multibyte sequence. Again, pc16 does not store any value.
The length in bytes required to produce the character. This happens in the case of a successful conversion. max is its largest value.

The working of mbrtoc32() is similar to that of mbrtoc16(). Whereas mbrtoc16() converts a multibyte sequence to a 16-bit character, mbrcto32() converts it to a 32-bit character. It returns similar values as well.

Implementation in C++

The following code demonstrates the use of mbrt0c16() and mbrtoc32().

#include <iostream>
#include <cuchar>

using namespace std;

int main()
{
char16_t char_16_bit;
char32_t char_32_bit;
char multibyte_seq[] = "M";
mbstate_t *conversion_state = NULL;
size_t length;

// conversion to 16-bit character
length = mbrtoc16(&char_16_bit, multibyte_seq, MB_CUR_MAX,
conversion_state);

if (length < 0)
{
cout << "Error in conversion\n";
cout << "Length returned = " << length;

return 0;
}

cout << "16-bit conversion\n";
cout << "Multibyte Sequence to be converted = " << multibyte_seq
<< "\n";
cout << "Length = " << length << "\n";
cout << "16-bit character code = " << char_16_bit << "\n";
wcout << "Value of 16-bit character = " << (wchar_t)char_16_bit
<< "\n";

// conversion to 32-bit character
multibyte_seq[0] = '#';
length = mbrtoc32(&char_32_bit, multibyte_seq, MB_CUR_MAX,
conversion_state);

if (length < 0)
{
cout << "Error in conversion\n";
cout << "Length returned = " << length;

return 0;
}

cout << "\n32-bit conversion\n";
cout << "Multibyte Sequence to be converted = " << multibyte_seq
<< "\n";
cout << "Length = " << length << "\n";
cout << "32-bit character code = " << char_32_bit << "\n";
wcout << "Value of 32-bit character = " << (wchar_t)char_32_bit
<< "\n";

return 0;
}

Output

16-bit conversion
Multibyte Sequence to be converted = M
Length = 1
16-bit character code = 77
Value of 16-bit character = M

32-bit conversion
Multibyte Sequence to be converted = #
Length = 1
32-bit character code = 35
Value of 32-bit character = #

Conclusion

In this tutorial, we learnt about the char16_t and char32_t datatypes in C++. We learnt about the conversion of multibyte sequences to these datatypes with the help of the mbrtoc16() and mbrtoc32() functions. These functions are declared in the cuchar header.

Leave a Reply