I have little to no knowledge of character sets other than standard ANSI, and it has been brought to my attention that the StrConv function in conjunction with byte arrays does not return the correct data when used with some character sets. Most of us are aware that VB6 stores all strings as Unicode, but for the ANSI character set the first byte of each character is always zero. In memory, the word "Test" would be stored as:
08h 00h 00h 00h 54h 00h 65h 00h 73h 00h 74h 00h
The first 4 bytes are the length stored as a Long variable. The octets are always stored in reverse order, and for a Long variable, the octet pairs are also reversed. So the above in forward order would be:
00h 00h 00h 08h 00h 54h 00h 65h 00h 73h 00h 74h
The string length is 4, but it is actually stored as 8 bytes, which is what is returned by LenB.
Detecting if an outgoing character is a real Unicode character is relatively simple. A zero first byte (second one in memory) indicates that the string is ANSI, and non-zero must mean that it is true Unicode.
But detecting if an incoming byte string is Unicode is a bit more difficult, as all the secondary zero bytes in ANSI have been removed. I came up with the following code, and hopefully someone with access to a different character set can tell me if it will work or not. Unicode byte strings (byte Array) can be coerced into a string by using the Set function, and ANSI byte strings can use the StrConv function. If you try to coerce an ANSI byte string into a string, it returns a series of question marks half the length of actual string. It doesn't actually return question marks, but certain functions cannot interpret the character because of the non-zero secondary byte. One of those is the Asc function, which is what I have used in the following code, the theory being that if the system can recognize the characters under these conditions, then it must be true Unicode.
J.A. Coutts
08h 00h 00h 00h 54h 00h 65h 00h 73h 00h 74h 00h
The first 4 bytes are the length stored as a Long variable. The octets are always stored in reverse order, and for a Long variable, the octet pairs are also reversed. So the above in forward order would be:
00h 00h 00h 08h 00h 54h 00h 65h 00h 73h 00h 74h
The string length is 4, but it is actually stored as 8 bytes, which is what is returned by LenB.
Detecting if an outgoing character is a real Unicode character is relatively simple. A zero first byte (second one in memory) indicates that the string is ANSI, and non-zero must mean that it is true Unicode.
But detecting if an incoming byte string is Unicode is a bit more difficult, as all the secondary zero bytes in ANSI have been removed. I came up with the following code, and hopefully someone with access to a different character set can tell me if it will work or not. Unicode byte strings (byte Array) can be coerced into a string by using the Set function, and ANSI byte strings can use the StrConv function. If you try to coerce an ANSI byte string into a string, it returns a series of question marks half the length of actual string. It doesn't actually return question marks, but certain functions cannot interpret the character because of the non-zero secondary byte. One of those is the Asc function, which is what I have used in the following code, the theory being that if the system can recognize the characters under these conditions, then it must be true Unicode.
J.A. Coutts
Code:
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (hpvDest As Any, hpvSource As Any, ByVal cbCopy As Long)
Private Function ByteToStr(bArray() As Byte) As String
Dim sTmp As String
Let sTmp = bArray
'If it is an unreconizable character set, System will return string of "?"
'half the length of the Unicode string
If Asc(sTmp) = 63 Then 'ANSI
ByteToStr = StrConv(bArray, vbUnicode)
Else 'Unicode
ByteToStr = sTmp
End If
End Function
Private Function GetUString(strInput As String) As String
Dim N%
Dim lLen As Long
Dim sTmp As String
lLen = LenB(strInput)
sTmp = String$(lLen, Chr$(0))
For N% = 0 To lLen - 1
Mid$(sTmp, N% + 1, 1) = Chr$(PeekB(StrPtr(strInput) + N%))
Next N%
GetUString = sTmp
End Function
Private Function StrToByte(strInput As String) As Byte()
Dim bArray() As Byte
'Examine the second byte
If PeekB(StrPtr(strInput) + 1) > 0 Then 'Must be Unicode
ReDim bArray(LenB(strInput))
CopyMemory bArray(0), ByVal StrPtr(strInput), LenB(strInput)
StrToByte = bArray
Else 'Must be ANSI
StrToByte = StrConv(strInput, vbFromUnicode)
End If
End Function
Private Function PeekB(ByVal lpdwData As Long) As Byte
CopyMemory PeekB, ByVal lpdwData, 1
End Function
Private Sub cmdTestANSI_Click()
Dim bANSI() As Byte
Dim bUNI() As Byte
Dim sTest As String
Dim sRestore As String
Dim I As Long
sTest = "ANSI"
bANSI = StrToByte(sTest)
Debug.Print "String = '" & sTest & "' Len = " & UBound(bANSI) + 1
Debug.Print "Byte Array = ";
For I = 0 To UBound(bANSI)
Debug.Print Hex$(bANSI(I)) & " ";
Next
Debug.Print
Debug.Print "Restore String = " & ByteToStr(bANSI)
Debug.Print "StrConv Byte Array = " & StrConv(bANSI, vbUnicode)
Let sRestore = bANSI
Debug.Print "Assign Byte Array = " & sRestore
End Sub
Private Sub cmdTestUNI_Click()
Dim bUNI() As Byte
Dim sTest As String
Dim sRestore As String
Dim I As Long
sTest = "UNICODE"
'Since we don't have an actual Unicode string, we will create an artificial
'one by filling in the first NULL character
bUNI = StrToByte(GetUString(sTest))
bUNI(1) = 1
Debug.Print "String = '" & GetUString(sTest) & "' Len = " & UBound(bUNI) + 1
Debug.Print "Byte Array = "
For I = 0 To UBound(bUNI)
Debug.Print Hex$(bUNI(I)) & " ";
Next
Debug.Print
Debug.Print "Restore String = " & ByteToStr(bUNI)
Debug.Print "StrConv Byte Array = " & StrConv(bUNI, vbUnicode)
Let sRestore = bUNI
Debug.Print "Assign Byte Array = " & sRestore
End Sub