Quantcast
Viewing all articles
Browse latest Browse all 21152

Will this work with Unicode?

I have little to no knowledge of character sets other than standard ANSI, and it has been brought to my attention that the StrConv function in conjunction with byte arrays does not return the correct data when used with some character sets. Most of us are aware that VB6 stores all strings as Unicode, but for the ANSI character set the first byte of each character is always zero. In memory, the word "Test" would be stored as:
08h 00h 00h 00h 54h 00h 65h 00h 73h 00h 74h 00h
The first 4 bytes are the length stored as a Long variable. The octets are always stored in reverse order, and for a Long variable, the octet pairs are also reversed. So the above in forward order would be:
00h 00h 00h 08h 00h 54h 00h 65h 00h 73h 00h 74h
The string length is 4, but it is actually stored as 8 bytes, which is what is returned by LenB.
Detecting if an outgoing character is a real Unicode character is relatively simple. A zero first byte (second one in memory) indicates that the string is ANSI, and non-zero must mean that it is true Unicode.

But detecting if an incoming byte string is Unicode is a bit more difficult, as all the secondary zero bytes in ANSI have been removed. I came up with the following code, and hopefully someone with access to a different character set can tell me if it will work or not. Unicode byte strings (byte Array) can be coerced into a string by using the Set function, and ANSI byte strings can use the StrConv function. If you try to coerce an ANSI byte string into a string, it returns a series of question marks half the length of actual string. It doesn't actually return question marks, but certain functions cannot interpret the character because of the non-zero secondary byte. One of those is the Asc function, which is what I have used in the following code, the theory being that if the system can recognize the characters under these conditions, then it must be true Unicode.

J.A. Coutts
Code:

Option Explicit

Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (hpvDest As Any, hpvSource As Any, ByVal cbCopy As Long)

Private Function ByteToStr(bArray() As Byte) As String
    Dim sTmp As String
    Let sTmp = bArray
    'If it is an unreconizable character set, System will return string of "?"
    'half the length of the Unicode string
    If Asc(sTmp) = 63 Then 'ANSI
        ByteToStr = StrConv(bArray, vbUnicode)
    Else 'Unicode
        ByteToStr = sTmp
    End If
End Function

Private Function GetUString(strInput As String) As String
    Dim N%
    Dim lLen As Long
    Dim sTmp As String
    lLen = LenB(strInput)
    sTmp = String$(lLen, Chr$(0))
    For N% = 0 To lLen - 1
        Mid$(sTmp, N% + 1, 1) = Chr$(PeekB(StrPtr(strInput) + N%))
    Next N%
    GetUString = sTmp
End Function

Private Function StrToByte(strInput As String) As Byte()
    Dim bArray() As Byte
    'Examine the second byte
    If PeekB(StrPtr(strInput) + 1) > 0 Then 'Must be Unicode
        ReDim bArray(LenB(strInput))
        CopyMemory bArray(0), ByVal StrPtr(strInput), LenB(strInput)
        StrToByte = bArray
    Else 'Must be ANSI
        StrToByte = StrConv(strInput, vbFromUnicode)
    End If
End Function
Private Function PeekB(ByVal lpdwData As Long) As Byte
    CopyMemory PeekB, ByVal lpdwData, 1
End Function

Private Sub cmdTestANSI_Click()
    Dim bANSI() As Byte
    Dim bUNI() As Byte
    Dim sTest As String
    Dim sRestore As String
    Dim I As Long
    sTest = "ANSI"
    bANSI = StrToByte(sTest)
    Debug.Print "String = '" & sTest & "' Len = " & UBound(bANSI) + 1
    Debug.Print "Byte Array = ";
    For I = 0 To UBound(bANSI)
        Debug.Print Hex$(bANSI(I)) & " ";
    Next
    Debug.Print
    Debug.Print "Restore String = " & ByteToStr(bANSI)
    Debug.Print "StrConv Byte Array = " & StrConv(bANSI, vbUnicode)
    Let sRestore = bANSI
    Debug.Print "Assign Byte Array = " & sRestore
End Sub

Private Sub cmdTestUNI_Click()
    Dim bUNI() As Byte
    Dim sTest As String
    Dim sRestore As String
    Dim I As Long
    sTest = "UNICODE"
    'Since we don't have an actual Unicode string, we will create an artificial
    'one by filling in the first NULL character
    bUNI = StrToByte(GetUString(sTest))
    bUNI(1) = 1
    Debug.Print "String = '" & GetUString(sTest) & "' Len = " & UBound(bUNI) + 1
    Debug.Print "Byte Array = "
    For I = 0 To UBound(bUNI)
        Debug.Print Hex$(bUNI(I)) & " ";
    Next
    Debug.Print
    Debug.Print "Restore String = " & ByteToStr(bUNI)
    Debug.Print "StrConv Byte Array = " & StrConv(bUNI, vbUnicode)
    Let sRestore = bUNI
    Debug.Print "Assign Byte Array = " & sRestore
End Sub


Viewing all articles
Browse latest Browse all 21152

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>