FillChar and StringOfChar under Delphi 10.2 for Win64 Release Target

FillChar and StringOfChar under Delphi 10.2 for Win64 Release Target

I have a question about a specific programming problem in Delphi 10.2 Pascal programming language. The StringOfChar and FillChar dont work properly under Win64 Release build on CPUs released before year 2012. Expected result of FillChar is just plain sequence of just repeating 8-bit characters in a given memory buffer. Expected result of StringOfChar is the same, but the result is stored inside a string type. But, in fact, when I compile our applications that worked in Delphi prior to 10.2 by the 10.2 version of Delphi, our applications compiled for Win64 stop working properly on CPUs released before year 2012. The StringOfChar and FillChar dont work properly they return a string of different characters, although in a repeating pattern not just a sequence of the same character as they should. Here is the minimal code enough to demonstrate the issue. Please note that the length of the sequence should be at least 16 characters, and the character should not be nul 0. The code is below: procedure TestStringOfChar; var a: AnsiString; ac: AnsiChar; begin ac : 1; a : StringOfCharac, 43; if a 1111111111111111111111111111111111111111111 then begin raise Exception.CreateANSI StringOfChar Failed; end; end; I know that there are lots of Delphi programmers at StackOverflow. Are you experiencing the same problem If yes, how you resolve it What is the solution By the way, I have contacted the developers of Delphi but they didnt confirm nor deny the issue so far. Im using Embarcadero Delphi 10.2 Version 25.0.26309.314. Update: If your CPU is manufactured in 2012 or later, additionally include the following lines before calling StringOfChar to reproduce the issue: const ERMSBBit 1 shl 9; 0200 begin CPUIDTable7.EBX : CPUIDTable7.EBX and not ERMSBBit; As about the April 2017 RAD Studio 10.2 Hotfix for Toolchain Issues - have tried with it and without it - it didnt help. The issue exists regardless of the Hotfix.

StringOfCharA: AnsiChar, count uses FillChar under the hood. You can use the following code to fix the issue: System.FastSystem A fast drop-in addition to speed up function in system.pas It should compile and run in XE2 and beyond. Alpha version 0.5, fully tested in Win64 c Copyright 2016 J. Bontes This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http:mozilla.orgMPL2.0. FillChar code is an altered version FillCharsse2 SynCommons.pas which is part of Synopse framework by Arnaud Bouchez Changelog 0.5 Initial version: unit FastSystem; interface procedure FillCharvar Dest; Count: NativeInt; Value: ansichar; inline; overload; procedure FillCharvar Dest; Count: NativeInt; Value: Byte; overload; procedure FillMemoryDestination: Pointer; Length: NativeUInt; Fill: Byte; inline; EXTERNALSYM FillMemory procedure ZeroMemoryDestination: Pointer; Length: NativeUInt; inline; EXTERNALSYM ZeroMemory implementation procedure FillCharvar Dest; Count: NativeInt; Value: ansichar; inline; overload; begin FillCharDest, Count, byteValue; end; procedure FillMemoryDestination: Pointer; Length: NativeUInt; Fill: Byte; begin FillCharDestination, Length, Fill; end; procedure ZeroMemoryDestination: Pointer; Length: NativeUInt; inline; begin FillCharDestination, Length, 0; end; This code is 3x faster than System.FillChar on x64. ifdef CPUX64 procedure FillCharvar Dest; Count: NativeInt; Value: Byte; rcx dest rdxcount r8bvalue asm .noframe .align 16 movzx r8,r8b Theres no need to optimize for count 3 mov rax,0101010101010101 mov r9d,edx imul rax,r8 fill rax with value. cmp rdx,59 Use simple code for small blocks. jl Below32 Above32: mov r11,rcx mov r8b,7 code shrink to help alignment. lea r9,rcxrdx r9end of array sub rdx,8 rep mov rcx,rax add rcx,8 and r11,r8 and 7 See if dest is aligned jz tail NotAligned: xor rcx,r11 align dest lea rdx,rdxr11 tail: test r9,r8 and 7 is tail aligned jz alignOK tailwrite: mov r9-8,rax no, we need to do a tail write and r9,r8 and 7 sub rdx,r9 deccount, tailcount alignOK: mov r10,rdx and edx,32168 count the partial iterations of the loop mov r8b,64 code shrink to help alignment. mov r9,rdx jz Initloop64 partialloop: shr r9,1 every instruction is 4 bytes lea r11,rip partial 47 start at the end of the loop sub r11,r9 step back as needed add rcx,rdx add the partial loop count to dest cmp r10,r8 do we need to do more loops jmp r11 do a partial loop Initloop64: shr r10,6 any work left jz done no, return mov rdx,r10 shr r10,19-6 use non-temporal move for 512kb jnz InitFillHuge Doloop64: add rcx,r8 dec edx mov rcx-6400H,rax mov rcx-6408H,rax mov rcx-6410H,rax mov rcx-6418H,rax mov rcx-6420H,rax mov rcx-6428H,rax mov rcx-6430H,rax mov rcx-6438H,rax jnz DoLoop64 done: rep ret db 66,66,0f,1f,44,00,00 nop7 partial: mov rcx-6408H,rax mov rcx-6410H,rax mov rcx-6418H,rax mov rcx-6420H,rax mov rcx-6428H,rax mov rcx-6430H,rax mov rcx-6438H,rax jge Initloop64 are we done with all loops rep ret db 0F,1F,40,00 InitFillHuge: FillHuge: add rcx,r8 dec rdx db 48,0F,C3,41,C0 movnti rcx-6400H,rax db 48,0F,C3,41,C8 movnti rcx-6408H,rax db 48,0F,C3,41,D0 movnti rcx-6410H,rax db 48,0F,C3,41,D8 movnti rcx-6418H,rax db 48,0F,C3,41,E0 movnti rcx-6420H,rax db 48,0F,C3,41,E8 movnti rcx-6428H,rax db 48,0F,C3,41,F0 movnti rcx-6430H,rax db 48,0F,C3,41,F8 movnti rcx-6438H,rax jnz FillHuge donefillhuge:mfence rep ret db 0F,1F,44,00,00 db 0F,1F,40,00 Below32: and r9d,not3 jz SizeIs3 FillTail: sub edx,4 lea r10,rip SmallFill 154 sub r10,r9 jmp r10 SmallFill: rep mov rcx56, eax rep mov rcx52, eax rep mov rcx48, eax rep mov rcx44, eax rep mov rcx40, eax rep mov rcx36, eax rep mov rcx32, eax rep mov rcx28, eax rep mov rcx24, eax rep mov rcx20, eax rep mov rcx16, eax rep mov rcx12, eax rep mov rcx08, eax rep mov rcx04, eax mov rcx,eax Fallthough: mov rcxrdx,eax unaligned write to fix up tail rep ret SizeIs3: shl edx,2 r9 3 r94 lea r10,rip do3 43 sub r10,rdx jmp r10 do3: rep mov rcx2,al do2: mov rcx,ax ret do1: mov rcx,al rep ret do0: rep ret end; endif The easiest way to fix your issue is to Download Mormot and include SynCommon.pas into your project. This will patch System.FillChar to the above code and include a couple of other performance improvements as well. Note that you dont need all of Mormot, just SynCommons by itself.

I took the test case from the FastCode Challenge - http:fastcode.sourceforge.net I have compiled the FillChar testing tool under Win64, and removed all 32-bit versions of FillChar present in the test. I have left 2 versions of 64-bit FillChar: FC_TokyoBugfixAVXEx - the one present in Delphi Tokyo 64-bit, with bugs fixed and AVX registers added. There is branching to detect ERMSB, AVX1 and AVX2 CPU capabilities. This branching happens on each FillChar call. There is no entry point patching or function address mapping. FillChar_J_Bontes - another version of FillChar, the function from System.FastSystem that you have posted here. I didnt test vanilla FillChar from Delphi Tokyo, because it contains a bug described in my initial post, and it improperly handles ERMSB. Kaby Lake - i7-7700K First column is the alignment of the function. Next 4 columns are results of various tests, lower is better. There are 4 tests in total. First test operates with smaller block, second with larger, and so on. Last column is a weighted summary of all tests. The CPU in the first test is Kaby Lake i7-7700K. Ivy Bridge - E5-2603 v2 Here are the results of a second test, on a previous microarchitecture: Xeon E5-2603 v2 Ivy Bridge, release date Sep 10, 2013, frequency 1.8 GHz, L2 Cache 4 256 KB, L3 Cache 10 MB, RAM 4 DDR3-1333. Ivy Bridge - E5-2643 v2 Here are the test results on a third set of hardware: Intel Xeon E5-2643 v2, frequency 3.5 GHz, L2 Cache 4 256 KB, L3 Cache 50 MB, RAM 4 DDR3-1600. Intel Core i9 7900X Here are the test results on a fourth set of hardware: Intel Core i9 7900X, frequency 3.3 GHz turbo frequency up to 4.5 GHz, L2 Cache 10 1024 KB, L3 Cache 13.75 MB, RAM 4 DDR4-2134.

Комментарии

Популярные сообщения из этого блога

Как преобразовать вертикальную запись в горизонтальную?

Skipping acquire of configured file 'contrib/binary-i386/Packages' as repository … doesn't support architecture 'i386'

How to delete a folder in remote Windows from Linux