Prepare for Kylix: The Compiler and RTL
by Danny Thorpe, Delphi R&D
What is Kylix?
Recent articles and Borland/Inprise press announcements have stirred up a lot of questions
lately. Perhaps the question on many Delphi and C++Builder developers' lips is "Just what exactly is Kylix?" We diligently refer them to the
Borland/Inprise Kylix press announcement and the
Kylix Q&A article, but invariably the response is "Yes, yes, that's all very nice, but that's
marketing stuff. Where's the real info? What will the nuts and bolts look like?
What will port, what won't?"
This is the first in a series of Borland community articles intended to brief you, the Delphi
and C++Builder developer, on Kylix technical bits that you need to be aware of to prepare
yourself and your code for possible porting or migration to the Linux universe. This is a
briefing of directions, issues, and solutions currently under evaluation, not a technical
specification cast in stone.
Disclaimer: This article describes features of software products that are in
development and subject to change without notice. Description of such features here is
speculative and does not constitute a binding contract or commitment of service.
C++ coders will probably notice that these first few articles will talk almost exclusively about
Delphi things. C++Builder and Delphi are siamese twins - where one goes, the other usually
follows. Sometimes Delphi leads with new technology features that appear in the next C++Builder
release, sometimes C++Builder leads with features that appear in the next Delphi release. The
Kylix project encompasses both Delphi and C++Builder tools for the Linux platform. Right now,
the plan is that Delphi for the Linux platform will be the first product produced by
the Kylix project. C++Builder will follow after we get Delphi out the door.
This article is required reading for
Kylix Kickstart attendees. (There will be a test!)
The Nuts and Bolts
Ok, enough prep verbage. Let's get down to business. This article will outline what's new,
what's different, and what's out of the Object Pascal language, compiler/linker, and Run Time
Library (RTL) in Kylix compared to the current Delphi 5 product for Windows. I'm not going to
talk about VCL, or the IDE, or anything else. You'll just have to keep an eye on the Borland
community site headlines to catch articles on those topics later on.
Command Line Tools
What's New?
- DCC as a native Linux executable. Whee!
- All-new built-in assembler, written in portable code. The old built-in assembler was written in TASM. (See next section)
- DCC produces native Linux x86 executables. Ooh-ah. (Ok, so this item should be painfully
obvious but it had to be said or some joker would claim we weren't doing native executables.)
What's Out?
What's Different?
Language Syntax
There won't be a lot of changes to the Object Pascal language syntax. Things that
are commonly mistaken as Windows-isms, such as Delphi's interface and GUID types,
exist just fine in Kylix. A few things that do rely heavily on Windows implementation and have
no equivalent in the Linux OS, such as
Variants and resources, will be reimplemented in Kylix.
What's New?
- Expression evaluation in conditional defines, including access to declared constants:
{$IF Defined(SomeSymbol) and (SomeConstant < 11.0)}
...
{$ELSE}
...
{$ENDIF}
Yes, Virginia, this can be used to check the compiler version with a single $IF expression. We even defined a new conditional symbol, CONDITIONALEXPRESSIONS, so you can hide the new $IF from the old compilers in source code that needs to compile everywhere. Note to self: when vacationing in Australia, leave the laptop in California...
- Pascal Library modules and packages compile to Linux Shared Object (.so) libraries. .so is the Linux equivalent of the Windows .DLL.
- The conditional symbol LINUX is now defined, indicating the source code is being compiled for the Linux platform.
What's Out?
- Variables on absolute addresses. The syntax
var X: Integer absolute $1234; cannot be supported in Position Independent Code and will most likely be thrown out entirely. Using absolute to overlay one variable on top of another variable should not be affected, but it will still earn you some well-deserved ugly looks from your fellow coders.
- The conditional symbol WIN32 is not defined in Kylix.
What's Different?
- Stdcall calling convention will be mapped to cdecl. This should have no
tangible effect on Pascal code, but will affect inline assembler code. Win32 STDCALL
has the callee clean up the stack, but in CDECL the caller cleans up. If you have any
stdcall routines implemented in inline assembler that don't exit through the
normal procedure endpoint, or you have inline assembler code that calls stdcall routines,
you'll have some tweaking to do.
- Safecall calling convention will be mapped to cdecl. Safecall will lose all
its special runtime semantics: no function result checking, no raising exceptions, and
when implementing a safecall routine, no trapping of exceptions. Since this drastically
changes the runtime behavior, we'll probably emit a compiler warning whenever your
Kylix code calls or implements a safecall routine. It would be simpler to say Safecall
doesn't exist in Kylix, but that would break too much existing code. Mapping Safecall
to cdecl will allow most existing code to still run correctly, it just won't deal with
exceptions the way the Win32 code does.
Run Time Library
What's New?
- Portable Variant implementation. We've implemented Variant data transport and
coercion in platform independent Object Pascal code. Only the variant data types
listed as Ole Automation compatible on the Windows side have been implemented on the Linux
side. Win32's 12 byte VT_DECIMAL will not be supported.
- WideStrings are now reference counted. In Windows, the Delphi WideString is implemented as an Ole BSTR to maximize data compatibility with OLE and ActiveX APIs. Ole BSTRs / WideStrings are not reference counted like Delphi AnsiStrings, so WideStrings tend to be a bit promiscuous in copying themselves all over the place.
In Linux, there is no WideString compatibility requirement or issue, so we've reimplemented WideStrings to use the same copy-on-write reference count semantics as AnsiStrings. In fact, Kylix WideStrings use many of the same internal RTL support functions as AnsiStrings! How's that for code reuse!
What's Out?
- Units such as ComObj, ComServ, Activex, Windows, etc;
- Safecall exceptions
- RaiseLastWin32Error, OleCheck, Win32Check
- ExpandUNCFilename. Linux doesn't support UNC (serverdirectory).
What's Different?
- Filename case sensitivity. Applications that assume the file system is case insensitive
(that is, the application alters the case of user input filenames or doesn't preserve the
case of filenames discovered by FindFirst/FindNext) won't work. Period.
- WideChar is (still) 2 byte Unicode. The Linux widechar type, wchar_t, is actually
4 bytes per character. 4 bytes!!! Ouch! The complete UCS specification (here's a
summary) calls for 4 bytes
per character to ensure that there is enough room in the character set to adequately represent all known languages and texts, living and dead, and room for future expansion, such as planetary invasion by Vogons. It would be a shame if Earth's character set couldn't represent Vogon poetry in its true native iconographs.
Anyway, nothing in the Linux kernel actually uses 4 byte widechars - the kernel expects
strings (filenames and so forth) to be encoded in UTF-8. Delphi WideChar and WideString will
remain 2 bytes per character Unicode, which just so happens to be a proper subset of the UCS-4
specification. How do you translate Unicode 2 byte chars to UCS 4 byte chars? Add two bytes of
zeros in front.
- AnsiStrings encoded as UTF-8. In Windows, AnsiStrings can carry multibyte character sequences, dependent upon the user's locale settings. The multibyte encodings for Japanese, Chinese, Hebrew, Arabic, and other locales are all different and usually incompatible. Linux appears to be standardizing on UTF-8, a multibyte encoding of the 4 bytes per char UCS character standard, as the dominant string data carrier.
UTF-8 has the advantage that it can encode the entire UCS character standard across all
known living languages and text systems, and UTF-8 is very easy to parse (unlike some
Windows mbcs encodings). Linux does also have locales and code page character sets, so
we have some reading to do yet to figure out how they mesh with UTF-8.
At this time we're hopeful that we can use UTF-8 for all AnsiString data everywhere
and make locale charsets and codepages a non-issue.
One side effect of UTF-8, though, is that multibyte character sequences can be
more than 2 bytes long. Most code in Windows (including parts of the Delphi RTL)
assumes that mbcs character sequences are at most 2 bytes in length - a lead byte and
a trail byte. I don't believe this two byte assumption would be a problem for
any Western character sets, but some of the Eastern languages and perhaps
mathematical symbol sets could spike up into the 3 byte UTF-8 range. In the
interest of correctness, existing code that looks like
if p^ in LeadBytes then Inc(p);
should be modified to handle the possibility of one or more trail bytes
following a lead byte. Techniques have yet to be determined.
- Resource string efficiencies. In Windows, resource strings are stored in the
executable file in Unicode format (2 bytes per char). Resource string data is
copied into heap allocated memory as a WideString (Ole BSTR) each time the resource
string is referenced at runtime.
In Linux, Delphi resource strings will be encoded in UTF-8 (1 byte per char,
usually) in the executable file. References to resource strings will resolve to
point directly into the read-only resource section of the executable file mapped
into memory by the program loader. No heap allocations, no data copying. It's just
there.
- File times in Unix format. The file time 32 bit integer in Delphi's FindFirst/FindNext
TSearchRec and returned by functions such as FileAge and FileGetDate is a DOS packed time on
Windows. On Linux, these will return a 32 bit integer in Unix time format. Comparing two such
file time integers on the same platform to determine which file was modified more recently will
still work fine. Code that unpacks the DOS time fields (say, to extract the year) will not work with the
Unix file time integer.
- DiskFree, DiskSize. Linux doesn't have drive letters. These functions will probably
be altered or overloaded to accept a path string instead of a drive letter char. Or, these
functions may disappear entirely. To be determined.
- ExtractFileDrive. See above. We'll probably modify this function to always
return an empty string in Kylix.
- Path separator. Linux uses slash '/' to separate directory names in a path, not
backslash ''. If your code uses the SysUtils utility routines like IsPathDelimiter,
IncludeTrailingBackslash and ExcludeTrailingBackslash and the ExtractFilePath family
of functions that already exist in
Delphi 5, you'll be insulated from the / versus platform differences. We'll also introduce
a new constant, PathSeparator, which will contain the appropriate character for the platform.
Feedback
Obviously, publishing this information is not a one way street. We need feedback
from the Delphi and C++Builder developer communities, as well as from the Linux community
at large.
Just one small request: Don't send me email! There are a lot more of you than
there are of me. You can attach comments to this
article or post comments to the Borland public newsgroups. Responding to comments in a public
forum is a much more effective use of Borland's resources than sending essentially the same
response to several email queries. Email responses only educate one person at a time. Newsgroups
and web posts educate thousands in one fell swoop.
I hope you find this Kylix Compiler and RTL briefing informative and helpful. Now, if
you don't mind, I really need to get back to implementing this stuff!
--Danny Thorpe
Senior Engineer, Delphi R&D
Inprise Corporation
|