Cheb's Home Page
 
 
 
Orphus system

Cheb's Home Page

Home
Cheb's Game Engine Quake II facelift
Штошник на ушах
 

 



Chepersy

It's a persistence system, or an ODBMS (object database management system) for FreePascal. Initially it was intended as a basis for a game engine, but its uses are much wider than that.

Русская версия >>

Project page at SourceForge >>

Persistence is ability of data to outlive the execution of the program that created it. Accordingly, the persistence system is to store your entire data structure to a file and also manage its compatibility betwen different program versions.

 

The last (arguably) stable version is 0.8.2
The latest available version is 0.8,99
The version currently (nov. 2014) in development is 0.9.00

Changes from 0.8.2:
 - API completely replaced
 - added the feature of walking the object graph
 - added the garbage collector
 - now it is possible to use arrays of records and enumerated arrays
 - changed the error processing paradigm
 - changed the internal architecture


Changelog 0.8.98 -> 0.9.00:
 1. stream format updated, with header md5 now saved before the header
 2. header and scenario caching, based on the md5 checksums. For each distinct checksum, header is loaded and parsed only once per program execution.
 3. support for headerless streams
 4. TManagedObject.Clone()
 5. Fixed the bug with doubling the field lists in the log.
 6. TManagedObject.Resurrect()
 7. new optional parameter OutputList in CpsLoad()
 8. Scrape made virtual
 9. Added 64-bit fields validation that raises an error if these are not aligned to a 64-bit boundary
 10. Added stubs for future support of the "fixed32" type.

TODO list for 0.9.01:
 1. Make sure the alignment prediction and validation does work without the forced packing with {​$packrecords 4}
 2. Mmake posssible registering types that are aliases of Double (e.g. TDateTime) 3. Implement the fixed32 type and its converters.

TODO list for future versions:
 1. As soon as FPC 2.8 is out, implement handling strings that have a variable code page
 2. As soon as FPC 2.8 is out, make sure {​$optimization noorderfields} works as intended

 

User's Manual

Overhead for programmer

For it to work, you need to give the persistence system a complete list of all your classes and their fields. This is done via special API, I call this process "registering". To sweeten the pill, the registering mechanism has a powerful validator that gives detailed explanations of all your mistakes and even gives you advices in some cases. See more in the "Registering" section.

Overhead for the machine

There's an extra 64-bit field in each class - that's all!

 

This is a manual for the version 0.8,99 which is an unfinished 0.9. Some features may be not implemented yet, plus there may be discrepancies between this manual and the actual API.


Advantages

- Full forward and limited backward compatibility of your saved data between various program versions.

- Supports circular graphs and cross-links between the class instances.

- Supports partial graph saving by selecting objects using bit mask

- Keeps compatibility when you change the field type (integer to extended or ansistring to widestring).

- Rapid saving and loading (over a million class instances per second on a 1.6GHz CPU).

- Special virtual methods BeforeSaving and AfterLoading allowing you to extend the system's functionality.

- API allows you to walk the entire object tree: not exactly OQL, but close.

- Customizable garbage collector that can run in either automatic or manually initiated mode.

Downsides

- Barely compatible with multi-threading.

- Slow development by one man.

- No support for other platforms beside intel-32. The port to PowerPC is theoretically possible but the probability of such event is zilch.

- You need to register all your classes manually (RTTI is too incomplete)

- You can only use classes that are descendants of TManagedObject - which means if you need a TstringList, you'll have to write one yourself.

- FreePascal only. The support for Turbo Delphi dropped as of 0.8.99.

- There is a very good possibility Chepersy will turn out incompatible with the future compiler versions. The system is one big hack.

THE PARADIGM

(The intended way to use it)

- The data/object structure must have one root class

- All of your classes must be descendants of TManagedObject, which supplies the necessary functionality.

- The enumerated types are one of the cornerstones. I exploit the fact that in Pascal, unlike C, the numeric values are assigned to the constants by the compiler, thus programmer is abstracted from the actual numbers. The arrays indexed by these types ("Enumerated arrays" as I call them) and sets based on the enumerated types are converted accordingly, however you shuffle the constants in the declaration, remove them or add new ones. The serialization routines work with the constant names rather than the actual ordinal values.

USAGE

(The steps you need to start using Chepersy)

1. Add {$include chepersy_defs.inc} to all your units, above the key word "Unit". Otherwise your program won't run, or worse, will generate trashed data files and you'll get crashes instead of backward compatibility

2. Add the modules typinfo, chepersy to your uses list.

3. Register your types and classes. See more in the sections "Registering your types" and "Registering your class".

4. After your types and classes are registered, you can use

function CpsStore(o: TObject; Target: Tstream;
    XorMask: dword = $ffffffff; AndMask: dword = $ffffffff): longbool;
function CspLoad(Source: TSTream): TObject;

, where o is the root class instance of your data structure.

It is your responsibility to create and destroy the streams and to set their position to 0 before loading. Chepersy doesn't require Seek() thus allowing you to use the compression streams.

The last two parameters allow you to save your objects selectively, using the bit mask. You can safely omit them if you had never used the CpsMask field before. See details in the "Walking the graph" section.

The first call to any of these functions makes further registering impossible.

5. (important) Don't forget: there are *no* constructor calls when your data structure is being loaded! The class instances are created "manually" via direct call to NewInstance(). Employ the AfterLoading() virtual method if your class has any connections to the «external» data not included into your data structure. (Example: an OpenGL context or texture id)

REGISTERING YOUR TYPES

(I wish RTTI would allow to avoid this step. If wishes were fishes...)

To register your types and fields, you need the unit typinfo added to your uses clause. Most of the registering procedures require as their input value the PtypeInfo resulting from call to TypeInfo(YourType). Unfortunately, RTTI is incomplete and you need to input many things manually.

Note: attempt to register the same type twice will be silently ignored.

So, meet your best fiend: the procedure RegType();

Integer and real numbers, strings: Everything Pascal has is already registered.

Pointers: Cannot be registered, incompatible with the Chepersy paradigm. A class can have fields of such types, but they are always skipped (see the «Registering your class» section).

Classes: See the "Registering your classes" section.

Metaclasses:

Considered already registered for all known classes. Any unknown metaclasses will be downgraded to their known ancestors, but there's no safety checking here, since Chepersy treats them all as the base metaclass, CManagedObject.

Enums:

RegType(TypeInfo(YourType));

All information for these types is available from RTTI. The unit where you declare them should be compiled with {$MINENUMSIZE 4} (already included in chepersy_defs.inc). Your enum can be a subrange type (like 0..20) but its low value must be zero and its high value no higher than 255.

Enumerated arrays (i.e. those indexed with enums):

RegType(TypeInfo(YourType), TypeInfo(BaseType), TypeInfo(IndexEnumType));

RegType(TypeInfo(YourType), '*BaseTypeName', TypeInfo(IndexEnumType));

RegType('*YourTypeName', '*BaseTypeName', TypeInfo(IndexEnumType));

When you change the order of constants in the enum or add/remove some, the serialization routine automatically shuffles the array elements to their new places and fills the new ones with zeros.

Dynamic arrays:

single-dimensional:

RegType(TypeInfo(YourType), TypeInfo(BaseType));

RegType(TypeInfo(YourType), '*BaseTypeName');

multi-dimensional:

RegType(TypeInfo(YourType), N, TypeInfo(BaseType));

RegType(TypeInfo(YourType), N, '*BaseTypeName');

The incursion level not limited, but currently only one- and two-dimensional arrays can be resolved if their base type is unknown. Three and more dimensional dynamic arrays with unknown base type will make your data file unreadable.

If your multi-dimensional array consists of declared single-dimensional array types, it would be wise to register it as such, to avoid potential compatibility problems in the future.

 

Note for the in-progress v0.8,99: not implemented yet! Low must be zero, your array must be single-dimensional, size conversion at reading not supported!

Difference from v0.8.2: Just add an asterisk to your old type name, and everything will stay compatible with your old data files.


Static arrays:

RegType(TypeInfo(YourType), TypeInfo(BaseType), [Low1, High1, ... LowN, HighN]);

Sets:

RegType(TypeInfo(YourType), TypeInfo(BaseEnumeratedType));

You can only use sets based on the enums. The limit is 256 elements. The serialization routines automatically re-shuffle the set bits when the base enumerated type changes.

 

Difference from v0.8.2 that took omitted type from the previous field: Now it takes the type of the closest *next* field that has a type definition. Such way it's closer to the Pascal syntax.

Records:

RegType(TypeInfo(YourType), SizeOf(YourType), []);

The packed records determined from the unpacked ones using the size supplied by you.

The field list format: name1, type1, ... nameN, typeN, where:

Field name is just a string. if it begins with the minus symbol, the field will be marked as "skipped" - it will be ignored at saving and filled with zeros at loading.

Field type can be

 a. TypeInfo(YourType)

 b. String name preceded by asterisk

 c. the constant CPS_POINTER or string '*pointer' - for the pointers and pointer-alike fields. These are always skipped.

 d. omitted. Use the declaration from the next field. Default is dword.

Examples:

RegType(TypeInfo(TMyRecord), sizeof(TMyRecord), ['a', 'b', typeinfo(integer), 'c', typeinfo(byte)]);

RegType('*TMyRecord', sizeof(TMyRecord), ['a', '-b', 'c', '*meine statishch array']);

Important note: there is no way to check if your fields are listed in the correct order. Only the overall size check is performed. So be careful. Also, watch out for the {$packrecords ...} directive: it can become your undoing.

REGISTERING CLASSES

(A bit boring thing to do but there's no way around it)

 

Important note: in v0.8.95 I briefly introduced the mechanism for using any classes, not just descendants of TManagedObject.

I removed it in 0.8.96: not only was it awkward and limiting, it also contained an algorithmic black hole. I don't want to deal with that, there is no pressing need for this stuff - so there will be no support for custom classes. Ever.

1. Derive your class from TManagedObject, overriding its virtual method RegisterFields().

This method is responsible for registering your class' fields. In an ideal world making all yor fields published would be sufficient - but alas, the RTTI is too weak.

type TMyClass = class (TManagedObject)
  а, b: TBlaBlaBla;
  c: integer;
public
  procedure RegisterFields(); override;
end;

 

Difference from v0.8.2 that took omitted type from the previous field: Now it takes the type of the closest *next* field that has a type definition. Such way it's closer to the Pascal syntax.

2. Register your fields

This operation is performed from the RegisterFields method of your class, which is called by RegType()/RegClass() (see below).

First you register your field's types if they aren't registered yet.

Second, you call inherited;

Third, you call the ListFields() and feed it the complete list of your class fields. The order must match their declaration order. The list format consists from three elements repeated for each field:

1. Name. It doesn't need to match the real field name, you can call it '$@# ,,mah-feeld' - but it should not begin with the asterisk. If it begins with the minus, a skipped field will be registered, with its name excluding the initial minus.

2. Field address.

3. Field type. It may be declared as TypeInfo(YourType) or '*YourTypeName' or CPS_POINTER or '*pointer' or CPS_METACLASS or '*metaclass' (the four last ones for the pointers and pointer-like skipped fields and for metaclasses (think of TClass)). You can omit the type, the next field's type will be used then.

procedure TMyClass.RegisterFields();
begin
 RegType(TypeInfo(TBlaBlaBla), ...);
 inherited;
 ListFields([  'a', @a, //you can omit TypeInfo
  'b', @b, TypeInfo(TBlaBlaBla),
  '-c', @c, TypeInfo(integer) //a skipped field
 ]);
end;

3. Call RegClass(TMyClass) or RegType(typeinfo(TMyClass))

This in turn leads to creation a class instance (avoiding the regular constructor call, of course). I was unable to find workaround to this, many necessary functions just won't work without a class instance. Then its RegisterFields() method is called.

Important: if, at this moment, your class' ancestors aren't registered yet, they will be auto-registered, recursively, until TManagedObject is reached.

4. The virtual methods BeforeSaving() and AfterLoading()

..allow you to do various tricks and manually resolve the skipped fields (like those where you store system handles or OpenGL texture names).

BeforeSaving() is called during the saving process. Some class instances may be already saved at this point.

AfterLoading() is called after loading *all* class instances. The call order is reverse, which means that all class fields of your class are valid at this moment (i.e. their AfterLoading have already been called).

TYPES AUTOCONVERSION

(You saw it too late that your type should have been Glfloat instead of Integer...)

It's already built in and tested, it works as intended. All the Pascal numeric types are cross-compatible at loading. The same goes for the ansistring/widestring pair.

The extensibility by user is currently removed.

WALKING THE GRAPH

(Routines allowing you to walk and mark your class graph)

It's what promotes Chepersy over being a simple persistence system to the proud ranks of the database engines.

Everything here is extremely simple. First you create your own procedure according this template:

type TcustomWalkProc = procedure(o: TObject);

Then, you pass it as a procedural variable to this:

function CpsWalkGraph(o: TObject; proc: TcustomWalkProc): boolean;

Your procedure will be called once per each object in the graph whose root is o.

The rest of functionality is in your hands. You can filter the objects by some criteria, or call their methods, or make them wear funny hats.

Bit masks

Since v0.8.95 there is a new dword type field CpsMask in the TManagedObject class. It is used to save the objects selectively, providing you with 32 independent bit flags selected by the new parameters XorMask and AndMask of the CpsStore function. The XorMask tells which bits to invert before checking and the AndMask - which to check. If any of these bits in non-zero, the object will be written. Otherwise it will not be, and the field/array element containing it will be NIL after loading. The size of the arrays of objects is not affected, they'll just contain some NILs. You have the AfterLoading() method to take care of that.

Do not forget: the mask is a part of the object, so it gets stored and loaded with it! But you have the

function CpsMarkupGraph(o: TManagedObject; SetMaskBits, ClearMaskBits: dword): longbool;

that allows you to mark your entire object graph.

Since v0.8.99 the bits 30 and 31 are reserved for the garbage collector. Trying to change them via CpsMarkupGraph will raise an assertion error, but still be careful, don't change them accidentally.

GARBAGE COLLECTOR

()

ERROR PROCESSING

(How Chepersy reacts to exceptions and other unexpected thinngs)

Any errors at registering raise an exception. You must catch it yourself and check the Chepersy error log (see below).

Any errors and exceptions at loading, saving and walking the graph are incapsulated. The corresponding functions return False or Nil. You should see these three TstringList's for more information:

CpsParserWarnings, //gets filled at loading if there are any classes that need conversion

CpsLog,

CpsError

If you need to display an error message, output the CpsError elements in a backward order to get a well structured explanation.

If possible, a rollback is performed, with deleting the loaded class instances. But there is no guarantee against instability and memory leaks. The initial paradigm was to perform an emergency exit if anything went wrong with Chepersy.

Objects skipped at loading

These are automatically added to the garbage collector graveyard. See the "Garbage collector" section for more details.

PERFORMANCE TIPS

(How to achieve higher flexibility and speed of serialization)

Some types are "accelerated". Whenever possible, the consecutive fields and arrays of these types are handled as a solid binary block, thus significantly speeding up the seriazlization.

These are: all enumerated types, longint, dword, longbool, int64, qword, single, double.

Any other types require a few procedure calls per field or array element (imagine how wasteful the array of byte is).

The ansistrings and widestrings aren't that bad, their contents is treated as a solid binary block as well.

COMPATIBILITY

(The known (in)compatible Pascal compilers list)

(Only intel-32!)

Free Pascal 2.2.0: compatible.

Free Pascal 2.2.2rc1: compatible.

Free Pascal 2.2.2: compatible

Free Pascal 2.4.0: compatible, it's the primary development tool.

Any Delphi versions: NOT compatible and will never be.