Monday, August 8, 2011

Quadratic Bézier Curves Stroker

Introduction

Stroking is a common task of every graphics library. When I started writing Fog-Framework, I used algorithms from antigrain to work with vector geometry. However, the antigrain can only stroke polygons, so quadratic and cubic Bézier curves must be first approximated into the line segments. When a graphics path is stroked, the information about curves is definitely lost, and very acute angles are not stroked properly unless very large value is used for property called approximation-scale (it's called flatness in other graphics libraries).

This inaccuracy can lead into problems. For example to stroke a path twice the curve must be approximated into many line segments. More line segments means more vertices, which means more cpu power to transform them, for example.

The purpose of this article is to demonstrate the work to replace the old polygon stroker with a new curve-to-curve stroker. The actual limitation is that current version (2011-08-08) can stroke quadratic Bézier curves only.

Background and Implementation Notes

The new curve-to-curve stroker is highly inspired by the work of Mike "Pomax" Kamermans, see his excellent resource A primer on Bézier curves, section How to scale curves. Please try the interactive example provided in the linked article. The quadratic Bézier curve scaling is definitely the way to go. This method is precise for "safe" curves, which are defined as curves where the angle between the two lines which define the curve is not acute. I found that this condition is not the best one and I adjusted the angle to 100 degrees.

If the "safe" curve criteria is not met, then the curve must be subdivided into smaller ones. There are several ways how the subdivision criteria are created. The author of Bézier primer used subdivision based on curve extremas (curve is first rotated so extrema is always found, if curve is not degenerate), but I noticed that the offset of commonly used curves is not precise, so I adjusted the mechanism how the curve is subdivided and I added some conditions for subdivision. This means that Fog-Framework generates more curves, but I found it more precise (for me it's perfect).

Results

The results of new curve-to-curve stroker (left) vs the antigrain stroker (right). The number of resulting curve segments generated by new curve-to-curve stroker depends on the curvature of the original curve, the number of line segments generated by antigrain stroker depends on the curve length and the flatness which was set to 0.25 (the default value).

As can be seen, the degenerate cases are handled more precisely by curve-to-curve stroker, because the corners are approximated using curves, instead of lines. I think that the curve-to-curve stroker generates a lot of curves in this case, but the resulting shape looks better than shape approximated by using line segments only.

Conclusion

This is only the beginning of my stroker implementation. I'd like to post another article related to the stroking of cubic Bézier curves after finished. The final implementation will be part of Fog-Framework, so everyone can use it without restrictions.

Friday, February 25, 2011

News

News

Some people are interested about the news in Fog-Framework, because the last commit is more than two months old. It's simple, I'm working on something really new that took me all the free time. I'd like to summarize my goals, my work and the future of Fog-Framework in this post.

First of all, I'd like to highlight my primary goal - make the library usable for consumers! This is really my primary goal and all my work is focused on it. It means that API must be stabilized, conventions must be finished and long-time incomplete features completed. My second goal is to make the library really fast and full-featured. The speed is very important and the incoming changes increase the performance of the library a bit. I think that Fog-Framework is going to be the fastest open-source 2d graphics library available.

The summary of incoming features:

  • Support for 32-bit integer, 32-bit/64-bit floating point geometry (integer geometry is limited and only used for basic shapes like rect/box).
  • Support for high-quality image formats (16-bit per component) including high-quality rasterizer, compositing and image-filtering.
  • Support for 24-bit RGB image format and 16-bit 555/565 image formats.
  • Support for LCD anti-aliasing and rasterization.
  • Increased number of compositing functions for SRC and SRC_OVER operators, reduced number of all other compositing operators (these are not so used).
  • Remove non-premultiplied ARGB format. It always caused problems.

Of course not everything is completed, but I did really progress to support all of these features and to allow creating additions in the future. The number of image formats will probably not grow, because I think that all (A)RGB formats are included. Only exceptions are HSV/CMYK and others, but do we need these?

The summary of extensions which will be implemented this year:

  • Shaders - shader language which can be used as a source pattern generator, pixel compositor and image filter.
  • C/C++ plugins - custom pattern generator and compositor extensions written in C++.
  • JIT compiler - my long-time dream.

It's now really easy to think about shaders and C/C++ extensions, because I can simply use floating-point based mathematics for this purpose. The AHSV conversion will be possible on-the-fly so custom shaders / compositing can bring really cool effects for UI and other graphics.

Summary

I know that these changes are huge. It took me a lot of time to refactor the library, create new API conventions and helper libraries (Fog-Face), but I'm sure that these steps were needed to make library more usable for various consumers and various demands. How many libraries support 64-bit image formats? How many libraries allow to use C++/Shader extensions? This all is going to be available in Fog - optimized to the death, portable and easy to use.

Other thinking

I saw many graphics libraries and it's interesting that Qt and Cairo is using division by 255 with floating-point precision (there is trick to do it fast). I used that division in old code too, but now I changed everything to scale to 256, adding 1 to the component or alpha to make it 0...256. The main reason why the div255 is used is the fact, that image formats are based on BYTEs and only value from 0 to 255 can be stored in it.

But the main problem is that when using fixed-point rasterizer, bilinear filter and all other effects/mathematics, the resulting scale is always from 0 to 256, inclusive. For example AGG rasterizer produces values from 0 to 256, but the 256 is saturated to 255 by sweep_scanline(). So I introduced some new span formats to use 0...256 scale (using WORD) and 0...65536 scale (using DWORD) and rasterizer is now able to create mask using non-saturated values. More memory impact is not visible, because each scanline produced by rasterizer is discarded after use. The positive thing is that blending is a bit faster, because division by 256 is really simple compared to optimized division by 255.

Update: Now only mask is scaled, alpha-blending uses div255.

I'd like to add specialized solid-color rasterizer that will be able to fill image directly without producing a scanline for some pixel formats. This should be faster and it's the upper limit where I can go keeping high-quality AGG rasterizer.

And when the code will be ready? I hope that early. I need to integrate Fog-Framework with one commercial product I'm working on so I really need it to be stable and fast:)

Friday, October 22, 2010

Vector geometry resources

Introduction

Small post about resources I found useful when working on Fog-Framework. Mainly about geometry - transforms, bezier curves, etc...

Code

Papers, Materials

Computer Aided Geometric Design (http://cagd.cs.byu.edu/~557/)

Wiki:

Interactive

More...

If you need more just google for them. If you think that I'm missing something just leave a note;-)

Monday, September 20, 2010

News, Class Naming Schema, Transforms

Introduction

Class naming schema describes how classes are named according to their features. Fog-Framework contains many version of classes describing the same problem. The main difference between them is, however, the data-type used to store class members. The main goal of this post is to introduce new class naming schema.

History

There were used several schemas in history, but it was never consistent in all library modules. For example Point and PointF were initial versions of vertex-based classes. Point was integer version and PointF was double-precision floating point version.

This way was changed when I introduced classes to also work with points based on 32-bit float data type. The reason to add also this version is simple - OpenGL support and improving performance of Fog-Framework running by hardware without double-precision floating point unit.

New scheme was to prefix class using the data-type. So Point was changed to IntPoint and PointF was changed to DoublePoint, introducing new class named FloatPoint. This scheme was better, because you are always looking at the data-type, but it's longer and the prefix doesn't look good.

There are several classes where schema was never applied. For example Argb class describes color using scalar 32-bit number (8-bit per component), but how to name that if I want two versions, floating-point based and integer-based? FloatArgb looks really weird to me so I started thinking about a good replacement.

Class Naming Schema

After some hours of work I decided to use suffix-form. Each class starts with its name and if there are several versions of it the suffix is added. Suffix always starts using Capitalized letter and can be only one character long. For example there are PointI, PointF and PointD classes that represents point now.

Argb class was renamed to ArgbI and ArgbF class was introduced. I added more classes related to color management so there are also AhsvF and AcmykF classes.

List of classes used for color management

  • ArgbI - ARGB color using 8-bit per entity (packed to 32-bit scalar).
  • ArgbF - ARGB color using 32-bit float per entity.
  • AhsvF - AHSV color using 32-bit float per entity.
  • AcmykF - ACMYK color using 32-bit float per entity.
  • Color - Class that can hold ArgbI, ArgbF, AhsvF or AcmykF.

List of geometric classes:

  • PointI - Point (X, Y) using 32-bit integers.
  • PointF - Point (X, Y) using 32-bit single-precision floats.
  • PointD - Point (X, Y) using 64-bit double-precision floats.
  • SizeI - Size (W, H) using 32-bit integers.
  • SizeF - Size (W, H) using 32-bit single-precision floats.
  • SizeD - Size (W, H) using 64-bit double-precision floats.
  • BoxI - Box (X1, Y1, X2, Y2) using 32-bit integers.
  • BoxF - Box (X1, Y1, X2, Y2) using 32-bit single-precision floats.
  • BoxD - Box (X1, Y1, X2, Y2) using 64-bit double-precision floats.
  • RectI - Rect (X, Y, W, H) using 32-bit integers.
  • RectF - Rect (X, Y, W, H) using 32-bit single-precision floats.
  • RectD - Rect (X, Y, W, H) using 64-bit double-precision floats.
  • TransformF - 3x3 matrix using 32-bit single-precision elements.
  • TransformD - 3x3 matrix using 64-bit double-precision elements.
  • PathF - Path using 32-bit single-precision vertices.
  • PathD - Path using 64-bit double-precision vertices.
  • PathBuilderF - Path-builder using 32-bit single-precision vertices.
  • PathBuilderD - Path-builder using 64-bit double-precision vertices.

Goals

The main reason to change the naming schema of classes was the future. Any class can be extended without worrying about syntax from now. For example 64-bit ARGB color can be added in the future (ArgbD), without breaking the API and binary compatibility.

News - Transform

I'd sucessfully implemented new matrix classes called Transform[F|D] which replaced [Float|Double]Matrix. I used similar technique which can be found in Qt/Skia - optimizing matrix transformation / point transformation using a matrix type. The matrix types are identity, translation, scaling, rotation, affine and projection.

As you noticed there is new transform not available before - Projection. The Transform class is 3x3 matrix that allows to produce nice transformations in perspective. This kind of transformation is available nearly in all other libraries

News - Color Management

There is also new Color class that allows to store into it ARGB, AHSV or ACMYK color. The class remembers which color model was used to store a color so it should be suitable to planned PDF/PS support (CMYK enabled output files). This feature will be also used by Fog-Svg library.

Other Plans

Because I added some exciting features (3x3 transform) I need to rewrite some things to match and coöperate with the new API. The first candidate for rewriting is curve flattening code. It's still mainly original code from AntiGrain and it has some disadvantages:

  • No clipping in the algorithm, must be clipped later.
  • There are several papers that describe better algorithms (BezierFlattening).
  • Curve flattening + stroke generator should be done ideally together.
  • Stroking to curves should be possible.

The next big thing is to refactor a Painter and to improve a multithreaded rasterizer (after I removed rasterizer-pool there is extra cost to create/destroy rasterizer per paint command).

Conclusion

Refactorization is good and I'm making it often, but the class naming schema breaks source/binary compatibility so it shouldn't be changed so often. I hope that I found the schema that will be used long time and that will be sufficient for all possible features which can be implemented later.

NOTE: These changes haven't been committed yet.

Tuesday, August 31, 2010

Analytic Rasterizer Improvements

Introduction

This article covers possibilities to improve the analytic rasterizer implemented by AntiGrain (originally by freetype or libart). It's some time ago I posted a question in an antigrain mailing list, but it's probably very specific so nobody responded. It's all about compressing cell structures generated by the analytic rasterizer.

Explanation

Before I start I'd like to explain how antigrain rasterizer works. Please never cite this explanation, because it's not detailed and it can be crippled by my level of English.

Imagine that you need to paint a triangle. Triangle is basic primitive and for each rasterizer it's a minimum. These steps are needed to rasterize it using an analytic method:

  1. Serialize the input coordinates (this is your call to fillTriangle() for example)
  2. Clip incoming vertices
  3. Iterate over edges and create cells
  4. Sort cells
  5. Sweep and composite each scanline

Bunch of cells are generated for each rasterized edge. The count of cells is relative to the length of the edge. Please read this article that covers the details.

For us is important that original cell structure in antigrain looks like this:

struct Cell
{
  int x;
  int y;
  int cover;
  int area;
};

This structure takes 16 bytes for x86/x64/Arm/... platforms. It's needed to sort generated cells by Y and then by X. Antigrain solves this problem using this technique:

  1. Sort first by Y using histogram. This is fastest than qsort and it's about two loops.
  2. Sort by X using bubble sort or qsort (if there are many cells).

You have choice to store pointers into sorted array or to copy generated cells into it. It seems that pointers are better in 32-bit mode and to copy whole structure without Y member in 64-bit one. No proof for this, just my testing before I rewrote the rasterizer.

Improvements

The first improvement I made is structure compression. At this time I have two variants of Cell structure - CellD and CellQ. CellD is 32-bit DWORD and CellQ is 64-bit QWORD. I compressed the structures by the following way:

DWORD (32-bit):

  • X - 13 bits (up to 8192 pixels in horizontal)
  • Cover - 10 bits
  • Area - 9 bits

QWORD (64-bit):

  • X - 32 bits
  • Cover - 16 bits
  • Area - 16 bits

There is stored 'area' member in the original antigrain cell which is calculated by 'cover * weight'. The result can take more than 16 bits so I'm storing the 'weight' instead, making the multiplication later in sweepScanline(). This trick allows to store one cell per 32-bit DWORD. This saves 75% space needed for a cell.

Careful reader should notice that there is no 'y' member in the cells used by Fog. The reason is that the Fog rasterizer don't need it, because cells are immediately stored in a space reserved for each scanline. I'm using single linked lists of small chunks of cells (14-15). The linked list is allocated using 64-byte alignment to not disturb when fetching cells by a multi-threaded painter.

Summary

  • Fog rasterizer is not sorting in Y direction, cells are already sorted.
  • Fog contains two cell structures - CellD and CellQ. CellD is expected to be used nearly by all paint calls.
  • Fog allocates cells per chunks. This means that if only two cells are needed for some scanline 14 or 15 cells are allocated.
  • The cell chunks overhead is about 7%. The cell compression rate is 74% for CellD and 47% for CellQ (including cell chunks overhead).

Conclusion

The Fog performance is better than before and memory overhead per rasterizer was reduced. This is only the beginning, I have another ideas how to improve multi-threaded rasterizer performance and to reduce memory overhead when using it. Currently Fog is using new rasterizer per each shape rendered using multi-threaded painter engine, the ideal solution will be to use one rasterizer per multi-threaded worker.

Another great improvement are rasterizers specialized for known shapes. I committed analytic rasterizer to fill rectangle (not axis aligned, sub-pixels are visible) some time ago. The performance of this rasterizer is really great, it's only about 50% slower than rasterizer used to fill axis-aligned rectangle, which is limited only by memory access speed.

How to say more? I'm making improvements and I hope that my work will be actively used some day. I read about two core Arm processor so maybe high-quality multi-threaded painting engine is future also for embedded devices.

Back in the School

Back in the School

It sounds unbelievable, but I'm back in the education system. After one year of working (I'm still working, this is just bonus) I decided to go back to school and to study. The reason is very simple - Fog-Framework. Yeah I'm going to work on Fog at school time and I really wonder about progress I can do.

I have some plans with Fog, but my biggest effort is to complete Fog-Core and Fog-Graphics. These parts are important to me and are important to make library usable. I started some design and implementation notes in the wiki:

The first thing I'd like to do is to switch into JIT compilation. I started writing Fog with this idea and I think that it's time to demonstrate the power of JIT in 2d graphics. The AsmJit tool is getting better and better and with 1.0 version I feel that it's the time to use it.

The second thing I'd like to do is to redesign painter and complete all unfinished stuff:

  • Painter
  • Font support
  • Image and Path effects

It seems that it's just three points, but it will be hard work and I'm looking forward to it!

Friday, August 6, 2010

Fog - Performance of kiia rasterizer

Introduction

My long-time TODO item was to implement KIIA rasterizer described here. If you download binaries pre-compiled for 32-bit Windows you should notice that the presented rasterizer is about 3 times faster than analytic rasterizer implemented in antigrain. The rasterizer uses classical approach - edges and AET (active edge table). Unlike the analytic rasterizer it needs scanlines to be rasterized from top-to-bottom direction (because of AET).

The biggest problem that stopped me many times from adding implementation into Fog was that there are too many source files, macros and I hadn't time to make basic working implementation. Last three days I was without internet so I tried to port code into Fog-Framework. I created Rasterizer_Kiia.cpp/.h files, implemented even-odd filler for simple clipping and started testing.

Performance

I must really say that performance of KIIA rasterizer was not as expected. I expected at least 50% performance gain, but in many tests the results are nearly equal to Fog analytic rasterizer and in about half tests the results are worse. Only area where KIIA rasterizer performance was better is polygon rasterization test.

Some FogBench results for comparison:

FogBench - Fog-Framework performance suite (version 0.3)

Surface  : 640x480
Quantity : 1000

Processor:  Intel(R) Pentium(R) M processor 1.70GHz
Features1: MMX=yes, MMXExt=yes, 3dNow=no, 3dNowExt=no, SSE=yes
Features2: SSE2=yes, SSE3=no, SSSE3=no
CPU count: 1

Test                    |Size      |KIIA-8x         |Analytic
------------------------+----------+----------------+---------------
FillRect-Argb-Copy      |256x256   |    40.817 [ms] |    40.881 [ms]
FillRect-Argb-Over      |256x256   |   198.905 [ms] |   197.983 [ms]
FillRect-Grad-Copy      |256x256   |   209.977 [ms] |   210.027 [ms]
FillRect-Grad-Over      |256x256   |   647.795 [ms] |   631.185 [ms]
FillRectSubPX-Argb-Copy |256x256   |   189.941 [ms] |   110.987 [ms]
FillRectSubPX-Argb-Over |256x256   |   345.159 [ms] |   268.764 [ms]
FillRectSubPX-Grad-Copy |256x256   |   418.936 [ms] |   336.705 [ms]
FillRectSubPX-Grad-Over |256x256   |   774.359 [ms] |   696.499 [ms]
FillRectAffine-Argb-Copy|256x256   |   206.611 [ms] |   190.822 [ms]
FillRectAffine-Argb-Over|256x256   |   360.999 [ms] |   342.187 [ms]
FillRectAffine-Grad-Copy|256x256   |   431.458 [ms] |   414.006 [ms]
FillRectAffine-Grad-Over|256x256   |   781.907 [ms] |   766.389 [ms]
FillRound-Argb-Copy     |256x256   |   178.653 [ms] |   100.603 [ms]
FillRound-Argb-Over     |256x256   |   328.033 [ms] |   255.203 [ms]
FillRound-Grad-Copy     |256x256   |   394.408 [ms] |   333.215 [ms]
FillRound-Grad-Over     |256x256   |   755.879 [ms] |   691.855 [ms]
FillPolygon-Argb-Copy   |256x256   |   156.355 [ms] |   230.961 [ms]
FillPolygon-Argb-Over   |256x256   |   195.185 [ms] |   268.732 [ms]
FillPolygon-Grad-Copy   |256x256   |   220.645 [ms] |   293.397 [ms]
FillPolygon-Grad-Over   |256x256   |   283.699 [ms] |   358.786 [ms]
Image-Copy              |256x256   |    83.364 [ms] |    82.404 [ms]
Image-Over              |256x256   |   249.658 [ms] |   249.129 [ms]
ImageAffine-Copy        |256x256   |  1838.421 [ms] |  1844.663 [ms]
ImageAffine-Over        |256x256   |  2037.941 [ms] |  2015.771 [ms]

Notes

You should notice that only polygon test is faster when using the KIIA rasterizer. The polygon test consists of rendering 10 vertex polygons. Here the overhead to create edges and AET is minimal. Now look at the round benchmark results. Round shape is always divided into poly-line segments before rendering. Here the KIIA rasterizer should outperform the analytic rasterizer, but it didn't. The reason is that the poly-line segments representing flattened path are very small and there are a lot of them.

The KIIA rasterizer is really better when generating polygon-only shapes. Bézier curves which are divided into several small poly-lines are not so efficient. The reason is that when using analytic rasterizer there is not so much difference between rasterizing a line or a bézier curve - the difference is only mathematics needed to divide a bézier curve into poly-line segments.

Conclusion

My efforts to implement other rasterizers into Fog-Framework are probably over. I'm sure that I can more optimize the analytic one (I really hope I can increase the performance about another 30%) and the pipeline can stay as is. Also I think that there could be a problem in Fog-Framework itself - the rasterization is two-pass process. It's needed to rasterize and sweep-scanlines - adding support for another rasterizer means that I need to sweep result into scanline container first. After you have scanline you can do pixel-level compositing.

So instead of providing more various quality rasterizers I'd like to optimize the current one. The output quality and performance of analytic rasterizer is excellent and optimizing some simple cases can outperform others.