EsoErik

Tuesday, April 10, 2018

 

Converting Siemens NX CL NURBS to Siemens 840D NC

Siemens NX CAM includes a post processor infrastructure that is larger than the universe and no less complex. God only fucking knows what's up with all that TCL code. God only cares. So, tossing the entire Siemens NX CAM post processor infrastructure straight into the dumper, just because we childishly decide we don't like it, on an infantile whim, what are we left with? (*smash*)

Generating an NX CAM toolpath for an operation still shows a toolpath on the screen. We still have a DMU65 UL (Ultrasonic+Lathe) to control, and it has Siemens 840D SL control. It's all under control. Well, will be soon.

That on-screen toolpath can be listed. Its listing is like:

TOOL PATH/REST_MILLING,TOOL,CAPS32SR120
TLDATA/MILL,0.5000,0.1200,4.0000,0.0000,0.0000
MSYS/0.0000,0.0000,0.0000,1.0000000,0.0000000,0.0000000,0.0000000,1.0000000,0.0000000
$$ centerline data
PAINT/PATH
PAINT/SPEED,10
LOAD/TOOL,245
PAINT/COLOR,186
RAPID
GOTO/-2.1000,1.2004,2.1000,0.0000000,0.0000000,1.0000000
PAINT/COLOR,211
RAPID
GOTO/-2.1000,1.2004,1.4538
PAINT/COLOR,42
FEDRAT/IPM,10.0000
GOTO/-2.1000,1.2004,1.3538
GOTO/-1.8500,1.5072,1.3538
PAINT/COLOR,31
GOTO/-1.7710,1.5994,1.3538
NURBS/
KNOT/1.0000000
CNTRL/-1.6893,1.6898,1.3538
CNTRL/-1.6017,1.7730,1.3538
CNTRL/-1.5072,1.8500,1.3538
PAINT/COLOR,37
GOTO/-1.2004,2.1000,1.3538
GOTO/-1.2004,2.1000,1.4538
PAINT/COLOR,211
RAPID
GOTO/-1.2004,2.1000,2.1000
PAINT/COLOR,186
RAPID
GOTO/-2.1000,0.9335,2.1000
PAINT/COLOR,211
RAPID
GOTO/-2.1000,0.9335,1.4538
PAINT/COLOR,42
GOTO/-2.1000,0.9335,1.3538
GOTO/-1.8500,1.2916,1.3538
PAINT/COLOR,31
GOTO/-1.7711,1.3978,1.3538
NURBS/
KNOT/1.0000000
CNTRL/-1.6346,1.5709,1.3538
CNTRL/-1.4724,1.7238,1.3538
CNTRL/-1.2916,1.8500,1.3538
PAINT/COLOR,37
GOTO/-0.9335,2.1000,1.3538
GOTO/-0.9335,2.1000,1.4538
PAINT/COLOR,211
RAPID
GOTO/-0.9335,2.1000,2.1000
PAINT/COLOR,186
RAPID
GOTO/-2.1000,0.6068,2.1000
PAINT/COLOR,211
RAPID
GOTO/-2.1000,0.6068,1.4538
PAINT/COLOR,42
GOTO/-2.1000,0.6068,1.3538
GOTO/-1.8500,1.0481,1.3538
PAINT/COLOR,31
GOTO/-1.7838,1.1571,1.3538
NURBS/
KNOT/0.0625000,0.3125000,0.4375000,0.5625000,0.8125000,0.9375000,1.0000000
CNTRL/-1.7716,1.1745,1.3538
CNTRL/-1.7126,1.2629,1.3538
CNTRL/-1.6205,1.3801,1.3538
CNTRL/-1.5040,1.5040,1.3538
CNTRL/-1.3801,1.6205,1.3538
CNTRL/-1.2462,1.7258,1.3538
CNTRL/-1.1212,1.8066,1.3538
CNTRL/-1.0662,1.8389,1.3538
CNTRL/-1.0481,1.8500,1.3538
PAINT/COLOR,37
GOTO/-0.6068,2.1000,1.3538
GOTO/-0.6068,2.1000,1.4538
PAINT/COLOR,211
RAPID
GOTO/-0.6068,2.1000,2.1000
PAINT/COLOR,186
RAPID
GOTO/-2.1000,0.1335,2.1000
PAINT/COLOR,211
RAPID
GOTO/-2.1000,0.1335,1.4538
PAINT/COLOR,42
GOTO/-2.1000,0.1335,1.3538
GOTO/-1.8500,0.7501,1.3538
PAINT/COLOR,31
GOTO/-1.8003,0.8625,1.3538
NURBS/
KNOT/0.0416667,0.9583333,1.0000000
CNTRL/-1.7906,0.8805,1.3538
CNTRL/-1.5821,1.3034,1.3538
CNTRL/-1.2036,1.6593,1.3538
CNTRL/-0.7686,1.8414,1.3538
CNTRL/-0.7501,1.8500,1.3538

Great! We really need some clue what 840D SL wants as input, however. Let's reach into the dumperino and run our NX post one last time. It outputs:

N340 T="CAPS32SR120"
N350 M6
N370 TRAFOOF
N420 ;
N430 ;Initial Move
N440 CYCLE832(_camtolerance,0,1)
N450 TRAORI
N460 G54
N470 ;
N480 ORIWKS
N490 ORIAXES
N500 G0 C0.0 A0.0
N510 G0 X-2.1 Y1.2003965 Z2.1 S0 D0 M3
N520 ;Approach Move
N530 Z1.4538064
N540 ;Engage Move
N550 G1 Z1.3538064 M8 F10.
N560 X-1.85 Y1.5072473
N570 ;Cutting
N580 X-1.7709655 Y1.5993672
N590 BSPLINE SD=3 X-1.6892636 Y1.6898064 PL=0.0
N600 X-1.601724 Y1.7730024 PL=1.
N610 X-1.5072473 Y1.85 PL=0.0
N620 ;Retract Move
N630 G1 X-1.2003965 Y2.1
N640 Z1.4538064
N650 ;Departure Move
N660 G0 Z2.1
N670 X-2.1 Y.9335334
N680 ;Approach Move
N690 Z1.4538064
N700 ;Engage Move
N710 G1 Z1.3538064
N720 X-1.85 Y1.2916128
N730 ;Cutting
N740 X-1.7711318 Y1.397804
N750 BSPLINE SD=3 X-1.6346003 Y1.5708928 PL=0.0
N760 X-1.472405 Y1.7238462 PL=1.
N770 X-1.2916128 Y1.85 PL=0.0
N780 ;Retract Move
N790 G1 X-.9335334 Y2.1
N800 Z1.4538064
N810 ;Departure Move
N820 G0 Z2.1
N830 X-2.1 Y.6068384
N840 ;Approach Move
N850 Z1.4538064
N860 ;Engage Move
N870 G1 Z1.3538064
N880 X-1.85 Y1.0481092
N890 ;Cutting
N900 X-1.7838374 Y1.1571332
N910 BSPLINE SD=3 X-1.7715783 Y1.1745064 PL=0.0
N920 X-1.7126259 Y1.2629285 PL=.0625
N930 X-1.6205476 Y1.3801208 PL=.25
N940 X-1.5039659 Y1.5039613 PL=.125
N950 X-1.3800976 Y1.620521 PL=.125
N960 X-1.2462019 Y1.7258162 PL=.25
N970 X-1.1212275 Y1.8066001 PL=.125
N980 X-1.0662174 Y1.838871 PL=.0625
N990 X-1.0481092 Y1.85 PL=0.0
N1000 ;Retract Move
N1010 G1 X-.6068385 Y2.1
N1020 Z1.4538064
N1030 ;Departure Move
N1040 G0 Z2.1
N1050 X-2.1 Y.1334575
N1060 ;Approach Move
N1070 Z1.4538064
N1080 ;Engage Move
N1090 G1 Z1.3538064
N1100 X-1.85 Y.7500681
N1110 ;Cutting
N1120 X-1.8003415 Y.8624805
N1130 BSPLINE SD=3 X-1.7906448 Y.8804501 PL=0.0
N1140 X-1.5821199 Y1.3034044 PL=.041667
N1150 X-1.2035841 Y1.6593227 PL=.916667
N1160 X-.7686003 Y1.8414274 PL=.041667
N1170 X-.7500682 Y1.85 PL=0.0
N1180 ;Retract Move
N1190 G1 X-.1334575 Y2.1
N1200 Z1.4538064
N1210 ;Departure Move
N1220 G0 Z2.1
N1230 X-2.1 Y-.0019108
N1240 ;Approach Move
N1250 Z1.4538064
N1260 ;Engage Move
N1270 G1 Z1.3538064
N1280 X-1.8829413
N1290 X-1.85 Y.2459095
N1300 ;Cutting
N1310 X-1.8300015 Y.36615
N1320 BSPLINE SD=3 X-1.8249937 Y.3858213 PL=0.0
N1330 X-1.7448088 Y.7636433 PL=.026316
N1340 X-1.3503773 Y1.4416025 PL=.473684
N1350 X-.6481162 Y1.7909367 PL=.473684
N1360 X-.2658658 Y1.846287 PL=.026316
N1370 X-.2459095 Y1.85 PL=0.0
N1380 ;Retract Move
N1390 G1 X.0019107 Y1.8829413
N1400 Y2.1
N1410 Z1.4538064
N1420 ;Departure Move
N1430 G0 Z2.1
N1440 X1.2003965
N1450 ;Approach Move
N1460 Z1.4538064
N1470 ;Engage Move
N1480 G1 Z1.3538064

This correspondence is trivial:

GOTO/-2.1000,1.2004,2.1000,0.0000000,0.0000000,1.0000000

N500 G0 C0.0 A0.0
N510 G0 X-2.1 Y1.2003965 Z2.1 S0 D0 M3

NX CAM specifies X,Y,Z,A3,B3,C3, where A3, B3, and C3 are tool orientation unit vector components. The NX post processor outputted A and C axis angles in degrees, but that's shit. Our machine understands unit vector orientation just fine. Fortunately, we don't have to worry any more about that, because this toolpath is prismatic. We'll deal with simultaneous five axis NURBS toolpaths some other time. NX doesn't seem to want to generate such things, anyway.

Moving on:

GOTO/-1.7710,1.5994,1.3538
NURBS/
KNOT/1.0000000
CNTRL/-1.6893,1.6898,1.3538
CNTRL/-1.6017,1.7730,1.3538
CNTRL/-1.5072,1.8500,1.3538

N580 X-1.7709655 Y1.5993672
N590 BSPLINE SD=3 X-1.6892636 Y1.6898064 PL=0.0
N600 X-1.601724 Y1.7730024 PL=1.
N610 X-1.5072473 Y1.85 PL=0.0

"It's a NURBS! Look out, sir! ARRRRGGGG!!!" No worries, I got this for ya. So, we see some shit going on. KNOT/1.0 clearly means something, but I'm not seeing it represented in our NC, unless it's that PL=1 term. Could be. And then there's those PL=0s. And the SD=3. Either these things are implicit, or our listing is woefully incomplete and will not serve our needs. I can believe that SD3 is implicit - sure, we're always working with cubic NURBS.

Unfortunately, we haven't yet made enough ill-gotten assumptions to fully misunderstand the situation. Let's extrapolate more.

NURBS/
KNOT/0.0625000,0.3125000,0.4375000,0.5625000,0.8125000,0.9375000,1.0000000
CNTRL/-1.7716,1.1745,1.3538
CNTRL/-1.7126,1.2629,1.3538
CNTRL/-1.6205,1.3801,1.3538
CNTRL/-1.5040,1.5040,1.3538
CNTRL/-1.3801,1.6205,1.3538
CNTRL/-1.2462,1.7258,1.3538
CNTRL/-1.1212,1.8066,1.3538
CNTRL/-1.0662,1.8389,1.3538
CNTRL/-1.0481,1.8500,1.3538

N910 BSPLINE SD=3 X-1.7715783 Y1.1745064 PL=0.0
N920 X-1.7126259 Y1.2629285 PL=.0625
N930 X-1.6205476 Y1.3801208 PL=.25
N940 X-1.5039659 Y1.5039613 PL=.125
N950 X-1.3800976 Y1.620521 PL=.125
N960 X-1.2462019 Y1.7258162 PL=.25
N970 X-1.1212275 Y1.8066001 PL=.125
N980 X-1.0662174 Y1.838871 PL=.0625
N990 X-1.0481092 Y1.85 PL=0.0

There's definitely a relationship between KNOT and PL values. Our KNOTs are
[0.0625, 0.3125, 0.4375, 0.5625, 0.8125, 0.9375, 1.0] and our PLs are
[0.0, 0.0625, 0.25, 0.125, 0.125, 0.25, 0.125, 0.0625, 0.0]. Let's prepend a 0 and append a 1 to our KNOTs, giving
[0, 0.0625, 0.3125, 0.4375, 0.5625, 0.8125, 0.9375, 1.0, 1]. In Python list generator syntax, [KNOTs[n]-KNOTs[n-1] for n in range(1,len(PLs))] produces
[0.0625, 0.25, 0.125, 0.125, 0.25, 0.125, 0.0625, 0.0]. These happen to be our PL values.

And that's that. The next step for me is to encode all of these assumptions into a Python script that takes a path listing as input and outputs 840D NC. I will do so here.

Wednesday, March 8, 2017

 

QML expression binds as C++ operator overloads... without QML

It's nice that QML permits binding expressions, such as field_Y = 2 * slider_X. It's not nice that JavaScript is involved. It's not nice that these binds are uni-directional.

Let's solve both of these dumbshit millennial JavaScript fucktard problems, which originate in the extremely endearing habit of kids rejecting the wisdom of their elders in favor of the scrpting flavor of the nanosecond. Good intentions and all that.

C++ operator overloading facilitates creation of any domain specific language you like, such as formal EBNF LL parser and interpreter specification, and these are compiled down to fast code using the three hundred decillion man-hours of PHd CS wizardry invested in whichever optimizing compiler you happen to be using.

Representing data bind expressions is no kind of challenge at all. However, all other people are idiots, leaving it to me to implement this, and I'm rather over-committed. So, we'll see.

Saturday, January 14, 2017

 

The expression is invalid for update in NX. Foo^2: Dimension error

If jump into Siemens NX, you'll soon run into the following error, prompted by your attempt to include a t squared term:


The expression is invalid for update in NX. yt: Dimension error.

Coercing t^2 into inch units does not seem to work:

The solution is to make a unitless expression, a "constant", that is used in place of t:


Saturday, August 27, 2016

 

Lazily Exposing an std::vector to Python as a Numpy Array with pybind11

Suppose one wishes to expose to Python multi-threaded C++ code that generates output in the form of flat integer or floating point value arrays, and these arrays should appear as Numpy arrays in Python. The two common approaches are (replacing "element" with bool, std::uint16_t, int, float, double, etc):

With pybind11, there is a better way: 1) keep any std::vector<> that may be exposed to Python in an std::shared_ptr<std::vector>, 2) expose the concrete std::vector<element> types used with std::shared_ptr<std::vector<>> as the associated "holder" type for each, with an appropriate .def_buffer call, 3) and in response to requests from Python, lazily retrieve (causing instantiation of) the Python object wrapping the vector requested, feed this to Numpy, and cache and return the resulting Numpy array.

This arrangement may sound complicated, but it is, by far, the most natural and flexible of all approaches: without resort to Python reference counting or requirement to acquire the GIL, a vector exposed in this manner is not garbage collected until both the last outstanding Python reference and the last outstanding C++ reference are gone. This is awesome.

Let's break down the rather dense instructions presented above.

1

Keep any std::vector<> that may be exposed to Python in an std::shared_ptr<std::vector<>>.

There not much to this. struct Foo { std::vector<int> v; }; changes to Foo { std::shared_ptr<std::vector<int>> v; };, and any v. changes to v->

 2

Expose the concrete std::vector<element> types used with std::shared_ptr<std::vector<>> as the associated "holder" type for each, with an appropriate .def_buffer call.
py::class_<std::vector<std::uint64_t>, std::shared_ptr<std::vector<std::uint64_t>>>(m, "_HistogramBuffer")
    .def_buffer([](std::vector<std::uint64_t>& v) {
        return py::buffer_info(
            v.data(),
            sizeof(std::uint64_t),
            py::format_descriptor<std::uint64_t>::format(),
            1,
            { v.size() },
            { sizeof(std::uint64_t) });
     });

3

In response to requests from Python, lazily retrieve (causing instantiation of) the Python object wrapping the vector requested, feed this to Numpy, and cache and return the resulting Numpy array. The lines of code where this is done are in bold; the rest is provided as minimal context, so that you have some chance of figuring out what I'm talking about :)

template<typename T>
struct StatsBase
{
    static void expose_via_pybind11(py::module& m);

    StatsBase();
    StatsBase(const StatsBase&) = delete;
    StatsBase& operator = (const StatsBase&) = delete;
    virtual ~StatsBase() = default;

    std::tuple<T, T> extrema;
    std::size_t max_bin;

    std::shared_ptr<std::vector<std::uint64_t>> histogram;
    // A numpy array that is a read-only view of histogram. Lazily created in response to get_histogram_py calls.
    std::shared_ptr<py::object> histogram_py;

    py::object& get_histogram_py();
};

template<typename T>
void StatsBase<T>::expose_via_pybind11(py::module& m)
{
    std::string s = std::string("_StatsBase_") + component_type_names[std::type_index(typeid(T))];
    py::class_<StatsBase<T>, std::shared_ptr<StatsBase<T>>>(m, s.c_str())
        .def_readonly("extrema", &StatsBase<T>::extrema)
        .def_readonly("max_bin", &StatsBase<T>::max_bin)
        .def_readonly("histogram_buff", &StatsBase<T>::histogram)
        .def_property_readonly("histogram", [](StatsBase<T>& v){return v.get_histogram_py();});
}

template<typename T>
StatsBase<T>::StatsBase()
  : extrema(0, 0),
    max_bin(0),
    histogram(new std::vector<std::uint64_t>(bin_count<T>(), 0)),
    histogram_py(nullptr)
{
}

template<typename T>
py::object& StatsBase<T>::get_histogram_py()
{
    if(!histogram_py)
    {
        py::object buffer_obj = py::cast(histogram);
        histogram_py.reset(new py::object(PyArray_FromAny(buffer_obj.ptr(), nullptr, 1, 1, 0, nullptr), false), &safe_py_deleter);
    }
    return *histogram_py;
}

StatsBase<T>::get_histogram_py() is a bit complex; let's break it down:

if(!histogram_py)
If the StatsBase<T> instance in question does not already have a non-null histogram_py pointer...

py::object buffer_obj = py::cast(histogram);
Get a Python object wrapping our std::shared_ptr<std::vector<std::uint64_t>> instance. This wrapper will be as we specified to pybind11 and will therefore have a buffer protocol interface understood by Numpy.

histogram_py.reset(new py::object(PyArray_FromAny(buffer_obj.ptr(), nullptr, 1, 1, 0, nullptr), false), &safe_py_deleter);
Use the PyArray_FromAny call to make a Numpy array that is a view of our vector and keep the resulting PyObject* in a pybind11 PyObject* wrapper that will decrement its refcount appropriately when destroyed. Store this in an std::shared_ptr with a GIL-safe deleter in order to avoid crashing in the case where a C++ background thread is the last thing with a reference to a StatsBase instance that has been accessed from a no-longer-extant Python reference.

return *histogram_py;
Return a C++ reference to the py::object representing the Numpy array.


This example is from real world code (it may be necessary to look in the new_ndimage_statistics branch, but I expect to merge this into master within the next few days). Apologies for not making a minimal example. If you'd like one or have any questions, please ask!

Wednesday, August 24, 2016

 

BTRFS Is a God Damned Joke

I tried storing an 8GiB virtual box disk image on BTRFS. Well, it copied over successfully, but minutes into 'pacman -Syu', the Linux instance in the VM began reporting copious IDE errors. Suspecting BTRFS copy-on-write being an issue, I moved the disk image back to a ZFS volume. This took inordinately long - it definitely was a COW issue. BTRFS went absolutely crazy as the VM wrote here, there and everywhere to its virtual disk, requiring a competent copy-on-write implementation - which BTRFS does not have.

That VM, again on ZFS, is again working flawlessly. ZFS is a copy-on-write filesystem and works. BTRFS is a copy-on-write filesystem and does not work.

We're how many years into BTRFS being officially "stable"? And it blows chunks the instant you attempt to, say, modify a file a lot? That doesn't seem right. Perhaps I'm the only one, and I'm doing something wrong? Nope. BTRFS just plain sucks.

The thing I did wrong with BTRFS was using BTRFS. Apparently, I could disable BTRFS's copy-on-write support for my VM disk image files. But, then my VM disk image files would have no FS-level data checksums or snapshot capability. If that's what I wanted, I'd keep my VM images on XFS or EXT4. It's not, and BTRFS is apparently little better than EXT4 with some additional features that don't work, so ZFS it is.

Friday, June 10, 2016

 

blitTextureForWidget: Yeah, Why Don't You?

I made this little test application in order to explore the impact of swap interval upon multiple visible QOpenGLWidget instances belonging to the same process.  It provided yeoman service, facilitating a massive FPS increase in important production code by demonstrating that swap interval 1, while friendly and well intended, really held us back.  Alas, even with this issue beheaded, something is yet rotten in the state of our OpenGL contexts:



Making the plain (non-OpenGL) dock widget floating instead of docked increases FPS by 146%.

QPlatformBackingStore::composeAndFlush(..) is the cause:

void QPlatformBackingStore::composeAndFlush(QWindow *window, const QRegion &region,
                                            const QPoint &offset,
                                            QPlatformTextureList *textures, QOpenGLContext *context,
                                            bool translucentBackground)
{
    if (!qt_window_private(window)->receivedExpose)
        return;

    if (!context->makeCurrent(window)) {
        qWarning("composeAndFlush: makeCurrent() failed");
        return;
    }

    QWindowPrivate::get(window)->lastComposeTime.start();

    QOpenGLFunctions *funcs = context->functions();
    funcs->glViewport(0, 0, window->width() * window->devicePixelRatio(), window->height() * window->devicePixelRatio());
    funcs->glClearColor(0, 0, 0, translucentBackground ? 0 : 1);
    funcs->glClear(GL_COLOR_BUFFER_BIT);

    if (!d_ptr->blitter) {
        d_ptr->blitter = new QOpenGLTextureBlitter;
        d_ptr->blitter->create();
    }

    d_ptr->blitter->bind();

    const QRect deviceWindowRect = deviceRect(QRect(QPoint(), window->size()), window);

    // Textures for renderToTexture widgets.
    for (int i = 0; i < textures->count(); ++i) {
        if (!textures->flags(i).testFlag(QPlatformTextureList::StacksOnTop))
/*1*/       blitTextureForWidget(textures, i, window, deviceWindowRect, d_ptr->blitter, offset);
    }

    // Backingstore texture with the normal widgets.
    GLuint textureId = 0;
    QOpenGLTextureBlitter::Origin origin = QOpenGLTextureBlitter::OriginTopLeft;
    if (QPlatformGraphicsBuffer *graphicsBuffer = this->graphicsBuffer()) {
        if (graphicsBuffer->size() != d_ptr->textureSize) {
            if (d_ptr->textureId)
                funcs->glDeleteTextures(1, &d_ptr->textureId);
            funcs->glGenTextures(1, &d_ptr->textureId);
            funcs->glBindTexture(GL_TEXTURE_2D, d_ptr->textureId);
            QOpenGLContext *ctx = QOpenGLContext::currentContext();
            if (!ctx->isOpenGLES() || ctx->format().majorVersion() >= 3) {
                funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0);
                funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, 0);
            }
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

            if (QPlatformGraphicsBufferHelper::lockAndBindToTexture(graphicsBuffer, &d_ptr->needsSwizzle, &d_ptr->premultiplied)) {
                d_ptr->textureSize = graphicsBuffer->size();
            } else {
                d_ptr->textureSize = QSize(0,0);
            }

            graphicsBuffer->unlock();
        } else if (!region.isEmpty()){
            funcs->glBindTexture(GL_TEXTURE_2D, d_ptr->textureId);
/*2*/       QPlatformGraphicsBufferHelper::lockAndBindToTexture(graphicsBuffer, &d_ptr->needsSwizzle,
        &d_ptr->premultiplied); }

        if (graphicsBuffer->origin() == QPlatformGraphicsBuffer::OriginBottomLeft)
            origin = QOpenGLTextureBlitter::OriginBottomLeft;
        textureId = d_ptr->textureId;
    } else {
        TextureFlags flags = 0;
        textureId = toTexture(deviceRegion(region, window, offset), &d_ptr->textureSize, &flags);
        d_ptr->needsSwizzle = (flags & TextureSwizzle) != 0;
        d_ptr->premultiplied = (flags & TexturePremultiplied) != 0;
        if (flags & TextureFlip)
            origin = QOpenGLTextureBlitter::OriginBottomLeft;
    }

    funcs->glEnable(GL_BLEND);
    if (d_ptr->premultiplied)
        funcs->glBlendFuncSeparate(GL_ONE, GL_ONE_MINUS_SRC_ALPHA, GL_ONE, GL_ONE);
    else
        funcs->glBlendFuncSeparate(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, GL_ONE, GL_ONE);

    if (textureId) {
        if (d_ptr->needsSwizzle)
            d_ptr->blitter->setSwizzleRB(true);
        // The backingstore is for the entire tlw.
        // In case of native children offset tells the position relative to the tlw.
        const QRect srcRect = toBottomLeftRect(deviceWindowRect.translated(offset), d_ptr->textureSize.height());
        const QMatrix3x3 source = QOpenGLTextureBlitter::sourceTransform(srcRect,
                                                                         d_ptr->textureSize,
                                                                         origin);
        d_ptr->blitter->blit(textureId, QMatrix4x4(), source);
        if (d_ptr->needsSwizzle)
            d_ptr->blitter->setSwizzleRB(false);
    }

    // Textures for renderToTexture widgets that have WA_AlwaysStackOnTop set.
    for (int i = 0; i < textures->count(); ++i) {
        if (textures->flags(i).testFlag(QPlatformTextureList::StacksOnTop))
            blitTextureForWidget(textures, i, window, deviceWindowRect, d_ptr->blitter, offset);
    }

    funcs->glDisable(GL_BLEND);
    d_ptr->blitter->release();

    context->swapBuffers(window);

}

The line marked with /*1*/ is fast to execute.  The line marked with /*2*/ is very slow.

/*1*/ is called for our docked QOpenGLWidgets.  /*2*/ is called for our docked QWidget that does not contain a QOpenGLWidget, but it is not called when that QWidget is made floating rather than docked.

/*2*/ ends up calling QPlatformGraphicsBufferHelper::bindSWToTexture(..):

bool QPlatformGraphicsBufferHelper::bindSWToTexture(const QPlatformGraphicsBuffer *graphicsBuffer,
                                                    bool *swizzleRandB, bool *premultipliedB,
                                                    const QRect &subRect)
{
#ifndef QT_NO_OPENGL
    QOpenGLContext *ctx = QOpenGLContext::currentContext();
    if (!ctx)
        return false;

    if (!(graphicsBuffer->isLocked() & QPlatformGraphicsBuffer::SWReadAccess))
        return false;

    QSize size = graphicsBuffer->size();

    Q_ASSERT(subRect.isEmpty() || QRect(QPoint(0,0), size).contains(subRect));

    GLenum internalFormat = GL_RGBA;
    GLuint pixelType = GL_UNSIGNED_BYTE;

    bool needsConversion = false;
    bool swizzle = false;
    bool premultiplied = false;
    QImage::Format imageformat = QImage::toImageFormat(graphicsBuffer->format());
    QImage image(graphicsBuffer->data(), size.width(), size.height(), graphicsBuffer->bytesPerLine(), imageformat);
    if (graphicsBuffer->bytesPerLine() != (size.width() * 4)) {
        needsConversion = true;
    } else {
        switch (imageformat) {
        case QImage::Format_ARGB32_Premultiplied:
            premultiplied = true;
            // no break
        case QImage::Format_RGB32:
        case QImage::Format_ARGB32:
            swizzle = true;
            break;
        case QImage::Format_RGBA8888_Premultiplied:
            premultiplied = true;
            // no break
        case QImage::Format_RGBX8888:
        case QImage::Format_RGBA8888:
            break;
        case QImage::Format_BGR30:
        case QImage::Format_A2BGR30_Premultiplied:
            if (!ctx->isOpenGLES() || ctx->format().majorVersion() >= 3) {
                pixelType = GL_UNSIGNED_INT_2_10_10_10_REV;
                internalFormat = GL_RGB10_A2;
                premultiplied = true;
            } else {
                needsConversion = true;
            }
            break;
        case QImage::Format_RGB30:
        case QImage::Format_A2RGB30_Premultiplied:
            if (!ctx->isOpenGLES() || ctx->format().majorVersion() >= 3) {
                pixelType = GL_UNSIGNED_INT_2_10_10_10_REV;
                internalFormat = GL_RGB10_A2;
                premultiplied = true;
                swiz5zle = true;
            } else {
                needsConversion = true;
            }
            break;
        default:
            needsConversion = true;
            break;
        }
    }
    if (needsConversion)
        image = image.convertToFormat(QImage::Format_RGBA8888);

    QOpenGLFunctions *funcs = ctx->functions();

    QRect rect = subRect;
    if (rect.isNull() || rect == QRect(QPoint(0,0),size)) {
        funcs->glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, size.width(), size.height(), 0, GL_RGBA, pixelType, image.constBits());
    } else {
#ifndef QT_OPENGL_ES_2
        if (!ctx->isOpenGLES()) {
            funcs->glPixelStorei(GL_UNPACK_ROW_LENGTH, image.width());
            funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, rect.x(), rect.y(), rect.width(), rect.height(), GL_RGBA, pixelType,
                                   image.constScanLine(rect.y()) + rect.x() * 4);
            funcs->glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);
        } else
#endif
        {
            // if the rect is wide enough it's cheaper to just
            // extend it instead of doing an image copy
            if (rect.width() >= size.width() / 2) {
                rect.setX(0);
                rect.setWidth(size.width());
            }

            // if the sub-rect is full-width we can pass the image data directly to
            // OpenGL instead of copying, since there's no gap between scanlines

            if (rect.width() == size.width()) {
                funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, 0, rect.y(), rect.width(), rect.height(), GL_RGBA, pixelType,
                                       image.constScanLine(rect.y()));
            } else {
                funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, rect.x(), rect.y(), rect.width(), rect.height(), GL_RGBA, pixelType,
                                       image.copy(rect).constBits());
            }
        }
    }
    if (swizzleRandB)
        *swizzleRandB = swizzle;
    if (premultipliedB)
        *premultipliedB = premultiplied;

    return true;

#else
    Q_UNUSED(graphicsBuffer)
    Q_UNUSED(swizzleRandB)
    Q_UNUSED(premultipliedB)
    Q_UNUSED(subRect)
    return false;
#endif // QT_NO_OPENGL

}

Those glTexSubImage2D calls are blocking texture uploads executed in the main thread (theoretically, glTexSubImage2D should be non-blocking, but profiling this code makes it very apparent that glTexSubImage2D is blocking).  In a profiler, it is easily seen that the huge FPS hit is mostly the result of synchronization delay; it takes time to marshal data to the GPU, and most of that time is spent waiting for inherently asynchronous things, such as DMA transfers, to most certainly be definitely completed, beyond a shadow of a doubt, triply confirmed, with extra delays just to be super-ultra-incredibly-sure.  If a modern video game were to upload its textures like this, one-at-a-time, in a blocking fashion, you would be lucky to get one frame per minute.

Perhaps we can engage the code path used for QGraphicsProxyWidgets and render QWidgets directly to a pixel buffer?  I don't know if QGraphicsProxyWidget actually does this, but the FPS hit from placing a QWidget updated every frame in a QGraphicsScene with an OpenGL viewport is less severe than the hit from docking a plain QWidget updated every frame alongside OpenGL viewports that are updated every frame.  One way we might try to do this is by simply making the plain QWidget containing the QLabel a QOpenGLWidget.  I think I remember hearing that QWidget children of QOpenGLWidgets are rendered properly, within the QOpenGLWidget viewport.  Perhaps this is the ticket.

[05:34 PM][ehvatum@heavenly:~/multiple_gl_viewport_fps_toy]> git diff
diff --git a/MainWindow.cpp b/MainWindow.cpp
index fd8ccb5..3e9acf2 100644
--- a/MainWindow.cpp
+++ b/MainWindow.cpp
@@ -6,7 +6,7 @@ MainWindow::MainWindow(QWidget *parent)
    m_central_fps_item(m_central_gs->addText("")),
    m_central_gv(new GL_QGraphicsView(0, m_central_gs)),
    m_central_swap_interval("central swapInterval == 0"),
-    m_left_widget(new QWidget()),
+    m_left_widget(new QOpenGLWidget()),
    m_left_dock_widget(new QDockWidget("left widget")),
    m_right_gs(new QGraphicsScene()),
    m_right_fps_item(m_right_gs->addText("")),
diff --git a/MainWindow.h b/MainWindow.h
index 276590c..e14afab 100644
--- a/MainWindow.h
+++ b/MainWindow.h
@@ -23,7 +23,7 @@ protected:
    QGraphicsTextItem* m_central_fps_item;
    GL_QGraphicsView* m_central_gv;
    QAction m_central_swap_interval;
-    QWidget *m_left_widget;
+    QOpenGLWidget *m_left_widget;
    QLabel *m_left_fps_label;
    QDockWidget* m_left_dock_widget;
    QGraphicsScene* m_right_gs;


With these changes, docking the left widget still imposes the same FPS hit and for the same reason: we wait for an enormous texture upload.  Floating the left widget removes the slowdown, unless I resize that floating widget to be the same size as the main window.  Together, all of this leads to an insight: the texture uploaded in order to compose a raster surface and a QOpenGLWidget is always the size of the top-level window ultimately containing the widgets.

So, Qt's raster + QOpenGLWidget composition is completely brain damaged and must be avoided.  However, I still need to have QMainWindows containing a mixture of docked QOpenGLWidgets and docked QWidgets.  The solution is to use QGLWidgets instead - these do not participate in composition.  Doing so brings FPS back to something reasonable.

Archives

July 2009   August 2009   September 2009   October 2009   November 2009   December 2009   January 2010   September 2010   December 2010   January 2011   February 2011   April 2011   June 2011   August 2011   February 2012   June 2012   July 2012   August 2012   October 2012   November 2012   January 2014   April 2014   June 2014   August 2014   September 2014   October 2014   January 2015   March 2015   April 2015   June 2015   November 2015   December 2015   January 2016   June 2016   August 2016   January 2017   March 2017   April 2018  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]