Friday, June 10, 2016


blitTextureForWidget: Yeah, Why Don't You?

I made this little test application in order to explore the impact of swap interval upon multiple visible QOpenGLWidget instances belonging to the same process.  It provided yeoman service, facilitating a massive FPS increase in important production code by demonstrating that swap interval 1, while friendly and well intended, really held us back.  Alas, even with this issue beheaded, something is yet rotten in the state of our OpenGL contexts:

Making the plain (non-OpenGL) dock widget floating instead of docked increases FPS by 146%.

QPlatformBackingStore::composeAndFlush(..) is the cause:

void QPlatformBackingStore::composeAndFlush(QWindow *window, const QRegion &region,
                                            const QPoint &offset,
                                            QPlatformTextureList *textures, QOpenGLContext *context,
                                            bool translucentBackground)
    if (!qt_window_private(window)->receivedExpose)

    if (!context->makeCurrent(window)) {
        qWarning("composeAndFlush: makeCurrent() failed");


    QOpenGLFunctions *funcs = context->functions();
    funcs->glViewport(0, 0, window->width() * window->devicePixelRatio(), window->height() * window->devicePixelRatio());
    funcs->glClearColor(0, 0, 0, translucentBackground ? 0 : 1);

    if (!d_ptr->blitter) {
        d_ptr->blitter = new QOpenGLTextureBlitter;


    const QRect deviceWindowRect = deviceRect(QRect(QPoint(), window->size()), window);

    // Textures for renderToTexture widgets.
    for (int i = 0; i < textures->count(); ++i) {
        if (!textures->flags(i).testFlag(QPlatformTextureList::StacksOnTop))
/*1*/       blitTextureForWidget(textures, i, window, deviceWindowRect, d_ptr->blitter, offset);

    // Backingstore texture with the normal widgets.
    GLuint textureId = 0;
    QOpenGLTextureBlitter::Origin origin = QOpenGLTextureBlitter::OriginTopLeft;
    if (QPlatformGraphicsBuffer *graphicsBuffer = this->graphicsBuffer()) {
        if (graphicsBuffer->size() != d_ptr->textureSize) {
            if (d_ptr->textureId)
                funcs->glDeleteTextures(1, &d_ptr->textureId);
            funcs->glGenTextures(1, &d_ptr->textureId);
            funcs->glBindTexture(GL_TEXTURE_2D, d_ptr->textureId);
            QOpenGLContext *ctx = QOpenGLContext::currentContext();
            if (!ctx->isOpenGLES() || ctx->format().majorVersion() >= 3) {
                funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0);
                funcs->glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, 0);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
            funcs->glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

            if (QPlatformGraphicsBufferHelper::lockAndBindToTexture(graphicsBuffer, &d_ptr->needsSwizzle, &d_ptr->premultiplied)) {
                d_ptr->textureSize = graphicsBuffer->size();
            } else {
                d_ptr->textureSize = QSize(0,0);

        } else if (!region.isEmpty()){
            funcs->glBindTexture(GL_TEXTURE_2D, d_ptr->textureId);
/*2*/       QPlatformGraphicsBufferHelper::lockAndBindToTexture(graphicsBuffer, &d_ptr->needsSwizzle,
        &d_ptr->premultiplied); }

        if (graphicsBuffer->origin() == QPlatformGraphicsBuffer::OriginBottomLeft)
            origin = QOpenGLTextureBlitter::OriginBottomLeft;
        textureId = d_ptr->textureId;
    } else {
        TextureFlags flags = 0;
        textureId = toTexture(deviceRegion(region, window, offset), &d_ptr->textureSize, &flags);
        d_ptr->needsSwizzle = (flags & TextureSwizzle) != 0;
        d_ptr->premultiplied = (flags & TexturePremultiplied) != 0;
        if (flags & TextureFlip)
            origin = QOpenGLTextureBlitter::OriginBottomLeft;

    if (d_ptr->premultiplied)
        funcs->glBlendFuncSeparate(GL_ONE, GL_ONE_MINUS_SRC_ALPHA, GL_ONE, GL_ONE);
        funcs->glBlendFuncSeparate(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, GL_ONE, GL_ONE);

    if (textureId) {
        if (d_ptr->needsSwizzle)
        // The backingstore is for the entire tlw.
        // In case of native children offset tells the position relative to the tlw.
        const QRect srcRect = toBottomLeftRect(deviceWindowRect.translated(offset), d_ptr->textureSize.height());
        const QMatrix3x3 source = QOpenGLTextureBlitter::sourceTransform(srcRect,
        d_ptr->blitter->blit(textureId, QMatrix4x4(), source);
        if (d_ptr->needsSwizzle)

    // Textures for renderToTexture widgets that have WA_AlwaysStackOnTop set.
    for (int i = 0; i < textures->count(); ++i) {
        if (textures->flags(i).testFlag(QPlatformTextureList::StacksOnTop))
            blitTextureForWidget(textures, i, window, deviceWindowRect, d_ptr->blitter, offset);




The line marked with /*1*/ is fast to execute.  The line marked with /*2*/ is very slow.

/*1*/ is called for our docked QOpenGLWidgets.  /*2*/ is called for our docked QWidget that does not contain a QOpenGLWidget, but it is not called when that QWidget is made floating rather than docked.

/*2*/ ends up calling QPlatformGraphicsBufferHelper::bindSWToTexture(..):

bool QPlatformGraphicsBufferHelper::bindSWToTexture(const QPlatformGraphicsBuffer *graphicsBuffer,
                                                    bool *swizzleRandB, bool *premultipliedB,
                                                    const QRect &subRect)
#ifndef QT_NO_OPENGL
    QOpenGLContext *ctx = QOpenGLContext::currentContext();
    if (!ctx)
        return false;

    if (!(graphicsBuffer->isLocked() & QPlatformGraphicsBuffer::SWReadAccess))
        return false;

    QSize size = graphicsBuffer->size();

    Q_ASSERT(subRect.isEmpty() || QRect(QPoint(0,0), size).contains(subRect));

    GLenum internalFormat = GL_RGBA;
    GLuint pixelType = GL_UNSIGNED_BYTE;

    bool needsConversion = false;
    bool swizzle = false;
    bool premultiplied = false;
    QImage::Format imageformat = QImage::toImageFormat(graphicsBuffer->format());
    QImage image(graphicsBuffer->data(), size.width(), size.height(), graphicsBuffer->bytesPerLine(), imageformat);
    if (graphicsBuffer->bytesPerLine() != (size.width() * 4)) {
        needsConversion = true;
    } else {
        switch (imageformat) {
        case QImage::Format_ARGB32_Premultiplied:
            premultiplied = true;
            // no break
        case QImage::Format_RGB32:
        case QImage::Format_ARGB32:
            swizzle = true;
        case QImage::Format_RGBA8888_Premultiplied:
            premultiplied = true;
            // no break
        case QImage::Format_RGBX8888:
        case QImage::Format_RGBA8888:
        case QImage::Format_BGR30:
        case QImage::Format_A2BGR30_Premultiplied:
            if (!ctx->isOpenGLES() || ctx->format().majorVersion() >= 3) {
                pixelType = GL_UNSIGNED_INT_2_10_10_10_REV;
                internalFormat = GL_RGB10_A2;
                premultiplied = true;
            } else {
                needsConversion = true;
        case QImage::Format_RGB30:
        case QImage::Format_A2RGB30_Premultiplied:
            if (!ctx->isOpenGLES() || ctx->format().majorVersion() >= 3) {
                pixelType = GL_UNSIGNED_INT_2_10_10_10_REV;
                internalFormat = GL_RGB10_A2;
                premultiplied = true;
                swiz5zle = true;
            } else {
                needsConversion = true;
            needsConversion = true;
    if (needsConversion)
        image = image.convertToFormat(QImage::Format_RGBA8888);

    QOpenGLFunctions *funcs = ctx->functions();

    QRect rect = subRect;
    if (rect.isNull() || rect == QRect(QPoint(0,0),size)) {
        funcs->glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, size.width(), size.height(), 0, GL_RGBA, pixelType, image.constBits());
    } else {
#ifndef QT_OPENGL_ES_2
        if (!ctx->isOpenGLES()) {
            funcs->glPixelStorei(GL_UNPACK_ROW_LENGTH, image.width());
            funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, rect.x(), rect.y(), rect.width(), rect.height(), GL_RGBA, pixelType,
                                   image.constScanLine(rect.y()) + rect.x() * 4);
            funcs->glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);
        } else
            // if the rect is wide enough it's cheaper to just
            // extend it instead of doing an image copy
            if (rect.width() >= size.width() / 2) {

            // if the sub-rect is full-width we can pass the image data directly to
            // OpenGL instead of copying, since there's no gap between scanlines

            if (rect.width() == size.width()) {
                funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, 0, rect.y(), rect.width(), rect.height(), GL_RGBA, pixelType,
            } else {
                funcs->glTexSubImage2D(GL_TEXTURE_2D, 0, rect.x(), rect.y(), rect.width(), rect.height(), GL_RGBA, pixelType,
    if (swizzleRandB)
        *swizzleRandB = swizzle;
    if (premultipliedB)
        *premultipliedB = premultiplied;

    return true;

    return false;
#endif // QT_NO_OPENGL


Those glTexSubImage2D calls are blocking texture uploads executed in the main thread (theoretically, glTexSubImage2D should be non-blocking, but profiling this code makes it very apparent that glTexSubImage2D is blocking).  In a profiler, it is easily seen that the huge FPS hit is mostly the result of synchronization delay; it takes time to marshal data to the GPU, and most of that time is spent waiting for inherently asynchronous things, such as DMA transfers, to most certainly be definitely completed, beyond a shadow of a doubt, triply confirmed, with extra delays just to be super-ultra-incredibly-sure.  If a modern video game were to upload its textures like this, one-at-a-time, in a blocking fashion, you would be lucky to get one frame per minute.

Perhaps we can engage the code path used for QGraphicsProxyWidgets and render QWidgets directly to a pixel buffer?  I don't know if QGraphicsProxyWidget actually does this, but the FPS hit from placing a QWidget updated every frame in a QGraphicsScene with an OpenGL viewport is less severe than the hit from docking a plain QWidget updated every frame alongside OpenGL viewports that are updated every frame.  One way we might try to do this is by simply making the plain QWidget containing the QLabel a QOpenGLWidget.  I think I remember hearing that QWidget children of QOpenGLWidgets are rendered properly, within the QOpenGLWidget viewport.  Perhaps this is the ticket.

[05:34 PM][ehvatum@heavenly:~/multiple_gl_viewport_fps_toy]> git diff
diff --git a/MainWindow.cpp b/MainWindow.cpp
index fd8ccb5..3e9acf2 100644
--- a/MainWindow.cpp
+++ b/MainWindow.cpp
@@ -6,7 +6,7 @@ MainWindow::MainWindow(QWidget *parent)
    m_central_gv(new GL_QGraphicsView(0, m_central_gs)),
    m_central_swap_interval("central swapInterval == 0"),
-    m_left_widget(new QWidget()),
+    m_left_widget(new QOpenGLWidget()),
    m_left_dock_widget(new QDockWidget("left widget")),
    m_right_gs(new QGraphicsScene()),
diff --git a/MainWindow.h b/MainWindow.h
index 276590c..e14afab 100644
--- a/MainWindow.h
+++ b/MainWindow.h
@@ -23,7 +23,7 @@ protected:
    QGraphicsTextItem* m_central_fps_item;
    GL_QGraphicsView* m_central_gv;
    QAction m_central_swap_interval;
-    QWidget *m_left_widget;
+    QOpenGLWidget *m_left_widget;
    QLabel *m_left_fps_label;
    QDockWidget* m_left_dock_widget;
    QGraphicsScene* m_right_gs;

With these changes, docking the left widget still imposes the same FPS hit and for the same reason: we wait for an enormous texture upload.  Floating the left widget removes the slowdown, unless I resize that floating widget to be the same size as the main window.  Together, all of this leads to an insight: the texture uploaded in order to compose a raster surface and a QOpenGLWidget is always the size of the top-level window ultimately containing the widgets.

So, Qt's raster + QOpenGLWidget composition is completely brain damaged and must be avoided.  However, I still need to have QMainWindows containing a mixture of docked QOpenGLWidgets and docked QWidgets.  The solution is to use QGLWidgets instead - these do not participate in composition.  Doing so brings FPS back to something reasonable.


July 2009   August 2009   September 2009   October 2009   November 2009   December 2009   January 2010   September 2010   December 2010   January 2011   February 2011   April 2011   June 2011   August 2011   February 2012   June 2012   July 2012   August 2012   October 2012   November 2012   January 2014   April 2014   June 2014   August 2014   September 2014   October 2014   January 2015   March 2015   April 2015   June 2015   November 2015   December 2015   January 2016   June 2016   August 2016   January 2017   March 2017  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]