GStreamer Planet

Vulkan Video is Open: Application showcase !

From dabrain34 by Stéphane Cerveau

Vulkanised 2025
Vulkanised 2025

UK calling

Long time no see this beautiful grey sky, roast beef on sunday and large but full packed pub when there is a football or a rugby game (The rose team has been lucky this year, grrr).

It was a delightful journey in the UK starting with my family visiting London including a lot (yes a lot…) of sightviews in a very short amount of time. But we managed to fit everything in. We saw the changing of the guards, the Thames river tide on a boat, Harry Potter gift shops and the beautiful Arsenal stadium with its legendary pitch, one of the best of England.

It was our last attraction in London and now it was time for my family to go to Standsted back home and me to Cambridge and its legendary university.

To start the journey in Cambridge, first I got some rest on Monday in the hotel to face the hail of information I will get during the conference. This year, Vulkanised took place on Arm’s campus, who kindly hosted the event, providing everything we needed to feel at home and comfortable.

The first day, we started with an introduction from Ralph Potter, the Vulkan Working Group Chair at Khronos, who introduced the new 1.4 release and all the extensions coming along including “Vulkan Video”. Then we could start this conference with my favorite topic, decoding video content with Vulkan Video. And the game was on! There was a presentation every 30 minutes including a neat one from my colleague at Igalia Ricardo Garcia about Device-Generated Commands in Vulkan and a break every 3 presentations. It took a lot of mental energy to keep up with all the topics as each presentation was more interesting than the last. During the break, we had time to relax with good coffee, delicious cookies, and nice conversations.

The first day ended up with a tooling demonstrations from LunarG, helping us all to understand and tame the Vulkan beast. The beast is ours now!

As I was not in the best shape due to a bug I caught on Sunday, I decided to play it safe and went to the hotel just after a nice indian meal. I had to prepare myself for the next day, where I would present “Vulkan Video is Open: Application Showcase”.

Vulkan Video is Open: Application showcase !

First Srinath Kumarapuram from Nvidia gave a presentation about the new extensions made available during 2024 by the Vulkan Video TSG. It started with a brief timeline of the video extensions from the initial h26x decoding to the latest VP9 decode coming this year including the 2024 extensions such as the AV1 codec. Then he presented more specific extensions such as VK_KHR_video_encode_quantization_map, VK_KHR_video_maintenance2 released during 2024 and coming in 2025, VK_KHR_video_encode_intra_refresh. He mentioned that the Vulkan toolbox now completely supports Vulkan Video, including the Validation Layers, Vulkan Profiles, vulkaninfo or GFXReconstruct.

After some deserved applause for a neat presentation, it was my time to be on stage.

During this presentation I focused on the Open source ecosystem around Vulkan Video. Indeed Vulkan Video ships with a sample app which is totally open along with the regular Conformance Test Suite. But that’s not all! Two major frameworks now ship with Vulkan Video support: GStreamer and FFmpeg.

Before this, I started by talking about Mesa, the open graphics library. This library which is totally open provides drivers which support Vulkan Video extensions and allow applications to run Vulkan Video decode or encode. The 3 major chip vendors are now supported. It started in 2022 with RADV, a userspace driver that implements the Vulkan API on most modern AMD GPUs. This driver supports all the vulkan video extensions except the lastest ones such as VK_KHR_video_encode_quantization_map or VK_KHR_video_maintenance2 but this they should be implemented sometime in 2025. Intel GPUs are now supported with the ANV driver, this driver also supports the common video extensions such as h264/5 and AV1 codec. The last driver to gain support was at the end of 2024 where several of the Vulkan Video extensions were introduced to NVK, a Vulkan driver for NVIDIA GPUs. This driver is still experimental but it’s possible to decode H264 and H265 content as well as its proprietary version. This completes the offering of the main GPUs on the market.

Then I moved to the applications including GStreamer, FFmpeg and Vulkan-Video-Samples. In addition to the extensions supported in 2025, we talked mainly about the decode conformance using Fluster. To compare all the implementations, including the driver, the version and the framework, a spreadsheet can be found here. In this spreadsheet we summarize the 3 supported codecs (H264, H265 and AV1) with their associated test suites and compare their implemententations using Vulkan Video (or not, see results for VAAPI with GStreamer). GStreamer, my favorite playground, can now decode H264 and H265 since 1.24 and recently got the support for AV1 but the merge request is still under review. It supports more than 80% of the H264 test vectors for the JVT-AVC_V1 and 85% of the H265 test vectors in JCT-VC-HEVC_V1. FFMpeg is offering better figures passing 90% of the tests. It supports all the avaliable codecs including all of the encoders as well. And finally Vulkan-Video-Samples is the app that you want to use to support all codecs for both encode and decode, but its currently missing support for mesa drivers when it comes to use Fluster decode tests..

Vulkanised on the 3rd day

During the 3rd day, we had interesting talks as well demonstrating the power of Vulkan, from Blender, a free and open-source 3D computer graphics software tool switching progressively to Vulkan, to the implementation of 3D a game engine using Rust, or compute shaders in Astronomy. My other colleague at Igalia, Lucas Fryzek, also had a presentation on Mesa with Lavapipe: a Mesa’s Software Renderer for Vulkan which allows you to have a hardware free implementation of Vulkan and to validate extensions in a simpler way. Finally, we finished this prolific and dense conference with Android and its close collaboration with Vulkan.

If you are interested in 3D graphics, I encourage you to attend future Vulkanised editions, which are full of passionate people. And if you can not attend you can still watch the presentation online.

If you are interested in the Vulkan Video presentation I gave, you can catch up the video here:

Or follow our Igalia live blog post on Vulkan Video:

https://blogs.igalia.com/vjaquez/vulkan-video-status/

As usual, if you would like to learn more about Vulkan, GStreamer or any other open multimedia framework, please feel free to contact us!

whippet lab notebook: untagged mallocs, bis

From wingolog by Andy Wingo

Andy Wingo

Earlier this week I took an inventory of how Guile uses the Boehm-Demers-Weiser (BDW) garbage collector, with the goal of making sure that I had replacements for all uses lined up in Whippet. I categorized the uses into seven broad categories, and I was mostly satisfied that I have replacements for all except the last: I didn’t know what to do with untagged allocations: those that contain arbitrary data, possibly full of pointers to other objects, and which don’t have a header that we can use to inspect on their type.

But now I do! Today’s note is about how we can support untagged allocations of a few different kinds in Whippet’s mostly-marking collector.

inside and outside

Why bother supporting untagged allocations at all? Well, if I had my way, I wouldn’t; I would just slog through Guile and fix all uses to be tagged. There are only a finite number of use sites and I could get to them all in a month or so.

The problem comes for uses of scm_gc_malloc from outside libguile itself, in C extensions and embedding programs. These users are loathe to adapt to any kind of change, and garbage-collection-related changes are the worst. So, somehow, we need to support these users if we are not to break the Guile community.

on intent

The problem with scm_gc_malloc, though, is that it is missing an expression of intent, notably as regards tagging. You can use it to allocate an object that has a tag and thus can be traced precisely, or you can use it to allocate, well, anything else. I think we will have to add an API for the tagged case and assume that anything that goes through scm_gc_malloc is requesting an untagged, conservatively-scanned block of memory. Similarly for scm_gc_malloc_pointerless: you could be allocating a tagged object that happens to not contain pointers, or you could be allocating an untagged array of whatever. A new API is needed there too for pointerless untagged allocations.

on data

Recall that the mostly-marking collector can be built in a number of different ways: it can support conservative and/or precise roots, it can trace the heap precisely or conservatively, it can be generational or not, and the collector can use multiple threads during pauses or not. Consider a basic configuration with precise roots. You can make tagged pointerless allocations just fine: the trace function for that tag is just trivial. You would like to extend the collector with the ability to make untagged pointerless allocations, for raw data. How to do this?

Consider first that when the collector goes to trace an object, it can’t use bits inside the object to discriminate between the tagged and untagged cases. Fortunately though the main space of the mostly-marking collector has one metadata byte for each 16 bytes of payload. Of those 8 bits, 3 are used for the mark (five different states, allowing for future concurrent tracing), two for the precise field-logging write barrier, one to indicate whether the object is pinned or not, and one to indicate the end of the object, so that we can determine object bounds just by scanning the metadata byte array. That leaves 1 bit, and we can use it to indicate untagged pointerless allocations. Hooray!

However there is a wrinkle: when Whippet decides the it should evacuate an object, it tracks the evacuation state in the object itself; the embedder has to provide an implementation of a little state machine, allowing the collector to detect whether an object is forwarded or not, to claim an object for forwarding, to commit a forwarding pointer, and so on. We can’t do that for raw data, because all bit states belong to the object, not the collector or the embedder. So, we have to set the “pinned” bit on the object, indicating that these objects can’t move.

We could in theory manage the forwarding state in the metadata byte, but we don’t have the bits to do that currently; maybe some day. For now, untagged pointerless allocations are pinned.

on slop

You might also want to support untagged allocations that contain pointers to other GC-managed objects. In this case you would want these untagged allocations to be scanned conservatively. We can do this, but if we do, it will pin all objects.

Thing is, conservative stack roots is a kind of a sweet spot in language run-time design. You get to avoid constraining your compiler, you avoid a class of bugs related to rooting, but you can still support compaction of the heap.

How is this, you ask? Well, consider that you can move any object for which we can precisely enumerate the incoming references. This is trivially the case for precise roots and precise tracing. For conservative roots, we don’t know whether a given edge is really an object reference or not, so we have to conservatively avoid moving those objects. But once you are done tracing conservative edges, any live object that hasn’t yet been traced is fair game for evacuation, because none of its predecessors have yet been visited.

But once you add conservatively-traced objects back into the mix, you don’t know when you are done tracing conservative edges; you could always discover another conservatively-traced object later in the trace, so you have to pin everything.

The good news, though, is that we have gained an easier migration path. I can now shove Whippet into Guile and get it running even before I have removed untagged allocations. Once I have done so, I will be able to allow for compaction / evacuation; things only get better from here.

Also as a side benefit, the mostly-marking collector’s heap-conservative configurations are now faster, because we have metadata attached to objects which allows tracing to skip known-pointerless objects. This regains an optimization that BDW has long had via its GC_malloc_atomic, used in Guile since time out of mind.

fin

With support for untagged allocations, I think I am finally ready to start getting Whippet into Guile itself. Happy hacking, and see you on the other side!

whippet lab notebook: on untagged mallocs

From wingolog by Andy Wingo

Andy Wingo

Salutations, populations. Today’s note is more of a work-in-progress than usual; I have been finally starting to look at getting Whippet into Guile, and there are some open questions.

inventory

I started by taking a look at how Guile uses the Boehm-Demers-Weiser collector‘s API, to make sure I had all my bases covered for an eventual switch to something that was not BDW. I think I have a good overview now, and have divided the parts of BDW-GC used by Guile into seven categories.

implicit uses

Firstly there are the ways in which Guile’s run-time and compiler depend on BDW-GC’s behavior, without actually using BDW-GC’s API. By this I mean principally that we assume that any reference to a GC-managed object from any thread’s stack will keep that object alive. The same goes for references originating in global variables, or static data segments more generally. Additionally, we rely on GC objects not to move: references to GC-managed objects in registers or stacks are valid across a GC boundary, even if those references are outside the GC-traced graph: all objects are pinned.

Some of these “uses” are internal to Guile’s implementation itself, and thus amenable to being changed, albeit with some effort. However some escape into the wild via Guile’s API, or, as in this case, as implicit behaviors; these are hard to change or evolve, which is why I am putting my hopes on Whippet’s mostly-marking collector, which allows for conservative roots.

defensive uses

Then there are the uses of BDW-GC’s API, not to accomplish a task, but to protect the mutator from the collector: GC_call_with_alloc_lock, explicitly enabling or disabling GC, calls to sigmask that take BDW-GC’s use of POSIX signals into account, and so on. BDW-GC can stop any thread at any time, between any two instructions; for most users is anodyne, but if ever you use weak references, things start to get really gnarly.

Of course a new collector would have its own constraints, but switching to cooperative instead of pre-emptive safepoints would be a welcome relief from this mess. On the other hand, we will require client code to explicitly mark their threads as inactive during calls in more cases, to ensure that all threads can promptly reach safepoints at all times. Swings and roundabouts?

precise tracing

Did you know that the Boehm collector allows for precise tracing? It does! It’s slow and truly gnarly, but when you need precision, precise tracing nice to have. (This is the GC_new_kind interface.) Guile uses it to mark Scheme stacks, allowing it to avoid treating unboxed locals as roots. When it loads compiled files, Guile also adds some sliced of the mapped files to the root set. These interfaces will need to change a bit in a switch to Whippet but are ultimately internal, so that’s fine.

What is not fine is that Guile allows C users to hook into precise tracing, notably via scm_smob_set_mark. This is not only the wrong interface, not allowing for copying collection, but these functions are just truly gnarly. I don’t know know what to do with them yet; are our external users ready to forgo this interface entirely? We have been working on them over time, but I am not sure.

reachability

Weak references, weak maps of various kinds: the implementation of these in terms of BDW’s API is incredibly gnarly and ultimately unsatisfying. We will be able to replace all of these with ephemerons and tables of ephemerons, which are natively supported by Whippet. The same goes with finalizers.

The same goes for constructs built on top of finalizers, such as guardians; we’ll get to reimplement these on top of nice Whippet-supplied primitives. Whippet allows for resuscitation of finalized objects, so all is good here.

misc

There is a long list of miscellanea: the interfaces to explicitly trigger GC, to get statistics, to control the number of marker threads, to initialize the GC; these will change, but all uses are internal, making it not a terribly big deal.

I should mention one API concern, which is that BDW’s state is all implicit. For example, when you go to allocate, you don’t pass the API a handle which you have obtained for your thread, and which might hold some thread-local freelists; BDW will instead load thread-local variables in its API. That’s not as efficient as it could be and Whippet goes the explicit route, so there is some additional plumbing to do.

Finally I should mention the true miscellaneous BDW-GC function: GC_free. Guile exposes it via an API, scm_gc_free. It was already vestigial and we should just remove it, as it has no sensible semantics or implementation.

allocation

That brings me to what I wanted to write about today, but am going to have to finish tomorrow: the actual allocation routines. BDW-GC provides two, essentially: GC_malloc and GC_malloc_atomic. The difference is that “atomic” allocations don’t refer to other GC-managed objects, and as such are well-suited to raw data. Otherwise you can think of atomic allocations as a pure optimization, given that BDW-GC mostly traces conservatively anyway.

From the perspective of a user of BDW-GC looking to switch away, there are two broad categories of allocations, tagged and untagged.

Tagged objects have attached metadata bits allowing their type to be inspected by the user later on. This is the happy path! We’ll be able to write a gc_trace_object function that takes any object, does a switch on, say, some bits in the first word, dispatching to type-specific tracing code. As long as the object is sufficiently initialized by the time the next safepoint comes around, we’re good, and given cooperative safepoints, the compiler should be able to ensure this invariant.

Then there are untagged allocations. Generally speaking, these are of two kinds: temporary and auxiliary. An example of a temporary allocation would be growable storage used by a C run-time routine, perhaps as an unbounded-sized alternative to alloca. Guile uses these a fair amount, as they compose well with non-local control flow as occurring for example in exception handling.

An auxiliary allocation on the other hand might be a data structure only referred to by the internals of a tagged object, but which itself never escapes to Scheme, so you never need to inquire about its type; it’s convenient to have the lifetimes of these values managed by the GC, and when desired to have the GC automatically trace their contents. Some of these should just be folded into the allocations of the tagged objects themselves, to avoid pointer-chasing. Others are harder to change, notably for mutable objects. And the trouble is that for external users of scm_gc_malloc, I fear that we won’t be able to migrate them over, as we don’t know whether they are making tagged mallocs or not.

what is to be done?

One conventional way to handle untagged allocations is to manage to fit your data into other tagged data structures; V8 does this in many places with instances of FixedArray, for example, and Guile should do more of this. Otherwise, you make new tagged data types. In either case, all auxiliary data should be tagged.

I think there may be an alternative, which would be just to support the equivalent of untagged GC_malloc and GC_malloc_atomic; but for that, I am out of time today, so type at y’all tomorrow. Happy hacking!

scikit-survival 0.24.0 released

From Posts | Sebastian Pölsterl by Sebastian Pölsterl

It’s my pleasure to announce the release of scikit-survival 0.24.0.

A highlight of this release the addition of cumulative_incidence_competing_risks() which implements a non-parameteric estimator of the cumulative incidence function in the presence of competing risks. In addition, the release adds support for scikit-learn 1.6, including the support for missing values for ExtraSurvivalTrees.

Analysis of Competing Risks

In classical survival analysis, the focus is on the time until a specific event occurs. If no event is observed during the study period, the time of the event is considered censored. A common assumption is that censoring is non-informative, meaning that censored subjects have a similar prognosis to those who were not censored.

Competing risks arise when each subject can experience an event due to one of $K$ ($K \geq 2$) mutually exclusive causes, termed competing risks. Thus, the occurrence of one event prevents the occurrence of other events. For example, after a bone marrow transplant, a patient might relapse or die from transplant-related causes (transplant-related mortality). In this case, death from transplant-related mortality precludes relapse.

The bone marrow transplant data from Scrucca et al., Bone Marrow Transplantation (2007) includes data from 35 patients grouped into two cancer types: Acute Lymphoblastic Leukemia (ALL; coded as 0), and Acute Myeloid Leukemia (AML; coded as 1).

from sksurv.datasets import load_bmt
bmt_features, bmt_outcome = load_bmt()
diseases = bmt_features["dis"].cat.rename_categories(
{"0": "ALL", "1": "AML"}
)
diseases.value_counts().to_frame()
dis count
AML 18
ALL 17

During the follow-up period, some patients might experience a relapse of the original leukemia or die while in remission (transplant related death). The outcome is defined similarly to standard time-to-event data, except that the event indicator specifies the type of event, where 0 always indicates censoring.

import pandas as pd
status_labels = {
0: "Censored",
1: "Transplant related mortality",
2: "Relapse",
}
risks = pd.DataFrame.from_records(bmt_outcome).assign(
label=lambda x: x["status"].replace(status_labels)
)
risks["label"].value_counts().to_frame()
label count
Relapse 15
Censored 11
Transplant related mortality 9

The table above shows the number of observations for each status.

Non-parametric Estimator of the Cumulative Incidence Function

If the goal is to estimate the probability of relapse, transplant-related death is a competing risk event. This means that the occurrence of relapse prevents the occurrence of transplant-related death, and vice versa. We aim to estimate curves that illustrate how the likelihood of these events changes over time.

Let’s begin by estimating the probability of relapse using the complement of the Kaplan-Meier estimator. With this approach, we treat deaths as censored observations. One minus the Kaplan-Meier estimator provides an estimate of the probability of relapse before time $t$.

import matplotlib.pyplot as plt
from sksurv.nonparametric import kaplan_meier_estimator
times, km_estimate = kaplan_meier_estimator(
bmt_outcome["status"] == 1, bmt_outcome["ftime"]
)
plt.step(times, 1 - km_estimate, where="post")
plt.xlabel("time $t$")
plt.ylabel("Probability of relapsing before time $t$")
plt.ylim(0, 1)
plt.grid()

However, this approach has a significant drawback: considering death as a censoring event violates the assumption that censoring is non-informative. This is because patients who died from transplant-related mortality have a different prognosis than patients who did not experience any event. Therefore, the estimated probability of relapse is often biased.

The cause-specific cumulative incidence function (CIF) addresses this problem by estimating the cause-specific hazard of each event separately. The cumulative incidence function estimates the probability that the event of interest occurs before time $t$, and that it occurs before any of the competing causes of an event. In the bone marrow transplant dataset, the cumulative incidence function of relapse indicates the probability of relapse before time $t$, given that the patient has not died from other causes before time $t$.

from sksurv.nonparametric import cumulative_incidence_competing_risks
times, cif_estimates = cumulative_incidence_competing_risks(
bmt_outcome["status"], bmt_outcome["ftime"]
)
plt.step(times, cif_estimates[0], where="post", label="Total risk")
for i, cif in enumerate(cif_estimates[1:], start=1):
plt.step(times, cif, where="post", label=status_labels[i])
plt.legend()
plt.xlabel("time $t$")
plt.ylabel("Probability of event before time $t$")
plt.ylim(0, 1)
plt.grid()

The plot shows the estimated probability of experiencing an event at time $t$ for both the individual risks and for the total risk.

Next, we want to to estimate the cumulative incidence curves for the two cancer types — acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) — to examine how the probability of relapse depends on the original disease diagnosis.

_, axs = plt.subplots(2, 2, figsize=(7, 6), sharex=True, sharey=True)
for j, disease in enumerate(diseases.unique()):
mask = diseases == disease
event = bmt_outcome["status"][mask]
time = bmt_outcome["ftime"][mask]
times, cif_estimates, conf_int = cumulative_incidence_competing_risks(
event,
time,
conf_type="log-log",
)
for i, (cif, ci, ax) in enumerate(
zip(cif_estimates[1:], conf_int[1:], axs[:, j]), start=1
):
ax.step(times, cif, where="post")
ax.fill_between(times, ci[0], ci[1], alpha=0.25, step="post")
ax.set_title(f"{disease}: {status_labels[i]}", size="small")
ax.grid()
for ax in axs[-1, :]:
ax.set_xlabel("time $t$")
for ax in axs[:, 0]:
ax.set_ylim(0, 1)
ax.set_ylabel("Probability of event before time $t$")

The left column shows the estimated cumulative incidence curves (solid lines) for patients diagnosed with ALL, while the right column shows the curves for patients diagnosed with AML, along with their 95% pointwise confidence intervals. The plot indicates that the estimated probability of relapse at $t=40$ days is more than three times higher for patients diagnosed with ALL compared to AML.

If you want to run the examples above yourself, you can execute them interactively in your browser using binder.

GStreamer 1.25.90 (1.26.0 rc1) pre-release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is excited to announce the first release candidate for the upcoming stable 1.26.0 feature release.

This 1.25.90 pre-release is for testing and development purposes in the lead-up to the stable 1.26 series which is now frozen for commits and scheduled for release very soon.

Depending on how things go there might be more release candidates in the next couple of days, but in any case we're aiming to get 1.26.0 out as soon as possible.

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

Orc 0.4.41 bug-fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a bug-fix release.

Highlights:

  • orccodemem: Don't modify the process umask, which caused race conditions with other threads
  • Require glibc >= 2.07
  • x86: various SSE and MMX fixes
  • avx: Fix sqrtps encoding causing an illegal instruction crash
  • Hide internal symbols from ABI and do not install internal headers
  • Rename backend to target, including `orc-backend` meson option and `ORC_BACKEND` environment variable
  • Testsuite, tools: Disambiguate OrcProgram naming conventions
  • Build: Fix `_clear_cache` call for Clang and error out on implicit function declarations
  • opcodes: Use MIN instead of CLAMP for known unsigned values to fix compiler warnings
  • ci improvements: Upload the generated .S and .bin and include Windows artifacts
  • Spelling fix in debug log message

Direct tarball download: orc-0.4.41.tar.xz.

tracepoints: gnarly but worth it

From wingolog by Andy Wingo

Andy Wingo

Hey all, quick post today to mention that I added tracing support to the Whippet GC library. If the support library for LTTng is available when Whippet is compiled, Whippet embedders can visualize the GC process. Like this!

Screenshot of perfetto showing a generational PCC trace

Click above for a full-scale screenshot of the Perfetto trace explorer processing the nboyer microbenchmark with the parallel copying collector on a 2.5x heap. Of course no image will have all the information; the nice thing about trace visualizers like is that you can zoom in to sub-microsecond spans to see exactly what is happening, have nice mouseovers and clicky-clickies. Fun times!

on adding tracepoints

Adding tracepoints to a library is not too hard in the end. You need to pull in the lttng-ust library, which has a pkg-config file. You need to declare your tracepoints in one of your header files. Then you have a minimal C file that includes the header, to generate the code needed to emit tracepoints.

Annoyingly, this header file you write needs to be in one of the -I directories; it can’t be just in the the source directory, because lttng includes it seven times (!!) using computed includes (!!!) and because the LTTng file header that does all the computed including isn’t in your directory, GCC won’t find it. It’s pretty ugly. Ugliest part, I would say. But, grit your teeth, because it’s worth it.

Finally you pepper your source with tracepoints, which probably you wrap in some macro so that you don’t have to require LTTng, and so you can switch to other tracepoint libraries, and so on.

using the thing

I wrote up a little guide for Whippet users about how to actually get traces. It’s not as easy as perf record, which I think is an error. Another ugly point. Buck up, though, you are so close to graphs!

By which I mean, so close to having to write a Python script to make graphs! Because LTTng writes its logs in so-called Common Trace Format, which as you might guess is not very common. I have a colleague who swears by it, that for him it is the lowest-overhead system, and indeed in my case it has no measurable overhead when trace data is not being collected, but his group uses custom scripts to convert the CTF data that he collects to... GTKWave (?!?!?!!).

In my case I wanted to use Perfetto’s UI, so I found a script to convert from CTF to the JSON-based tracing format that Chrome profiling used to use. But, it uses an old version of Babeltrace that wasn’t available on my system, so I had to write a new script (!!?!?!?!!), probably the most Python I have written in the last 20 years.

is it worth it?

Yes. God I love blinkenlights. As long as it’s low-maintenance going forward, I am satisfied with the tradeoffs. Even the fact that I had to write a script to process the logs isn’t so bad, because it let me get nice nested events, which most stock tracing tools don’t allow you to do.

I fixed a small performance bug because of it – a worker thread was spinning waiting for a pool to terminate instead of helping out. A win, and one that never would have shown up on a sampling profiler too. I suspect that as I add more tracepoints, more bugs will be found and fixed.

fin

I think the only thing that would be better is if tracepoints were a part of Linux system ABIs – that there would be header files to emit tracepoint metadata in all binaries, that you wouldn’t have to link to any library, and the actual tracing tools would be intermediated by that ABI in such a way that you wouldn’t depend on those tools at build-time or distribution-time. But until then, I will take what I can get. Happy tracing!

whippet at fosdem

From wingolog by Andy Wingo

Andy Wingo

Hey all, the video of my FOSDEM talk on Whippet is up:

Slides here, if that’s your thing.

I ended the talk with some puzzling results around generational collection, which prompted yesterday’s post. I don’t have a firm answer yet. Or rather, perhaps for the splay benchmark, it is to be expected that a generational GC is not great; but there are other benchmarks that also show suboptimal throughput in generational configurations. Surely it is some tuning issue; I’ll be looking into it.

Happy hacking!

GStreamer 1.25.50 unstable development release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another development release in the unstable 1.25 release series.

The unstable 1.25 release series is for testing and development purposes in the lead-up to the stable 1.26 series which is scheduled for release ASAP. Any newly-added API can still change until that point.

This development release is primarily for developers and early adopters.

The plan is to get 1.26 out of the door as quickly as possible, and with this release a feature freeze is now in effect.

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

baffled by generational garbage collection

From wingolog by Andy Wingo

Andy Wingo

Usually in this space I like to share interesting things that I find out; you might call it a research-epistle-publish loop. Today, though, I come not with answers, but with questions, or rather one question, but with fractal surface area: what is the value proposition of generational garbage collection?

hypothesis

The conventional wisdom is encapsulated in a 2004 Blackburn, Cheng, and McKinley paper, “Myths and Realities: The Performance Impact of Garbage Collection”, which compares whole-heap mark-sweep and copying collectors to their generational counterparts, using the Jikes RVM as a test harness. (It also examines a generational reference-counting collector, which is an interesting predecessor to the 2022 LXR work by Zhao, Blackburn, and McKinley.)

The paper finds that generational collectors spend less time than their whole-heap counterparts for a given task. This is mainly due to less time spent collecting, because generational collectors avoid tracing/copying work for older objects that mostly stay in the same place in the live object graph.

The paper also notes an improvement for mutator time under generational GC, but only for the generational mark-sweep collector, which it attributes to the locality and allocation speed benefit of bump-pointer allocation in the nursery. However for copying collectors, generational GC tends to slow down the mutator, probably because of the write barrier, but in the end lower collector times still led to lower total times.

So, I expected generational collectors to always exhibit lower wall-clock times than whole-heap collectors.

test workbench

In whippet, I have a garbage collector with an abstract API that specializes at compile-time to the mutator’s object and root-set representation and to the collector’s allocator, write barrier, and other interfaces. I embed it in whiffle, a simple Scheme-to-C compiler that can run some small multi-threaded benchmarks, for example the classic Gabriel benchmarks. We can then test those benchmarks against different collectors, mutator (thread) counts, and heap sizes. I expect that the generational parallel copying collector takes less time than the whole-heap parallel copying collector.

results?

So, I ran some benchmarks. Take the splay-tree benchmark, derived from Octane’s splay.js. I have a port to Scheme, and the results are... not good!

In this graph the “pcc” series is the whole-heap copying collector, and “generational-pcc” is the generational counterpart, with a nursery sized such that after each collection, its size is 2 MB times the number of active mutator threads in the last collector. So, for this test with eight threads, on my 8-core Ryzen 7 7840U laptop, the nursery is 16MB including the copy reserve, which happens to be the same size as the L3 on this CPU. New objects are kept in the nursery one cycle before being promoted to the old generation.

There are also results for “mmc” and “generational-mmc” collectors, which use an Immix-derived algorithm that allows for bump-pointer allocation but which doesn’t require a copy reserve. There, the generational collectors use a sticky mark-bit algorithm, which has very different performance characteristics as promotion is in-place, and the nursery is as large as the available heap size.

The salient point is that at all heap sizes, and for these two very different configurations (mmc and pcc), generational collection takes more time than whole-heap collection. It’s not just the splay benchmark either; I see the same thing for the very different nboyer benchmark. What is the deal?

I am honestly quite perplexed by this state of affairs. I wish I had a narrative to tie this together, but in lieu of that, voici some propositions and observations.

“generational collection is good because bump-pointer allocation”

Sometimes people say that the reason generational collection is good is because you get bump-pointer allocation, which has better locality and allocation speed. This is misattribution: it’s bump-pointer allocators that have these benefits. You can have them in whole-heap copying collectors, or you can have them in whole-heap mark-compact or immix collectors that bump-pointer allocate into the holes. Or, true, you can have them in generational collectors with a copying nursery but a freelist-based mark-sweep allocator. But also you can have generational collectors without bump-pointer allocation, for free-list sticky-mark-bit collectors. To simplify this panorama to “generational collectors have good allocators” is incorrect.

“generational collection lowers pause times”

It’s true, generational GC does lower median pause times:

But because a major collection is usually slightly more work under generational GC than in a whole-heap system, because of e.g. the need to reset remembered sets, the maximum pauses are just as big and even a little bigger:

I am not even sure that it is meaningful to compare median pause times between generational and non-generational collectors, given that the former perform possibly orders of magnitude more collections than the latter.

Doing fewer whole-heap traces is good, though, and in the ideal case, the less frequent major traces under generational collectors allows time for concurrent tracing, which is the true mitigation for long pause times.

is it whiffle?

Could it be that the test harness I am using is in some way unrepresentative? I don’t have more than one test harness for Whippet yet. I will start work on a second Whippet embedder within the next few weeks, so perhaps we will have an answer there. Still, there is ample time spent in GC pauses in these benchmarks, so surely as a GC workload Whiffle has some utility.

One reasons that Whiffle might be unrepresentative is that it is an ahead-of-time compiler, whereas nursery addresses are assigned at run-time. Whippet exposes the necessary information to allow a just-in-time compiler to specialize write barriers, for example the inline check that the field being mutated is not in the nursery, and an AOT compiler can’t encode this as an immediate. But it seems a small detail.

Also, Whiffle doesn’t do much compiler-side work to elide write barriers. Could the cost of write barriers be over-represented in Whiffle, relative to a production language run-time?

Relatedly, Whiffle is just a baseline compiler. It does some partial evaluation but no CFG-level optimization, no contification, no nice closure conversion, no specialization, and so on: is it not representative because it is not an optimizing compiler?

is it something about the nursery size?

How big should the nursery be? I have no idea.

As a thought experiment, consider the case of a 1 kilobyte nursery. It is probably too small to allow the time for objects to die young, so the survival rate at each minor collection would be high. Above a certain survival rate, generational GC is probably a lose, because your program violates the weak generational hypothesis: it introduces a needless copy for all survivors, and a synchronization for each minor GC.

On the other hand, a 1 GB nursery is probably not great either. It is plenty large enough to allow objects to die young, but the number of survivor objects in a space that large is such that pause times would not be very low, which is one of the things you would like in generational GC. Also, you lose out on locality: a significant fraction of the objects you traverse are probably out of cache and might even incur TLB misses.

So there is probably a happy medium somewhere. My instinct is that for a copying nursery, you want to make it about as big as L3 cache, which on my 8-core laptop is 16 megabytes. Systems are different sizes though; in Whippet my current heuristic is to reserve 2 MB of nursery per core that was active in the previous cycle, so if only 4 threads are allocating, you would have a 8 MB nursery. Is this good? I don’t know.

is it something about the benchmarks?

I don’t have a very large set of benchmarks that run on Whiffle, and they might not be representative. I mean, they are microbenchmarks.

One question I had was about heap sizes. If a benchmark’s maximum heap size fits in L3, which is the case for some of them, then probably generational GC is a wash, because whole-heap collection stays in cache. When I am looking at benchmarks that evaluate generational GC, I make sure to choose those that exceed L3 size by a good factor, for example the 8-mutator splay benchmark in which minimum heap size peaks at 300 MB, or the 8-mutator nboyer-5 which peaks at 1.6 GB.

But then, should nursery size scale with total heap size? I don’t know!

Incidentally, the way that I scale these benchmarks to multiple mutators is a bit odd: they are serial benchmarks, and I just run some number of threads at a time, and scale the heap size accordingly, assuming that the minimum size when there are 4 threads is four times the minimum size when there is just one thread. However, multithreaded programs are unreliable, in the sense that there is no heap size under which they fail and above which they succeed; I quote:

"Consider 10 threads each of which has a local object graph that is usually 10 MB but briefly 100MB when calculating: usually when GC happens, total live object size is 10×10MB=100MB, but sometimes as much as 1 GB; there is a minimum heap size for which the program sometimes works, but also a minimum heap size at which it always works."

is it the write barrier?

A generational collector partitions objects into old and new sets, and a minor collection starts by visiting all old-to-new edges, called the “remembered set”. As the program runs, mutations to old objects might introduce new old-to-new edges. To maintain the remembered set in a generational collector, the mutator invokes write barriers: little bits of code that run when you mutate a field in an object. This is overhead relative to non-generational configurations, where the mutator doesn’t have to invoke collector code when it sets fields.

So, could it be that Whippet’s write barriers or remembered set are somehow so inefficient that my tests are unrepresentative of the state of the art?

I used to use card-marking barriers, but I started to suspect they cause too much overhead during minor GC and introduced too much cache contention. I switched to precise field-logging barriers some months back for Whippet’s Immix-derived space, and we use the same kind of barrier in the generational copying (pcc) collector. I think this is state of the art. I need to see if I can find a configuration that allows me to measure the overhead of these barriers, independently of other components of a generational collector.

is it something about the generational mechanism?

A few months ago, my only generational collector used the sticky mark-bit algorithm, which is an unconventional configuration: its nursery is not contiguous, non-moving, and can be as large as the heap. This is part of the reason that I implemented generational support for the parallel copying collector, to have a different and more conventional collector to compare against. But generational collection loses on some of these benchmarks in both places!

is it something about collecting more often?

On one benchmark which repeatedly constructs some trees and then verifies them, I was seeing terrible results for generational GC, which I realized were because of cooperative safepoints: generational GC collects more often, so it requires that all threads reach safepoints more often, and the non-allocating verification phase wasn’t emitting any safepoints. I had to change the compiler to emit safepoints at regular intervals (in my case, on function entry), and it sped up the generational collector by a significant amount.

This is one instance of a general observation, which is that any work that doesn’t depend on survivor size in a GC pause is more expensive with a generational collector, which runs more collections. Synchronization can be a cost. I had one bug in which tracing ephemerons did work proportional to the size of the whole heap, instead of the nursery; I had to specifically add generational support for the way Whippet deals with ephemerons during a collection to reduce this cost.

is it something about collection frequency?

Looking deeper at the data, I have partial answers for the splay benchmark, and they are annoying :)

Splay doesn’t actually allocate all that much garbage. At a 2.5x heap, the stock parallel MMC collector (in-place, sticky mark bit) collects... one time. That’s all. Same for the generational MMC collector, because the first collection is always major. So at 2.5x we would expect the generational collector to be slightly slower. The benchmark is simply not very good – or perhaps the most generous interpretation is that it represents tasks that allocate 40 MB or so of long-lived data and not much garbage on top.

Also at 2.5x heap, the whole-heap copying collector runs 9 times, and the generational copying collector does 293 minor collections and... 9 major collections. We are not reducing the number of major GCs. It means either the nursery is too small, so objects aren’t dying young when they could, or the benchmark itself doesn’t conform to the weak generational hypothesis.

At a 1.5x heap, the copying collector doesn’t have enough space to run. For MMC, the non-generational variant collects 7 times, and generational MMC times out. Timing out indicates a bug, I think. Annoying!

I tend to think that if I get results and there were fewer than, like, 5 major collections for a whole-heap collector, that indicates that the benchmark is probably inapplicable at that heap size, and I should somehow surface these anomalies in my analysis scripts.

collecting more often redux

Doing a similar exercise for nboyer at 2.5x heap with 8 threads (4GB for 1.6GB live data), I see that pcc did 20 major collections, whereas generational pcc lowered that to 8 major collections and 3471 minor collections. Could it be that there are still too many fixed costs associated with synchronizing for global stop-the-world minor collections? I am going to have to add some fine-grained tracing to find out.

conclusion?

I just don’t know! I want to believe that generational collection was an out-and-out win, but I haven’t yet been able to prove it is true.

I do have some homework to do. I need to find a way to test the overhead of my write barrier – probably using the MMC collector and making it only do major collections. I need to fix generational-mmc for splay and a 1.5x heap. And I need to do some fine-grained performance analysis for minor collections in large heaps.

Enough for today. Feedback / reactions very welcome. Thanks for reading and happy hacking!

PipeWire ♥ Sovereign Tech Agency

From Arun Raghavan by Arun Raghavan

In my previous post, I alluded to an exciting development for PipeWire. I’m now thrilled to officially announce that Asymptotic will be undertaking several important tasks for the project, thanks to funding from the Sovereign Tech Fund (now part of the Sovereign Tech Agency).

Some of you might be familiar with the Sovereign Tech Fund from their funding for GNOME, GStreamer and systemd – they have been investing in foundational open source technology, supporting the digital commons in key areas, a mission closely aligned with our own.

We will be tackling three key areas of work.

ASHA hearing aid support

I wrote a bit about our efforts on this front. We have already completed the PipeWire support for single ASHA hearing aids, and are actively working on support for stereo pairs.

Improvements to GStreamer elements

We have been working through the GStreamer+PipeWire todo list, fixing bugs and making it easier to build audio and video streaming pipelines on top of PipeWire. A number of usability improvements have already landed, and more work on this front continues

A Rust-based client library

While we have a pretty functional set of Rust bindings around the C-based libpipewire already, we will be creating a pure Rust implementation of a PipeWire client, and provide that via a C API as well.

There are a number of advantages to this: type and memory safety being foremost, but we can also leverage Rust macros to eliminate a lot of boilerplate (there are community efforts in this direction already that we may be able to build upon).

This is a large undertaking, and this funding will allow us to tackle a big chunk of it – we are excited, and deeply appreciative of the work the Sovereign Tech Agency is doing in supporting critical open source infrastructure.

Watch this space for more updates!

Looking ahead at 2025 and Fedora Workstation and jobs on offer!

From Christian F.K. Schaller by Christian Schaller

Christian Schaller

So a we are a little bit into the new year I hope everybody had a great break and a good start of 2025. Personally I had a blast having gotten the kids an air hockey table as a Yuletide present :). Anyway, wanted to put this blog post together talking about what we are looking at for the new year and to let you all know that we are hiring.

Artificial Intelligence
One big item on our list for the year is looking at ways Fedora Workstation can make use of artificial intelligence. Thanks to IBMs Granite effort we know have an AI engine that is available under proper open source licensing terms and which can be extended for many different usecases. Also the IBM Granite team has an aggressive plan for releasing updated versions of Granite, incorporating new features of special interest to developers, like making Granite a great engine to power IDEs and similar tools. We been brainstorming various ideas in the team for how we can make use of AI to provide improved or new features to users of GNOME and Fedora Workstation. This includes making sure Fedora Workstation users have access to great tools like RamaLama, that we make sure setting up accelerated AI inside Toolbx is simple, that we offer a good Code Assistant based on Granite and that we come up with other cool integration points.

Wayland
The Wayland community had some challenges last year with frustrations boiling over a few times due to new protocol development taking a long time. Some of it was simply the challenge of finding enough people across multiple projects having the time to follow up and help review while other parts are genuine disagreements of what kind of things should be Wayland protocols or not. That said I think that problem has been somewhat resolved with a general understanding now that we have the ‘ext’ namespace for a reason, to allow people to have a space to review and make protocols without an expectation that they will be universally implemented. This allows for protocols of interest only to a subset of the community going into ‘ext’ and thus allowing protocols that might not be of interest to GNOME and KDE for instance to still have a place to live.

The other more practical problem is that of having people available to help review protocols or providing reference implementations. In a space like Wayland where you need multiple people from multiple different projects it can be hard at times to get enough people involved at any given time to move things forward, as different projects have different priorities and of course the developers involved might be busy elsewhere. One thing we have done to try to help out there is to set up a small internal team, lead by Jonas Ådahl, to discuss in-progress Wayland protocols and assign people the responsibility to follow up on those protocols we have an interest in. This has been helpful both as a way for us to develop internal consensus on the best way forward, but also I think our contribution upstream has become more efficient due to this.

All that said I also believe Wayland protocols will fade a bit into the background going forward. We are currently at the last stage of a community ‘ramp up’ on Wayland and thus there is a lot of focus on it, but once we are over that phase we will probably see what we saw with X.org extensions over time, that for the most time new extensions are so niche that 95% of the community don’t pay attention or care. There will always be some new technology creating the need for important new protocols, but those are likely to come along a relatively slow cadence.

High Dynamic Range

HDR support in GNOME Control Center

HDR support in GNOME Control Center

As for concrete Wayland protocols the single biggest thing for us for a long while now has of course been the HDR support for Linux. And it was great to see the HDR protocol get merged just before the holidays. I also want to give a shout out to Xaver Hugl from the KWin project. As we where working to ramp up HDR support in both GNOME Shell and GTK+ we ended up working with Xaver and using Kwin for testing especially the GTK+ implementation. Xaver was very friendly and collaborative and I think HDR support in both GNOME and KDE is more solid thanks to that collaboration, so thank you Xaver!

Talking about concrete progress on HDR support Jonas Adahl submitted merge requests for HDR UI controls for GNOME Control Center. This means you will be able to configure the use of HDR on your system in the next Fedora Workstation release.

PipeWire
I been sharing a lot of cool PipeWire news here in the last couple of years, but things might slow down a little as we go forward just because all the major features are basically working well now. The PulseAudio support is working well and we get very few bug reports now against it. The reports we are getting from the pro-audio community is that PipeWire works just as well or better as JACK for most people in terms of for instance latency, and when we do see issues with pro-audio it tends to be more often caused by driver issues triggered by PipeWire trying to use the device in ways that JACK didn’t. We been resolving those by adding more and more options to hardcode certain options in PipeWire, so that just as with JACK you can force PipeWire to not try things the driver has problems with. Of course fixing the drivers would be the best outcome, but for some of these pro-audio cards they are so niche that it is hard to find developers who wants to work on them or who has hardware to test with.

We are still maturing the video support although even that is getting very solid now. The screen capture support is considered fully mature, but the camera support is still a bit of a work in progress, partially because we are going to a generational change the camera landscape with UVC cameras being supplanted by MIPI cameras. Resolving that generational change isn’t just on PipeWire of course, but it does make the a more volatile landscape to mature something in. Of course an advantage here is that applications using PipeWire can easily switch between V4L2 UVC cameras and libcamera MIPI cameras, thus helping users have a smooth experience through this transition period.
But even with the challenges posed by this we are moving rapidly forward with Firefox PipeWire camera support being on by default in Fedora now, Chrome coming along quickly and OBS Studio having PipeWire support for some time already. And last but not least SDL3 is now out with PipeWire camera support.

MIPI camera support
Hans de Goede, Milan Zamazal and Kate Hsuan keeps working on making sure MIPI cameras work under Linux. MIPI cameras are a step forward in terms of technical capabilities, but at the moment a bit of a step backward in terms of open source as a lot of vendors believe they have ‘secret sauce’ in the MIPI camera stacks. Our works focuses mostly on getting the Intel MIPI stack fully working under Linux with the Lattice MIPI aggregator being the biggest hurdle currently for some laptops. Luckily Alan Stern, the USB kernel maintainer, is looking at this now as he got the hardware himself.

Flatpak
Some major improvements to the Flatpak stack has happened recently with the USB portal merged upstream. The USB portal came out of the Sovereign fund funding for GNOME and it gives us a more secure way to give sandboxed applications access to you USB devcices. In a somewhat related note we are still working on making system daemons installable through Flatpak, with the usecase being applications that has a system daemon to communicate with a specific piece of hardware for example (usually through USB). Christian Hergert got this on his todo list, but we are at the moment waiting for Lennart Poettering to merge some pre-requisite work into systemd that we want to base this on.

Accessibility
We are putting in a lot of effort towards accessibility these days. This includes working on portals and Wayland extensions to help facilitate accessibility, working on the ORCA screen reader and its dependencies to ensure it works great under Wayland. Working on GTK4 to ensure we got top notch accessibility support in the toolkit and more.

GNOME Software
Last year Milan Crha landed the support for signing the NVIDIA driver for use on secure boot. The main feature Milan he is looking at now is getting support for DNF5 into GNOME Software. Doing this will resolve one of the longest standing annoyances we had, which is that the dnf command line and GNOME Software would maintain two separate package caches. Once the DNF5 transition is done that should be a thing of the past and thus less risk of disk space being wasted on an extra set of cached packages.

Firefox
Martin Stransky and Jan Horak has been working hard at making Firefox ready for the future, with a lot of work going into making sure it supports the portals needed to function as a flatpak and by bringing HDR support to Firefox. In fact Martin just got his HDR patches for Firefox merged this week. So with the PipeWire camera support, Flatpak support and HDR support in place, Firefox will be ready for the future.

We are hiring! looking for 2 talented developers to join the Red Hat desktop team
We are hiring! So we got 2 job openings on the Red Hat desktop team! So if you are interested in joining us in pushing the boundaries of desktop linux forward please take a look and apply. For these 2 positions we are open to remote workers across the globe and while the job adds list specific seniorities we are somewhat flexible on that front too for the right candidate. So be sure to check out the two job listings and get your application in! If you ever wanted to work fulltime on GNOME and related technologies this is your chance.

GStreamer 1.24.12 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • d3d12: Fix shaders failing to compile with newer dxc versions
  • decklinkvideosink: Fix handling of caps framerate in auto mode; also a decklinkaudiosink fix
  • devicemonitor: Fix potential crash macOS when a device is unplugged
  • gst-libav: Fix crash in audio encoders like avenc_ac3 if input data has insufficient alignment
  • gst-libav: Fix build against FFmpeg 4.2 as in Ubuntu 20.04
  • gst-editing-services: Fix Python library name fetching on Windows
  • netclientclock: Don't store failed internal clocks in the cache, so applications can re-try later
  • oggdemux: Seeking and duration fixes
  • osxaudiosrc: Fixes for failing init/no output on recent iOS versions
  • qtdemux: Use mvhd transform matrix and support for flipping
  • rtpvp9pay: Fix profile parsing
  • splitmuxsrc: Fix use with decodebin3 which would occasionally fail with an assertion when seeking
  • tsdemux: Fix backwards PTS wraparound detection with ignore-pcr=true
  • video-overlay-composition: Declare the video/size/orientation tags for the meta and implement scale transformations
  • vtdec: Fix seeks occasionally hanging on macOS due to a race condition when draining
  • webrtc: Fix duplicate payload types with RTX and multiple video codecs
  • win32-pluginloader: Make sure not to create any windows when inspecting plugins
  • wpe: Various fixes for re-negotiation, latency reporting, progress messages on startup
  • x264enc: Add missing data to AvcDecoderConfigurationRecord in codec_data for high profile variants
  • cerbero: Support using ccache with cmake if enabled
  • Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.12 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

C idioms; it’s the processor, noob

From Planet – Felipe Contreras by Felipe Contreras

I explain why !p isn't just a nitpick in C: it's idiomatic.

here we go again

From wingolog by Andy Wingo

Andy Wingo

Good evening, fey readers. Tonight, a note on human rights and human wrongs.

I am in my mid-forties, and so I have seen some garbage governments in my time; one of the worst was Trump’s election in 2016. My heart ached in so many ways, but most of all for immigrants in the US. It has always been expedient for a politician to blame problems on those outside the polity, and in recent years it has been open season on immigration: there is always a pundit ready to make immigrants out to be the source of a society’s ills, always a publisher ready to distribute their views, always a news channel ready to invite the pundit to discuss these Honest Questions, and never will they actually talk to an immigrant. It gives me a visceral sense of revulsion, as much now as in 2016.

What to do? All of what was happening in my country was at a distance. And there is this funny thing that you don’t realize inside the US, in which the weight of the US’s cultural influence is such that the concerns of the US are pushed out into the consciousness of the rest of the world and made out to have a kind of singular, sublime importance, outweighing any concern that people in South Africa or Kosovo or Thailand might be having; and even to the point of outweighing what is happening in your own country of residence. I remember the horror of Europeans upon hearing of Trump’s detention camps—and it is right to be horrified!—but the same people do not know what is happening at and within their own borders, the network of prisons and flophouses and misery created by Europe’s own xenophobic project. The cultural weight of the US is such that it can blind the rest of the world into ignorance of the local, and thus to inaction, there at the place where the rest of us actually have power to act.

I could not help immigrants in the US, practically speaking. So I started to help immigrants in France. I joined the local chapter of the Ligue des droits de l’Homme, an organization with a human rights focus but whose main activity was a weekly legal and administrative advice clinic. I had gone through the immigration pipeline and could help others.

It has been interesting work. One thing that you learn quickly is that not everything that the government does is legal. Sometimes this observation takes the form of an administrative decision that didn’t respect the details of a law. Sometimes it’s something related to the hierarchy of norms, for example that a law’s intent was not reflected in the way it was translated to the operational norms used by, say, asylum processing agents. Sometimes it’s that there was no actual higher norm, but that the norms are made by the people who show up, and if it’s only the cops that show up, things get tilted copwards.

A human-rights approach is essentially liberal, and I don’t know if it is enough in these the end days of a liberal rule of law. It is a tool. But for what I see, it is enough for me right now: there is enough to do, I can make enough meaningful progress for people that the gaping hole in my soul opened by the Trumpocalypse has started to close. I found my balm in this kind of work, but there are as many paths as people, and surely yours will be different.

So, friends, here we are in 2025: new liver, same eagles. These are trying times. Care for yourself and for your loved ones. For some of you, that will be as much as you can do; we all have different situations. But for the rest of us, and especially those who are not really victims of marginalization, we can do things, and it will help people, and what’s more, it will make you feel better. Find some comrades, and reach your capacity gradually; you’re no use if you burn out. I don’t know what is on the other side of this, but in the meantime let’s not make it easy for the bastards.

GStreamer 1.25.1 unstable development release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce the first development release in the unstable 1.25 release series.

The unstable 1.25 release series is for testing and development purposes in the lead-up to the stable 1.26 series which is scheduled for release ASAP. Any newly-added API can still change until that point.

This development release is primarily for developers and early adopters.

The plan is to get 1.26 out of the door as quickly as possible.

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

an annoying failure mode of copying nurseries

From wingolog by Andy Wingo

Andy Wingo

I just found a funny failure mode in the Whippet garbage collector and thought readers might be amused.

Say you have a semi-space nursery and a semi-space old generation. Both are block-structured. You are allocating live data, say, a long linked list. Allocation fills the nursery, which triggers a minor GC, which decides to keep everything in the nursery another round, because that’s policy: Whippet gives new objects another cycle in which to potentially become unreachable.

This causes a funny situation!

Consider that the first minor GC doesn’t actually free anything. But, like, nothing: it’s impossible to allocate anything in the nursery after collection, so you run another minor GC, which promotes everything, and you’re back to the initial situation, wash rinse repeat. Copying generational GC is strictly a pessimization in this case, with the additional insult that it doesn’t preserve object allocation order.

Consider also that because copying collectors with block-structured heaps are unreliable, any one of your minor GCs might require more blocks after GC than before. Unlike in the case of a major GC in which this essentially indicates out-of-memory, either because of a mutator bug or because the user didn’t give the program enough heap, for minor GC this is just what we expect when allocating a long linked list.

Therefore we either need to allow a minor GC to allocate fresh blocks – very annoying, and we have to give them back at some point to prevent the nursery from growing over time – or we need to maintain some kind of margin, corresponding to the maximum amount of fragmentation. Or, or, we allow evacuation to fail in a minor GC, in which case we fall back to promotion.

Anyway, I am annoyed and amused and I thought others might share in one or the other of these feelings. Good day and happy hacking!

ephemerons vs generations in whippet

From wingolog by Andy Wingo

Andy Wingo

Happy new year, hackfolk! Today, a note about ephemerons. I thought I was done with them, but it seems they are not done with me. The question at hand is, how do we efficiently and correctly implement ephemerons in a generational collector? Whippet‘s answer turns out to be simple but subtle.

on oracles

The deal is, I want to be able to evaluate different collector constructions and configurations, and for that I need a performance oracle: a known point in performance space-time against which to compare the unknowns. For example, I want to know how a sticky mark-bit approach to generational collection does relative to the conventional state of the art. To do that, I need to build a conventional system to compare against! If I manage to do a good job building the conventional evacuating nursery, it will have similar performance characteristics as other nurseries in other unlike systems, and thus I can use it as a point of comparison, even to systems I haven’t personally run myself.

So I am adapting the parallel copying collector I described last July to have generational support: a copying (evacuating) young space and a copying old space. Ideally then I’ll be able to build a collector with a copying young space (nursery) but a mostly-marking nofl old space.

notes on a copying nursery

A copying nursery has different operational characteristics than a sticky-mark-bit nursery, in a few ways. One is that a sticky mark-bit nursery will promote all survivors at each minor collection, leaving the nursery empty when mutators restart. This has the pathology that objects allocated just before a minor GC aren’t given a chance to “die young”: a sticky-mark-bit GC over-promotes.

Contrast that to a copying nursery, which can decide to promote a survivor or leave it in the young generation. In Whippet the current strategy for the parallel-copying nursery I am working on is to keep freshly allocated objects around for another collection, and only promote them if they are live at the next collection. We can do this with a cheap per-block flag, set if the block has any survivors, which is the case if it was allocated into as part of evacuation during minor GC. This gives objects enough time to die young while not imposing much cost in the way of recording per-object ages.

Recall that during a GC, all inbound edges from outside the graph being traced must be part of the root set. For a minor collection where we just trace the nursery, that root set must include all old-to-new edges, which are maintained in a data structure called the remembered set. Whereas for a sticky-mark-bit collector the remembered set will be empty after each minor GC, for a copying collector this may not be the case. An existing old-to-new remembered edge may be unnecessary, because the target object was promoted; we will clear these old-to-old links at some point. (In practice this is done either in bulk during a major GC, or the next time the remembered set is visited during the root-tracing phase of a minor GC.) Or we could have a new-to-new edge which was not in the remembered set before, but now because the source of the edge was promoted, we must adjoin this old-to-new edge to the remembered set.

To preserve the invariant that all edges into the nursery are part of the roots, we have to pay special attention to this latter kind of edge: we could (should?) remove old-to-promoted edges from the remembered set, but we must add promoted-to-survivor edges. The field tracer has to have specific logic that applies to promoted objects during a minor GC to make the necessary remembered set mutations.

other object kinds

In Whippet, “small” objects (less than 8 kilobytes or so) are allocated into block-structed spaces, and large objects have their own space which is managed differently. Notably, large objects are never moved. There is generational support, but it is currently like the sticky-mark-bit approach: any survivor is promoted. Probably we should change this at some point, at least for collectors that don’t eagerly promote all objects during minor collections.

finalizers?

Finalizers keep their target objects alive until the finalizer is run, which effectively makes each finalizer part of the root set. Ideally we would have a separate finalizer table for young and old objects, but currently Whippet just has one table, which we always fully traverse at the end of a collection. This effectively adds the finalizer table to the remembered set. This is too much work—there is no need to visit finalizers for old objects in a minor GC—but it’s not incorrect.

ephemerons

So what about ephemerons? Recall that an ephemeron is an object E×KV in which there is an edge from E to V if and only if both E and K are live. Implementing this conjunction is surprisingly gnarly; you really want to discover live ephemerons while tracing rather than maintaining a global registry as we do with finalizers. Whippet’s algorithm is derived from what SpiderMonkey does, but extended to be parallel.

The question is, how do we implement ephemeron-reachability while also preserving the invariant that all old-to-new edges are part of the remembered set?

For Whippet, the answer turns out to be simple: an ephemeron E is never older than its K or V, by construction, and we never promote E without also promoting (if necessary) K and V. (Ensuring this second property is somewhat delicate.) In this way you never have an old E and a young K or V, so no edge from an ephemeron need ever go into the remembered set. We still need to run the ephemeron tracing algorithm for any ephemerons discovered as part of a minor collection, but we don’t need to fiddle with the remembered set. Phew!

conclusion

As long all promoted objects are older than all survivors, and that all ephemerons are younger than the objects referred to by their key and value edges, Whippet’s parallel ephemeron tracing algorithm will efficiently and correctly trace ephemeron edges in a generational collector. This applies trivially for a sticky-mark-bit collector, which always promotes and has no survivors, but it also holds for a copying nursery that allows for survivors after a minor GC, as long as all survivors are younger than all promoted objects.

Until next time, happy hacking in 2025!

GStreamer 1.24.11 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • playback: Fix SSA/ASS subtitles with embedded fonts
  • decklink: add missing video modes and fix 8K video modes
  • matroskamux: spec compliance fixes for audio-only files
  • onnx: disable onnxruntime telemetry
  • qtdemux: Fix base offset update when doing segment seeks
  • srtpdec: Fix a use-after-free issue
  • (uri)decodebin3: Fix stream change scenarios, possible deadlock on shutdown
  • video: fix missing alpha flag in AV12 format description
  • avcodecmap: Add some more channel position mappings
  • cerbero bootstrap fixes for Windows 11
  • Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.11 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

A Brimful of ASHA

From Arun Raghavan by Arun Raghavan

It’s 2025(!), and I thought I’d kick off the year with a post about some work that we’ve been doing behind the scenes for a while. Grab a cup of $beverage_of_choice, and let’s jump in with some context.

History: Hearing aids and Bluetooth

Various estimates put the number of people with some form of hearing loss at 5% of the population. Hearing aids and cochlear implants are commonly used to help deal with this (I’ll use “hearing aid” or “HA” in this post, but the same ideas apply to both). Historically, these have been standalone devices, with some primitive ways to receive audio remotely (hearing loops and telecoils).

As you might expect, the last couple of decades have seen advances that allow consumer devices (such as phones, tablets, laptops, and TVs) to directly connect to hearing aids over Bluetooth. This can provide significant quality of life improvements – playing audio from a device’s speakers means the sound is first distorted by the speakers, and then by the air between the speaker and the hearing aid. Avoiding those two steps can make a big difference in the quality of sound that reaches the user.

An illustration of the audio path through air vs. wireless audio (having higher fidelity)
Comparison of audio paths

Unfortunately, the previous Bluetooth audio standards (BR/EDR and A2DP – used by most Bluetooth audio devices you’ve come across) were not well-suited for these use-cases, especially from a power-consumption perspective. This meant that HA users would either have to rely on devices using proprietary protocols (usually limited to Apple devices), or have a cumbersome additional dongle with its own battery and charging needs.

Recent Past: Bluetooth LE

The more recent Bluetooth LE specification addresses some of the issues with the previous spec (now known as Bluetooth Classic). It provides a low-power base for devices to communicate with each other, and has been widely adopted in consumer devices.

On top of this, we have the LE Audio standard, which provides audio streaming services over Bluetooth LE for consumer audio devices and HAs. The hearing aid industry has been an active participant in its development, and we should see widespread support over time, I expect.

The base Bluetooth LE specification has been around from 2010, but the LE Audio specification has only been public since 2021/2022. We’re still seeing devices with LE Audio support trickle into the market.

In 2018, Google partnered with a hearing aid manufacturer to announce the ASHA (Audio Streaming for Hearing Aids) protocol, presumably as a stop-gap. The protocol uses Bluetooth LE (but not LE Audio) to support low-power audio streaming to hearing aids, and is publicly available. Several devices have shipped with ASHA support in the last ~6 years.

A brief history of Bluetooth LE and audio

Hot Take: Obsolescence is bad UX

As end-users, we understand the push/pull of technological advancement and obsolescence. As responsible citizens of the world, we also understand the environmental impact of this.

The problem is much worse when we are talking about medical devices. Hearing aids are expensive, and are expected to last a long time. It’s not uncommon for people to use the same device for 5-10 years, or even longer.

In addition to the financial cost, there is also a significant emotional cost to changing devices. There is usually a period of adjustment during which one might be working with an audiologist to tune the device to one’s hearing. Neuroplasticity allows the brain to adapt to the device and extract more meaning over time. Changing devices effectively resets the process.

All this is to say that supporting older devices is a worthy goal in itself, but has an additional set of dimensions in the context of accessibility.

HAs and Linux-based devices

Because of all this history, hearing aid manufacturers have traditionally focused on mobile devices (i.e. Android and iOS). This is changing, with Apple supporting its proprietary MFi (made for iPhone/iPad/iPod) protocol on macOS, and Windows adding support for LE Audio on Windows 11.

This does leave the question of Linux-based devices, which is our primary concern – can users of free software platforms also have an accessible user experience?

A lot of work has gone into adding Bluetooth LE support in the Linux kernel and BlueZ, and more still to add LE Audio support. PipeWire’s Bluetooth module now includes support for LE Audio, and there is continuing effort to flesh this out. Linux users with LE Audio-based hearing aids will be able to take advantage of all this.

However, the ASHA specification was only ever supported on Android devices. This is a bit of a shame, as there are likely a significant number of hearing aids out there with ASHA support, which will hopefully still be around for the next 5+ years. This felt like a gap that we could help fill.

Step 1: A Proof-of-Concept

We started out by looking at the ASHA specification, and the state of Bluetooth LE in the Linux kernel. We spotted some things that the Android stack exposes that BlueZ does not, but it seemed like all the pieces should be there.

Friend-of-Asymptotic, Ravi Chandra Padmala spent some time with us to implement a proof-of-concept. This was a pretty intense journey in itself, as we had to identify some good reference hardware (we found an ASHA implementation on the onsemi RSL10), and clean out the pipes between the kernel and userspace (LE connection-oriented channels, which ASHA relies on, weren’t commonly used at that time).

We did eventually get the proof-of-concept done, and this gave us confidence to move to the next step of integrating this into BlueZ – albeit after a hiatus of paid work. We have to keep the lights on, after all!

Step 2: ASHA in BlueZ

The BlueZ audio plugin implements various audio profiles within the BlueZ daemon – this includes A2DP for Bluetooth Classic, as well as BAP for LE Audio.

We decided to add ASHA support within this plugin. This would allow BlueZ to perform privileged operations and then hand off a file descriptor for the connection-oriented channel, so that any userspace application (such as PipeWire) could actually stream audio to the hearing aid.

I implemented an initial version of the ASHA profile in the BlueZ audio plugin last year, and thanks to Luiz Augusto von Dentz’ guidance and reviews, the plugin has landed upstream.

This has been tested with a single hearing aid, and stereo support is pending. In the process, we also found a small community of folks with deep interest in this subject, and you can join us on #asha on the BlueZ Slack.

Step 3: PipeWire support

To get end-to-end audio streaming working with any application, we need to expose the BlueZ ASHA profile as a playback device on the audio server (i.e., PipeWire). This would make the HAs appear as just another audio output, and we could route any or all system audio to it.

My colleague, Sanchayan Maity, has been working on this for the last few weeks. The code is all more or less in place now, and you can track our progress on the PipeWire MR.

Step 4 and beyond: Testing, stereo support, …

Once we have the basic PipeWire support in place, we will implement stereo support (the spec does not support more than 2 channels), and then we’ll have a bunch of testing and feedback to work with. The goal is to make this a solid and reliable solution for folks on Linux-based devices with hearing aids.

Once that is done, there are a number of UI-related tasks that would be nice to have in order to provide a good user experience. This includes things like combining the left and right HAs to present them as a single device, and access to any tuning parameters.

Getting it done

This project has been on my mind since the ASHA specification was announced, and it has been a long road to get here. We are in the enviable position of being paid to work on challenging problems, and we often contribute our work upstream. However, there are many such projects that would be valuable to society, but don’t necessarily have a clear source of funding.

In this case, we found ourselves in an interesting position – we have the expertise and context around the Linux audio stack to get this done. Our business model allows us the luxury of taking bites out of problems like this, and we’re happy to be able to do so.

However, it helps immensely when we do have funding to take on this work end-to-end – we can focus on the task entirely and get it done faster.

Onward…

I am delighted to announce that we were able to find the financial support to complete the PipeWire work! Once we land basic mono audio support in the MR above, we’ll move on to implementing stereo support in the BlueZ plugin and the PipeWire module. We’ll also be testing with some real-world devices, and we’ll be leaning on our community for more feedback.

This is an exciting development, and I’ll be writing more about it in a follow-up post in a few days. Stay tuned!

Igalia Multimedia achievements (2024)

From Herostratus’ legacy by Víctor Jáquez

As 2024 draws to a close, it’s a perfect time to reflect on the year’s accomplishments done by the Multimedia team in Igalia. In our consideration, there were three major achievements:

  1. WebRTC’s support in WPE/WebKitGTK with GStreamer.
  2. GES maturity improved for real-life use-cases.
  3. Vulkan Video development and support in GStreamer.

WebRTC support in WPE/WebKitGTK with GStreamer #

WPE and WebKitGTK are WebKit ports maintained by Igalia, the former for embedded devices and the latter for applications with a full-featured Web integration.

WebRTC is a web API that allows real-time communication (RTC) directly between web browser and applications. Examples of these real-time communications are video conferencing, cloud gaming, live-streaming, etc.

Some WebKit ports support libwebrtc, an open-source library that implements the WebRTC specification, developed and maintained by Google. WPE and WebKitGTK originally also supports libwebrtc, but we started to use also GstWebRTC, a set of GStreamer plugins and libraries that implement WebRTC, which adapts perfectly to the multimedia implementation in both ports, also in GStreamer.

This year the fruits of this work have been unlocked by enabling Amazon Luna gaming:

https://www.youtube.com/watch?v=lyO7Hqj1jMs

And also enabling a CAD modeling, server-side rendered service, known as Zoo:

https://www.youtube.com/watch?v=CiuYjSCDsUM

WebKit Multimedia #

WebKit made significant improvements in multimedia handling, addressing various issues to enhance stability and playback quality. Key updates include preventing premature play() calls during seeking, fixing memory leaks. The management of track identifiers was also streamlined by transitioning from string-based to integer-based IDs. Additionally, GStreamer-related race conditions were resolved to prevent hangs during playback state transitions. Memory leaks in WebAudio and event listener management were addressed, along with a focus on memory usage optimizations.

The handling of media buffering and seeking was enhanced with buffering hysteresis for smoother playback. Media Source Extensions (MSE) behavior was refined to improve playback accuracy, such as supporting markEndOfStream() before appendBuffer() and simplifying playback checks. Platform-specific issues were also tackled, including AV1 and Opus support for encrypted media and better detection of audio sinks. And other improvements on multimedia performance and efficiency.

GES maturity improved for real-live use-cases #

GStreamer Editing Services (GES) is a set of GStreamer plugins and a library that allow non-linear video editing. For example, GES is what’s behind of Pitivi, the open source video editor application.

Last year, GES was deployed in web-based video editors, where the actual video processing is done server-side. These projects allowed, in great deal, the enhancement and maturity of the library and plugins.

Tella is a browser-based tool that allow to screen record and webcam, without any extra software. Finished the recording, the user can edit, in the browser, the video and publish it.

https://www.youtube.com/watch?v=uSWqWHBRDWE

Sequence is a complete, browser-based, video editor with collaborative features. GES is used in the backend to render the editing operations.

https://www.youtube.com/watch?v=bXNdDIiG9lE

Vulkan Video development and GStreamer support #

The last but not the least, this year we continue our work with the Vulkan Video ecosystem by working the task subgroup (TSG) enabling H.264/H.265 encoding, and AV1 decoding and encoding.

Early this year we delivered a talk in the Vulkanised about our work, which ranges from the Conformance Test Suite (CTS), Mesa, and GStreamer.

https://www.youtube.com/watch?v=z1HcWrmdwzI

Conclusion #

As we wrap up 2024, it’s clear that the year has been one of significant progress, driven by innovation and collaboration. Here’s to continuing the momentum and making 2025 even better!

GStreamer + PipeWire: A Todo List

From Arun Raghavan by Arun Raghavan

I wrote about our time at the GStreamer Conference in October, and one important thing I was able to do is spend some time with all-around great guy George reflecting on where the GStreamer plugins for PipeWire are, and what we need to do to get them to a rock-solid state.

This is a summary of our conversation, in the form of a to-do list of sorts…

Status Quo

Currently, we have two elements: pipewiresrc and pipewiresink. The two plugins work with both audio and video, and instantiate a PipeWire capture and playback stream, respectively. The stream, as with any PipeWire client, appears as a node in the PipeWire.

Buffers are managed in the GStreamer pipeline using bufferpools, and recently Wim re-enabled exposing the stream clock as a GStreamer clock.

There have been a number of issues that have cropped up over time, and we’ve been plugging away at addressing them, but it was worth stepping back and looking at the whole for a bit.

Use Cases

The straightforward uses of these elements might be to represent client streams: pipewiresrc might connect to an audio capture device (like a microphone), or video capture device (like a webcam), and provide the data for downstream elements to consume. Similarly pipewiresink might be used to play audio to the system output (speakers or headphones, perhaps).

Because of the flexibility of the PipeWire API, these elements may also be used to provide a virtual capture or playback device though. So pipewiresrc might provide a virtual audio sink, which applications could connect to to stream audio over the network (like say a WebRTC stream).

Conversely, it is possible to use pipewiresink to provide a virtual capture device – for example, the pipeline might generate a video stream and expose it a virtual camera for other applications to use.

We might even combine the two cases, one might connect to a webcam as a client, apply some custom video processing, and then expose that stream back as a virtual camera source as easily as:

pipewiresrc target-object="MyCamera" ! <some video filters> ! \
  pipewiresink provide=true stream-properties="props,media.class=Video/Source,media.role=Camera"

So we have a minor combinatorial explosion across 3 axes, and all combinations are valid:

  • pipewiresrc vs. pipewiresink
  • audio vs. video
  • stream vs. virtual device

For each of these combinations, we might have different behaviour across the various issues below.

Split ’em up?

Before we look at specific issues, it is worth pointing out that the PipeWire elements are unusual in that they support both audio and video with the same code. This seems like a tantalisingly elegant idea, and it’s quite neat that we are able to get this far with this unified approach.

However, as we examine the specific issues we are seeing, it does seem to emerge that the audio and video paths diverge in several ways. It may be time to consider whether the divergence merits just splitting them up into separate audio and video elements.

Linking

The first issue that comes to mind is how we might want PipeWire or WirePlumber to manage linking the nodes from the GStreamer pipeline with other nodes (devices or streams).

For the playback/capture stream use-cases, we would want the nodes to automatically be connected to a sink/source node when the GStreamer pipeline goes to the PAUSED or PLAYING state, and for that link to be torn down when leaving those states. It might be possible for the link to “move” if, for example, the default playback or capture device changes, though a “move” is really the removal of the current link with a new link following.

For the virtual device use-cases, the pipeline state should likely follow the link state. That is, when a node is connected to our virtual device, we want the GStreamer pipeline to start producing/consuming data, and when disconnected, it should go back to “sleep”, possibly running again later.

The latter is something that a GStreamer application using these plugins might have to manage manually, but simplifying this and supporting this via gst-launch-1.0 for easy command-line use would be nice to have.

There are already the beginnings of support for such usage via the provide property on pipewiresink, but more work is needed for this to make this truly usable.

Bufferpools

Closely related to linking are buffers and bufferpools, as the process of linking nodes is what makes buffers for data exchange available to PipeWire nodes.

While bufferpools are a valuable concept for memory efficiency and avoiding unnecessary memcpy()s, they come with some complexity overhead in managing the pipeline. For one, as the number of buffers in a bufferpool is limited, it is possible to exhaust the set of buffers (with a large queue for example).

There are also some lifecycle complexities that arise from links coming and going, as the corresponding buffers also then go away from under us, something that GStreamer bufferpools are not designed for.

A solution to the first problem might be to avoid using bufferpools for some cases (for example, they might not be very valuable for audio). The solution to the lifecycle problem is a trickier one, and no clear answer is apparent yet, at least with the APIs as they stand.

We might also need to support resizing bufferpools for some cases, and that is not something that is easy to support with how buffer management currently happens in PipeWire (the stream API does not really give us much of a handle on this).

Formats

In order to support the various use-cases, we want to be able to support both a fixed format (if we know what we are providing), or a negotiated format (if we can adapt in the GStreamer pipeline based on what PipeWire has/wants).

There is also a large surface area of formats that PipeWire supports that we need to make sure we support well:

  • There are known issues with some planar video formats being presented correctly from pipewiresrc
  • We do not expose planar audio formats, although both GStreamer and PipeWire support them
  • Support for DSD and passthrough audio (e.g. Dolby/DTS over HDMI) needs to be wired up
  • Support for compressed formats (we added PipeWire support for decode + render on a DSP)

Rate matching

While Wim recently added some rate matching code to pipewiresink, there is work to be done to make sure that if there is skew between the GStreamer pipeline’s data rate and the audio device rate, we can use PipeWire’s rate adaptation features to compensate for such skew. This should work in both pipewiresink and pipewiresrc.

For some background on this topic, check out my talk on clock rate matching from a couple of months ago.

Device provider conflicts

While we are improving the out-of-the-box experience of these elements, unfortunately the PipeWire device provider currently supersedes all others (the GStreamer Device Provider API allows for discovering devices on the system and the elements used to access them).

The higher rank might make sense for video (as system integrators likely want to start preferring PipeWire elements over V4L2), but it can lead to a bad experience for audio (the PulseAudio elements work better today via PipeWire’s PulseAudio emulation layer).

We might temporarily drop the rank of PipeWire elements for audio to avoid autoplugging them while we fix the problems we have.

Probing formats

We create a “probe” stream in pipewiresink while getting ready to play audio, in order to discover what formats the device we would play to supports. This is required to detect supported formats and make decisions about whether to decode in GStreamer, what sample rate and format are preferred, etc.

Unfortunately, that also causes a “false” playback device startup sequence, which might manifest as a click or glitch on some hardware. Having a way to set up a probe that does not actually open the device would be a nice improvement.

Player state

There are a couple of areas where policy actions do not always surface well to the application/UI layer. One instance of this is where a stream is “corked” (maybe because only one media player should be active at a time) – we want to let the player know it has been paused, so it can update its state and let the UI know too. There is limited infrastructure for this already, via GST_MESSAGE_REQUEST_STATE.

Also, more of a session management (i.e. WirePlumber / system integration) thing, we do not really have the concept of a system-wide media player state. This would be useful if we want to exercise policy like “don’t let any media player play while we’re on a call”, and have that state be consistent across UI interactions (i.e. hitting play during a call does not trigger playback, maybe even lets the user know why it’s not working / provides an override).

GStreamer 1.24.10 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes.

It should be safe to update from 1.24.x, and we would recommend you update at your earliest convenience.

Highlighted bugfixes:

  • More than 40 security fixes across a wide range of elements following an audit by the GitHub Security Lab, including the MP4, Matroska, Ogg and WAV demuxers, subtitle parsers, image decoders, audio decoders and the id3v2 tag parser
  • avviddec: Fix regression that could trigger assertions about width/height mismatches
  • appsink and appsrc fixes
  • closed caption handling fixes
  • decodebin3 and urisourcebin fixes
  • glupload: dmabuf: Fix emulated tiled import
  • level: fix LevelMeta values outside of the stated range
  • mpegtsmux, flvmux: fix potential busy looping with high cpu usage in live mode
  • pipeline dot file graph generation improvements
  • qt(6): fix criticals with multiple qml(6)gl{src,sink}
  • rtspsrc: Optionally timestamp RTP packets with their receive times in TCP/HTTP mode to enable clock drift handling
  • splitmuxsrc: reduce number of file descriptors used
  • systemclock: locking order fixes
  • v4l2: fix possible v4l2videodec deadlock on shutdown; 8-bit bayer format fixes
  • x265: Fix build with libx265 version >= 4.1 after masteringDisplayColorVolume API change
  • macOS: fix rendering artifacts in retina displays, plus ptp clock fixes
  • cargo: Default to thin lto for the release profile (for faster builds with lower memory requirements)
  • Various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements
  • Translation updates

See the GStreamer 1.24.10 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Pied 0.3 Released

From Michael Sheldon's Stuff by Michael Sheldon

Michael Sheldon

I’ve just released version 0.3 of Pied, my tool for making advanced text-to-speech voices work easily on Linux.

A screenshot of Pied, showing a list of voices that can be downloaded and used for speech synthesis.

This release adds support for models with multiple sub-voices, providing access to hundreds of new voices. It also adds different qualities of model, fixes previews whilst downloading and fixes audio output detection when using sox.

It’s available in the Snap Store or as a FlatPak

GStreamer Conference 2024

From Arun Raghavan by Arun Raghavan

All of us at Asymptotic are back home from the exciting week at GStreamer Conference 2024 in Montréal, Canada last month. It was great to hang out with the community and see all the great work going on in the GStreamer ecosystem.

Montréal sunsets are 😍

There were some visa-related adventures leading up to the conference, but thanks to the organising team (shoutout to Mark Filion and Tim-Philipp Müller), everything was sorted out in time and Sanchayan and Taruntej were able to make it.

This conference was also special because this year marks the 25th anniversary of the GStreamer project!

Happy birthday to us! 🎉

Talks

We had 4 talks at the conference this year.

GStreamer & QUIC (video)

Sancyahan speaking about GStreamer and QUIC

Sanchayan spoke about his work with the various QUIC elements in GStreamer. We already have the quinnquicsrc and quinquicsink upstream, with a couple of plugins to allow (de)multiplexing of raw streams as well as an implementation or RTP-over-QUIC (RoQ). We’ve also started work on Media-over-QUIC (MoQ) elements.

This has been a fun challenge for us, as we’re looking to build out a general-purpose toolkit for building QUIC application-layer protocols in GStreamer. Watch this space for more updates as we build out more functionality, especially around MoQ.

Clock Rate Matching in GStreamer & PipeWire (video)

Arun speaking about PipeWire delay-locked loops
Photo credit: Francisco

My talk was about an interesting corner of GStreamer, namely clock rate matching. This is a part of live pipelines that is often taken for granted, so I wanted to give folks a peek under the hood.

The idea of doing this talk was was born out of some recent work we did to allow splitting up the graph clock in PipeWire from the PTP clock when sending AES67 streams on the network. I found the contrast between the PipeWire and GStreamer approaches thought-provoking, and wanted to share that with the community.

GStreamer for Real-Time Audio on Windows (video)

Next, Taruntej dove into how we optimised our usage of GStreamer in a real-time audio application on Windows. We had some pretty tight performance requirements for this project, and Taruntej spent a lot of time profiling and tuning the pipeline to meet them. He shared some of the lessons learned and the tools he used to get there.

Simplifying HLS playlist generation in GStreamer (video)

Sanchayan also walked us through the work he’s been doing to simplify HLS (HTTP Live Streaming) multivariant playlist generation. This should be a nice feature to round out GStreamer’s already strong support for generating HLS streams. We are also exploring the possibility of reusing the same code for generating DASH (Dynamic Adaptive Streaming over HTTP) manifests.

Hackfest

As usual, the conference was followed by a two-day hackfest. We worked on a few interesting problems:

  • Sanchayan addressed some feedback on the QUIC muxer elements, and then investigated extending the HLS elements for SCTE-35 marker insertion and DASH support

  • Taruntej worked on improvements to the threadshare elements, specifically to bring some ts-udpsrc element features in line with udpsrc

  • I spent some time reviewing a long-pending merge request to add soft-seeking support to the AWS S3 sink (so that it might be possible to upload seekable MP4s, for example, directly to S3). I also had a very productive conversation with George Kiagiadakis about how we should improve the PipeWire GStreamer elements (more on this soon!)

All in all, it was a great time, and I’m looking forward to the spring hackfest and conference in the the latter part next year!

GStreamer 1.24.9 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and a security fix and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • gst-rtsp-server security fix
  • GstAggregator start time selection and latency query fixes for force-live mode
  • audioconvert: fix dynamic handling of mix matrix, and accept custom upstream event for setting one
  • encodebin: fix parser selection for encoders that support multiple codecs
  • flvmux improvments for pipelines where timestamps don't start at 0
  • glcontext: egl: Unrestrict the support base DRM formats
  • kms: Add IMX-DCSS auto-detection in sink and fix stride with planar formats in allocator
  • macOS main application event loop fixes
  • mpegtsdemux: Handle PTS/DTS wraparound with ignore-pcr=true
  • playbin3, decodebin3, parsebin, urisourcebin: fix races, and improve stability and stream-collection handling
  • rtpmanager: fix early RTCP SR generation for sparse streams like metadata
  • qml6glsrc: Reduce capture delay
  • qtdemux: fix parsing of rotation matrix with 180 degree rotation
  • rtpav1depay: added wait-for-keyframe and request-keyframe properties
  • srt: make work with newer libsrt versions and don't re-connect on authentication failure
  • v4l2 fixes and improvement
  • webrtcsink, webrtcbin and whepsrc fixes
  • cerbero: fix Python 3.13 compatibility, g-i with newer setuptools, bootstrap on Arch Linux; iOS build fixes
  • Ship qroverlay plugin in binary packages
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.9 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

wireless_status kernel sysfs API

From /bɑs ˈtjɛ̃ no ˈse ʁɑ/ (hadess) | News by Bastien Nocera

(I worked on this feature last year, before being moved off desktop related projects, but I never saw it documented anywhere other than in the original commit messages, so here's the opportunity to shine a little light on a feature that could probably see more use)

    The new usb_set_wireless_status() driver API function can be used by drivers of USB devices to export whether the wireless device associated with that USB dongle is turned on or not.

    To quote the commit message:

This will be used by user-space OS components to determine whether the
battery-powered part of the device is wirelessly connected or not,
allowing, for example:
- upower to hide the battery for devices where the device is turned off
  but the receiver plugged in, rather than showing 0%, or other values
  that could be confusing to users
- Pipewire to hide a headset from the list of possible inputs or outputs
  or route audio appropriately if the headset is suddenly turned off, or
  turned on
- libinput to determine whether a keyboard or mouse is present when its
  receiver is plugged in.
This is not an attribute that is meant to replace protocol specific
APIs [...] but solely for wireless devices with
an ad-hoc “lose it and your device is e-waste” receiver dongle.
 

    Currently, the only 2 drivers to use this are the ones for the Logitech G935 headset, and the Steelseries Arctis 1 headset. Adding support for other Logitech headsets would be possible if they export battery information (the protocols are usually well documented), support for more Steelseries headsets should be feasible if the protocol has already been reverse-engineered.

    As far as consumers for this sysfs attribute, I filed a bug against Pipewire (link) to use it to not consider the receiver dongle as good as unplugged if the headset is turned off, which would avoid audio being sent to headsets that won't hear it.

    UPower supports this feature since version 1.90.1 (although it had a bug that makes 1.90.2 the first viable release to include it), and batteries will appear and disappear when the device is turned on/off.

A turned-on headset

GStreamer Conference 2024

From Herostratus’ legacy by Víctor Jáquez

Early this month I spent a couple weeks in Montreal, visiting the city, but mostly to attend the GStreamer Conference and the following hackfest, which happened to be co-located with the XDC Conference. It was my first time in Canada and I utterly enjoyed it. Thanks to all those from the GStreamer community that organized and attended the event.

For now you can replay the whole streams of both days of conference, but soon the split videos for each talk will be published:

GStreamer Conference 2024 - Day 1, Room 1 - October 7, 2024

https://www.youtube.com/watch?v=KLUL1D53VQI


GStreamer Conference 2024 - Day 1, Room 2 - October 7, 2024

https://www.youtube.com/watch?v=DH64D_6gc80


GStreamer Conference 2024 - Day 2, Room 2 - October 8, 2024

https://www.youtube.com/watch?v=jt6KyV757Dk


GStreamer Conference 2024 - Day 2, Room 1 - October 8, 2024

https://www.youtube.com/watch?v=W4Pjtg0DfIo


And a couple pictures of igalians :)

GStreamer Skia plugin
GStreamer Skia plugin

GStreamer VA
GStreamer VA

GstWebRTC
GstWebRTC

GstVulkan
GstVulkan

GStreamer Editing Services
gstreamer editing services

Adding buffering hysteresis to the WebKit GStreamer video player

From GStreamer – Happy coding by Enrique Ocaña González

The <video> element implementation in WebKit does its job by using a multiplatform player that relies on a platform-specific implementation. In the specific case of glib platforms, which base their multimedia on GStreamer, that’s MediaPlayerPrivateGStreamer.

WebKit GStreamer regular playback class diagram

The player private can have 3 buffering modes:

  • On-disk buffering: This is the typical mode on desktop systems, but is frequently disabled on purpose on embedded devices to avoid wearing out their flash storage memories. All the video content is downloaded to disk, and the buffering percentage refers to the total size of the video. A GstDownloader element is present in the pipeline in this case. Buffering level monitoring is done by polling the pipeline every second, using the fillTimerFired() method.
  • In-memory buffering: This is the typical mode on embedded systems and on desktop systems in case of streamed (live) content. The video is downloaded progressively and only the part of it ahead of the current playback time is buffered. A GstQueue2 element is present in the pipeline in this case. Buffering level monitoring is done by listening to GST_MESSAGE_BUFFERING bus messages and using the buffering level stored on them. This is the case that motivates the refactoring described in this blog post, what we actually wanted to correct in Broadcom platforms, and what motivated the addition of hysteresis working on all the platforms.
  • Local files: Files, MediaStream sources and other special origins of video don’t do buffering at all (no GstDownloadBuffering nor GstQueue2 element is present on the pipeline). They work like the on-disk buffering mode in the sense that fillTimerFired() is used, but the reported level is relative, much like in the streaming case. In the initial version of the refactoring I was unaware of this third case, and only realized about it when tests triggered the assert that I added to ensure that the on-disk buffering method was working in GST_BUFFERING_DOWNLOAD mode.

The current implementation (actually, its wpe-2.38 version) was showing some buffering problems on some Broadcom platforms when doing in-memory buffering. The buffering levels monitored by MediaPlayerPrivateGStreamer weren’t accurate because the Nexus multimedia subsystem used on Broadcom platforms was doing its own internal buffering. Data wasn’t being accumulated in the GstQueue2 element of playbin, because BrcmAudFilter/BrcmVidFilter was accepting all the buffers that the queue could provide. Because of that, the player private buffering logic was erratic, leading to many transitions between “buffer completely empty” and “buffer completely full”. This, it turn, caused many transitions between the HaveEnoughData, HaveFutureData and HaveCurrentData readyStates in the player, leading to frequent pauses and unpauses on Broadcom platforms.

So, one of the first thing I tried to solve this issue was to ask the Nexus PlayPump (the subsystem in charge of internal buffering in Nexus) about its internal levels, and add that to the levels reported by GstQueue2. There’s also a GstMultiqueue in the pipeline that can hold a significant amount of buffers, so I also asked it for its level. Still, the buffering level unstability was too high, so I added a moving average implementation to try to smooth it.

All these tweaks only make sense on Broadcom platforms, so they were guarded by ifdefs in a first version of the patch. Later, I migrated those dirty ifdefs to the new quirks abstraction added by Phil. A challenge of this migration was that I needed to store some attributes that were considered part of MediaPlayerPrivateGStreamer before. They still had to be somehow linked to the player private but only accessible by the platform specific code of the quirks. A special HashMap attribute stores those quirks attributes in an opaque way, so that only the specific quirk they belong to knows how to interpret them (using downcasting). I tried to use move semantics when storing the data, but was bitten by object slicing when trying to move instances of the superclass. In the end, moving the responsibility of creating the unique_ptr that stored the concrete subclass to the caller did the trick.

Even with all those changes, undesirable swings in the buffering level kept happening, and when doing a careful analysis of the causes I noticed that the monitoring of the buffering level was being done from different places (in different moments) and sometimes the level was regarded as “enough” and the moment right after, as “insufficient”. This was because the buffering level threshold was one single value. That’s something that a hysteresis mechanism (with low and high watermarks) can solve. So, a logical level change to “full” would only happen when the level goes above the high watermark, and a logical level change to “low” when it goes under the low watermark level.

For the threshold change detection to work, we need to know the previous buffering level. There’s a problem, though: the current code checked the levels from several scattered places, so only one of those places (the first one that detected the threshold crossing at a given moment) would properly react. The other places would miss the detection and operate improperly, because the “previous buffering level value” had been overwritten with the new one when the evaluation had been done before. To solve this, I centralized the detection in a single place “per cycle” (in updateBufferingStatus()), and then used the detection conclusions from updateStates().

So, with all this in mind, I refactored the buffering logic as https://commits.webkit.org/284072@main, so now WebKit GStreamer has a buffering code much more robust than before. The unstabilities observed in Broadcom devices were gone and I could, at last, close Issue 1309.

A body of work of an atypical programmer

From Planet – Felipe Contreras by Felipe Contreras

Examples of how my atypical way of thinking has often been objectively proven right.

preliminary notes on a nofl field-logging barrier

From wingolog by Andy Wingo

Andy Wingo

When you have a generational collector, you aim to trace only the part of the object graph that has been allocated recently. To do so, you need to keep a remembered set: a set of old-to-new edges, used as roots when performing a minor collection. A language run-time maintains this set by adding write barriers: little bits of collector code that run when a mutator writes to a field.

Whippet’s nofl space is a block-structured space that is appropriate for use as an old generation or as part of a sticky-mark-bit generational collector. It used to have a card-marking write barrier; see my article diving into V8’s new write barrier, for more background.

Unfortunately, when running whiffle benchmarks, I was seeing no improvement for generational configurations relative to whole-heap collection. Generational collection was doing fine in my tiny microbenchmarks that are part of Whippet itself, but when translated to larger programs (that aren’t yet proper macrobenchmarks), it was a lose.

I had planned on doing some serious tracing and instrumentation to figure out what was happening, and thereby correct the problem. I still plan on doing this, but instead for this issue I used the old noggin technique instead: just, you know, thinking about the thing, eventually concluding that unconditional card-marking barriers are inappropriate for sticky-mark-bit collectors. As I mentioned in the earlier article:

An unconditional card-marking barrier applies to stores to slots in all objects, not just those in oldspace; a store to a new object will mark a card, but that card may contain old objects which would then be re-scanned. Or consider a store to an old object in a more dense part of oldspace; scanning the card may incur more work than needed. It could also be that Whippet is being too aggressive at re-using blocks for new allocations, where it should be limiting itself to blocks that are very sparsely populated with old objects.

That’s three problems. The second is well-known. But the first and last are specific to sticky-mark-bit collectors, where pages mix old and new objects.

a precise field-logging write barrier

Back in 2019, Steve Blackburn’s paper Design and Analysis of Field-Logging Write Barriers took a look at the state of the art in precise barriers that record not regions of memory that have been updated, but the precise edges (fields) that were written to. He ends up re-using this work later in the 2022 LXR paper (see §3.4), where the write barrier is used for deferred reference counting and a snapshot-at-the-beginning (SATB) barrier for concurrent marking. All in all field-logging seems like an interesting strategy. Relative to card-marking, work during the pause is much less: you have a precise buffer of all fields that were written to, and you just iterate that, instead of iterating objects. Field-logging does impose some mutator cost, but perhaps the payoff is worth it.

To log each old-to-new edge precisely once, you need a bit per field indicating whether the field is logged already. Blackburn’s 2019 write barrier paper used bits in the object header, if the object was small enough, and otherwise bits before the object start. This requires some cooperation between the collector, the compiler, and the run-time that I wasn’t ready to pay for. The 2022 LXR paper was a bit vague on this topic, saying just that it used “a side table”.

In Whippet’s nofl space, we have a side table already, used for a number of purposes:

  1. Mark bits.

  2. Iterability / interior pointers: is there an object at a given address? If so, it will have a recognizable bit pattern.

  3. End of object, to be able to sweep without inspecting the object itself

  4. Pinning, allowing a mutator to prevent an object from being evacuated, for example because a hash code was computed from its address

  5. A hack to allow fully-conservative tracing to identify ephemerons at trace-time; this re-uses the pinning bit, since in practice such configurations never evacuate

  6. Bump-pointer allocation into holes: the mark byte table serves the purpose of Immix’s line mark byte table, but at finer granularity. Because of this though, it is swept lazily rather than eagerly.

  7. Generations. Young objects have a bit set that is cleared when they are promoted.

Well. Why not add another thing? The nofl space’s granule size is two words, so we can use two bits of the byte for field logging bits. If there is a write to a field, a barrier would first check that the object being written to is old, and then check the log bit for the field being written. The old check will be to a byte that is nearby or possibly the same as the one to check the field logging bit. If the bit is unsert, we call out to a slow path to actually record the field.

preliminary results

I disassembled the fast path as compiled by GCC and got something like this on x86-64, in AT&T syntax, for the young-generation test:

mov    %rax,%rdx
and    $0xffffffffffc00000,%rdx
shr    $0x4,%rax
and    $0x3ffff,%eax
or     %rdx,%rax
testb  $0xe,(%rax)

The first five instructions compute the location of the mark byte, from the address of the object (which is known to be in the nofl space). If it has any of the bits in 0xe set, then it’s in the old generation.

Then to test a field logging bit it’s a similar set of instructions. In one of my tests the data type looks like this:

struct Node {
  uintptr_t tag;
  struct Node *left;
  struct Node *right;
  int i, j;
};

Writing the left field will be in the same granule as the object itself, so we can just test the byte we fetched for the logging bit directly with testb against $0x80. For right, we should be able to know it’s in the same slab (aligned 4 MB region) and just add to the previously computed byte address, but the C compiler doesn’t know that right now and so recomputes. This would work better in a JIT. Anyway I think these bit-swizzling operations are just lost in the flow of memory accesses.

For the general case where you don’t statically know the offset of the field in the object, you have to compute which bit in the byte to test:

mov    %r13,%rcx
mov    $0x40,%eax
shr    $0x3,%rcx
and    $0x1,%ecx
shl    %cl,%eax
test   %al,%dil

Is it good? Well, it improves things for my whiffle benchmarks, relative to the card-marking barrier, seeing a 1.05×-1.5× speedup across a range of benchmarks. I suspect the main advantage is in avoiding the “unconditional” part of card marking, where a write to a new object could cause old objects to be added to the remembered set. There are still quite a few whiffle configurations in which the whole-heap collector outperforms the sticky-mark-bit generational collector, though; I hope to understand this a bit more by building a more classic semi-space nursery, and comparing performance to that.

Implementation links: the barrier fast-path, the slow path, and the sequential store buffers. (At some point I need to make it so that allocating edge buffers in the field set causes the nofl space to page out a corresponding amount of memory, so as to be honest when comparing GC performance at a fixed heap size.)

Until next time, onwards and upwards!

GStreamer Conference 2024: Full Schedule, Talk Abstracts and Speakers Biographies now available

From GStreamer News by GStreamer

GStreamer

The GStreamer Conference team is pleased to announce that the full conference schedule including talk abstracts and speaker biographies is now available for this year's lineup of talks and speakers, covering again an exciting range of topics!

The GStreamer Conference 2024 will take place on 7-8 October 2024 in Montréal, Canada, followed by a hackfest.

Details about the conference, hackfest and how to register can be found on the conference website.

This year's topics and speakers:

Lightning Talks:

Many thanks to our amazing sponsors ‒ Platinum sponsors Collabora, Igalia, and Pexip, Gold sponsors Centricular, La Société des Arts Technologiques, Axis Communications, and Genius Sports, and Silver sponsors Laerdal Labs, asymptotic, Cablecast, and Fluendo, without whom the conference would not be possible in this form.

We hope to see you all in Montréal! Don't forget to register as soon as possible if you're planning on joining us, so we can order enough food and drinks!

needed-bits optimizations in guile

From wingolog by Andy Wingo

Andy Wingo

Hey all, I had a fun bug this week and want to share it with you.

numbers and representations

First, though, some background. Guile’s numeric operations are defined over the complex numbers, not over e.g. a finite field of integers. This is generally great when writing an algorithm, because you don’t have to think about how the computer will actually represent the numbers you are working on.

In practice, Guile will represent a small exact integer as a fixnum, which is a machine word with a low-bit tag. If an integer doesn’t fit in a word (minus space for the tag), it is represented as a heap-allocated bignum. But sometimes the compiler can realize that e.g. the operands to a specific bitwise-and operation are within (say) the 64-bit range of unsigned integers, and so therefore we can use unboxed operations instead of the more generic functions that do run-time dispatch on the operand types, and which might perform heap allocation.

Unboxing is important for speed. It’s also tricky: under what circumstances can we do it? In the example above, there is information that flows from defs to uses: the operands of logand are known to be exact integers in a certain range and the operation itself is closed over its domain, so we can unbox.

But there is another case in which we can unbox, in which information flows backwards, from uses to defs: if we see (logand n #xff), we know:

  • the result will be in [0, 255]

  • that n will be an exact integer (or an exception will be thrown)

  • we are only interested in a subset of n‘s bits.

Together, these observations let us transform the more general logand to an unboxed operation, having first truncated n to a u64. And actually, the information can flow from use to def: if we know that n will be an exact integer but don’t know its range, we can transform the potentially heap-allocating computation that produces n to instead truncate its result to the u64 range where it is defined, instead of just truncating at the use; and potentially this information could travel farther up the dominator tree, to inputs of the operation that defines n, their inputs, and so on.

needed-bits: the |0 of scheme

Let’s say we have a numerical operation that produces an exact integer, but we don’t know the range. We could truncate the result to a u64 and use unboxed operations, if and only if only u64 bits are used. So we need to compute, for each variable in a program, what bits are needed from it.

I think this is generally known a needed-bits analysis, though both Google and my textbooks are failing me at the moment; perhaps this is because dynamic languages and flow analysis don’t get so much attention these days. Anyway, the analysis can be local (within a basic block), global (all blocks in a function), or interprocedural (larger than a function). Guile’s is global. Each CPS/SSA variable in the function starts as needing 0 bits. We then compute the fixpoint of visiting each term in the function; if a term causes a variable to flow out of the function, for example via return or call, the variable is recorded as needing all bits, as is also the case if the variable is an operand to some primcall that doesn’t have a specific needed-bits analyser.

Currently, only logand has a needed-bits analyser, and this is because sometimes you want to do modular arithmetic, for example in a hash function. Consider Bon Jenkins’ lookup3 string hash function:

#define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
#define mix(a,b,c) \
{ \
  a -= c;  a ^= rot(c, 4);  c += b; \
  b -= a;  b ^= rot(a, 6);  a += c; \
  c -= b;  c ^= rot(b, 8);  b += a; \
  a -= c;  a ^= rot(c,16);  c += b; \
  b -= a;  b ^= rot(a,19);  a += c; \
  c -= b;  c ^= rot(b, 4);  b += a; \
}
...

If we transcribe this to Scheme, we get something like:

(define (jenkins-lookup3-hashword2 str)
  (define (u32 x) (logand x #xffffFFFF))
  (define (shl x n) (u32 (ash x n)))
  (define (shr x n) (ash x (- n)))
  (define (rot x n) (logior (shl x n) (shr x (- 32 n))))
  (define (add x y) (u32 (+ x y)))
  (define (sub x y) (u32 (- x y)))
  (define (xor x y) (logxor x y))

  (define (mix a b c)
    (let* ((a (sub a c)) (a (xor a (rot c 4)))  (c (add c b))
           (b (sub b a)) (b (xor b (rot a 6)))  (a (add a c))
           (c (sub c b)) (c (xor c (rot b 8)))  (b (add b a))
           ...)
      ...))

  ...

These u32 calls are like the JavaScript |0 idiom, to tell the compiler that we really just want the low 32 bits of the number, as an integer. Guile’s compiler will propagate that information down to uses of the defined values but also back up the dominator tree, resulting in unboxed arithmetic for all of these operations.

(When writing this, I got all the way here and then realized I had already written quite a bit about this, almost a decade ago ago. Oh well, consider this your lucky day, you get two scoops of prose!)

the bug

All that was just prelude. So I said that needed-bits is a fixed-point flow analysis problem. In this case, I want to compute, for each variable, what bits are needed for its definition. Because of loops, we need to keep iterating until we have found the fixed point. We use a worklist to represent the conts we need to visit.

Visiting a cont may cause the program to require more bits from the variables that cont uses. Consider:

(define-significant-bits-handler
    ((logand/immediate label types out res) param a)
  (let ((sigbits (sigbits-intersect
                   (inferred-sigbits types label a)
                   param
                   (sigbits-ref out res))))
    (intmap-add out a sigbits sigbits-union)))

This is the sigbits (needed-bits) handler for logand when one of its operands (param) is a constant and the other (a) is variable. It adds an entry for a to the analysis out, which is an intmap from variable to a bitmask of needed bits, or #f for all bits. If a already has some computed sigbits, we add to that set via sigbits-union. The interesting point comes in the sigbits-intersect call: the bits that we will need from a are first the bits that we infer a to have, by forward type-and-range analysis; intersected with the bits from the immediate param; intersected with the needed bits from the result value res.

If the intmap-add call is idempotent—i.e., out already contains sigbits for a—then out is returned as-is. So we can check for a fixed-point by comparing out with the resulting analysis, via eq?. If they are not equal, we need to add the cont that defines a to the worklist.

The bug? The bug was that we were not enqueuing the def of a, but rather the predecessors of label. This works when there are no cycles, provided we visit the worklist in post-order; and regardless, it works for many other analyses in Guile where we compute, for each labelled cont (basic block), some set of facts about all other labels or about all other variables. In that case, enqueuing a predecessor on the worklist will cause all nodes up and to including the variable’s definition to be visited, because each step adds more information (relative to the analysis computed on the previous visit). But it doesn’t work for this case, because we aren’t computing a per-label analysis.

The solution was to rewrite that particular fixed-point to enqueue labels that define a variable (possibly multiple defs, because of joins and loop back-edges), instead of just the predecessors of the use.

Et voilà ! If you got this far, bravo. Type at y’all again soon!

GStreamer 1.24.8 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • decodebin3: collection handling fixes
  • encodebin: Fix pad removal (and smart rendering in gst-editing-services)
  • glimagesink: Fix cannot resize viewport when video size changed in caps
  • matroskamux, webmmux: fix firefox compatibility issue with Opus audio streams
  • mpegtsmux: Wait for data on all pads before deciding on a best pad unless timing out
  • splitmuxsink: Override LATENCY query to pretend to downstream that we're not live
  • video: QoS event handling improvements
  • voamrwbenc: fix list of bitrates
  • vtenc: Restart encoding session when certain errors are detected
  • wayland: Fix ABI break in WL context type name
  • webrtcbin: Prevent crash when attempting to set answer on invalid SDP
  • cerbero: ship vp8/vp9 software encoders again, which went missing in 1.24.7; ship transcode plugin
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.8 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Orc 0.4.40 bug-fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a bug-fix release, with some minor follow-ups to the security fixes of the previous release.

Highlights:

  • Security: Minor follow-up fixes for CVE-2024-40897
  • Fix include header use from C++
  • orccodemem: Assorted memory mapping fixes
  • powerpc: fix div255w which still used the inexact substitution
  • powerpc: Disable VSX and ISA 2.07 for Apple targets
  • powerpc: Allow detection of ppc64 in Mac OS
  • x86: work around old GCC versions (pre 9.0) having broken xgetbv implementationsv
  • x86: consider MSYS2/Cygwin as Windows for ABI purposes only
  • x86: handle unnatural and misaligned array pointers
  • x86: Fix non-C11 typedefs
  • x86: try fixing AVX detection again by adding check for XSAVE
  • Some compatibility fixes for Musl
  • meson: Fix detecting XSAVE on older AppleClangv
  • Check return values of malloc() and realloc()

Direct tarball download: orc-0.4.40.tar.xz.

Linux proves Elon Musk is wrong about entrepreneurs

From Planet – Felipe Contreras by Felipe Contreras

I show with a simple example that Elon Musk's claim that only entrepreneurs generate wealth is completely wrong.

GStreamer and WebRTC HTTP signalling

From Arun Raghavan by Arun Raghavan

The WebRTC nerds among us will remember the first thing we learn about WebRTC, which is that it is a specification for peer-to-peer communication of media and data, but it does not specify how signalling is done.

Or put more simply, if you want call someone on the web, WebRTC tells you how you can transfer audio, video and data, but it leaves out the bit about how you make the call itself: how do you locate the person you’re calling, let them know you’d like to call them, and a few following steps before you can see and talk to each other.

WebRTC signalling
WebRTC signalling

While this allows services to provide their own mechanisms to manage how WebRTC calls work, the lack of a standard mechanism means that general-purpose applications need to individually integrate each service that they want to support. For example, GStreamer’s webrtcsrc and webrtcsink elements support various signalling protocols, including Janus Video Rooms, LiveKit, and Amazon Kinesis Video Streams.

However, having a standard way for clients to do signalling would help developers focus on their application and worry less about interoperability with different services.

Standardising Signalling

With this motivation, the IETF WebRTC Ingest Signalling over HTTPS (WISH) workgroup has been working on two specifications:

(author’s note: the puns really do write themselves :))

As the names suggest, the specifications provide a way to perform signalling using HTTP. WHIP gives us a way to send media to a server, to ingest into a WebRTC call or live stream, for example.

Conversely, WHEP gives us a way for a client to use HTTP signalling to consume a WebRTC stream – for example to create a simple web-based consumer of a WebRTC call, or tap into a live streaming pipeline.

WHIP and WHEP
WHIP and WHEP

With this view of the world, WHIP and WHEP can be used both for calling applications, but also as an alternative way to ingest or play back live streams, with lower latency and a near-ubiquitous real-time communication API.

In fact, several services already support this including Dolby Millicast, LiveKit and Cloudflare Stream.

WHIP and WHEP with GStreamer

We know GStreamer already provides developers two ways to work with WebRTC streams:

  • webrtcbin: provides a low-level API, akin to the PeerConnection API that browser-based users of WebRTC will be familiar with

  • webrtcsrc and webrtcsink: provide high-level elements that can respectively produce/consume media from/to a WebRTC endpoint

At Asymptotic, my colleagues Tarun and Sanchayan have been using these building blocks to implement GStreamer elements for both the WHIP and WHEP specifications. You can find these in the GStreamer Rust plugins repository.

Our initial implementations were based on webrtcbin, but have since been moved over to the higher-level APIs to reuse common functionality (such as automatic encoding/decoding and congestion control). Tarun covered our work in a talk at last year’s GStreamer Conference.

Today, we have 4 elements implementing WHIP and WHEP.

Clients

  • whipclientsink: This is a webrtcsink-based implementation of a WHIP client, using which you can send media to a WHIP server. For example, streaming your camera to a WHIP server is as simple as:
gst-launch-1.0 -e \
  v4l2src ! video/x-raw ! queue ! \
  whipclientsink signaller::whip-endpoint="https://my.webrtc/whip/room1"
  • whepclientsrc: This is work in progress and allows us to build player applications to connect to a WHEP server and consume media from it. The goal is to make playing a WHEP stream as simple as:
gst-launch-1.0 -e \
  whepclientsrc signaller:whep-endpoint="https://my.webrtc/whep/room1" ! \
  decodebin ! autovideosink

The client elements fit quite neatly into how we might imagine GStreamer-based clients could work. You could stream arbitrary stored or live media to a WHIP server, and play back any media a WHEP server provides. Both pipelines implicitly benefit from GStreamer’s ability to use hardware-acceleration capabilities of the platform they are running on.

GStreamer WHIP/WHEP clients
GStreamer WHIP/WHEP clients

Servers

  • whipserversrc: Allows us to create a WHIP server to which clients can connect and provide media, each of which will be exposed as GStreamer pads that can be arbitrarily routed and combined as required. We have an example server that can play all the streams being sent to it.

  • whepserversink: Finally we have ongoing work to publish arbitrary streams over WHEP for web-based clients to consume this media.

The two server elements open up a number of interesting possibilities. We can ingest arbitrary media with WHIP, and then decode and process, or forward it, depending on what the application requires. We expect that the server API will grow over time, based on the different kinds of use-cases we wish to support.

GStreamer WHIP/WHEP server
GStreamer WHIP/WHEP server

This is all pretty exciting, as we have all the pieces to create flexible pipelines for routing media between WebRTC-based endpoints without having to worry about service-specific signalling.

If you’re looking for help realising WHIP/WHEP based endpoints, or other media streaming pipelines, don’t hesitate to reach out to us!

GStreamer 1.24.7 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • Fix APE and Musepack audio file and GIF playback with FFmpeg 7.0
  • playbin3: Fix potential deadlock with multiple playbin3s with glimagesink used in parallel
  • qt6: various qmlgl6src and qmlgl6sink fixes and improvements
  • rtspsrc: expose property to force usage of non-compliant setup URLs for RTSP servers where the automatic fallback doesn't work
  • urisourcebin: gapless playback and program switching fixes
  • v4l2: various fixes
  • va: Fix potential deadlock with multiple va elements used in parallel
  • meson: option to disable gst-full for static-library build configurations that do not need this
  • cerbero: libvpx updated to 1.14.1; map 2022Server to Windows11; disable rust variant on Linux if binutils is too old
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.7 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Everyone is using modulo wrong

From Planet – Felipe Contreras by Felipe Contreras

All programmers use the modulo operator, but virtually none have stopped to consider what it actually is, and that's why one of the most powerful features of it doesn't work in most programming languages. This article touches abstract mathematics in order to explain why.

GStreamer Conference 2024: Call for Presentations is open - submit your talk proposal now!

From GStreamer News by GStreamer

GStreamer

About the GStreamer Conference

The GStreamer Conference 2024 GStreamer Conference will take place on Monday-Tuesday 7-8 October 2024 in Montréal, Québec, Canada.

It is a conference for developers, community members, decision-makers, industry partners, researchers, students and anyone else interested in the GStreamer multimedia framework or Open Source and cross-platform multimedia.

There will also be a hackfest just after the conference.

You can find more details about the conference on the GStreamer Conference 2024 web site.

Call for Presentations is open

The call for presentations is now open for talk proposals and lightning talks. Please submit your talk now!

Talks can be on almost anything multimedia related, ranging from talks about applications to challenges in the lower levels in the stack or hardware.

We want to hear from you, what you are working on, what you are using GStreamer for, what difficulties you have encountered and how you solved them, what your plans are, what you like, what you dislike, how to improve things!

Please submit all proposals through the GStreamer Conference 2024 Talk Submission Portal.

GStreamer 1.24.6 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • Fix compatibility with FFmpeg 7.0
  • qmlglsink: Fix failure to display content on recent Android devices
  • adaptivedemux: Fix handling of closed caption streams
  • cuda: Fix runtime compiler loading with old CUDA tookit
  • decodebin3 stream selection handling fixes
  • d3d11compositor, d3d12compositor: Fix transparent background mode with YUV output
  • d3d12converter: Make gamma remap work as intended
  • h264decoder: Update output frame duration for interlaced video when second field frame is discarded
  • macOS audio device provider now listens to audio devices being added/removed at runtime
  • Rust plugins: audioloudnorm, s3hlssink, gtk4paintablesink, livesync and webrtcsink fixes
  • videoaggregator: preserve features in non-alpha caps for subclasses with non-system memory sink caps
  • vtenc: Fix redistribute latency spam
  • v4l2: fixes for complex video formats
  • va: Fix strides when importing DMABUFs, dmabuf handle leaks, and blocklist unmaintained Intel i965 driver for encoding
  • waylandsink: Fix surface cropping for rotated streams
  • webrtcdsp: Enable multi_channel processing to fix handling of stereo streams
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.6 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Orc 0.4.39 bug-fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a minor bug-fix release, and also includes a security fix.

Highlights:

  • Security: Fix error message printing buffer overflow leading to possible code execution in orcc with specific input files (CVE-2024-40897). This only affects developers and CI environments using orcc, not users of liborc.
  • div255w: fix off-by-one error in the implementations
  • x86: only run AVX detection if xgetbv is available
  • x86: fix AVX detection by implementing the check recommended by Intel
  • Only enable JIT compilation on Apple arm64 if running on macOS, fixes crashes on iOS
  • Fix potential crash in emulation mode if logging is enabled
  • Handle undefined TARGET_OS_OSX correctly
  • orconce: Fix typo in GCC __sync-based implementation
  • orconce: Fix usage of __STDC_NO_ATOMICS__
  • Fix build with MSVC 17.10 + C11
  • Support stack unwinding on Windows
  • Major opcode and instruction set code clean-ups and refactoring
  • Refactor allocation and chunk initialization of code regions
  • Fall back to emulation on Linux if JIT support is not available, e.g. because of SELinux sandboxing or noexec mounting)

Direct tarball download: orc-0.4.39.tar.xz.

GStreamer Rust bindings 0.23 / Rust Plugins 0.13 release

From GStreamer News by GStreamer

GStreamer

GStreamer Rust bindingsGStreamer Rust plugins

As usual this release follows the latest gtk-rs 0.20 release and the corresponding API changes.

This release features relatively few changes and mostly contains the addition of some convenience APIs, the addition of bindings for some minor APIs, addition of bindings for new GStreamer 1.26 APIs, and various optimizations.

The new release also brings a lot of bugfixes, most of which were already part of the bugfix releases of the previous release series.

Details can be found in the release notes for gstreamer-rs.

The code and documentation for the bindings is available on the freedesktop.org GitLab

as well as on crates.io.

The new 0.13 version of the GStreamer Rust plugins features many improvements to the existing plugins as well as various new plugins. A majority of the changes were already backported to the 0.12 release series and its bugfix releases, which is part of the GStreamer 1.24 binaries.

A full list of available plugins can be seen in the repository's README.md.

Details for this release can be found in the release notes for gst-plugins-rs.

If you find any bugs, notice any missing features or other issues please report them in GitLab for the bindings or the plugins.

scikit-survival 0.23.0 released

From Posts | Sebastian Pölsterl by Sebastian Pölsterl

I am pleased to announce the release of scikit-survival 0.23.0.

This release adds support for scikit-learn 1.4 and 1.5, which includes missing value support for RandomSurvivalForest. For more details on missing values support, see the section in the release announcement for 0.23.0.

Moreover, this release fixes critical bugs. When fitting SurvivalTree, the sample_weight is now correctly considered when computing the log-rank statistic for each split. This change also affects RandomSurvivalForest and ExtraSurvivalTrees which pass sample_weight to the individual trees in the ensemble. Therefore, the outputs produced by SurvivalTree, RandomSurvivalForest, and ExtraSurvivalTrees will differ from previous releases.

This release fixes a bug in ComponentwiseGradientBoostingSurvivalAnalysis and GradientBoostingSurvivalAnalysis when dropout is used. Previously, dropout was only applied starting with the third iteration, now dropout is applied in the second iteration too.

Finally, this release adds compatibility with numpy 2.0 and drops support for Python 3.8.

Install

scikit-survival is available for Linux, macOS, and Windows and can be installed either

via pip:

pip install scikit-survival

or via conda

 conda install -c conda-forge scikit-survival

GStreamer 1.24.5 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • webrtcsink: Support for AV1 via nvav1enc, av1enc or rav1enc encoders
  • AV1 RTP payloader/depayloader fixes to work correctly with Chrome and Pion WebRTC
  • av1parse, av1dec error handling/robustness improvements
  • av1enc: Handle force-keyunit events properly for WebRTC
  • decodebin3: selection and collection handling improvements
  • hlsdemux2: Various fixes for discontinuities, variant switching, playlist updates
  • qml6glsink: fix RGB format support
  • rtspsrc: more control URL handling fixes
  • v4l2src: Interpret V4L2 report of sync loss as video signal loss
  • d3d12 encoder, memory and videosink fixes
  • vtdec: more robust error handling, fix regression
  • ndi: support for NDI SDK v6
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.5 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Fedora Workstation development update – Artificial Intelligence edition

From Christian F.K. Schaller by Christian Schaller

Christian Schaller

There are times when you feel your making no progress and there are other times when things feel like they are landing in quick succession. Luckily this definitely is the second when a lot of our long term efforts are finally coming over the finish line. As many of you probably know our priorities tend to be driven by a combination of what our RHEL Workstation customers need, what our hardware partners are doing and what is needed for Fedora Workstation to succeed. We also try to be good upstream partners and do patch reviews and participate where we can in working on upstream standards, especially those of course of concern to our RHEL Workstation and Server users. So when all those things align we are at our most productive and that seems to be what is happening now. Everything below is features in flight that will at the latest land in Fedora Workstation 41.

Artificial Intelligence

Granite LLM

IBM Granite LLM models usher in a new era of open source AI.


One of the areas of great importance to Red Hat currently is working on enabling our customers and users to take advantage of the advances in Artificial Intelligence. We do this in a lot of interesting ways like our recently announced work with IBM to release the high quality Granite AI models under terms that make them the most open major vendor AI models according to the Stanford Foundation Model Transparency Index , but not only are we releasing the full LLM source code, we are also creating a project to make modifying and teaching the LLM a lot easier through a project we call Instructlab. Instructlab is enabling almost anyone to quickly download a Granite LLM model and start teaching it specific things relevant to you or your organization. This put you in control of the AI and what it knows and can do as opposed to being demoted to a pure consumer.

And it is not just Granite, we are ensuring other other major AI projects will work with Fedora too, like Meta’s popular Llama LLM. And a big step for that is how Tom Rix has been working on bringing in AMD accelerated support (ROCm) for PyTorch to Fedora. PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. The long term goal is that you should be able to just install PyTorch on Fedora and have it work hardware accelerated with any of the 3 major GPU vendors chipsets.

NVIDIA in Fedora
Fedora Workstation
So the clear market leader at the moment for powering AI workloads in NVIDIA so I am also happy to let you know about two updates we are working on that will make you life better on Fedora when using NVIDIA GPUs, be that for graphics or for compute or Artificial Intelligence. So for the longest time we have had easy install of the NVIDIA driver through GNOME Software in Fedora Workstation, unfortunately this setup never dealt with what is now the default usecase, which is using it with a system that has secure boot enabled. So the driver install was dropped from GNOME Software in our most recent release as the only way for people to get it working was through using mokutils on the command line, but the UI didn’t tell you that. Well we of course realize that sending people back to the command line to get this driver installed is highly unfortunate so Milan Crha has been working together with Alan Day and Jakub Steiner to come up with a streamlined user experience in GNOME Software to let you install the binary NVIDIA driver and provide you with an integrated graphical user interface help to sign the kernel module for use with secure boot. This is a bit different than what we for instance are doing in RHEL, where we are working with NVIDIA to provide pre-signed kernel modules, but that is a lot harder to do in Fedora due to the rapidly updating kernel versions and which most Fedora users appreciate as a big plus. So instead what we are for opting in Fedora is as I said to make it simple for you to self-sign the kernel module for use with secure boot. We are currently looking at when we can make this feature available, but no later than Fedora Workstation 41 for sure.

Toolbx getting top notch NVIDIA integration

Toolbx

Container Toolbx enables developers quick and easy access to their favorite development platforms


Toolbx, our incredible developer focused containers tool, is going from strength to strength these days with the rewrite from the old shell scripts to Go starting to pay dividends. The next major feature that we are closing in on is full NVIDIA driver support with Toolbx. As most of you know Toolbx is our developer container solution which makes it super simple to set up development containers with specific versions of Fedora or RHEL or many other distributions. Debarshi Ray has been working on implementing support for the official NVIDIA container device interface module which should enable us to provide full NVIDIA and CUDA support for Toolbx containers. This should provide reliable NVIDIA driver support going forward and Debarshi is currently testing various AI related container images to ensure they run smoothly on the new setup.

We are also hoping the packaging fixes to subscription manager will land soon as that will make using RHEL containers on Fedora a lot smoother. While this feature basically already works as outlined here we do hope to make it even more streamlined going forward.

Open Source NVIDIA support
Of course being Red Hat we haven’t forgotten about open source here, you probably heard about Nova our new Rust based upstream kernel driver for NVIDIA hardware which will provided optimized support for the hardware supported by NVIDIAs firmware (basically all newer ones) and accelerate Vulkan through the NVK module and provide OpenGL through Zink. That effort is still quite early days, but there is some really cool developments happening around Nova that I am not at liberty to share yet, but I hope to be able to talk about those soon.

High Dynamic Range (HDR)
Jonas Ådahl after completing the remote access work for GNOME under Wayland has moved his focus to help land the HDR support in mutter and GNOME Shell. He recently finished rebasing his HDR patches onto a wip merge request from
Georges Stavracas which ported gnome-shell to using paint nodes,

So the HDR enablement in mutter and GNOME shell is now a set of 3 patches.

With this the work is mostly done, what is left is avoiding over exposure of the cursor, and inhibiting direct scanout.

We also hope to help finalize the upstream Wayland specs soon so that everyone can implement this and know the protocols are stable and final.

DRM leasing – VR Headsets
VR Googles
The most common usecase for DRM leasing is VR headsets, but it is also a useful feature for things like video walls. José Expósito is working on finalizing a patch for it using the Wayland protocol adopted by KDE and others. We where somewhat hesitant to go down this route as we felt a portal would have been a better approach, especially as a lot of our experienced X.org developers are worried that Wayland is in the process of replicating one of the core issues with X through the unmanageable plethora of Wayland protocols that is being pushed. That said, the DRM leasing stuff was not a hill worth dying on here, getting this feature out to our users in a way they could quickly use was more critical, so DRM leasing will land soon through this merge request.

Explicit sync
Another effort that we have put a lot of effort into together with our colleagues at NVIDIA is landing support for what is called explicit sync into the Linux kernel and the graphics drivers.The linux graphics stack was up to this point using something called implicit sync, but the NVIDIA drivers did not work well with that and thus people where experiencing ‘blinking’ applications under Wayland. So we worked with NVIDIA and have landed the basic support in the kernel and in GNOME and thus once the 555 release of the NVIDIA driver is out we hope the ‘blinking’ issues are fully resolved for your display. There has been some online discussion about potential performance gains from this change too, across all graphics drivers, but the reality of this is somewhat uncertain or at least it is still unclear if there will be real world measurable gains from adding explicit sync. I heard knowledgeable people argue both sides with some saying there should be visible performance gains while others say the potential gains will be so specific that unless you write a test to benchmark it explicitly you will not be able to detect a difference. But what is beyond doubt is that this will make using the NVIDIA stack with Wayland a lot better a that is a worthwhile goal in itself. The one item we are still working on is integrating the PipeWire support for explicit sync into our stack, because without it you might have the same flickering issues with PipeWire streams on top of the NVIDIA driver that you have up to now seen on your screen. So for instance if you are using PipeWire for screen capture it might look fine on screen with the fixes already merged, but the captured video has flickering. Wim Taymans landed some initial support in PipeWire already so now Michel Dänzer is working on implementing the needed bits for PipeWire in mutter. At the same time Wim is working on ensuring we have a testing client available to verify the compositor support. Once everything has landed in mutter and we been able to verify that it works with the sample client we will need to add support to client applications interacting with PipeWire, like Firefox, Chrome, OBS Studio and GNOME-remote-desktop.

GStreamer Vulkan Operation API

From Herostratus’ legacy by Víctor Jáquez

Two weeks ago the GStreamer Spring Hackfest took place in Thessaloniki, Greece. I had a great time. I hacked a bit on VA, Vulkan and my toy, planet-rs, but mostly I ate delicious Greek food ☻. A big thanks to our hosts: Vivia, Jordan and Sebastian!

GStreamer Spring Hackfest
2024
First day of the GStreamer Spring Hackfest 2024 - https://floss.social/@gstreamer/112511912596084571

And now, writing this supposed small note, I recalled that I have in my to-do list an item to write a comment about GstVulkanOperation, an addition to GstVulkan API which helps with the synchronization of operations on frames, in order to enable Vulkan Video.

Originally, GstVulkan API didn’t provide almost any synchronization operation, beside fences, and that appeared to be enough for elements, since they do simple Vulkan operations. Nonetheless, as soon as we enabled VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT feature, which reports resource access conflicts due to missing or incorrect synchronization operations between action [*], a sea of hazard operation warnings drowned us [*].

Hazard operations are a sequence of read/write commands in a memory area, such as an image, that might be re-ordered, or racy even.

Why are those hazard operations reported by the Vulkan Validation Layer, if the programmer pushes the commands to execute in queue in order? Why is explicit synchronization required? Because, as the great blog post from Hans-Kristian Arntzen, Yet another blog explaining Vulkan synchronization, (make sure you read it!) states:

[…] all commands in a queue execute out of order. Reordering may happen across command buffers and even vkQueueSubmits

In order to explain how synchronization is done in Vulkan, allow me to yank a couple definitions stated by the specification:

Commands are instructions that are recorded in a device’s queue. There are four types of commands: action, state, synchronization and indirection. Synchronization commands impose ordering constraints on action commands, by introducing explicit execution and memory dependencies.

Operation is an arbitrary amount of commands recorded in a device’s queue.

Since the driver can reorder commands (perhaps for better performance, dunno), we need to send explicit synchronization commands to the device’s queue to enforce a specific sequence of action commands.

Nevertheless, Vulkan doesn’t offer fine-grained dependencies between individual operations. Instead, dependencies are expressed as a relation of two elements, where each element is composed by the intersection of scope and operation. A scope is a concept in the specification that, in practical terms, can be either pipeline stage (for execution dependencies), or both pipeline stage and memory access type (for memory dependencies).

First let’s review execution dependencies through pipeline stages:

Every command submitted to a device’s queue goes through a sequence of steps known as pipeline stages. This sequence of steps is one of the very few implicit ordering guarantees that Vulkan has. Draw calls, copy commands, compute dispatches, all go through certain sequential stages, which amount of stages to cover depends on the specific command and the current command buffer state.

In order to visualize an abstract execution dependency let’s imagine two compute operations and the first must happen before the second.

Operation 1
Sync command
Operation 2
  1. The programmer has to specify the Sync command in terms of two scopes (Scope 1 and Scope 2), in this execution dependency case, two pipeline stages.
  2. The driver generates an intersection between commands in Operation 1 and Scope 1 defined as Scoped operation 1. The intersection contains all the commands in Operation 1 that go through up to the pipeline stage defined in Scope 1. The same is done with Operation 2 and Scope 2 generating Scoped operation 2.
  3. Finally, we got an execution dependency that guarantees that Scoped operation 1 happens before Scoped operation 2.

Now let’s talk about memory dependencies:

First we need to understand the concepts of memory availability and visibility. Their formal definition in Vulkan are a bit hard to grasp since they come from the Vulkan memory model, which is intended to abstract all the ways of how hardware access memory. Perhaps we could say that availability is the operation that assures the existence of the required memory; while visibility is the operation that assures it’s possible to read/write the data in that memory area.

Memory dependencies are limited the Operation 1 that be done before memory availability and Operation 2 that have to be done after its visibility.

But again, there’s no fine-grained way to declare that memory dependency. Instead, there are memory access types, which are functions used by descriptor types, or functions for pipeline stage to access memory, and they are used as access scopes.

All in all, if a synchronization command defining a memory dependency between two operations, it’s composed by the intersection of between each command and a pipeline stage, intersected with the memory access type associated with the memory processed by those commands.

Now that the concepts are more or less explained we could see those concepts expressed in code. The synchronization command for execution and memory dependencies is defined by VkDependencyInfoKHR. And it contains a set of barrier arrays, for memory, buffers and images. Barriers express the relation of dependency between two operations. For example, Image barriers use VkImageMemoryBarrier2 which contain the mask for source pipeline stage (to define Scoped operation 1), and the mask for the destination pipeline stage (to define Scoped operation 2); the mask for source memory access type and the mask for the destination memory access to define access scopes; and also layout transformation declaration.

A Vulkan synchronization example from Vulkan Documentation wiki:

vkCmdDraw(...);

... // First render pass teardown etc.

VkImageMemoryBarrier2KHR imageMemoryBarrier = {
...
.srcStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT_KHR,
.dstStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT_KHR,
.oldLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL
/* .image and .subresourceRange should identify image subresource accessed */};

VkDependencyInfoKHR dependencyInfo = {
...
1, // imageMemoryBarrierCount
&imageMemoryBarrier, // pImageMemoryBarriers
...
}

vkCmdPipelineBarrier2KHR(commandBuffer, &dependencyInfo);

... // Second render pass setup etc.

vkCmdDraw(...);

First draw samples a texture in the fragment shader. Second draw writes to that texture as a color attachment.

This is a Write-After-Read (WAR) hazard, which you would usually only need an execution dependency for - meaning you wouldn’t need to supply any memory barriers. In this case you still need a memory barrier to do a layout transition though, but you don’t need any access types in the src access mask. The layout transition itself is considered a write operation though, so you do need the destination access mask to be correct - or there would be a Write-After-Write (WAW) hazard between the layout transition and the color attachment write.

Other explicit synchronization mechanisms, along with barriers, are semaphores and fences. Semaphores are a synchronization primitive that can be used to insert a dependency between operations without notifying the host; while fences are a synchronization primitive that can be used to insert a dependency from a queue to the host. Semaphores and fences are expressed in the VkSubmitInfo2KHR structure.

As a preliminary conclusion, synchronization in Vulkan is hard and a helper API would be very helpful. Inspired by FFmpeg work done by Lynne, I added GstVulkanOperation object helper to GStreamer Vulkan API.

GstVulkanOperation object helper aims to represent an operation in the sense of the Vulkan specification mentioned before. It owns a command buffer as public member where external commands can be pushed to the associated device’s queue.

It has a set of methods:

Internally, GstVulkanOperation contains two arrays:

  1. The array of dependency frames, which are the set of frames, each representing an operation, which will hold dependency relationships with other dependency frames.

    gst_vulkan_operation_add_dependency_frame appends frames to this array.

    When calling gst_vulkan_operation_end the frame’s barrier state for each frame in the array is updated.

    Also, each dependency frame creates a timeline semaphore, which will be signaled when a command, associated with the frame, is executed in the device’s queue.

  2. The array of barriers, which contains a list of synchronization commands. gst_vulkan_operation_add_frame_barrier fills and appends a VkImageMemoryBarrier2KHR associated with a frame, which can be in the array of dependency frames.

Here’s a generic view of video decoding example:

gst_vulkan_operation_begin (priv->exec, ...);

cmd_buf = priv->exec->cmd_buf->cmd;

gst_vulkan_operation_add_dependency_frame (exec, out,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);

/* assume a engine where out frames can be used for DPB frames, */
/* so a barrier for layout transition is required */
gst_vulkan_operation_add_frame_barrier (exec, out,
VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR,
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR, NULL);

for (i = 0; i < dpb_size; i++) {
gst_vulkan_operation_add_dependency_frame (exec, dpb_frame,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);
}

barriers = gst_vulkan_operation_retrieve_image_barriers (exec);
vkCmdPipelineBarrier2 (cmd_buf, &(VkDependencyInfo) {
...
.pImageMemoryBarriers = barriers->data,
.imageMemoryBarrierCount = barriers->len,
});
g_array_unref (barriers);

vkCmdBeginVideoCodingKHR (cmd_buf, &decode_start);
vkCmdDecodeVideoKHR (cmd_buf, &decode_info);
vkCmdEndVideoCodingKHR (cmd_buf, &decode_end);

gst_vulkan_operation_end (exec, ...);

Here, just one memory barrier is required for memory layout transition, but semaphores are required to signal when an output frame and its DPB frames are processed, and later, the output frame can be used as a DPB frame. Otherwise, the output frame might not be fully reconstructed with it’s used as DPB for the next output frame, generating only noise.

And that’s all. Thank you.

GStreamer 1.24.4 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • audioconvert: support more than 64 audio channels
  • avvidec: fix dropped frames when doing multi-threaded decoding of I-frame codecs such as DV Video
  • mpegtsmux: Correctly time out in live pipelines, esp. for sparse streams like KLV and DVB subtitles
  • vtdec deadlock fixes on shutdown and format/resolution changes (as might happen with e.g. HLS/DASH)
  • fmp4mux, isomp4mux: Add support for adding AV1 header OBUs into the MP4 headers, and add language from tags
  • gtk4paintablesink improvements: fullscreen mode and gst-play-1.0 support
  • webrtcsink: add support for insecure TLS and improve error handling and VP9 handling
  • vah264enc, vah265enc: timestamp handling fixes; generate IDR frames on force-keyunit-requests, not I frames
  • v4l2codecs: decoder: Reorder caps to prefer `DMA_DRM` ones, fixes issues with playbin3
  • Visualizer plugins fixes
  • Avoid using private APIs on iOS
  • various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.4 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

GStreamer Hackfest 2024

From Herostratus’ legacy by Víctor Jáquez

Last weeks were a bit hectic. First, with a couple friends we biked the southwest of the Netherlands for almost a week. The next week, the last one, I attended the 2024 Display Next Hackfest

This week was Igalia’s Assembly meetings, and next week, along with other colleagues, I’ll be in Thessaloniki for the GStreamer Spring Hackfest

I’m happy to meet again friends from the GStreamer community and talk and move things forward related with Vulkan, VA-API, KMS, video codecs, etc.

GStreamer Conference 2024 announced to take place 7-10 October 2024 in Montréal, Canada

From GStreamer News by GStreamer

GStreamer

The GStreamer project is thrilled to announce that this year's GStreamer Conference will take place on Monday-Tuesday 7-8 October 2024 in Montréal, Québec, Canada, followed by a hackfest.

You can find more details about the conference on the GStreamer Conference 2024 web site.

A call for papers will be sent out in due course.

Registration will open at a later time, in late June / early July.

We will announce those and any further updates on the gstreamer-announce mailing list, the website, on Twitter. and on Mastodon.

Talk slots will be available in varying durations from 20 minutes up to 45 minutes. Whatever you're doing or planning to do with GStreamer, we'd like to hear from you!

We also plan to have sessions with short lightning talks / demos / showcase talks for those who just want to show what they've been working on or do a mini-talk instead of a full-length talk. Lightning talk slots will be allocated on a first-come-first-serve basis, so make sure to reserve your slot if you plan on giving a lightning talk.

A GStreamer hackfest will take place right after the conference, on 9-10 October 2024.

We hope to see you in Montréal!

Please spread the word!

Dissecting GstSegments

From GStreamer – Happy coding by Enrique Ocaña González

During all these years using GStreamer, I’ve been having to deal with GstSegments in many situations. I’ve always have had an intuitive understanding of the meaning of each field, but never had the time to properly write a good reference explanation for myself, ready to be checked at those times when the task at hand stops being so intuitive and nuisances start being important. I used the notes I took during an interesting conversation with Alba and Alicia about those nuisances, during the GStreamer Hackfest in A Coruña, as the seed that evolved into this post.

But what are actually GstSegments? They are the structures that track the values needed to synchronize the playback of a region of interest in a media file.

GstSegments are used to coordinate the translation between Presentation Timestamps (PTS), supplied by the media, and Runtime.

PTS is the timestamp that specifies, in buffer time, when the frame must be displayed on screen. This buffer time concept (called buffer running-time in the docs) refers to the ideal time flow where rate isn’t being had into account.

Decode Timestamp (DTS) is the timestamp that specifies, in buffer time, when the frame must be supplied to the decoder. On decoders supporting P-frames (forward-predicted) and B-frames (bi-directionally predicted), the PTS of the frames reaching the decoder may not be monotonic, but the PTS of the frames reaching the sinks are (the decoder outputs monotonic PTSs).

Runtime (called clock running time in the docs) is the amount of physical time that the pipeline has been playing back. More specifically, the Runtime of a specific frame indicates the physical time that has passed or must pass until that frame is displayed on screen. It starts from zero.

Base time is the point when the Runtime starts with respect to the input timestamp in buffer time (PTS or DTS). It’s the Runtime of the PTS=0.

Start, stop, duration: Those fields are buffer timestamps that specify when the piece of media that is going to be played starts, stops and how long that portion of the media is (the absolute difference between start and stop, and I mean absolute because a segment being played backwards may have a higher start buffer timestamp than what its stop buffer timestamp is).

Position is like the Runtime, but in buffer time. This means that in a video being played back at 2x, Runtime would flow at 1x (it’s physical time after all, and reality goes at 1x pace) and Position would flow at 2x (the video moves twice as fast than physical time).

The Stream Time is the position in the stream. Not exactly the same concept as buffer time. When handling multiple streams, some of them can be offset with respect to each other, not starting to be played from the begining, or even can have loops (eg: repeating the same sound clip from PTS=100 until PTS=200 intefinitely). In this case of repeating, the Stream time would flow from PTS=100 to PTS=200 and then go back again to the start position of the sound clip (PTS=100). There’s a nice graphic in the docs illustrating this, so I won’t repeat it here.

Time is the base of Stream Time. It’s the Stream time of the PTS of the first frame being played. In our previous example of the repeating sound clip, it would be 100.

There are also concepts such as Rate and Applied Rate, but we didn’t get into them during the discussion that motivated this post.

So, for translating between Buffer Time (PTS, DTS) and Runtime, we would apply this formula:

Runtime = BufferTime * ( Rate * AppliedRate ) + BaseTime

And for translating between Buffer Time (PTS, DTS) and Stream Time, we would apply this other formula:

StreamTime = BufferTime * AppliedRate + Time

And that’s it. I hope these notes in the shape of a post serve me as reference in the future. Again, thanks to Alicia, and especially to Alba, for the valuable clarifications during the discussion we had that day in the Igalia office. This post wouldn’t have been possible without them.

GStreamer 1.24.3 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • EXIF image tag parsing security fixes
  • Subtitle handling improvements in parsebin
  • Fix issues with HLS streams that contain VTT subtitles
  • Qt6 QML sink re-render and re-sizing fixes
  • unixfd ipc plugin timestamp and segment handling fixes
  • vah264enc, vah265enc: Do not touch the PTS of the output frame
  • vah264dec and vapostproc fixes and improvements
  • v4l2: multiple fixes and improvements, incl. for mediatek JPEG decoder and v4l2 loopback
  • v4l2: fix hang after seek with some v4l2 decoders
  • Wayland sink fixes
  • ximagesink: fix regression on RPi/aarch64
  • fmp4mux, mp4mux gained FLAC audio support
  • D3D11, D3D12: reliablity improvements and memory leak fixes
  • Media Foundation device provider fixes
  • GTK4 paintable sink improvements including support for directly importing dmabufs with GTK 4.14
  • WebRTC sink/source fixes and improvements
  • AWS s3sink, s3src, s3hlssink now support path-style addressing
  • MPEG-TS demuxer fixes
  • Python bindings fixes
  • various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.3 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

GStreamer 1.22.12 old-stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the now old-stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • EXIF image tag parsing security fixes
  • glimagesink, gl/macos: race and reference count fixes
  • GstPlay, dvbsubenc, alphadecodebin, d3dvideosink fixes
  • rtpjitterbuffer extended timestamp handling fixes
  • v4l2: fix regression with tiled formats
  • ximagesink: fix regression on RPi/aarch64
  • Thread-safety fixes
  • Python bindings fixes
  • cerbero build fixes with clang 15 on latest macOS/iOS
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.12 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

PS: GStreamer 1.22 has now been superseded by GStreamer 1.24.

Update from the GNOME board

From Robotic Tendencies by Robert McQueen

Robert McQueen

It’s been around 6 months since the GNOME Foundation was joined by our new Executive Director, Holly Million, and the board and I wanted to update members on the Foundation’s current status and some exciting upcoming changes.

Finances

As you may be aware, the GNOME Foundation has operated at a deficit (nonprofit speak for a loss – ie spending more than we’ve been raising each year) for over three years, essentially running the Foundation on reserves from some substantial donations received 4-5 years ago. The Foundation has a reserves policy which specifies a minimum amount of money we have to keep in our accounts. This is so that if there is a significant interruption to our usual income, we can preserve our core operations while we work on new funding sources. We’ve now “hit the buffers” of this reserves policy, meaning the Board can’t approve any more deficit budgets – to keep spending at the same level we must increase our income.

One of the board’s top priorities in hiring Holly was therefore her experience in communications and fundraising, and building broader and more diverse support for our mission and work. Her goals since joining – as well as building her familiarity with the community and project – have been to set up better financial controls and reporting, develop a strategic plan, and start fundraising. You may have noticed the Foundation being more cautious with spending this year, because Holly prepared a break-even budget for the Board to approve in October, so that we can steady the ship while we prepare and launch our new fundraising initiatives.

Strategy & Fundraising

The biggest prerequisite for fundraising is a clear strategy – we need to explain what we’re doing and why it’s important, and use that to convince people to support our plans. I’m very pleased to report that Holly has been working hard on this and meeting with many stakeholders across the community, and has prepared a detailed and insightful five year strategic plan. The plan defines the areas where the Foundation will prioritise, develop and fund initiatives to support and grow the GNOME project and community. The board has approved a draft version of this plan, and over the coming weeks Holly and the Foundation team will be sharing this plan and running a consultation process to gather feedback input from GNOME foundation and community members.

In parallel, Holly has been working on a fundraising plan to stabilise the Foundation, growing our revenue and ability to deliver on these plans. We will be launching a variety of fundraising activities over the coming months, including a development fund for people to directly support GNOME development, working with professional grant writers and managers to apply for government and private foundation funding opportunities, and building better communications to explain the importance of our work to corporate and individual donors.

Board Development

Another observation that Holly had since joining was that we had, by general nonprofit standards, a very small board of just 7 directors. While we do have some committees which have (very much appreciated!) volunteers from outside the board, our officers are usually appointed from within the board, and many board members end up serving on multiple committees and wearing several hats. It also means the number of perspectives on the board is limited and less representative of the diverse contributors and users that make up the GNOME community.

Holly has been working with the board and the governance committee to reduce how much we ask from individual board members, and improve representation from the community within the Foundation’s governance. Firstly, the board has decided to increase its size from 7 to 9 members, effective from the upcoming elections this May & June, allowing more voices to be heard within the board discussions. After that, we’re going to be working on opening up the board to more participants, creating non-voting officer seats to represent certain regions or interests from across the community, and take part in committees and board meetings. These new non-voting roles are likely to be appointed with some kind of application process, and we’ll share details about these roles and how to be considered for them as we refine our plans over the coming year.

Elections

We’re really excited to develop and share these plans and increase the ways that people can get involved in shaping the Foundation’s strategy and how we raise and spend money to support and grow the GNOME community. This brings me to my final point, which is that we’re in the run up to the annual board elections which take place in the run up to GUADEC. Because of the expansion of the board, and four directors coming to the end of their terms, we’ll be electing 6 seats this election. It’s really important to Holly and the board that we use this opportunity to bring some new voices to the table, leading by example in growing and better representing our community.

Allan wrote in the past about what the board does and what’s expected from directors. As you can see we’re working hard on reducing what we ask from each individual board member by increasing the number of directors, and bringing additional members in to committees and non-voting roles. If you’re interested in seeing more diverse backgrounds and perspectives represented on the board, I would strongly encourage you consider standing for election and reach out to a board member to discuss their experience.

Thanks for reading! Until next time.

Best Wishes,
Rob
President, GNOME Foundation

Update 2024-04-27: It was suggested in the Discourse thread that I clarify the interaction between the break-even budget and the 1M EUR committed by the STF project. This money is received in the form of a contract for services rather than a grant to the Foundation, and must be spent on the development areas agreed during the planning and application process. It’s included within this year’s budget (October 23 – September 24) and is all expected to be spent during this fiscal year, so it doesn’t have an impact on the Foundation’s reserves position. The Foundation retains a small % fee to support its costs in connection with the project, including the new requirement to have our accounts externally audited at the end of the financial year. We are putting this money towards recruitment of an administrative assistant to improve financial and other operational support for the Foundation and community, including the STF project and future development initiatives.

(also posted to GNOME Discourse, please head there if you have any questions or comments)

From WebKit/GStreamer to rust-av, a journey on our stack’s layers

From Base-Art - Philippe Normand by Phil Normand

Phil Normand

In this post I’ll try to document the journey starting from a WebKit issue and ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.

I’ve been working on WebKit’s GStreamer backends for a while. Usually some new feature needed on WebKit side would trigger work …

GStreamer 1.24.2 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • H.264 parsing regression fixes
  • WavPack typefinding improvements
  • Video4linux fixes and improvements
  • Android build and runtime fixes
  • macOS OpenGL memory leak and robustness fixes
  • Qt/QML video sink fixes
  • Windows MSVC binary packages: fix libvpx avx/avx2/avx512 instruction set detection
  • Package new analytics and mse libraries in binary packages
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.2 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

xz backdoor and autotools insanity

From Planet – Felipe Contreras by Felipe Contreras

I argue autotools' convoluted nature is what enabled the xz backdoor in the first place. The truth is nobody needs to use autotools, and I show why.

Fedora Workstation 40 – what are we working on

From Christian F.K. Schaller by Christian Schaller

Christian Schaller
So Fedora Workstation 40 Beta has just come out so I thought I share a bit about some of the things we are working on for Fedora Workstation currently and also major changes coming in from the community.

Flatpak

Flatpaks has been a key part of our strategy for desktop applications for a while now and we are working on a multitude of things to make Flatpaks an even stronger technology going forward. Christian Hergert is working on figuring out how applications that require system daemons will work with Flatpaks, using his own Sysprof project as the proof of concept application. The general idea here is to rely on the work that has happened in SystemD around sysext/confext/portablectl trying to figure out who we can get a system service installed from a Flatpak and the necessary bits wired up properly. The other part of this work, figuring out how to give applications permissions that today is handled with udev rules, that is being worked on by Hubert Figuière based on earlier work by Georges Stavracas on behalf of the GNOME Foundation thanks to the sponsorship from the Sovereign Tech Fund. So hopefully we will get both of these two important issues resolved soon. Kalev Lember is working on polishing up the Flatpak support in Foreman (and Satellite) to ensure there are good tools for managing Flatpaks when you have a fleet of systems you manage, building on the work of Stephan Bergman. Finally Jan Horak and Jan Grulich is working hard on polishing up the experience of using Firefox from a fully sandboxed Flatpak. This work is mainly about working with the upstream community to get some needed portals over the finish line and polish up some UI issues in Firefox, like this one.

Toolbx

Toolbx, our project for handling developer containers, is picking up pace with Debarshi Ray currently working on getting full NVIDIA binary driver support for the containers. One of our main goals for Toolbx atm is making it a great tool for AI development and thus getting the NVIDIA & CUDA support squared of is critical. Debarshi has also spent quite a lot of time cleaning up the Toolbx website, providing easier access to and updating the documentation there. We are also moving to use the new Ptyxis (formerly Prompt) terminal application created by Christian Hergert, in Fedora Workstation 40. This both gives us a great GTK4 terminal, but we also believe we will be able to further integrate Toolbx and Ptyxis going forward, creating an even better user experience.

Nova

So as you probably know, we have been the core maintainers of the Nouveau project for years, keeping this open source upstream NVIDIA GPU driver alive. We plan on keep doing that, but the opportunities offered by the availability of the new GSP firmware for NVIDIA hardware means we should now be able to offer a full featured and performant driver. But co-hosting both the old and the new way of doing things in the same upstream kernel driver has turned out to be counter productive, so we are now looking to split the driver in two. For older pre-GSP NVIDIA hardware we will keep the old Nouveau driver around as is. For GSP based hardware we are launching a new driver called Nova. It is important to note here that Nova is thus not a competitor to Nouveau, but a continuation of it. The idea is that the new driver will be primarily written in Rust, based on work already done in the community, we are also evaluating if some of the existing Nouveau code should be copied into the new driver since we already spent quite a bit of time trying to integrate GSP there. Worst case scenario, if we can’t reuse code, we use the lessons learned from Nouveau with GSP to implement the support in Nova more quickly. Contributing to this effort from our team at Red Hat is Danilo Krummrich, Dave Airlie, Lyude Paul, Abdiel Janulgue and Phillip Stanner.

Explicit Sync and VRR

Another exciting development that has been a priority for us is explicit sync, which is critical for especially the NVidia driver, but which might also provide performance improvements for other GPU architectures going forward. So a big thank you to Michel Dänzer , Olivier Fourdan, Carlos Garnacho; and Nvidia folks, Simon Ser and the rest of community for working on this. This work has just finshed upstream so we will look at backporting it into Fedora Workstaton 40. Another major Fedora Workstation 40 feature is experimental support for Variable Refresh Rate or VRR in GNOME Shell. The feature was mostly developed by community member Dor Askayo, but Jonas Ådahl, Michel Dänzer, Carlos Garnacho and Sebastian Wick have all contributed with code reviews and fixes. In Fedora Workstation 40 you need to enable it using the command

gsettings set org.gnome.mutter experimental-features "['variable-refresh-rate']"

PipeWire

Already covered PipeWire in my post a week ago, but to quickly summarize here too. Using PipeWire for video handling is now finally getting to the stage where it is actually happening, both Firefox and OBS Studio now comes with PipeWire support and hopefully we can also get Chromium and Chrome to start taking a serious look at merging the patches for this soon. Whats more Wim spent time fixing Firewire FFADO bugs, so hopefully for our pro-audio community users this makes their Firewire equipment fully usable and performant with PipeWire. Wim did point out when I spoke to him though that the FFADO drivers had obviously never had any other consumer than JACK, so when he tried to allow for more functionality the drivers quickly broke down, so Wim has limited the featureset of the PipeWire FFADO module to be an exact match of how these drivers where being used by JACK. If the upstream kernel maintainer is able to fix the issues found by Wim then we could look at providing a more full feature set. In Fedora Workstation 40 the de-duplication support for v4l vs libcamera devices should work as soon as we update Wireplumber to the new 0.5 release.

To hear more about PipeWire and the latest developments be sure to check out this interview with Wim Taymans by the good folks over at Destination Linux.

Remote Desktop

Another major feature landing in Fedora Workstation 40 that Jonas Ådahl and Ray Strode has spent a lot of effort on is finalizing the remote desktop support for GNOME on Wayland. So there has been support for remote connections for already logged in sessions already, but with these updates you can do the login remotely too and thus the session do not need to be started already on the remote machine. This work will also enable 3rd party solutions to do remote logins on Wayland systems, so while I am not at liberty to mention names, be on the lookout for more 3rd party Wayland remoting software becoming available this year.

This work is also important to help Anaconda with its Wayland transition as remote graphical install is an important feature there. So what you should see there is Anaconda using GNOME Kiosk mode and the GNOME remote support to handle this going forward and thus enabling Wayland native Anaconda.

HDR

Another feature we been working on for a long time is HDR, or High Dynamic Range. We wanted to do it properly and also needed to work with a wide range of partners in the industry to make this happen. So over the last year we been contributing to improve various standards around color handling and acceleration to prepare the ground, work on and contribute to key libraries needed to for instance gather the needed information from GPUs and screens. Things are coming together now and Jonas Ådahl and Sebastian Wick are now going to focus on getting Mutter HDR capable, once that work is done we are by no means finished, but it should put us close to at least be able to start running some simple usecases (like some fullscreen applications) while we work out the finer points to get great support for running SDR and HDR applications side by side for instance.

PyTorch

We want to make Fedora Workstation a great place to do AI development and testing. First step in that effort is packaging up PyTorch and making sure it can have working hardware acceleration out of the box. Tom Rix has been leading that effort on our end and you will see the first fruits of that labor in Fedora Workstation 40 where PyTorch should work with GPU acceleration on AMD hardware (ROCm) out of the box. We hope and expect to be able to provide the same for NVIDIA and Intel graphics eventually too, but this is definitely a step by step effort.

GStreamer 1.24.1 stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.0.

Highlighted bugfixes:

  • Fix instant-EOS regression in audio sinks in some cases when volume is 0
  • rtspsrc: server compatibility improvements and ONVIF trick mode fixes
  • rtsp-server: fix issues if RTSP media was set to be both shared and reusable
  • (uri)decodebin3 and playbin3 fixes
  • adaptivdemux2/hlsdemux2: Fix issues with failure updating playlists
  • mpeg123 audio decoder fixes
  • v4l2codecs: DMA_DRM caps support for decoders
  • va: various AV1 / H.264 / H.265 video encoder fixes
  • vtdec: fix potential deadlock regression with ProRes playback
  • gst-libav: fixes for video decoder frame handling, interlaced mode detection
  • avenc_aac: support for 7.1 and 16 channel modes
  • glimagesink: Fix the sink not always respecting preferred size on macOS
  • gtk4paintablesink: Fix scaling of texture position
  • webrtc: Allow resolution and framerate changes, and many other improvements
  • webrtc: Add new LiveKit source element
  • Fix usability of binary packages on arm64 iOS
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.1 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

GStreamer 1.22.11 old-stable bug fix release

From GStreamer News by GStreamer

GStreamer

The GStreamer team is pleased to announce another bug fix release in the now old-stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Fix instant-EOS regression in audio sinks in some cases when volume is 0
  • rtspsrc: server compatibility improvements and ONVIF trick mode fixes
  • libsoup linking improvements on non-Linux platforms
  • va: improvements for intel i965 driver
  • wasapi2: fix choppy audio and respect ringbuffer buffer/latency time
  • rtsp-server file descriptor leak fix
  • uridecodebin3 fixes
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.11 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here: