Flutter Engine
The Flutter Engine
VelloComputeSteps.h
Go to the documentation of this file.
1/*
2 * Copyright 2023 Google LLC
3 *
4 * Use of this source code is governed by a BSD-style license that can be
5 * found in the LICENSE file.
6 */
7
8#ifndef skgpu_graphite_compute_VelloComputeSteps_DEFINED
9#define skgpu_graphite_compute_VelloComputeSteps_DEFINED
10
11#include "include/core/SkSpan.h"
16
17#include <string_view>
18
19namespace skgpu::graphite {
20
21// This file defines ComputeSteps for all Vello compute stages and their permutations. The
22// declaration of each ComputeStep subclass mirrors the name of the pipeline stage as defined in the
23// shader metadata.
24//
25// The compute stages all operate over a shared set of buffer and image resources. The
26// `kVelloSlot_*` constant definitions below each uniquely identify a shared resource that must be
27// instantiated when assembling the ComputeSteps into a DispatchGroup.
28//
29// === Monoids and Prefix Sums ===
30//
31// Vello's GPU algorithms make repeated use of parallel prefix sums techniques. These occur
32// frequently in path rasterization (e.g. winding number accummulation across a scanline can be
33// thought of as per-pixel prefix sums) but Vello also uses them to calculate buffer offsets for
34// associated entries across its variable length encoding streams.
35//
36// For instance, given a scene that contains Bézier paths, each path gets encoded as a transform,
37// a sequence of path tags (verbs), and zero or more 2-D points associated with each
38// tag. N paths will often map to N transforms, N + M tags, and N + M + L points (where N > 0, M >
39// 0, L >= 0). These entries are stored in separate parallel transform, path tag, and path data
40// streams. The correspondence between entries of these independent streams is implicit. To keep
41// CPU encoding of these streams fast, the offsets into each buffer for a given "path object" is
42// computed dynamically and in parallel on the GPU. Since the offsets for each object build
43// additively on offsets that appear before it in the stream, parallel computation of
44// offsets can be treated as a dynamic programming problem that maps well to parallel prefix sums
45// where each object is a "monoid" (https://en.wikipedia.org/wiki/Monoid) that supports algebraic
46// addition/subtraction over data encoded in the path tags themselves.
47//
48// Once computed, a monoid contains the offsets into the input (and sometimes output) buffers for a
49// given object. The parallel prefix sums operation is defined as a monoidal reduce + pre-scan pair.
50// (Prefix Sums and Their Applications, Blelloch, G., https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf)
51//
52// While these concepts are an implementation detail they are core to the Vello algorithm and are
53// reflected in the pipeline names and data slot definitions.
54//
55// === Full Pipeline ===
56//
57// The full Vello pipeline stages are as follows and should be dispatched in the following order:
58//
59// I. Build the path monoid stream:
60// If the input fits within the workgroup size:
61// pathtag_reduce, pathtag_scan_small
62// else
63// pathtag_reduce, pathtag_reduce2, pathtag_scan1, pathtag_scan_large
64//
65// II. Compute path bounding boxes, convert path segments into cubics:
66// bbox_clear, pathseg
67//
68// III. Process the draw object stream to build the draw monoids and inputs to the clip stage:
69// draw_reduce, draw_leaf
70//
71// IV. Compute the bounding boxes for the clip stack from the input stream, if the scene contains
72// clips:
73// clip_reduce, clip_leaf
74//
75// V. Allocate tile and segment buffers for the individual bins and prepare for coarse rasterization
76// binning, tile_alloc, path_coarse
77//
78// VI. Coarse rasterization
79// backdrop_dyn, coarse
80//
81// VII. Fine rasterization
82// fine
83//
84// TODO: Document the coverage mask pipeline once it has been re-implemented.
85
86// ***
87// Shared buffers that are accessed by various stages.
88//
89// The render configration uniform buffer.
90constexpr int kVelloSlot_ConfigUniform = 0;
91
92// The scene encoding buffer.
93constexpr int kVelloSlot_Scene = 1;
94
95// ***
96// Buffers used during the element processing stage. This stage converts the stream of variable
97// length path tags, transforms, brushes into a "path monoid" stream containing buffer offsets for
98// the subsequent stages that associate the input streams with individual draw elements. This stage
99// performs a parallel prefix sum (reduce + scan) which can be performed in two dispatches if the
100// entire input can be processed by a single workgroup per dispatch. Otherwise, the algorithm
101// requires two additional dispatches to continue the traversal (this is due to a lack of primitives
102// to synchronize execution across workgroups in MSL and WGSL).
103//
104// Single pass variant pipelines: pathtag_reduce, pathtag_scan_small
105// Multi-pass variant pipelines: pathtag_reduce, pathtag_reduce2, pathtag_scan1, pathtag_scan_large
106constexpr int kVelloSlot_TagMonoid = 2;
107
108// Single pass variant slots:
110
111// Multi pass variant slots:
115
116// ***
117// The second part of element processing flattens path elements (moveTo, lineTo, quadTo, etc) into
118// an unordered line soup buffer and computes their bounding boxes. This stage is where strokes get
119// expanded to fills and stroke styles get applied. The output is an unordered "line soup" buffer
120// and the tight device-space bounding box of each path.
121//
122// Pipelines: bbox_clear, flatten
123constexpr int kVelloSlot_PathBBoxes = 6;
124constexpr int kVelloSlot_Lines = 7;
125
126// ***
127// The next part prepares the draw object stream (entries in the per-tile command list aka PTCL)
128// and additional metadata for the subsequent clipping and binning stages.
129//
130// Pipelines: draw_reduce, draw_leaf
132constexpr int kVelloSlot_DrawMonoid = 9;
133constexpr int kVelloSlot_InfoBinData = 10;
134constexpr int kVelloSlot_ClipInput = 11;
135
136// ***
137// Clipping. The outputs of this stage are the finalized draw monoid and the clip bounding-boxes.
138// Clipping involves evaluating the stack monoid: refer to the following paper for the meaning of
139// these buffers: https://arxiv.org/pdf/2205.11659.pdf,
140// https://en.wikipedia.org/wiki/Bicyclic_semigroup
141//
142// Pipelines: clip_reduce, clip_leaf
143constexpr int kVelloSlot_ClipBicyclic = 12;
144constexpr int kVelloSlot_ClipElement = 13;
145constexpr int kVelloSlot_ClipBBoxes = 14;
146
147// ***
148// Buffers containing bump allocated data, the inputs and outputs to the binning, coarse raster, and
149// per-tile segment assembly stages.
150//
151// Pipelines: binning, tile_alloc, path_count, backdrop, coarse, path_tiling
152constexpr int kVelloSlot_DrawBBoxes = 15;
153constexpr int kVelloSlot_BumpAlloc = 16;
154constexpr int kVelloSlot_BinHeader = 17;
155
156constexpr int kVelloSlot_Path = 18;
157constexpr int kVelloSlot_Tile = 19;
158constexpr int kVelloSlot_SegmentCounts = 20;
159constexpr int kVelloSlot_Segments = 21;
160constexpr int kVelloSlot_PTCL = 22;
161
162// ***
163// Texture resources used by the fine rasterization stage. The gradient image needs to get populated
164// on the CPU with pre-computed gradient ramps. The image atlas is intended to hold pre-uploaded
165// images that are composited into the scene.
166//
167// The output image contains the final render.
168constexpr int kVelloSlot_OutputImage = 23;
169constexpr int kVelloSlot_GradientImage = 24;
170constexpr int kVelloSlot_ImageAtlas = 25;
171
172// ***
173// The indirect count buffer is used to issue an indirect dispatch of the path count and path tiling
174// stages.
175constexpr int kVelloSlot_IndirectCount = 26;
176
177// ***
178// The sample mask lookup table used in MSAA modes of the fine rasterization stage.
179constexpr int kVelloSlot_MaskLUT = 27;
180
181std::string_view VelloStageName(vello_cpp::ShaderStage);
187
188template <vello_cpp::ShaderStage S>
189class VelloStep : public ComputeStep {
190public:
191 ~VelloStep() override = default;
192
195 }
196
197protected:
201 resources,
203 Flags::kSupportsNativeShader) {}
204
205private:
206 // Helper that creates a SkSpan from a universal reference to a container. Generally, creating a
207 // SkSpan from an rvalue reference is not safe since the pointer stored in the SkSpan will
208 // dangle beyond the constructor expression. In our usage in the constructor above,
209 // the lifetime of the temporary TArray should match that of the SkSpan, both of which should
210 // live through the constructor call expression.
211 //
212 // From https://en.cppreference.com/w/cpp/language/reference_initialization#Lifetime_of_a_temporary:
213 //
214 // a temporary bound to a reference parameter in a function call exists until the end of the
215 // full expression containing that function call
216 //
217 template <typename T, typename C>
218 static SkSpan<const T> AsSpan(C&& container) {
219 return SkSpan(std::data(container), std::size(container));
220 }
221};
222
223#define VELLO_COMPUTE_STEP(stage) \
224 class Vello##stage##Step final : public VelloStep<vello_cpp::ShaderStage::stage> { \
225 public: \
226 Vello##stage##Step(); \
227 };
228
239VELLO_COMPUTE_STEP(PathCountSetup);
241VELLO_COMPUTE_STEP(PathTilingSetup);
242VELLO_COMPUTE_STEP(PathtagReduce);
243VELLO_COMPUTE_STEP(PathtagReduce2);
244VELLO_COMPUTE_STEP(PathtagScan1);
245VELLO_COMPUTE_STEP(PathtagScanLarge);
246VELLO_COMPUTE_STEP(PathtagScanSmall);
248
249#undef VELLO_COMPUTE_STEP
250
251template <vello_cpp::ShaderStage S, SkColorType T> class VelloFineStepBase : public VelloStep<S> {
252public:
253 // We need to return a texture format for the bound textures.
254 std::tuple<SkISize, SkColorType> calculateTextureParameters(
255 int index, const ComputeStep::ResourceDesc&) const override {
256 SkASSERT(index == 4);
257 // TODO: The texture dimensions are unknown here so this method returns 0 for the texture
258 // size. In this case this field is unused since VelloRenderer assigns texture resources
259 // directly to the DispatchGroupBuilder. The format must still be queried to describe the
260 // ComputeStep's binding layout. This method could be improved to enable conditional
261 // querying of optional/dynamic parameters.
262 return {{}, T};
263 }
264
265protected:
267 : VelloStep<S>(resources) {}
268};
269
270template <vello_cpp::ShaderStage S, SkColorType T, ::rust::Vec<uint8_t> (*MaskLutBuilder)()>
272public:
273 size_t calculateBufferSize(int resourceIndex, const ComputeStep::ResourceDesc&) const override {
274 SkASSERT(resourceIndex == 5);
275 return fMaskLut.size();
276 }
277
278 void prepareStorageBuffer(int resourceIndex,
280 void* buffer,
281 size_t bufferSize) const override {
282 SkASSERT(resourceIndex == 5);
283 SkASSERT(fMaskLut.size() == bufferSize);
284 memcpy(buffer, fMaskLut.data(), fMaskLut.size());
285 }
286
287protected:
289 : VelloFineStepBase<S, T>(resources), fMaskLut(MaskLutBuilder()) {}
290
291private:
292 ::rust::Vec<uint8_t> fMaskLut;
293};
294
296 : public VelloFineStepBase<vello_cpp::ShaderStage::FineArea, kRGBA_8888_SkColorType> {
297public:
299};
300
302 : public VelloFineStepBase<vello_cpp::ShaderStage::FineAreaR8, kAlpha_8_SkColorType> {
303public:
305};
306
307class VelloFineMsaa16Step final : public VelloFineMsaaStepBase<vello_cpp::ShaderStage::FineMsaa16,
308 kRGBA_8888_SkColorType,
309 vello_cpp::build_mask_lut_16> {
310public:
312};
313
315 : public VelloFineMsaaStepBase<vello_cpp::ShaderStage::FineMsaa16R8,
316 kAlpha_8_SkColorType,
317 vello_cpp::build_mask_lut_16> {
318public:
320};
321
322class VelloFineMsaa8Step final : public VelloFineMsaaStepBase<vello_cpp::ShaderStage::FineMsaa8,
323 kRGBA_8888_SkColorType,
324 vello_cpp::build_mask_lut_8> {
325public:
327};
328
330 : public VelloFineMsaaStepBase<vello_cpp::ShaderStage::FineMsaa8R8,
331 kAlpha_8_SkColorType,
332 vello_cpp::build_mask_lut_8> {
333public:
335};
336
337} // namespace skgpu::graphite
338
339#endif // skgpu_graphite_compute_VelloComputeSteps_DEFINED
#define SkASSERT(cond)
Definition: SkAssert.h:116
SkSpan(Container &&) -> SkSpan< std::remove_pointer_t< decltype(std::data(std::declval< Container >()))> >
SkSpan< const ResourceDesc > resources() const
Definition: ComputeStep.h:236
VelloFineMsaaStepBase(SkSpan< const ComputeStep::ResourceDesc > resources)
size_t calculateBufferSize(int resourceIndex, const ComputeStep::ResourceDesc &) const override
void prepareStorageBuffer(int resourceIndex, const ComputeStep::ResourceDesc &, void *buffer, size_t bufferSize) const override
VelloFineStepBase(SkSpan< const ComputeStep::ResourceDesc > resources)
std::tuple< SkISize, SkColorType > calculateTextureParameters(int index, const ComputeStep::ResourceDesc &) const override
~VelloStep() override=default
VelloStep(SkSpan< const ResourceDesc > resources)
NativeShaderSource nativeShaderSource(NativeShaderFormat format) const override
uint32_t uint32_t * format
DEF_SWITCHES_START aot vmservice shared library Name of the *so containing AOT compiled Dart assets for launching the service isolate vm snapshot The VM snapshot data that will be memory mapped as read only SnapshotAssetPath must be present isolate snapshot The isolate snapshot data that will be memory mapped as read only SnapshotAssetPath must be present cache dir Path to the cache directory This is different from the persistent_cache_path in embedder which is used for Skia shader cache icu native lib Path to the library file that exports the ICU data vm service The hostname IP address on which the Dart VM Service should be served If not defaults to or::depending on whether ipv6 is specified vm service A custom Dart VM Service port The default is to pick a randomly available open port disable vm Disable the Dart VM Service The Dart VM Service is never available in release mode disable vm service Disable mDNS Dart VM Service publication Bind to the IPv6 localhost address for the Dart VM Service Ignored if vm service host is set endless trace buffer
Definition: switches.h:126
it will be possible to load the file into Perfetto s trace viewer disable asset Prevents usage of any non test fonts unless they were explicitly Loaded via prefetched default font Indicates whether the embedding started a prefetch of the default font manager before creating the engine run In non interactive keep the shell running after the Dart script has completed enable serial On low power devices with low core running concurrent GC tasks on threads can cause them to contend with the UI thread which could potentially lead to jank This option turns off all concurrent GC activities domain network JSON encoded network policy per domain This overrides the DisallowInsecureConnections switch Embedder can specify whether to allow or disallow insecure connections at a domain level old gen heap size
Definition: switches.h:259
constexpr int kVelloSlot_DrawBBoxes
constexpr int kVelloSlot_Lines
constexpr int kVelloSlot_LargePathtagScanFirstPassOutput
constexpr int kVelloSlot_BinHeader
constexpr int kVelloSlot_SegmentCounts
constexpr int kVelloSlot_ClipBicyclic
constexpr int kVelloSlot_Scene
constexpr int kVelloSlot_ConfigUniform
VELLO_COMPUTE_STEP(BackdropDyn)
constexpr int kVelloSlot_GradientImage
constexpr int kVelloSlot_ClipInput
skia_private::TArray< ComputeStep::WorkgroupBufferDesc > VelloWorkgroupBuffers(vello_cpp::ShaderStage stage)
constexpr int kVelloSlot_LargePathtagReduceFirstPassOutput
std::string_view VelloStageName(vello_cpp::ShaderStage stage)
constexpr int kVelloSlot_IndirectCount
constexpr int kVelloSlot_DrawReduceOutput
constexpr int kVelloSlot_LargePathtagReduceSecondPassOutput
constexpr int kVelloSlot_TagMonoid
ComputeStep::NativeShaderSource VelloNativeShaderSource(vello_cpp::ShaderStage stage, ComputeStep::NativeShaderFormat format)
constexpr int kVelloSlot_Segments
WorkgroupSize VelloStageLocalSize(vello_cpp::ShaderStage stage)
constexpr int kVelloSlot_InfoBinData
constexpr int kVelloSlot_ClipElement
constexpr int kVelloSlot_PathBBoxes
constexpr int kVelloSlot_Path
constexpr int kVelloSlot_OutputImage
constexpr int kVelloSlot_Tile
constexpr int kVelloSlot_PathtagReduceOutput
constexpr int kVelloSlot_PTCL
constexpr int kVelloSlot_ClipBBoxes
constexpr int kVelloSlot_DrawMonoid
constexpr int kVelloSlot_MaskLUT
constexpr int kVelloSlot_BumpAlloc
constexpr int kVelloSlot_ImageAtlas
#define T
Definition: precompiler.cc:65
std::shared_ptr< const fml::Mapping > data
Definition: texture_gles.cc:63