Flutter Engine
The Flutter Engine
Loading...
Searching...
No Matches
UniformManager.h
Go to the documentation of this file.
1/*
2 * Copyright 2021 Google LLC
3 *
4 * Use of this source code is governed by a BSD-style license that can be
5 * found in the LICENSE file.
6 */
7
8#ifndef skgpu_UniformManager_DEFINED
9#define skgpu_UniformManager_DEFINED
10
11#include "include/core/SkM44.h"
15#include "include/core/SkRect.h"
17#include "include/core/SkSize.h"
18#include "include/core/SkSpan.h"
22#include "src/base/SkHalf.h"
23#include "src/base/SkMathPriv.h"
28
29#include <algorithm>
30#include <memory>
31
32namespace skgpu::graphite {
33
34class UniformDataBlock;
35
36/**
37 * Layout::kStd140
38 * ===============
39 *
40 * From OpenGL Specification Section 7.6.2.2 "Standard Uniform Block Layout":
41 * 1. If the member is a scalar consuming N basic machine units, the base alignment is N.
42 * 2. If the member is a two- or four-component vector with components consuming N basic machine
43 * units, the base alignment is 2N or 4N, respectively.
44 * 3. If the member is a three-component vector with components consuming N
45 * basic machine units, the base alignment is 4N.
46 * 4. If the member is an array of scalars or vectors, the base alignment and array
47 * stride are set to match the base alignment of a single array element, according
48 * to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
49 * array may have padding at the end; the base offset of the member following
50 * the array is rounded up to the next multiple of the base alignment.
51 * 5. If the member is a column-major matrix with C columns and R rows, the
52 * matrix is stored identically to an array of C column vectors with R components each,
53 * according to rule (4).
54 * 6. If the member is an array of S column-major matrices with C columns and
55 * R rows, the matrix is stored identically to a row of S × C column vectors
56 * with R components each, according to rule (4).
57 * 7. If the member is a row-major matrix with C columns and R rows, the matrix
58 * is stored identically to an array of R row vectors with C components each,
59 * according to rule (4).
60 * 8. If the member is an array of S row-major matrices with C columns and R
61 * rows, the matrix is stored identically to a row of S × R row vectors with C
62 * components each, according to rule (4).
63 * 9. If the member is a structure, the base alignment of the structure is N, where
64 * N is the largest base alignment value of any of its members, and rounded
65 * up to the base alignment of a vec4. The individual members of this substructure are then
66 * assigned offsets by applying this set of rules recursively,
67 * where the base offset of the first member of the sub-structure is equal to the
68 * aligned offset of the structure. The structure may have padding at the end;
69 * the base offset of the member following the sub-structure is rounded up to
70 * the next multiple of the base alignment of the structure.
71 * 10. If the member is an array of S structures, the S elements of the array are laid
72 * out in order, according to rule (9).
73 *
74 * Layout::kStd430
75 * ===============
76 *
77 * When using the std430 storage layout, shader storage blocks will be laid out in buffer storage
78 * identically to uniform and shader storage blocks using the std140 layout, except that the base
79 * alignment and stride of arrays of scalars and vectors in rule 4 and of structures in rule 9 are
80 * not rounded up a multiple of the base alignment of a vec4.
81 *
82 * NOTE: While not explicitly stated, the layout rules for WebGPU and WGSL are identical to std430
83 * for SSBOs and nearly identical to std140 for UBOs. The default mat2x2 type is treated as two
84 * float2's (not an array), so its size is 16 and alignment is 8 (vs. a size of 32 and alignment of
85 * 16 in std140). When emitting WGSL from SkSL, prepareUniformPolyfillsForInterfaceBlock() defined
86 * in WGSLCodeGenerator, will modify the type declaration to match std140 exactly. This allows the
87 * UniformManager and UniformOffsetCalculator to avoid having WebGPU-specific layout rules
88 * (whereas SkSL::MemoryLayout has more complete rules).
89 *
90 * Layout::kMetal
91 * ===============
92 *
93 * SkSL converts its types to the non-packed SIMD vector types in MSL. The size and alignment rules
94 * are equivalent to std430 with the exception of half3 and float3. In std430, the size consumed
95 * by non-array uniforms of these types is 3N while Metal consumes 4N (which is equal to the
96 * alignment of a vec3 in both Layouts).
97 *
98 * Half vs. Float Uniforms
99 * =======================
100 *
101 * Regardless of the precision when the shader is executed, std140 and std430 layouts consume
102 * "half"-based uniforms in full 32-bit precision. Metal consumes "half"-based uniforms expecting
103 * them to have already been converted to f16. WebGPU has an extension to support f16 types, which
104 * behave like this, but we do not currently utilize it.
105 *
106 * The rules for std430 can be easily extended to f16 by applying N = 2 instead of N = 4 for the
107 * base primitive alignment.
108 *
109 * NOTE: This could also apply to the int vs. short or uint vs. ushort types, but these smaller
110 * integer types are not supported on all platforms as uniforms. We disallow short integer uniforms
111 * entirely, and if the data savings are required, packing should be implemented manually.
112 * Short integer vertex attributes are supported when the vector type lets it pack into 32 bits
113 * (e.g. int16x2 or int8x4).
114 *
115 *
116 * Generalized Layout Rules
117 * ========================
118 *
119 * From the Layout descriptions above, the following simpler rules are sufficient:
120 *
121 * 1. If the base primitive type is "half" and the Layout expects half floats, N = 2; else, N = 4.
122 *
123 * 2. For arrays of scalars or vectors (with # of components, M = 1,2,3,4):
124 * a. If arrays must be aligned on vec4 boundaries OR M=3, then align and stride = 4*N.
125 * b. Otherwise, the align and stride = M*N.
126 *
127 * In both cases, the total size required for the uniform is "array size"*stride.
128 *
129 * 3. For single scalars or vectors (M = 1,2,3,4), the align is SkNextPow2(M)*N (e.g. N,2N,4N,4N).
130 * a. If M = 3 and the Layout aligns the size with the alignment, the size is 4*N and N
131 * padding bytes must be zero'ed out afterwards.
132 * b. Otherwise, the align and size = M*N
133 *
134 * 4. The starting offset to write data is the current offset aligned to the calculated align value.
135 * The current offset is then incremented by the total size of the uniform.
136 *
137 * For arrays and padded vec3's, the padding is included in the stride and total size, meeting
138 * the requirements of the original rule 4 in std140. When a single float3 that is not padded
139 * is written, the next offset only advances 12 bytes allowing a smaller type to pack tightly
140 * next to the Z coordinate.
141 *
142 * When N = 4, the CPU and GPU primitives are compatible, regardless of being float, int, or uint.
143 * Contiguous ranges between any padding (for alignment or for array stride) can be memcpy'ed.
144 * When N = 2, the CPU data is float and the GPU data f16, so values must be converted one primitive
145 * at a time using SkFloatToHalf or skvx::to_half.
146 *
147 * The UniformManager will zero out any padding bytes (either prepended for starting alignment,
148 * or appended for stride alignment). This is so that the final byte array can be hashed for uniform
149 * value de-duplication before uploading to the GPU.
150 *
151 * While SkSL supports non-square matrices, the SkSLType enum and Graphite only expose support for
152 * square matrices. Graphite assumes all matrix uniforms are in column-major order. This matches the
153 * data layout of SkM44 already and UniformManager automatically transposes SkMatrix (which is in
154 * row-major data) to be column-major. Thus, for layout purposes, a matrix or an array of matrices
155 * can be laid out equivalently to an array of the column type with an array count multiplied by the
156 * number of columns.
157 *
158 * Graphite does not embed structs within structs for its UBO or SSBO declarations for paint or
159 * RenderSteps. However, when the "uniforms" are defined for use with SSBO random access, the
160 * ordered set of uniforms is actually defining a struct instead of just a top-level interface.
161 * As such, once all uniforms are recorded, the size must be rounded up to the maximum alignment
162 * encountered for its members to satisfy alignment rules for all Layouts.
163 *
164 * If Graphite starts to define sub-structs, UniformOffsetCalculator can be used recursively.
165 */
166namespace LayoutRules {
167 // The three diverging behaviors across the different Layouts:
168 static constexpr bool PadVec3Size(Layout layout) { return layout == Layout::kMetal; }
169 static constexpr bool AlignArraysAsVec4(Layout layout) { return layout == Layout::kStd140; }
170 static constexpr bool UseFullPrecision(Layout layout) { return layout != Layout::kMetal; }
171}
172
174public:
176 UniformOffsetCalculator(Layout layout, int offset) : fLayout(layout), fOffset(offset) {}
177
178 // NOTE: The returned size represents the last consumed byte (if the recorded
179 // uniforms are embedded within a struct, this will need to be rounded up to a multiple of
180 // requiredAlignment()).
181 int size() const { return fOffset; }
182 int requiredAlignment() const { return fReqAlignment; }
183
184 // Calculates the correctly aligned offset to accommodate `count` instances of `type` and
185 // advances the internal offset. Returns the correctly aligned start offset.
186 //
187 // After a call to this method, `size()` will return the offset to the end of `count` instances
188 // of `type` (while the return value equals the aligned start offset). Subsequent calls will
189 // calculate the new start offset starting at `size()`.
190 int advanceOffset(SkSLType type, int count = Uniform::kNonArray);
191
192private:
193 Layout fLayout = Layout::kInvalid;
194 int fOffset = 0;
195 int fReqAlignment = 0;
196};
197
199public:
200 UniformManager(Layout layout) { this->resetWithNewLayout(layout); }
201
203 size_t size() const { return fStorage.size(); }
204
205 void resetWithNewLayout(Layout layout);
206 void reset() { this->resetWithNewLayout(fLayout); }
207
208 // scalars
209 void write(float f) { this->write<SkSLType::kFloat>(&f); }
210 void write(int32_t i) { this->write<SkSLType::kInt >(&i); }
211 void writeHalf(float f) { this->write<SkSLType::kHalf >(&f); }
212
213 // [i|h]vec4 and arrays thereof (just add overloads as needed)
214 void write(const SkPMColor4f& c) { this->write<SkSLType::kFloat4>(c.vec()); }
215 void write(const SkRect& r) { this->write<SkSLType::kFloat4>(r.asScalars()); }
216 void write(const SkV4& v) { this->write<SkSLType::kFloat4>(v.ptr()); }
217
218 void write(const SkIRect& r) { this->write<SkSLType::kInt4>(&r); }
219
220 void writeHalf(const SkPMColor4f& c) { this->write<SkSLType::kHalf4>(c.vec()); }
221 void writeHalf(const SkRect& r) { this->write<SkSLType::kHalf4>(r.asScalars()); }
222 void writeHalf(const SkV4& v) { this->write<SkSLType::kHalf4>(v.ptr()); }
223
225 this->writeArray<SkSLType::kFloat4>(v.data(), v.size());
226 }
228 this->writeArray<SkSLType::kFloat4>(c.data(), c.size());
229 }
231 this->writeArray<SkSLType::kHalf4>(c.data(), c.size());
232 }
233
234 // [i|h]vec3
235 void write(const SkV3& v) { this->write<SkSLType::kFloat3>(v.ptr()); }
236 void write(const SkPoint3& p) { this->write<SkSLType::kFloat3>(&p); }
237
238 void writeHalf(const SkV3& v) { this->write<SkSLType::kHalf3>(v.ptr()); }
239 void writeHalf(const SkPoint3& p) { this->write<SkSLType::kHalf3>(&p); }
240
241 // NOTE: 3-element vectors never pack efficiently in arrays, so avoid using them
242
243 // [i|h]vec2
244 void write(const SkV2& v) { this->write<SkSLType::kFloat2>(v.ptr()); }
245 void write(const SkSize& s) { this->write<SkSLType::kFloat2>(&s); }
246 void write(const SkPoint& p) { this->write<SkSLType::kFloat2>(&p); }
247
248 void write(const SkISize& s) { this->write<SkSLType::kInt2>(&s); }
249
250 void writeHalf(const SkV2& v) { this->write<SkSLType::kHalf2>(v.ptr()); }
251 void writeHalf(const SkSize& s) { this->write<SkSLType::kHalf2>(&s); }
252 void writeHalf(const SkPoint& p) { this->write<SkSLType::kHalf2>(&p); }
253
254 // NOTE: 2-element vectors don't pack efficiently in std140, so avoid using them
255
256 // matrices
257 void write(const SkM44& m) {
258 // All Layouts treat a 4x4 column-major matrix as an array of vec4's, which is exactly how
259 // SkM44 already stores its data.
260 this->writeArray<SkSLType::kFloat4>(SkMatrixPriv::M44ColMajor(m), 4);
261 }
262
263 void writeHalf(const SkM44& m) {
264 this->writeArray<SkSLType::kHalf4>(SkMatrixPriv::M44ColMajor(m), 4);
265 }
266
267 void write(const SkMatrix& m) {
268 // SkMatrix is row-major, so rewrite to column major. All Layouts treat a 3x3 column
269 // major matrix as an array of vec3's.
270 float colMajor[9] = {m[0], m[3], m[6],
271 m[1], m[4], m[7],
272 m[2], m[5], m[8]};
273 this->writeArray<SkSLType::kFloat3>(colMajor, 3);
274 }
275 void writeHalf(const SkMatrix& m) {
276 float colMajor[9] = {m[0], m[3], m[6],
277 m[1], m[4], m[7],
278 m[2], m[5], m[8]};
279 this->writeArray<SkSLType::kHalf3>(colMajor, 3);
280 }
281
282 // NOTE: 2x2 matrices can be manually packed the same or better as a vec4, so prefer that
283
284 // This is a specialized uniform writing entry point intended to deduplicate the paint
285 // color. If a more general system is required, the deduping logic can be added to the
286 // other write methods (and this specialized method would be removed).
288 if (fWrotePaintColor) {
289 // Validate expected uniforms, but don't write a second copy since the paint color
290 // uniform can only ever be declared once in the final SkSL program.
291 SkASSERT(this->checkExpected(/*dst=*/nullptr, SkSLType::kFloat4, Uniform::kNonArray));
292 } else {
293 this->write<SkSLType::kFloat4>(&color);
294 fWrotePaintColor = true;
295 }
296 }
297
298 // Copy from `src` using Uniform array-count semantics.
299 void write(const Uniform&, const void* src);
300
301 // Debug-only functions to control uniform expectations.
302#ifdef SK_DEBUG
303 bool isReset() const;
304 void setExpectedUniforms(SkSpan<const Uniform> expected);
305 void doneWithExpectedUniforms();
306#endif // SK_DEBUG
307
308private:
309 // All public write() functions in UniformManager already match scalar/vector SkSLTypes or have
310 // explicitly converted matrix SkSLTypes to a writeArray<column type> so this does not need to
311 // check anything beyond half[2,3,4].
312 static constexpr bool IsHalfVector(SkSLType type) {
314 }
315
316 // Other than validation, actual layout doesn't care about 'type' and the logic can be
317 // based on vector length and whether or not it's half or full precision.
318 template <int N, bool Half> void write(const void* src, SkSLType type);
319 template <int N, bool Half> void writeArray(const void* src, int count, SkSLType type);
320
321 // Helpers to select dimensionality and convert to full precision if required by the Layout.
322 template <SkSLType Type> void write(const void* src) {
323 static constexpr int N = SkSLTypeVecLength(Type);
324 if (IsHalfVector(Type) && !LayoutRules::UseFullPrecision(fLayout)) {
325 this->write<N, /*Half=*/true>(src, Type);
326 } else {
327 this->write<N, /*Half=*/false>(src, Type);
328 }
329 }
330 template <SkSLType Type> void writeArray(const void* src, int count) {
331 static constexpr int N = SkSLTypeVecLength(Type);
332 if (IsHalfVector(Type) && !LayoutRules::UseFullPrecision(fLayout)) {
333 this->writeArray<N, /*Half=*/true>(src, count, Type);
334 } else {
335 this->writeArray<N, /*Half=*/false>(src, count, Type);
336 }
337 }
338
339 // This is marked 'inline' so that it can be defined below with write() and writeArray() and
340 // still link correctly.
341 inline char* append(int alignment, int size);
342
343 SkTDArray<char> fStorage;
344
345 Layout fLayout;
346 int fReqAlignment = 0;
347 // The paint color is treated special and we only add its uniform once.
348 bool fWrotePaintColor = false;
349
350 // Debug-only verification that UniformOffsetCalculator is consistent and that write() calls
351 // match the expected uniform declaration order.
352#ifdef SK_DEBUG
353 UniformOffsetCalculator fOffsetCalculator; // should match implicit offsets from getWriteDst()
354 SkSpan<const Uniform> fExpectedUniforms;
355 int fExpectedUniformIndex = 0;
356
357 bool checkExpected(const void* dst, SkSLType, int count);
358#endif // SK_DEBUG
359};
360
361///////////////////////////////////////////////////////////////////////////////////////////////////
362// Definitions
363
364// Shared helper for both write() and writeArray()
365template <int N, bool Half>
367 static_assert(1 <= N && N <= 4);
368
369 static constexpr int kElemSize = Half ? sizeof(SkHalf) : sizeof(float);
370 static constexpr int kSize = N * kElemSize;
371 static constexpr int kAlign = SkNextPow2_portable(N) * kElemSize;
372
373 // Reads kSize bytes from 'src' and copies or converts (float->half) the N values
374 // into 'dst'. Does not add any other padding that may depend on usage and Layout.
375 static void Copy(const void* src, void* dst) {
376 if constexpr (Half) {
377 using VecF = skvx::Vec<SkNextPow2_portable(N), float>;
378 VecF srcData;
379 if constexpr (N == 3) {
380 // Load the 3 values into a float4 to take advantage of vectorized conversion.
381 // The 4th value will not be copied to dst.
382 const float* srcF = static_cast<const float*>(src);
383 srcData = VecF{srcF[0], srcF[1], srcF[2], 0.f};
384 } else {
385 srcData = VecF::Load(src);
386 }
387
388 auto dstData = to_half(srcData);
389 // NOTE: this is identical to Vec::store() for N=1,2,4 and correctly drops the 4th
390 // lane when N=3.
391 memcpy(dst, &dstData, kSize);
392 } else {
393 memcpy(dst, src, kSize);
394 }
395 }
396
397#ifdef SK_DEBUG
398 static void Validate(const void* src, SkSLType type, Layout layout) {
399 // Src validation
400 SkASSERT(src);
401 // All primitives on the CPU side should be 4 byte aligned
402 SkASSERT(SkIsAlign4(reinterpret_cast<intptr_t>(src)));
403
404 // Type and validation layout
406 SkASSERT(SkSLTypeVecLength(type) == N); // Matrix types should have been flattened already
407 if constexpr (Half) {
411 } else {
414 }
415 }
416#endif
417};
418
419template<int N, bool Half>
420void UniformManager::write(const void* src, SkSLType type) {
421 using L = LayoutTraits<N, Half>;
422 SkDEBUGCODE(L::Validate(src, type, fLayout);)
423
424 // Layouts diverge in how vec3 size is determined for non-array usage
425 char* dst = (N == 3 && LayoutRules::PadVec3Size(fLayout))
426 ? this->append(L::kAlign, L::kSize + L::kElemSize)
427 : this->append(L::kAlign, L::kSize);
428 SkASSERT(this->checkExpected(dst, type, Uniform::kNonArray));
429
430 L::Copy(src, dst);
431 if (N == 3 && LayoutRules::PadVec3Size(fLayout)) {
432 memset(dst + L::kSize, 0, L::kElemSize);
433 }
434}
435
436template<int N, bool Half>
437void UniformManager::writeArray(const void* src, int count, SkSLType type) {
438 using L = LayoutTraits<N, Half>;
439 static constexpr int kSrcStride = N * 4; // Source data is always in multiples of 4 bytes.
440
441 SkDEBUGCODE(L::Validate(src, type, fLayout);)
442 SkASSERT(count > 0);
443
444 if (Half || N == 3 || (N != 4 && LayoutRules::AlignArraysAsVec4(fLayout))) {
445 // A non-dense array (N == 3 is always padded to vec4, or the Layout requires it),
446 // or we have to perform half conversion so iterate over each element.
447 static constexpr int kStride = Half ? L::kAlign : 4*L::kElemSize;
448 SkASSERT(!(Half && LayoutRules::AlignArraysAsVec4(fLayout))); // should be exclusive
449
450 const char* srcBytes = reinterpret_cast<const char*>(src);
451 char* dst = this->append(kStride, kStride*count);
452 SkASSERT(this->checkExpected(dst, type, count));
453
454 for (int i = 0; i < count; ++i) {
455 L::Copy(srcBytes, dst);
456 if constexpr (kStride - L::kSize > 0) {
457 memset(dst + L::kSize, 0, kStride - L::kSize);
458 }
459
460 dst += kStride;
461 srcBytes += kSrcStride;
462 }
463 } else {
464 // A dense array with no type conversion, so copy in one go.
465 SkASSERT(L::kAlign == L::kSize && kSrcStride == L::kSize);
466 char* dst = this->append(L::kAlign, L::kSize*count);
467 SkASSERT(this->checkExpected(dst, type, count));
468
469 memcpy(dst, src, L::kSize*count);
470 }
471}
472
473char* UniformManager::append(int alignment, int size) {
474 SkASSERT(size > 0);
475
476 const int offset = fStorage.size();
477 const int padding = SkAlignTo(offset, alignment) - offset;
478
479 // These are just asserts not aborts because SkSL compilation imposes limits on the size of
480 // runtime effect arrays, and internal shaders should not be using excessive lengths.
481 SkASSERT(std::numeric_limits<int>::max() - alignment >= offset);
482 SkASSERT(std::numeric_limits<int>::max() - size >= padding);
483
484 char* dst = fStorage.append(size + padding);
485 if (padding > 0) {
486 memset(dst, 0, padding);
487 dst += padding;
488 }
489
490 fReqAlignment = std::max(fReqAlignment, alignment);
491 return dst;
492}
493
494} // namespace skgpu::graphite
495
496#endif // skgpu_UniformManager_DEFINED
int count
SkColor4f color
static constexpr size_t SkAlignTo(size_t x, size_t alignment)
Definition SkAlign.h:33
static constexpr bool SkIsAlign4(T x)
Definition SkAlign.h:20
#define SkASSERT(cond)
Definition SkAssert.h:116
#define SkDEBUGCODE(...)
Definition SkDebug.h:23
uint16_t SkHalf
Definition SkHalf.h:16
constexpr int SkNextPow2_portable(int value)
Definition SkMathPriv.h:277
bool SkSLTypeIsFullPrecisionNumericType(SkSLType type)
static constexpr bool SkSLTypeCanBeUniformValue(SkSLType type)
SkSLType
static constexpr bool SkSLTypeIsFloatType(SkSLType type)
static constexpr int SkSLTypeVecLength(SkSLType type)
#define N
Definition beziers.cpp:19
Definition SkM44.h:150
static const SkScalar * M44ColMajor(const SkM44 &m)
constexpr T * data() const
Definition SkSpan_impl.h:94
constexpr size_t size() const
Definition SkSpan_impl.h:95
int size() const
Definition SkTDArray.h:138
T * append()
Definition SkTDArray.h:191
void write(const SkMatrix &m)
void writeHalf(const SkRect &r)
void writeArray(SkSpan< const SkV4 > v)
void write(const SkPoint &p)
UniformDataBlock finishUniformDataBlock()
void writeHalfArray(SkSpan< const SkPMColor4f > c)
void write(const SkPoint3 &p)
void write(const SkIRect &r)
void writeHalf(const SkMatrix &m)
void writePaintColor(const SkPMColor4f &color)
void writeHalf(const SkPMColor4f &c)
void writeHalf(const SkPoint &p)
void write(const SkISize &s)
void resetWithNewLayout(Layout layout)
void writeHalf(const SkPoint3 &p)
void writeArray(SkSpan< const SkPMColor4f > c)
void writeHalf(const SkSize &s)
void write(const SkPMColor4f &c)
int advanceOffset(SkSLType type, int count=Uniform::kNonArray)
UniformOffsetCalculator(Layout layout, int offset)
static constexpr int kSize
struct MyStruct s
dst
Definition cp.py:12
static constexpr bool UseFullPrecision(Layout layout)
static constexpr bool AlignArraysAsVec4(Layout layout)
static constexpr bool PadVec3Size(Layout layout)
Point offset
const float * vec() const
Definition SkColor.h:308
const float * asScalars() const
Definition SkRect.h:1340
Definition SkM44.h:19
const float * ptr() const
Definition SkM44.h:52
Definition SkM44.h:56
const float * ptr() const
Definition SkM44.h:94
Definition SkM44.h:98
const float * ptr() const
Definition SkM44.h:129
static constexpr int kElemSize
static void Copy(const void *src, void *dst)
static constexpr int kAlign