Before v1.15.0: c=10, a=1, r=0
Rule #3: source code has changed, increment r:
r=1
Rule #4: interfaces were removed in vpx_tpl.h, set r=0, increment c:
c=11, r=0
Rule #5: no interfaces have been added
Rule #6: interfaces were removed in vpx_tpl.h, set a=0:
a=0
After release: c=11, a=0, r=0
major = c-a = 11
minor = a = 0
patch = r = 0
Bug: webm:384672478
Change-Id: I2e70e7e35c64ece32eaf1dc5625640965483f9b9
Possible fix for issue below. It was only disabled
for screen in a previous change, but we force it off
always to check if it clears the issue.
The speed feature disabled is only used for 3 spatial
layers and at least 2 temporal. The impact on speed is
expected to be small, ~2%, so ok to disable for now and
see if it clears the issue.
Bug: 366146260
Change-Id: If7af006425e1e0ef297b9d6466507ea4c90ddb6f
(cherry picked from commit 09b3d5fc5aa48752f95f4c0c37b0bd4ff55c0ba1)
Integer overflow in encode_frame_to_data_rate()
for the update:
lc->total_target_vs_actual += bits_off_for_this_layer
Fix is to use int64_t for total_target_vs_actual.
Bug: chromium:368114043
Change-Id: I9a01e1a69e26ae748e8ae23d9e1287431510388d
Divide by 3 instead of multiple by 3, in comparison of
lrc->avg_frame_bandwidth vd lrc->last_avg_frame_bandwidth,
in two functions for reset rc.
Small loss in precision, so acceptable.
Similar change to:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5698570
Bug: chromium:367892770
Change-Id: Ia9ef09a9f6beba930fedd496407cfa7057e39336
PF_ARM_SVE_INSTRUCTIONS_AVAILABLE and PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE
are available in WinSDK 10.0.26100 and recent versions of mingw-w64.
Based on a patch by Martin Storsjö on ffmpeg-devel:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-September/333611.html
Change-Id: I34b2341a559f95aa400e84d709f3eb36da5dbb7b
There's no direct processor feature constant for I8MM alone, but
there is a flag for SVE-I8MM (added in WinSDK 10.0.26100 and
recent versions of mingw-w64). If SVE-I8MM is available, we can
assume that I8MM is available.
While HW supporting these features isn't yet commonly running
Windows, this at least allows detecting and running the I8MM codepaths
in Windows builds in Wine (possibly running in QEMU).
Based on patch from Martin Storsjö on ffmpeg-devel:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-September/333609.html
Change-Id: I77117bee8516924fddcdecccae8bab3cf5beed96
The program requires a minimum of 2 parameters. Previously the tool
would crash if only one input file was given.
Bug: webm:365481206
Change-Id: I875d81b2db4fcc4338061c03b23bb51b0aad58e4
Possible fix for issue below. The speed feature disabled
is only used for 3 spatial layers and at least 2 temporal.
The impact on speed is expected to be small, ~2%, so ok
to disable for now and see if it clears the issue.
Bug: 366146260
Change-Id: I94ab991d583cc2ce758db337abbbb463a65f0767
The wrapped storage must exist for the duration of the vpx_image_t
allocation.
Bug: aomedia:363806063
Change-Id: Ic6b79a56b6c07776222d1767490d873d7408ced0
The default template for https://issues.webmproject.org/ is a public bug
report. Security issues can be reported securely using the 'Security
report' template.
Change-Id: Ic7144a6c7a144772b78852d1415a51a570c79d50
and examples/resize_util.c. These functions were added in:
3cd37dfeb Adds a non-normative resize library to vp9 encoder
but never used meaningfully in the library.
This mirrors the change in libaom:
d10029bb4b Restore function prototype of av1_resize_frame420
except that vp9_resize_frame420() was never exported in the shared
library, so can be deleted along with the rest.
The reasoning for removing examples/resize_util.c is the same: it is not
useful and examples should use the public functions of the libvpx
library.
Change-Id: I386080d3f1a3ef81dfc87fcdf5bbdf459d996f03
Added key frame temporal filtering. Enabled it for VOD encoding
with encoder speed < 2.
Minor improvement in prediction.
Added the restriction of using no more than "arnr_max_frames"
frames for temporal filtering.
Key frame temporal filtering is turned off by default for now. To
enable it, set "--enable-keyframe-filtering=1"
Borg result with "--enable-keyframe-filtering=1"
avg_psnr: ovr_psnr: ssim: vmaf:
hdres2: -0.762 -0.863 -0.903 -0.680
midres2: -0.813 -0.753 -0.757 -0.743
lowres2: -0.492 -0.598 -0.737 -0.881
The impact on the encoder time is minimal.
Change-Id: If6abea3e21efcb96f1978cd9dfaa742c40dc2a56
`#if defined(__GNUC__)` is enough if a specific version isn't being
looked for.
Bug: aomedia:356832974
Change-Id: I3fcbecf9d547c6a2d89d7b5456e83ee08ddc6f5e
Applied 12-tap filter to temporal filter prediction for better
result. Improved the calculation of frames to be used in temporal
filtering.
The overall PSNR gain was -0.511% (lowres), -0.338% (midres), and
-0.288% (hdres).
Encoder time was increased by ~2%, which would be largely reduced
by the following SIMD optimization.
Change-Id: If3ece30f1614beadc99ebf6b4dc3f2d988d3bdb9
Move the saturate_cast_double_to_int() function in
vp8/encoder/firstpass.c to vpx_dsp/vpx_dsp_common.h so that it can be
used in other files.
Change-Id: I748fea969520542dca68d7a46500d3272f22e16f
to INT_MAX. This matches calc_iframe_target_size() in VP8
(http://crbug.com/1473473). If rc->avg_frame_bandwidth is large even
small kf_boost values will overflow an int.
Change-Id: Iaca5b47fe97793ae70930b3b2c2f42725d2c96fb
This fixes a build error seen in gcc 15:
3b63004 mkvparser/mkvparser.cc: add missing <cstdint> include
Bug: aomedia:357622679
Change-Id: I6c4a1795d189f9993d4f2c5c9f0375912bc58f0c
Rely on the -I or -system compiler option to find "gtest/gtest.h". This
makes it easier to build our tests against a copy of gtest outside the
libvpx source tree.
Bug: webm:42330726
Change-Id: I3b189c6345e13b36b236d1eedc6ee091bfa71f48
Fixes a 'Result of operation is garbage or undefined' static analysis
report (seen with clang-14) related to left shifting a negative value.
Bug: b:328632178
Change-Id: I18f0100eca0deac1cac9be0c7e848685d2911fb3
Motion vectors are now clamped in
vp8_find_best_sub_pixel_step_iteratively, vp8_find_best_sub_pixel_step,
vp8_find_best_half_pixel_step, vp8_full_search_sad,
vp8_refining_search_sadx4 and vp8_refining_search_sad_c (the rtcd for
other optimizations are redirects to vp8_refining_search_sadx4).
The difference of valid motion vectors may still go beyond the range of
the MVcount array, however, so additional checks are added to
rd_update_mvcount() and update_mvcount().
Note the test source and settings (speed 1 and GOOD quality mode) come
from the issue report; additional coverage is added for realtime. The
realtime path does not trigger the error without the fix, but as it's
similar to the rd path, the same clamp is done to be safe.
Fixes:
vp8/encoder/rdopt.c:1579:5: runtime error: index 17467 out of bounds for
type 'unsigned int[2047]'
Bug: oss-fuzz:69906
Change-Id: Ia8bd087cfe4475ab09ba711ed806fbcbaa72e552
cpi->output_framerate may be as large as 10M. Previously this would
cause kf_boost to be ~20M which would overflow an int when multiplied by
values in kf_boost_qadjustment[].
Fixes:
vp8/encoder/ratectrl.c:340:25: runtime error: signed integer overflow:
19999984 * 220 cannot be represented in type 'int'
Bug: oss-fuzz:69100
Change-Id: I2d77c9d2912412f6265f6a8dc0e6b361b63b8242
The assignment "cpi->output_framerate = cpi->framerate;" after the
vp8_new_framerate() call is not needed, because vp8_new_framerate() sets
cpi->framerate and cpi->output_framerate to the same value.
Change-Id: I4de97b43957142d658e0c08ecfc6628844ce453a
+ fix an additional double -> int overflow warning (chrome's fuzzers do
not have the float-cast-overflow sanitizer enabled)
Bug: chromium:352414650
Change-Id: I634bb421a74236eac434df138ed71dadf197596a
The only real change is in the initialization of frame_window. The (int)
cast is moved to the result of VPXMIN(), so that
cpi->twopass.total_stats.count - cpi->common.current_video_frame is
calculated in double.
Change-Id: Ia80f24614af7184b37cfdd99d8a8b1639460f273
rc->avg_frame_bandwidth is capped at INT_MAX. Rather than multiply the
value by 3, divide projected_frame_size by 3 to avoid the overflow.
Without rounding this differs slightly from the original, but loss of
precision is acceptable in this case.
Bug: chromium:348440590
Change-Id: Id5960825c79d7c764d257e9b4bd0a1de751878d8
Replace the VERSION_STRING_NOSP macro by the public API function
vpx_codec_version_str().
Treat vpx_version.h as an absolutely internal header of the libvpx
library.
Change-Id: I86ba8548a62adae91ae7f5caad98169707f3fc64
This change happens in define_gf_group().
Since this part is not critical for ext_ratectrl,
turn off the error reporting for now.
Change-Id: Ie74aa06a116edb8c5d9e7b29cadbd366232fbc1d
The compare_fp_stats() and compare_fp_stats_md5() functions are not used
when CONFIG_REALTIME_ONLY is equal to 1. Define these functions only if
CONFIG_REALTIME_ONLY is 0 to avoid the -Wunused-function warnings.
Change-Id: Iaae208f67708cfaeee5304b0320ebce63c863f96
Allow the TPL group to use up to 3 reference frames from the
previous GOP. This slightly changes the coding stats in the range
of <0.1%.
STATS_CHANGED
Change-Id: Ieb4e948a783bf8ef9ca78717d56ff750f3f795a4
Fix double-to-int cast overflows in vp8 code caused by setting the
target bitrate to the maximum value (2000000).
Tested: Build libvpx with UndefinedBehaviorSanitizer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --arnr-maxframes=7 \
--arnr-strength=5 --kf-max-dist=9999 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50 --codec=vp8
Note: This is essentially the VP8 version of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/191361.
Bug: 349440066
Change-Id: Ia43e1aad8fcab60ace49da960579081c2c3a5445
Fix the following UBSan integer errors in test_decode():
vpxenc.c:1589:57: runtime error: implicit conversion from type 'int' of
value -16 (32-bit, signed) to type 'unsigned int' changed the value to
4294967280 (32-bit, unsigned)
vpxenc.c:1590:58: runtime error: implicit conversion from type 'int' of
value -16 (32-bit, signed) to type 'unsigned int' changed the value to
4294967280 (32-bit, unsigned)
Tested: Build libvpx with -fsanitize=integer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --arnr-maxframes=7 \
--arnr-strength=5 --kf-max-dist=9999 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50 --codec=vp8
Bug: 349440066
Change-Id: Ice2f0e7176ffec664856559e2c02bd51113c4d74
Tested: Build libvpx with -fsanitize=integer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --min-gf-interval=4 \
--max-gf-interval=22 --arnr-maxframes=7 --arnr-strength=5 \
--kf-max-dist=9999 --aq-mode=0 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50
This unsigned integer overflow seems to be caused by
g_timebase.num=1000000.
Note: This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/191401.
Bug: 349440066
Change-Id: I924fa9c653400764dd7320938b88b4ea40f38172
This patch fixes some additional cases where under extreme conditions
some of the VBR adjustment variables can wrap.
As this happens on a per frame level the extra saturation checks should
not be an issue for performance.
Note: This CL is a port of the following libaom CLs:
https://aomedia-review.googlesource.com/c/aom/+/190521https://aomedia-review.googlesource.com/c/aom/+/190888
Change-Id: I87c4ecca10f39767002f7d90d0f43b19c7150832
Current code was disallowing scene detection for
speeds >= 8, to avoid any encode_time increase
(see comment in the code).
But we can expect the cost to be small even at speed 8,9,
and that concern on encode_time was from some time ago
before 8 and 9 were further optimized. And this is
needed for content with scene changes (see issue attached).
So allow scene detection now for all RTC speed settings (speed >= 5).
Bug: b/346846607
Change-Id: I678dbb88ff1399ed89b2bf9770ae9427e3044fc4
The last reference to the flag in configure was removed in:
fad70a358 Remove -fno-strict-aliasing flag
The library should be expected to function without this flag; it's built
and tested elsewhere without it.
Bug: webm:570, webm:603
Change-Id: Icf85fd9bd5c9cb0c81d6eecf10fba07807f48b4a
The GNU Assembler was removed in r24. clang's internal assembler works,
but `-c` is necessary to avoid linking.
Bug: webm:1856
Change-Id: I61f80cf78657d3b71d5e73c5b2510575533ca5ea
Move the function into define_gf_group().
define_gf_group() has a lot of settings that might cause
performance drop if skipped.
Imitate define_gf_group_structure()'s behavior which add
an extra overlay frame at the end of gf_group whenever
alt_ref is used.
After this change, we can feed the baseline decision through
webmrc and get the same result as baseline.
This CL is tested with city_cif.yuv using ffmpeg
BUG = b/345528565
Change-Id: Ib61f0a0a72251f8662fb4072e0cfd7f456a243b3
Quiets some spurious -Wmaybe-uninitialized warnings with gcc 14.1.0.
In function 'calc_plane_error16',
inlined from 'main' at ../tools/tiny_ssim.c:464:5:
../tools/tiny_ssim.c:37:12: warning: 'v[0]' may be used uninitialized
[-Wmaybe-uninitialized]
37 | if (orig == NULL || recon == NULL) {
| ^
In function 'calc_plane_error16',
inlined from 'main' at ../tools/tiny_ssim.c:462:5:
../tools/tiny_ssim.c:37:12: warning: 'u[0]' may be used uninitialized
[-Wmaybe-uninitialized]
37 | if (orig == NULL || recon == NULL) {
| ^
In function 'calc_plane_error',
inlined from 'main' at ../tools/tiny_ssim.c:461:5:
../tools/tiny_ssim.c:61:12: warning: 'y[0]' may be used uninitialized
[-Wmaybe-uninitialized]
61 | if (orig == NULL || recon == NULL) {
To reduce confusion, read_input_file() is changed to return an int as
previously it would only return (size_t)-1/0/1 (and now returns 0/1).
Change-Id: I2344048ecc2bd233891ffcef08002ee98d6d262a
The default behavior changed in:
148d1085f Refactor and extend run-time CPU feature detection on Arm
This fixes build errors with these targets as there is no runtime cpu
detection defined for them.
Change-Id: Ie6b0bae1fc3e244d7dfcc823f60c3e466ccade79
Both VP8 and VP9 internally cap the target bitrate to the smaller of the
uncompressed bitrate and 1000000 kilobits per second.
Change-Id: I4008ce09b5e709e75111800341d015e41eb1da42
These change fixes issues that can occur if the user specifies a very
high target data rate or rate per frame.
Fixes some issue with overflow of int variables used to hold bitrate
values (rate per second, rate per frame etc).
Note: This CL is a port of the following libaom CLs:
https://aomedia-review.googlesource.com/c/aom/+/190381https://aomedia-review.googlesource.com/c/aom/+/190462
All the changes were ported to VP9. For VP8, only the new type of
cpi->bytes (equivalent to ppi->total_bytes in libaom) was ported.
Change-Id: I438dd46efd5a134389b893ffae1f8a2381207906
2024-05-21 v1.14.1 "Venetian Duck"
This release includes enhancements and bug fixes.
- Upgrading:
This release is ABI compatible with the previous release.
- Enhancement:
Improved the detection of compiler support for AArch64 extensions,
particularly SVE.
Added vpx_codec_get_global_headers() support for VP9.
- Bug fixes:
Added buffer bounds checks to vpx_writer and vpx_write_bit_buffer.
Fix to GetSegmentationData() crash in aq_mode=0 for RTC rate control.
Fix to alloc for row_base_thresh_freq_fac.
Free row mt memory before freeing cpi->tile_data.
Fix to buffer alloc for vp9_bitstream_worker_data.
Fix to VP8 race issue for multi-thread with pnsr_calc.
Fix to uv width/height in vp9_scale_and_extend_frame_ssse3.
Fix to integer division by zero and overflow in calc_pframe_target_size().
Fix to integer overflow in vpx_img_alloc() & vpx_img_wrap()(CVE-2024-5197).
Fix to UBSan error in vp9_rc_update_framerate().
Fix to UBSan errors in vp8_new_framerate().
Fix to integer overflow in vp8 encodeframe.c.
Handle EINTR from sem_wait().
Change-Id: Ic5e274fdc35c9141591a65e825bf012d2cca3caa
The integer overflow happens
in vp9_calc_iframe_target_size_one_pass_cbr(), when
calculating the target size for L1T3 encoding.
The input target bitrate(kbps) is very large, so it gets set
to INT_MAX (before being multiplied by 1000 to convert to bps),
and avg_frame_bandwidth is then set to (INT_MAX / lc->framerate),
which when multipled by (16 + kf_boost) can exceed INT_MAX.
Fix is to cast the operands to int64_t and final result to int.
Bug: chromium:340918567
Change-Id: Ic00094b22c1f12ca988c0cb1fcaed473e1f8ed2b
In multi-threaded scenario, when the bitstream
buffer allocated is insufficient, the main thread
called 'longjmp' without waiting for the completion
of workers. In this patch, 'longjmp' is called by
the main thread after joining other worker threads.
This resolves the assertion failure as reported in
Bug: webm:1847
Bug: webm:1844
Change-Id: I399c76087b65e7b8d9a9fa4f12d784408243d648
Before proceeding with Encode(). This avoids some static analysis
warnings about uninitialized `cfg_` members.
Change-Id: Ib67b278d6706ab1034219e8c1ad9ba0c5b574ba8
In very rare cases (e.g. encoding with very high bit rate), the
allocated token memory isn't enough, which causes a buffer overflow
and then an encoder failure. This is fixed by using the aligned
number of blocks while allocating this buffer.
BUG=b/328803779
Change-Id: I5437cce13398206bf9982d57f35d6f9da17b187f
This is a port of the change in libaom:
https://aomedia-review.googlesource.com/c/aom/+/189761
5ccdc66ab6 cpu.cmake: Do more elaborate test of whether SVE can be compiled
For Windows targets, Clang will successfully compile simpler
SVE functions, but if the function requires backing up and restoring
SVE registers (as part of the AAPCS calling convention), Clang
will fail to generate unwind data for this function, resulting
in an error.
This issue is tracked upstream in Clang in
https://github.com/llvm/llvm-project/issues/80009.
Check whether the compiler can compile such a function, and
disable SVE if it is unable to handle that case.
Change-Id: I8550248abd6a7876bd8ecf6ba66bc70518133566
This mode is used infrequently and is quite slow. This shifts the tests
to nightly to speed up the presubmit.
Change-Id: I3020887e0ca0150d7cbea9cc726649c11f94d56c
Use the utility functions and set gf_group_size in
ext_rc_define_gf_group_structure()
Avoid using gop_decision->update_type to keep the logic simple
for now.
Also simplify the interface.
Change-Id: I78fd5892e6f9731d50d6e5da97598b46c70a1dde
The vpx_ports/msvc.h header provides snprintf() and round() for MSVC
older than Visual Studio 2015 and Visual Studio 2013, respectively.
Since configure now requires vs14 (Visual Studio 2015) or later, it is
safe to remove vpx_ports/msvc.h.
Change-Id: I2fe4c41eaa126f4cf17639c11895f1e464294c76
Replace %ld with %zu for `size_t`. Added in:
fd28f6f3c Add rate_ctrl_log_path
Fixes:
vp9\encoder\vp9_encoder.c(5748,15): warning C4477: 'fprintf' : format
string '%ld' requires an argument of type 'long', but variadic
argument 2 has type 'size_t'
Change-Id: I36fa9c7a9e14d4a2d9ef51a7f5c55de71bb34518
If img_data is not NULL, img_alloc_helper ignores buf_align, so
vpx_img_wrap can set buf_align to any placeholder value.
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/90362.
Bug: webm:1850
Change-Id: I42bc45aecf822a9314caf23058fe123d0574dc20
Port the changes to aom/src/aom_image.c in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/56643. The changes
related to `border` are not ported.
Bug: webm:1850
Change-Id: Ie81fffe0c84e912da880ffca245ae27cd71cf348
I introduced this bug in commit 2e32276:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5446333
I changed the line
stride_in_bytes = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
to three lines:
s = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
if (s > INT_MAX) goto fail;
stride_in_bytes = (int)s;
But I didn't realize that `s` is used later in the calculation of
alloc_size.
As a quick fix, undo the effect of s * 2 for high bit depths after `s`
has been assigned to stride_in_bytes.
Bug: chromium:332382766
Change-Id: I53fbf405555645ab1d7254d31aadabe4f426be8c
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188962.
stride_align is documented to be the "alignment, in bytes, of each row
in the image (stride)."
Change-Id: I2184b50dc3607611f47719319fa5adb3adcef2fd
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188823.
Impose maximum values on the input parameters so that we can perform
arithmetic operations without worrying about overflows.
Also change the VpxImageTest.VpxImgAllocHugeWidth test to write to the
first and last samples in the first row of the Y plane, so that the test
will crash if there is unsigned integer overflow in the calculation of
stride_in_bytes.
Bug: chromium:332382766
Change-Id: I54cec6c9e26377abaa8a991042ba277ff70afdf3
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188761.
Fix unsigned integer overflows in the calculation of stride_in_bytes in
img_alloc_helper() when d_w is huge.
Change the type of stride_in_bytes from unsigned int to int because it
will be assigned to img->stride[VPX_PLANE_Y], which is of the int type.
Test:
. ../libvpx/tools/set_analyzer_env.sh integer
../libvpx/configure --enable-debug --disable-optimizations
make -j
./test_libvpx --gtest_filter=VpxImageTest.VpxImgAllocHugeWidth
Bug: chromium:332382766
Change-Id: I3b39d78f61c7255e10cbf72ba2f4975425a05a82
The MAX_NUM_THREADS macro is unrelated to the VPxWorkerInterface, so it
doesn't need to be defined in vpx_util/vpx_thread.h.
The VP8 code doesn't seem to depend on MAX_NUM_THREADS, so VP8 can use
64 directly in the range check of its g_threads option. Move the
definition of the MAX_NUM_THREADS macro to vp9/encoder/vp9_ethread.h and
use it in VP9 code only.
Change-Id: Ibf788ca2496c743a2ac0498fefaab8a3c181228d
The `error: use of undeclared identifier 'EBUSY'` in
vpx_util/vpx_pthread.h was found in Mozilla's bug 1886318 [1]. This
patch addresses the issue by adding the `<errno.h>` header to introduce
the `EBUSY` identifier, resolving the problem.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1886318#c1
Change-Id: Ic417dafebf5ab160060dd29f692fa9c40d8db05a
The Google cpp style guide dictates that you should "include what you
use" with respect to symbols. This CL adds vpx_config.h imports to unit
tests that rely on config flags but were otherwise indirectly included.
Change-Id: Ia70a512cebe6c104d2d64afbed3cde8a405c68df
This CL will help run libvpx tests under Chromium against its partition
allocator. The allocator does not support single allocations above
3.998GiB. Because of this tests related to large video sizes that
Chromium is configured for are expected to fail.
Chromium also only supports the CONFIG_REALTIME_ONLY option,
some changes are scoped behind this flag.
Change-Id: I80e8743c0619ce502688109ce0be01cb252d5f92
ctx->pending_cx_data is a pointer. It looks nicer to compare
ctx->pending_cx_data with NULL than with 0.
Change-Id: I18815907b3d75551abfc603cb3c5c0297dceed23
cpi_->cyclic_refresh is nullptr if aq_mode is 0, in other words, the
rate controller runs in non adaptive quantization mode. This CL fixes
the crash in GetSegmentationData() in non aq mode.
Bug: b/259487065
Test: video encoding on ChromeOS
Change-Id: I503b30d15c697c8dd1da203b3c7361b91c428e87
VPX_CODEC_CORRUPT_FRAME is a decoder error. It is strange for
vpx_codec_encode() to fail with this error. In set_frame_size(), change
VPX_CODEC_CORRUPT_FRAME to VPX_CODEC_ERROR.
The use of VPX_CODEC_CORRUPT_FRAME was originally added in
commit 1ed56a46b3.
Change-Id: Iee92ed4cfca5061289b278ece2ba475cf98fec06
The current SVE2 approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for high bitdepth 2D convolution for
filter sizes smaller or equal to 4 to avoid the storing and
re-loading from the intermediate buffer.
This approach is not beneficial when applying an 8tap filter in the
convolution.
Change-Id: Ie090eb79f1cbf182300d9343ae63069396ef3956
These invalid value definitions are necessary to initialize
the gop decision in external RC so libvpx can tell which is populated
and which is not
Bug: b/329483680
Change-Id: I06bbb41fa59d0fb95296aebd0d05a703ec953b81
Coverity somehow thinks the return value of read_tx_mode() is between 0
and 7 (inclusive).
Hopefully this will fix Coverity CID 1584457: Out-of-bounds access in
read_coef_probs().
Change-Id: I49fbddf6fd6861bc9def9dfa91eaaaa4aefe5710
This array will be partially configured and used in later rate
distortion optimization search.
BUG=webm:1846
Change-Id: I83daba341c56767187031edb1c10d4528a4257a3
Add the `size` and `error` members to the vpx_write_bit_buffer struct.
Add the vpx_wb_init() and vpx_wb_has_error() functions.
Instances of the vpx_write_bit_buffer struct are only allocated in the
vp9_pack_bitstream() function. So vp9_pack_bitstream() is the only
function outside vpx_dsp/bitwriter_buffer.* that needs updating.
This CL completes the work of adding output buffer bounds checks to
vp9/encoder/vp9_bitstream.c.
Bug: webm:1844
Change-Id: I6b362be572852ee51d96023b35bfb334faada7e1
Issue happens for real-time nonrd pickmode.
Due to speed feature: sf->adaptive_rd_thresh_row_mt,
enabled for speed >= 8, and for speed >= 7 svc only.
Issue occurs where resolution (sb_rows) changes and
row_base_thresh_freq_fact needs to be re-allocated.
Fix is to add sb_rows to TileDataEnc and check for
re-alloc of row_base_thresh_freq_fac.
Bug: b:331108922
Change-Id: I1a1ca94c14f343200c180725e4cb8d91d3c55b83
In the vpx_writer struct, change the buffer_end field to the size field.
Change vpx_stop_encode() to return true on success, false on failure
(output buffer full).
In write_compressed_header(), remove the assertion
assert(header_bc.pos <= 0xffff). The caller (vp9_pack_bitstream()) will
check that condition.
In vp9_pack_bitstream(), the variable "first_part_size" is renamed
"compressed_hdr_size".
Bug: webm:1844
Change-Id: I4ed6ab905a707ad44d875e53036d5a42523a65d0
In vp9_init_tile_data(), call vp9_row_mt_mem_dealloc(cpi) to free the
row mt memory in cpi->tile_data before freeing cpi->tile_data.
Bug: b:331086799, b:331108729
Change-Id: Idc79984ce7e0110e6858139b2ed286492a2e8622
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all highbd SVE horizontal convolution algorithms process
4 rows at a time, but this means we end up doing at least 1 row too
much work in the 2D first pass case where we need h + 7, not h + 8
rows of output.
This patch adds an additional SVE2 path that processes h + 7 rows of
data exactly, saving the work of the unnecessary extra row.
Change-Id: I2f5d39ad737dbd7eccb08dd2b51586c6710119b8
If a local variable "pc" is defined as &cpi->common, replace
"cpi->common." with "pc->".
Also replace a memcpy() call with a struct assignment.
Change-Id: I6f4f12e69d9989beaa6e04c83d93230e7d726278
Declare the dest_size member of the VP9BitstreamWorkerData struct as
size_t instead of int.
Fix the following MSVC warning:
vp9\encoder\vp9_bitstream.c(1031,37): warning C4267: '=':
conversion from 'size_t' to 'int', possible loss of data
Change-Id: Idab5ad5d4bf4d1e4754f011a3073c9a89da29f55
The buffer_end field will allow bounds checking when vpx_writer writes
to the output buffer. This CL sets up the plumbing to pass the output
buffer size from vp9_pack_bitstream() to vpx_start_encode(), which
initializes the vpx_writer struct. vpx_writer doesn't use the output
buffer size in bounds checks yet, but the code in vp9_bitstream.c does.
Bug: webm:1844
Change-Id: I995e469ab453c02d740f54b46e0b08c7f2eb1a2e
This was added in libaom in:
5ddac0aac8 RTCD defs: Remove empty specialize statements once and for all.
https://aomedia-review.googlesource.com/c/aom/+/9062
Change-Id: I9c8fb0c8e4bd4dc9373d8533ab083dff816e7cbe
Set up the plumbing to pass the size of the output buffer `dest` to
vp9_pack_bitstream(). The output buffer is the cx_data buffer in the
encoder_encode() function in vp9/vp9_cx_iface.c, and its size is
cx_data_sz.
In this CL vp9_pack_bitstream() ignores the `dest_size` parameter.
Bug: webm:1844
Change-Id: I53c80280143d409cf16f87c4d6deec3d9338aea3
Avoid calling encode_tiles_buffer_alloc_size() twice by saving its
return value in a local variable.
Change-Id: I3050f9cf7c3520f7edc80abf66620ba233fadad8
The code was using the bitstream_worker_data when it
wasn't allocated for big enough size. This is because
the existing condition was to only re-alloc the
bitstream_worker_data when current dest_size was larger
than the current frame_size. But under resolution change
where frame_size is increased, beyond the current dest_size,
we need to allow re-alloc to the new size.
The existing condition to re-alloc when dest_size is
larger than frame_size (which is not required) is kept
for now.
Also increase the dest_size to account for image format.
Added tests, for both ROW_MT=0 and 1, that reproduce
the failures in the bugs below.
Note: this issue only affects the REALTIME encoding path.
Bug: b/329088759, b/329674887, b/329179808
Change-Id: Icd65dbc5317120304d803f648d4bd9405710db6f
SVE and SVE2 code paths in libvpx require intrinsics from
arm_neon_sve_bridge.h. SVE is disabled if the compiler does not
support this header. This patch conditionally disables SVE2 in the
same way.
Also gate the check for arm_neon_sve_bridge.h on whether SVE is
enabled in the first place. The check isn't necessary if the user has
explicitly disabled SVE. (Explicitly disabling SVE already disables
SVE2 since the former is a pre-requisite for the latter.)
Change-Id: Ibb21f09e8b2470d1ce5d98b71b101f5b7f7dbcdc
In encoder_encode(), remove the return statement after a
vpx_internal_error() call because setjmp() has been called at that
point.
Change-Id: Ib8ebbfbacb21097ce7f1b4e3bf53004bbe88a42b
in struct VP8RateControlRtcConfig and struct VP9RateControlRtcConfig;
structs default to public access.
Change-Id: Icdc5b44fb4c7297b0cb3c6cde8bec33ea5cee18c
vp8/vp8_ratectrl_rtc.h should come first as it's implemented in this
module. Split the rest of the groups on C/C++/vpx bounds.
Change-Id: If6bbbd8f3adf3766fa36fbc53ae06c9f6f76ebe9
Add SVE2 implementation of vpx_highbd_convolve8_avg_vert function.
Add the corresponding tests as well.
Change-Id: I20ca19e09a1686bb00c0b51bf756ddab0adbc2c0
Add SVE implementation of vpx_highbd_convolve8_avg_horiz function.
Add the corresponding tests as well.
Change-Id: If13793fa653834dfdfeddfee60b80129eea85dd7
Add SVE2 implementation of vpx_highbd_convolve8_vert function. Add
the corresponding tests as well.
Change-Id: I289ac79d4493935217feaa4fd2fa0b8ef9a62972
Add 'sve2' arch options to the configure, build and unit test files -
adding appropriate conditional options where necessary. Arm SIMD
extensions are treated as supersets in libvpx, so disable SVE2 if
SVE is unavailable.
Change-Id: Icdec2aace357e36fba77c77cd8b70da1e5427fce
This was deprecated in 1.9.5 [1]. It is now enabled by default. For
earlier versions of doxygen this will set the value to false, but I
don't believe we were relying on this functionality.
[1]: https://www.doxygen.nl/manual/changelog.html#log_1_9_5
Change-Id: I75f576d35ca86636761cf70fda0dd0ad37f71d71
The sem_* macros do not behave exactly like the POSIX sem_* functions.
Add the vp8_ prefix to the sem_* macro names to make it clear that they
are not the POSIX sem_* functions. Another reason for adding the vp8_
prefix is that we need to wrap sem_wait() (to handle EINTR) on the Unix
platforms that have real sem_wait() function.
Handle EINTR in the Unix (non-Apple) definition of vp8_sem_wait().
Change-Id: I3df02a30f851d41691a55cf7a84aa2ff054bba9c
Based on a clang-tidy warning:
`no header providing "sem_wait" is directly included`
Though this may not clear it entirely, it's the closest that can be
done given the platform-dependent includes and implementation in
vp8/common/threading.h
Change-Id: I19984f820f3f380e58deef40563a2f0c66187748
set --target to the more modern aarch64-android-gcc and remove an
incorrect comment regarding realtime-only.
Change-Id: I5f6c9de9fcd96a60817e37fc6f6505725ddea6b9
When dot-product and SVE support are disabled the hwcap variable is
currently unused. Fix this by wrapping it in an #ifdef matching the
conditions where it is needed.
Change-Id: I1c2e302d861c6c726b314e374f07d4fafe17ffc7
libvpx's check for conditionally defining __builtin_prefetch is broken,
since clang-cl defines __builtin_prefetch on Win ARM64: in addition, it
supports up to 3 arguments, with the latter 2 being optional. This
causes build breaks when paired with other libraries, like Abseil, which
do perform the conditional test correctly.
The real fix here is to define something like VPX_PREFETCH rather than
trying to #define an implementation-reserved name, which is undefined
behavior.
Bug: 328105513
Change-Id: Ibe14d9ce34306654bd20e560973f76c3b40036ee
Refactor the transpose_concat_*() helper function used in the Arm Neon
DotProd and I8MM vertical convolution implementations to not use TBL
instructions. Using vzip* to achieve the same outcome (with the same
number of instructions) avoids needing/loading the lookup indices and
also increases performance on little (in-order) Arm Cortex cores.
Change-Id: Iff62a44f8a9bf0ee239d5bb36be8424cab0dbca5
sem_wait() may be interrupted by a signal and fail with EINTR:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
Retry the sem_wait() call if it fails with EINTR.
This finishes the fix started in
https://chromium-review.googlesource.com/c/webm/libvpx/+/5299569. As a
speculative fix, that CL fixed only the sem_wait(&cpi->h_event_end_lpf)
calls responsible for bug chromium:324459561. ClusterFuzz verified the
fix, so this CL extends it to the other sem_wait() calls.
Note that sem_wait() calls like the following do not need this fix,
because the while (1) loop retries the sem_wait() call if it fails:
while (1) {
if (vpx_atomic_load_acquire(&cpi->b_multi_threaded) == 0) break;
if (sem_wait(&cpi->h_event_start_lpf) == 0) {
...
}
}
Bug: chromium:324459561
Change-Id: I0f0612616eee37fb3da68049e49b3e86927b5e24
We already have some logic in the configure.sh file to selectively
disable code dependent on particular architecture extensions, however we
do not yet have anything to check that the compiler being supplied
recognises and can compile code using these extensions.
This commit adds compiler "-march=..." flag tests to the existing
extension-disable loop so that we now correctly disable extensions that
are not supported by the compiler. For AArch64 this loop also needs to
move below the existing compiler/OS handling to ensure that prefixes
like $CROSS are handled correctly before running compiler tests.
Bug: webm:1841
Change-Id: I936b911c4b0ebf03abc34b7532b2bb4568129f57
Add SVE implementation for vpx_highbd_convolve8_horiz that specialises
for 4-tap filters. This way we avoid a lot of redundant work to
multiply and add zero, given that some of the 8-tap filters are
zero-padded, so they are effectively 4-tap filters.
Change-Id: Ib5e0377f924df1d893e9436f443fcbe7d196ea27
Rename dot_neon_sve_bridge.h to vpx_neon_sve_bridge.h in order to
reflect that other instructions can be implemented in the header
file. In a subsequent patch, the usage of vtbl with Neon-SVE bridge
intrinsics will be added.
Change-Id: I8f71aad2b7fb4932c9554badf041a80aca58c7cf
Remove the 4-tap Neon DotProd path for the horizontal pass of 2D
convolution since it has been made redundant by the horizontal-
vertical merged implementation. Also move the 8-tap path closer to
where it is used and call it explicitly rather than the filter-
agnostic wrapper.
Change-Id: I1861dc88a67a759c3e8deb0b471ec447a62063f2
The current SBD Neon DotProd approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for 4-tap standard bitdepth 2D
convolution to avoid storing to and re-loading from the intermediate
buffer - giving a 10-25% speedup depending on block size. Merging the
passes for 8-tap filters does not have the same benefit, so keep the
existing implementation.
Change-Id: Ic6008836d1a499ee2cd957b9db194fca5671ccb4
Remove the 4-tap Neon i8mm path for the horizontal pass of 2D
convolution since it has been made redundant by the horizontal-
vertical merged implementation. Also move the 8-tap path closer to
where it is used and call it explicitly rather than the filter-
agnostic wrapper.
Change-Id: Icddecb7e133656c54aa5e79536b49759715b6fcb
The current SBD Neon i8mm approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for 4-tap standard bitdepth 2D
convolution to avoid storing to and re-loading from the intermediate
buffer - giving a 5-40% speedup depending on block size. Merging the
passes for 8-tap filters does not have the same benefit, so keep the
existing implementation.
Change-Id: Ic8ec2822681176ef879dcaf8424d8d91c5e8d2df
With either CONFIG_VP8=0 or CONFIG_VP9=0. Fixes a warning about an extra
';' outside of a function due to VP[89]_INSTANTIATE_TEST_SUITE() being
defined to nothing.
Change-Id: I1878d7596e39c5166efbe96450a733efc08665ea
inter/intra_cost in VP9 TPL is calculated with SATD
which should be close enough to be used as inter/intra_pred_err
Bug: b/326262148
Change-Id: Ic0fd08708fcf3640398fc22a1a6bb6f449b2a9b8
Anonymous unions are not supported in C99, they were added in C11:
https://en.cppreference.com/w/c/language/union
Fixes -Wpendantic warning:
vp9/encoder/vp9_context_tree.h:93:4: warning: ISO C99 doesn’t support
unnamed structs/unions [-Wpedantic]
Change-Id: Ibd29d6deca35d81ea886e80e9f44575c73ecd96d
Fixes a -Wpedantic warning:
vp9/encoder/vp9_rdopt.c:1988:20: warning: invalid use of pointers to
arrays with different qualifiers in ISO C before C2X [-Wpedantic]
Change-Id: I581e21d7e59c0bae0e44056a3b3f049c5a4e7cf2
Add SVE implementation of vpx_highbd_convolve8_horiz function. Add
the corresponding tests as well.
Change-Id: I0b2815831daf203e167ea5289307087ce53ff9da
The new Armv8.0 Neon implementation of 4-tap vertical convolution is
faster than Armv8.4 DotProd and Armv8.6 I8MM implementations. This
patch removes the DotProd and I8MM implementations in favour of using
the Armv8.0 version everywhere.
Change-Id: I126470fd4862d8bb116153e90bb2e4f2f2dba1e4
Refactor Armv8.0 Neon 4-tap convolution functions to operate on 8-bit
types directly, rather than first widening to 16-bit.
2-tap (bilinear) filter values are always positive, but 4-tap filter
values are negative on the outer edges (taps 0 and 3), with taps 1
and 2 having much greater positive values to compensate. To use
instructions that operate on 8-bit types we also need the types to be
unsigned. In the convolution kernel, subtracting the products of taps
0 and 3 from the products of taps 1 and 2 always works since 2-tap
filters are 0-padded.
Co-authored by: Hari Limaye <hari.limaye@arm.com>
Change-Id: I87b32e2ef8cbd21eebb8cd2642e8826b704905b1
The THREADFN and THREAD_EXIT_SUCCESS macros are used to define the
thread start routines passed to our implementation of pthread_create(),
so it makes sense to define these macros in vpx_util/vpx_pthread.h. This
also allows the VP8 and VP9 code to share the macro definitions.
Replace the THREAD_FUNCTION macro by THREADFN. They have the same
definition.
Change-Id: I79a7476e43652667af6a8da7ad7ce346b1b6b024
This helps prevent name clashes if code e.g. #includes headers from both
libvpx and libaom.
Bug: none
Change-Id: Ifc9e7ac4862dc04a399e7777d2636e1453627970
Currently we use two rounds of complex right-shift operations to
narrow and pack results from the dot-product convolution kernels.
This patch refactors these sequences to use one "simple" right-shift
and one complex right-shift - reducing the latency by 4 cycles on
modern out-of-order Arm CPUs.
Change-Id: I3fd38560bb14d85826e417f40d35f11165ab80da
Currently we use two rounds of complex right-shift operations to
narrow and pack results from the dot-product convolution kernels.
This patch refactors these sequences to use one "simple" right-shift
and one complex right-shift - reducing the latency by 4 cycles on
modern out-of-order Arm CPUs.
Change-Id: I908147ed65a87157009363782399ff398406cdf9
- Initialize gop_decision
- Initialize GF group for a new one
- GF group index for key frame special treatment is not needed any more
when key frame is decided by the RC
Bug: b/323050877
Change-Id: Iaf36ea4f671b833f3ba4c524b9799a3093412dfa
The current Neon approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically, average with the dst block and store the final
output.
This patch merges the two phases for high bitdepth 2D convolution to
avoid the storing and re-loading from the intermediate buffer. This
provides a small gain (<5%) for large block sizes but the benefit
increases for small block sizes - as the proportion of compute to
memory access decreases. These effects are amplified further when
considering little (in-order) core performance.
Change-Id: I84f1cafcfbbfa48b2cfe4b20881da9c4bc3b56ac
The current Neon approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for high bitdepth 2D convolution to
avoid the storing and re-loading from the intermediate buffer. This
provides a small gain (<5%) for large block sizes but the benefit
increases for small block sizes - as the proportion of compute to
memory access decreases. These effects are amplified further when
considering little (in-order) core performance.
Change-Id: I8ec13fb9edd642fdb927bf5394a3c2a349d22a29
Add a highbd Neon implementation of the horizontal portion of 2D
convolution specialised for executing with 4-tap filters. This new
path is also used when executing with bilinear (2-tap) filters.
Change-Id: I513e35c4f8857bc89e0def5e9402bc31ddd46440
Add a highbd Neon implementation of vertical convolution specialised
for executing with 4-tap filters. This new path is also used when
executing with bilinear (2-tap) filters.
Change-Id: I30469c7b8e6ccff31d96588a3e4c21b401f1ed09
Add a highbd Neon implementation of horizontal convolution specialised
for executing with 4-tap filters. This new path is also used when
executing with bilinear (2-tap) filters.
Change-Id: Icabeea295af3e0bbeda755168996668cb960b0de
Filter tap reporting was made more granular recently[1] to enable Arm
Neon optimizations that specialise convolution implementations
according to the filter size. This patch removes an assert that
should have been removed during that change - it no longer serves any
purpose to assert that the filter being used is a no-op filter.
This change is a pre-requisite for some highbd Neon convolution
changes that specialise implementations according to filter size.
(Without this change a convolve-copy test would fail should we
interrogate the size of the filter.)
[1] https://chromium-review.googlesource.com/c/webm/libvpx/+/5063929
Change-Id: I2a71680d27134535e6c0663b1668ba1b150b1a6f
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all highbd Neon horizontal convolution algorithms process
4 rows at a time, but this means we end up doing at least 1 row too
much work in the 2D first pass case where we need h + 7, not h + 8
rows of output.
This patch adds an additional Neon path that processes h + 7 rows of
data exactly, saving the work of the unnecessary extra row.
Change-Id: Id6658b4e9e774effc760ff131e188b6907a57676
Call scalar C implementation of 2D convolution immediately if scaling
is required - instead of entering the Neon functions for the
horizontal and vertical passses and then falling back to the scalar
implementation. This has the benefit of being able to allocate a
smaller intermediate buffer.
Change-Id: Icacdd5f3a1401395951b613da1cd6932955bd0f8
There's no reason for these files to be separate, and merging them
will make life easier in subsequent commits adding a horizontal pass
specialised for the first pass of 2D.
Also perform some refactoring for 2D convolution definitions:
- Add a comment deriving the intermediate buffer height.
- Align the intermediate buffers to 32 bytes.
Change-Id: Ib92524396e6f9c58295339de54d08d894ace3bd1
Mostly a cosmetic change:
1) Remove forward declarations.
2) Remove excessive prefetches - some of which were wrong, prefetching
data that had just been loaded.
Change-Id: I17d8accc2abf3a9b2050603f859fce588a1f7178
CONFIG_PROFILE is unused currently. The option can still be selected
because it is in the CMDLINE_SELECT list and interpreted by configure
directly.
Bug: webm:1835
Change-Id: Id9667289113335a10018803f578b255967bd60b1
Move narrowing shift and max value clipping into the 4-pixel-output
kernel. As well as cleaning up the code quite a bit, this also
improves performance by 5-10% as it eliminates the implied top /
bottom register shuffling of the previous approach.
Also clean up the formatting and magic numbers in the 8-pixel-output
kernel.
Change-Id: I77a5e9e317ef4097f187330d4b32973022ba573f
In https://chromium-review.googlesource.com/c/webm/libvpx/+/71356, the
statement
clamp(q, active_best_quality, active_worst_quality);
was added to rc_pick_q_and_bounds_two_pass() (recently renamed
vp9_rc_pick_q_and_bounds_two_pass()).
The result of the clamp() call is not used, so the clamp() call has no
side effect.
Fix Coverity CID 1577645 Useless call:
side_effect_free: Calling
clamp(q, active_best_quality, active_worst_quality) is only useful for
its return value, which is ignored.
Change-Id: I014c3e4caf2bc999fe480000acc4e49e7ad15aaf
Various bits of tidying up to make the code more compact:
- Use appropriate load/store helper functions from mem_neon.h.
- Remove variable forward declarations.
- Use != 0 instead of > 0 in loop termination tests.
- Remove excessive prefetches.
Change-Id: I114cf4d2a34f02acc130558d125d2c191c6c5992
Various bits of tidying up to make the code more compact:
- Use/create appropriate mem_neon.h load/store helper functions.
- Remove variable forward declarations.
- Use != 0 instead of > 0 in loop termination tests.
- Remove excessive prefetches.
Change-Id: Ida7d3c4a3fe084600417f196baa26501c6e2d45a
Initialise result vectors of mem_neon.h helpers with vdup_n_<type>(0)
instead of load-broadcast of the first loaded elements. The former is
more easily optimized by modern compilers.
Change-Id: If967e2bb55523670c3e433dd66d060665e13b4f2
Align the intermediate buffers to 32 bytes and always use a stride of
64, regardless of the actual data block width.
Change-Id: I738eaa711168bc8231d8ac54d9e5e5e87b62e703
Add rdmult to the frame decision as RC can return this information, and
we may want to use it in the future.
Bug: b/323234722
Change-Id: I8ddb7038073d89af1ef84932448b1abaf1937cee
This change was intended to be cosmetic in that it tweaks some
comments, removes forward declarations and moves some constant
declarations into the kernels where they're used. However, it also
adds some performance for 8-tap vertical convolution paths as it
appears removing forward declarations also removes some false loop-
carried dependencies that the compiler wasn't able to figure out.
Change-Id: Ic58658b10fbe8378062920199819359d2df008de
The updated test will validate the QP / frame type / ARF settings by the
rate controller and callbacks, making sure the callbacks are working as
expected.
Removed the old tests that verify the signals from the encoder, which
are not needed any more.
Change-Id: Ida3c484e2ac520f3e81358d7cbf7918abfdaca54
Disable some tests because they rely on vpx_rc_gop_info_t
which isn't populated when the callback is used for key frame
This parameter will be deleted / cleaned up in the follow-up.
Bug: b/323050877
Change-Id: If1c0476eac8d324c8d5a460bfc9afdb6d93aacdf
Use uv_crop_(width|height). This fixes an issue with 1 to 2 scaling from
1x1 where the unrounded value would go to zero, resulting in a heap
overflow. This path is only executed when the library is built without
--enable-vp9-highbitdepth.
Bug: b:319964497
Change-Id: I9cb6632f864ec54c045608af86aede20657d6253
Simplify the computation of the Armv8.4 DotProd convolution
correction constant. Summing 128 * filter_tap[0,7] is always the same
as 128 * 128 since the filter taps always sum to 128.
Change-Id: I227ba47ae47bed8304a695a2395bcc85f33c245c
Move the convolution kernels using Armv8.4 dotprod and Armv8.6 i8mm
instructions into the respective .c files. These kernels are only used
in the respective .c files so it isn't useful for them to be declared
in a header.
This change also removes the need for feature-macro guarding - which
wasn't being done correctly for MSVC (since Microsoft's Arm
architecture feature macros are named differently to those defined by
GNU-compliant compilers.)
Bug: webm:1838
Change-Id: I495fca2a982c34978b6c9102f144bb9c45352a9a
Move the Arm Neon dotprod and i8mm 2D convolution functions into the
appropriate vpx_convolve8_neon_[dotprod|i8mm].c file. Only the
Armv7/Armv8.0 Neon files needed to be split in this way to allow
linking against a handwritten assembly implementation of the kernels
for Armv7 builds.
Change-Id: Ifc363556c3961aa78b9e53761537d4816c5b9964
This is one commit after the libwebm-1.0.0.31 tag:
affd7f4 In MakeUID(), call rand() under #ifdef _WIN32
Change-Id: I5979a8cd3b064d4f4f0dbeca9f84f6791e593b47
Call indirect RTCD high bitdepth variance functions (instead of the
Neon functions) in the high bitdepth Neon subpel variance paths so
that faster SVE variance functions can be used on CPUs where SVE is
supported.
Change-Id: I04bdef235afac06f2100df0cbaccfb8caef41ac7
Add SVE implementation of get<w>x<h>var functions for 8-, 10-, 12- bit
depth. Add the corresponding tests as well.
Change-Id: Id4feb8726a3eb0a963e3dd8932ee52374a67da48
Add standard and high bitdepth unit tests for vpx_get<w>x<h>var
functions. Enable these unit tests for the C implementation.
Change-Id: I8716fd6a9718dab3eef218a8a60a1efd4c0e316c
Fix Coverity defects CID 1568604 and CID 1568615 (Uninitialized pointer
field). Since the constructors are private and the Create() factory
methods set the cpi_ pointer field, these two Coverity defects are
harmless.
Define the constructors with "= default" instead of "{}".
Change-Id: Ie6b45fce66c23941a9a5c38ee0bccbc4b7d3a2a2
Add SVE implementation of variance functions for 8-, 10-, 12- bit
depth. Add the corresponding tests as well.
Change-Id: I785d85760ad4346cbfbf0f842784b4945870afee
Observed when built using Visual Studio 2019.
Move 720P image allocation to the heap.
Bug: webm:1831
Change-Id: I4e343af08d2f282618ad1b328a39d7dba5e79654
read_yuv_frame() supports VPX_IMG_FMT_NV12. Port its code to
vpx_img_read() and vpx_img_write().
The code in vp9/simple_encode.cc, including img_read(), doesn't support
VPX_IMG_FMT_NV12. Check before the vpx_img_alloc() calls and abort the
process if the image format is VPX_IMG_FMT_NV12.
Bug: chromium:1510090
Change-Id: Ie77e29c2c9ee7a01e6a59c8ad3cbcc769d9f2d4c
If fmt is VPX_IMG_FMT_NONE, currently img_alloc_helper() allocates a
single plane because VPX_IMG_FMT_NONE (0) is not a planar format (the
VPX_IMG_FMT_PLANAR bit is not set in VPX_IMG_FMT_NONE).
Although this seems correct, the problem is that most of the code in
libvpx assumes planar formats and is likely to dereference a null
pointer when it uses img->planes[1]. Also, VPX_IMG_FMT_NONE isn't really
a valid image format. So it is safer to make img_alloc_helper() fail if
fmt is VPX_IMG_FMT_NONE.
Change-Id: I05b47f4b5eceb631a02384b2cce1c2f6fdca8673
This often falls out of sync with the release and the version is already
contained in CHANGELOG.
Bug: webm:1833
Change-Id: Ieee6ca40249bf6e77037fbec30d87b109ca8fe21
Release v1.14.0 Venetian Duck
2024-01-18 v1.14.0 "Venetian Duck"
This release drops support for old C compilers, such as Visual Studio 2012
and older, that disallow mixing variable declarations and statements (a C99
feature). It adds support for run-time CPU feature detection for Arm
platforms, as well as support for darwin23 (macOS 14).
- Upgrading:
This release is ABI incompatible with the previous release.
Various new features for rate control library for real-time: SVC parallel
encoding, loopfilter level, support for frame dropping, and screen content.
New callback function send_tpl_gop_stats for vp9 external rate control
library, which can be used to transmit TPL stats for a group of pictures. A
public header vpx_tpl.h is added for the definition of TPL stats used in
this callback.
libwebm is upgraded to libwebm-1.0.0.29-9-g1930e3c.
- Enhancement:
Improvements on Neon optimizations: VoD: 12-35% speed up for bitdepth 8,
68%-151% speed up for high bitdepth.
Improvements on AVX2 and SSE optimizations.
Improvements on LSX optimizations for LoongArch.
42-49% speedup on speed 0 VoD encoding.
Android API level predicates.
- Bug fixes:
Fix to missing prototypes from the rtcd header.
Fix to segfault when total size is enlarged but width is smaller.
Fix to the build for arm64ec using MSVC.
Fix to copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic.
Fix to -Wshadow warnings.
Fix to heap overflow in vpx_get4x4sse_cs_neon.
Fix to buffer overrun in highbd Neon subpel variance filters.
Added bitexact encode test script.
Fix to -Wl,-z,defs with Clang's sanitizers.
Fix to decoder stability after error & continued decoding.
Fix to mismatch of VP9 encode with NEON intrinsics with C only version.
Fix to Arm64 MSVC compile vpx_highbd_fdct4x4_neon.
Fix to fragments count before use.
Fix to a case where target bandwidth is 0 for SVC.
Fix mask in vp9_quantize_avx2,highbd_get_max_lane_eob.
Fix to int overflow in vp9_calc_pframe_target_size_one_pass_cbr.
Fix to integer overflow in vp8,ratectrl.c.
Fix to interger overflow in vp9 svc.
Fix to avg_frame_bandwidth overflow.
Fix to per frame qp for temporal layers.
Fix to unsigned integer overflow in sse computation.
Fix to uninitialized mesh feature for BEST mode.
Fix to overflow in highbd temporal_filter.
Fix to unaligned loads w/w==4 in vpx_convolve_copy_neon.
Skip arm64_neon.h workaround w/VS >= 2019.
Fix to c vs avx mismatch of diamond_search_sad().
Fix to c vs intrinsic mismatch of vpx_hadamard_32x32() function.
Fix to a bug in vpx_hadamard_32x32_neon().
Fix to Clang -Wunreachable-code-aggressive warnings.
Fix to a bug in vpx_highbd_hadamard_32x32_neon().
Fix to -Wunreachable-code in mfqe_partition.
Force mode search on 64x64 if no mode is selected.
Fix to ubsan failure caused by left shift of negative.
Fix to integer overflow in calc_pframe_target_size.
Fix to float-cast-overflow in vp8_change_config().
Fix to a null ptr before use.
Conditionally skip using inter frames in speed features.
Remove invalid reference frames.
Disable intra mode search speed features conditionally.
Set nonrd keyframe under dynamic change of deadline for rtc.
Fix to scaled reference offsets.
Set skip_recode=0 in nonrd_pick_sb_modes.
Fix to an edge case when downsizing to one.
Fix to a bug in frame scaling.
Fix to pred buffer stride.
Fix to a bug in simple motion search.
Update frame size in actual encoding.
Change-Id: I9c27fb2b917f9b80ed4bcc5cb3b4f87c56b62c2f
Add SVE implementation of MSE functions for 10-, 12- bit depth. Add
the corresponding tests as well.
An implementation was not added for 8 bit depth as the Neon DotProd
version is faster than the SVE implementation.
Change-Id: I0c5712ba2735a2879a0aa3a9a52980032fddc7a6
Enable Neon Dotprod 8-bit high bitdepth implementation for MSE
function as it is now not called with bit depth 10 or 12.
Bug: webm:1819
Change-Id: I9d1d506401aa0523fba2d8ea4978dc00fdacbb95
Instead of always calling highbd_get_block_variance_fn with bit depth
8 use the macroblock's bit depth.
Bug: webm:1819
Change-Id: Ib4b19703384e897ee9ffeef73a11a8af2d262558
For svc with no inter-layer prediction: reset
the RC and force max_qp on all spatial layers
on scene/slide changes. In the current code it was only
reset on current spatial layer because it was assumed
we can predict off lower spatial layer to avoid
prediction across scene change. But this does not apply
when inter-layer prediction is off on delta frames.
Also reset only up to current temporal layer.
Because of the hierarchical prediction structure
only the lower temporal layers need the RC to be reset.
This helps to reduce excessive frame drops for the
full_superframe_drop mode.
Change-Id: I76925681850b82aa7fff7f9b1c1a0a605cf3cf3b
for VPX_CODEC_USE_PSNR. This clears a clang-tidy warning. vpx_encoder.h
exports vpx_codec.h so it shouldn't be necessary.
Change-Id: I863b6f8689eeef59cd9eadf3cdc177247a0653f8
This can happen in the setting of the frame
target size for delta frames, for non-CBR mode
(end_usage != USAGE_STREAM_FROM_SERVER) and with
temporal layers.
In calc_pframe_target_size(): the percent_high
(factor to adjust the target_size) may end up dividing
bits_off_target by total_byte_count. The total_byte_count
is define per layer for temporal layers, so it will be zero
for delta frames if the enhancement layer has never been
encoded before.
Since percent_high is capped to over_shoot_pct, the proposed
fix is to apply this cap if total_byte_count is zero.
Also this CL fixes a few integer overflow issues in setting
the layer target_bandwidth, the recale function, and in
setting target_bits_per_mb.
Unittest is added by Wan-Teh which triggers this issue.
Bug: chromium:1514684
Change-Id: I091158e720ece75d7ab9b7c4d18d30a5783102ab
Add header file containing helper functions to make use of SVE
dot-product intrinsics via the Neon-SVE bridge.
Change-Id: I6cd198f8202559672817cbc19f890db35c03d3ff
GCC already does not allow implicit vector type conversions by default,
add -flax-vector-conversions=none to Clang builds to have the same
behavior.
Change-Id: I9d1adb836377077cf48818c80fe71025e2d2bdc7
Added unitest which triggers the data race in the
bug below, when only C code is forced.
The data race is between the loopfilter and variance
computation from generate_psnr_packet calculation.
Proposed fix is to move the wait for loopfilter thread to
finish up before entering generate_psnr_packet().
Bug: b/266833179.
Change-Id: Id2871c53274be0f404e65601c9a5c98aaead0c72
Equivalent to the change to av1_change_config() in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/182413.
Because we call alloc_compressor_data() only if
cm->mi_alloc_size < new_mi_size, this change won't cause
alloc_compressor_data() to be called unnecessarily, unlike the libaom
bug https://crbug.com/aomedia/3526.
Bug: b:317105128
Change-Id: I8a772a1d5c4766846641a6d541a6d861bf76c60f
The VpxTpl* structs defined in vpx_tpl.h are only used by the external
rate control library. Add a VPX_TPL_ABI_VERSION component to
VPX_EXT_RATECTRL_ABI_VERSION and remove the VPX_TPL_ABI_VERSION
component from VPX_ENCODER_ABI_VERSION.
The current value of VPX_TPL_ABI_VERSION is 2. It is subtracted from
VPX_EXT_RATECTRL_ABI_VERSION and added to VPX_ENCODER_ABI_VERSION so
that the values of those two macros stay the same.
Add a note to explain why VPX_ENCODER_ABI_VERSION has a
VPX_EXT_RATECTRL_ABI_VERSION component.
Change-Id: I680b8522dc04328cd51df6de590fdec75ca88ae8
Commit db83435 introduced support for configuring for *-darwin23-gcc.
However configuring for *-darwin23-gcc does not currently add the
`-arch` flag to CFLAGS/LDFLAGS, so correct this here.
Change-Id: Ieeda1a5039ad40590dfcdcc6ba615a1d1697d54d
Change if to assertion in vp9_extrc_get_encodeframe_decision
Clarify comment for VP9E_ENABLE_EXTERNAL_RC_TPL that
rc_type | VPX_RC_QP must be non zero for this control to work.
Change-Id: I2c54cf7eda1f0f12f4ff7ac929e8e6a1fdd2215d
Performance optimization. get_msb utilizes
the compiler/platform specific last significant bit
operator.
Note: 32 bit unsigned assumed, like all get_msb implementations do.
Change-Id: Ib013ad24aa0ea845efeb52aacd448b067edf91da
Explain why the encoder init functions cannot call update_error_state().
In vp8/vp8_cx_iface.c, this comment should have been added in
https://chromium-review.googlesource.com/c/webm/libvpx/+/4506609.
Rewrite update_error_state() in vp8/vp8_cx_iface.c to look like the
versions in vp9/vp9_cx_iface.c and av1/av1_cx_iface.c (in libaom).
Change-Id: I3f153d67b8c549ca5ac8ea0cfbcaad4ae705c8e6
After a longjmp() call in vp8e_encode(), call update_error_state() so
that we return the error code and error detail set by the
vpx_internal_error() call.
Change-Id: I1f2428eb1b1f61e46c02604e16a5d44dcf162479
The function convolve8_4_usdot contains a comment relating to the
SDOT implementation of convolve8, which requires addition of a
correction constant to account for range clamp of the input values.
This is not performed in the i8mm USDOT implementation - so remove the
comment.
Also add some const qualifiers to function arguments.
Change-Id: I10aff560d20403897f708ee293bf873be9c35761
Fix the following clang-tidy misc-include-cleaner warnings:
vp9/encoder/vp9_encoder.c:
no header providing "vp9_is_valid_scale" is directly included
no header providing "VPX_CODEC_CORRUPT_FRAME" is directly included
vp9/vp9_cx_iface.c:
no header providing "valid_ref_frame_size" is directly included
Change-Id: I20e846f5b14c42c72aaefec0718b4ae9c7eea44a
Issue explanation:
The unit test calls set_config function twice after encoding the
first frame.
The first call of set_config reduces frame width, but is still within
half of the first frame.
The second call reduces frame width even more, making is less than
half of the first frame, which according to the encoder logic,
there is no valid ref frames, and this frame should be set as a
forced keyframe. This leads to null pointer access in scale_factors
later.
Solution:
To make sure the correct detection of a forced key frame,
we need to update the frame width and height only when the actual
encoding is performed.
Bug: b/311985118
Change-Id: Ie2cd3b760d4a4b399845693d7421c4eb11a12775
This change fixed a bug revealed by b/311294795.
In simple motion search, the reference buffer pointer needs to be
restored after the search. Otherwise, it causes problems while the
reference frame scaling happens. This CL fixes the bug.
Bug: b/311294795
Change-Id: I093722d5888de3cc6a6542de82a6ec9d601f897d
no header providing "CONFIG_VP9_HIGHBITDEPTH" is directly included
no header providing "VPX_BITS_8" is directly included
Change-Id: Ie6d78c79ab462501417f2b451bbe808a1fdce931
Use vpx_sse and vpx_highbd_sse instead of vpx_mse16x16 and
vpx_highbd_8_mse16x16 respectively to compute SSE for PSNR
calculations. This solves an issue whereby vpx_highbd_8_mse16x16
was being used to calculate SSE for 10- and 12-bit input.
This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063
by Jonathan Wright <jonathan.wright@arm.com>.
Bug: webm:1819
Change-Id: I37e3ac72835e67ccb44ac89a4ed16df62c2169a7
vpx/vpx_integer.h is clearly intended as the facade header for the
Standard C Library headers <stddef.h>, <inttypes.h>, and <stdint.h>.
It is reasonable to expect that vpx/vpx_decoder.h and vpx/vpx_encoder.h
should provide the symbols from vpx/vpx_codec.h.
Change-Id: I220797e63b2efc3dd9e2ac197fe2f918bf80d247
This change fixed a corner case bug reealed by b/311394513.
During the frame scaling, vpx_highbd_convolve8() and vpx_scaled_2d()
requires both x_step_q4 and y_step_q4 are less than or equal to a
defined value. Otherwise, it needs to call vp9_scale_and_extend_
frame_nonnormative() that supports arbitrary scaling.
The fix was done in LBD and HBD funnctions.
Bug: b/311394513
Change-Id: Id0d34e7910ec98859030ef968ac19331488046d4
Need to set skip_recode properly so that
vp9_encode_block_intra() can work properly when it is
called by block_rd_txfm(). We can not skip "recode" because
it is still at the rd search stage.
Bug: b/310340241
Change-Id: I7d7600ef72addd341636549c2dad1868ad90e1cb
Define the VPX_DL_REALTIME, VPX_DL_GOOD_QUALITY, and VPX_DL_BEST_QUALITY
macros as unsigned long, because the deadline parameter of
vpx_codec_encode() is of the unsigned long type. This enables C++
templates to deduce the unsigned long type from these macros.
Change-Id: I2173e3bbf5e15c84c11843790df93a497a35ed7d
fseeko and ftello are available on Android only from API level 24. Add
the needed guards for these functions.
Suggested by Yifan Yang.
Change-Id: I3a6721d31e1d961ab10b434ea6e92959bd5a70ab
The changes in this CL show that both the VP8 and VP9 implementations of
the decode function eventually discard the deadline parameter. Change
the code to ignore the deadline parameter in vpx_codec_decode() without
passing it to the decode function, and document that the deadline
parameter is ignored and 0 should be passed.
Change-Id: Ia977e16cdbdf97901207aa2d749887980137c4c0
Since the reference frame is already scaled, do not scale the offsets.
BUG: b/311489136, b/312656387
Change-Id: Ib346242e7ec8c4d3ed26668fa4094271218278ed
Add an Armv8.0 MLA Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ic2c3cb307b95964cd0ba86f1c42eece3a8ab7cf4
Add an Armv8.0 MLA Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I027eaf2d1bb9711c2217cc8aa6b1e379d3e66b26
The deadline parameter of vpx_codec_encode() is of the unsigned long
type. The cpplint runtime/int check and the clang-tidy
google-runtime-int warn about the use of the unsigned long type. Adding
a type alias works around this issue.
Note: vpx_codec_decode() also has a deadline parameter, but it is of the
long type. So unfortuntely this type alias cannot be simply named
vpx_codec_deadline_t and the name must suggest it is encoder-specific.
Change-Id: I27b6b25730b620b328422ec3f91e63fdc55b377a
Add an Armv8.6 USDOT Neon path for the horizontal portion of 2D
convolution, specialised for executing with 4-tap filters (the most
common filter size for settings --good --cpu-used=1.) This new path
is also used when executing with bilinear (2-tap) filters.
Change-Id: I455e5a94bdcea1358025bd8e4d4c8c62e373aa5d
Add an Armv8.6 USDOT Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I8f7633d9852ebfe8feb9b4a055715f849cccf297
Add an Armv8.4 SDOT Neon path for the horizontal portion of 2D
convolution, specialised for executing with 4-tap filters (the most
common filter size for settings --good --cpu-used=1.) This new path
is also used when executing with bilinear (2-tap) filters.
Change-Id: I5116d10ddb371ac2cf302ef905d06f2140dc7600
Add an Armv8.4 SDOT Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ib396681b3f7b8b0eeba94381fbe33a06cf7b4a13
Add an Armv8.6 USDOT Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ic893b25541e3317c5d5c270c338f868f080aed7c
Add an Armv8.4 SDOT Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I3eb00b5a34f5676b68bda60a2a29be56e3d7d0cd
vpx_get_filter_taps() currently reports either 8-tap or 2-tap.
However, many 8-tap filters are actually 0-padded, resulting in a
lot of redundant work (multiplying by, and adding, 0) when processing
using an 8-tap convolution function. In preparation for adding 2- and
4-tap SIMD implementations for the convolution paths, make the filter
size reporting more granular, stripping any 0 padding. Filter sizes
can now be reported as 2-, 4-, 6- or 8-tap.
Change-Id: I100133aac7173134af34b918c9ad3007d98d6060
Delete redundant transpose/permute code in the Neon dot-product
vertical convolution paths. Variable values were assigned but never
used before subsequent assignment.
Change-Id: I15b29d0c993f56599e0d18ac1d5787e6385d2a3a
2023-11-23 13:58:42 +00:00
303 changed files with 9422 additions and 6539 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.