Before v1.15.0: c=10, a=1, r=0
Rule #3: source code has changed, increment r:
r=1
Rule #4: interfaces were removed in vpx_tpl.h, set r=0, increment c:
c=11, r=0
Rule #5: no interfaces have been added
Rule #6: interfaces were removed in vpx_tpl.h, set a=0:
a=0
After release: c=11, a=0, r=0
major = c-a = 11
minor = a = 0
patch = r = 0
Bug: webm:384672478
Change-Id: I2e70e7e35c64ece32eaf1dc5625640965483f9b9
Possible fix for issue below. It was only disabled
for screen in a previous change, but we force it off
always to check if it clears the issue.
The speed feature disabled is only used for 3 spatial
layers and at least 2 temporal. The impact on speed is
expected to be small, ~2%, so ok to disable for now and
see if it clears the issue.
Bug: 366146260
Change-Id: If7af006425e1e0ef297b9d6466507ea4c90ddb6f
(cherry picked from commit 09b3d5fc5aa48752f95f4c0c37b0bd4ff55c0ba1)
Integer overflow in encode_frame_to_data_rate()
for the update:
lc->total_target_vs_actual += bits_off_for_this_layer
Fix is to use int64_t for total_target_vs_actual.
Bug: chromium:368114043
Change-Id: I9a01e1a69e26ae748e8ae23d9e1287431510388d
Divide by 3 instead of multiple by 3, in comparison of
lrc->avg_frame_bandwidth vd lrc->last_avg_frame_bandwidth,
in two functions for reset rc.
Small loss in precision, so acceptable.
Similar change to:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5698570
Bug: chromium:367892770
Change-Id: Ia9ef09a9f6beba930fedd496407cfa7057e39336
PF_ARM_SVE_INSTRUCTIONS_AVAILABLE and PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE
are available in WinSDK 10.0.26100 and recent versions of mingw-w64.
Based on a patch by Martin Storsjö on ffmpeg-devel:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-September/333611.html
Change-Id: I34b2341a559f95aa400e84d709f3eb36da5dbb7b
There's no direct processor feature constant for I8MM alone, but
there is a flag for SVE-I8MM (added in WinSDK 10.0.26100 and
recent versions of mingw-w64). If SVE-I8MM is available, we can
assume that I8MM is available.
While HW supporting these features isn't yet commonly running
Windows, this at least allows detecting and running the I8MM codepaths
in Windows builds in Wine (possibly running in QEMU).
Based on patch from Martin Storsjö on ffmpeg-devel:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-September/333609.html
Change-Id: I77117bee8516924fddcdecccae8bab3cf5beed96
The program requires a minimum of 2 parameters. Previously the tool
would crash if only one input file was given.
Bug: webm:365481206
Change-Id: I875d81b2db4fcc4338061c03b23bb51b0aad58e4
Possible fix for issue below. The speed feature disabled
is only used for 3 spatial layers and at least 2 temporal.
The impact on speed is expected to be small, ~2%, so ok
to disable for now and see if it clears the issue.
Bug: 366146260
Change-Id: I94ab991d583cc2ce758db337abbbb463a65f0767
The wrapped storage must exist for the duration of the vpx_image_t
allocation.
Bug: aomedia:363806063
Change-Id: Ic6b79a56b6c07776222d1767490d873d7408ced0
The default template for https://issues.webmproject.org/ is a public bug
report. Security issues can be reported securely using the 'Security
report' template.
Change-Id: Ic7144a6c7a144772b78852d1415a51a570c79d50
and examples/resize_util.c. These functions were added in:
3cd37dfeb Adds a non-normative resize library to vp9 encoder
but never used meaningfully in the library.
This mirrors the change in libaom:
d10029bb4b Restore function prototype of av1_resize_frame420
except that vp9_resize_frame420() was never exported in the shared
library, so can be deleted along with the rest.
The reasoning for removing examples/resize_util.c is the same: it is not
useful and examples should use the public functions of the libvpx
library.
Change-Id: I386080d3f1a3ef81dfc87fcdf5bbdf459d996f03
Added key frame temporal filtering. Enabled it for VOD encoding
with encoder speed < 2.
Minor improvement in prediction.
Added the restriction of using no more than "arnr_max_frames"
frames for temporal filtering.
Key frame temporal filtering is turned off by default for now. To
enable it, set "--enable-keyframe-filtering=1"
Borg result with "--enable-keyframe-filtering=1"
avg_psnr: ovr_psnr: ssim: vmaf:
hdres2: -0.762 -0.863 -0.903 -0.680
midres2: -0.813 -0.753 -0.757 -0.743
lowres2: -0.492 -0.598 -0.737 -0.881
The impact on the encoder time is minimal.
Change-Id: If6abea3e21efcb96f1978cd9dfaa742c40dc2a56
`#if defined(__GNUC__)` is enough if a specific version isn't being
looked for.
Bug: aomedia:356832974
Change-Id: I3fcbecf9d547c6a2d89d7b5456e83ee08ddc6f5e
Applied 12-tap filter to temporal filter prediction for better
result. Improved the calculation of frames to be used in temporal
filtering.
The overall PSNR gain was -0.511% (lowres), -0.338% (midres), and
-0.288% (hdres).
Encoder time was increased by ~2%, which would be largely reduced
by the following SIMD optimization.
Change-Id: If3ece30f1614beadc99ebf6b4dc3f2d988d3bdb9
Move the saturate_cast_double_to_int() function in
vp8/encoder/firstpass.c to vpx_dsp/vpx_dsp_common.h so that it can be
used in other files.
Change-Id: I748fea969520542dca68d7a46500d3272f22e16f
to INT_MAX. This matches calc_iframe_target_size() in VP8
(http://crbug.com/1473473). If rc->avg_frame_bandwidth is large even
small kf_boost values will overflow an int.
Change-Id: Iaca5b47fe97793ae70930b3b2c2f42725d2c96fb
This fixes a build error seen in gcc 15:
3b63004 mkvparser/mkvparser.cc: add missing <cstdint> include
Bug: aomedia:357622679
Change-Id: I6c4a1795d189f9993d4f2c5c9f0375912bc58f0c
Rely on the -I or -system compiler option to find "gtest/gtest.h". This
makes it easier to build our tests against a copy of gtest outside the
libvpx source tree.
Bug: webm:42330726
Change-Id: I3b189c6345e13b36b236d1eedc6ee091bfa71f48
Fixes a 'Result of operation is garbage or undefined' static analysis
report (seen with clang-14) related to left shifting a negative value.
Bug: b:328632178
Change-Id: I18f0100eca0deac1cac9be0c7e848685d2911fb3
Motion vectors are now clamped in
vp8_find_best_sub_pixel_step_iteratively, vp8_find_best_sub_pixel_step,
vp8_find_best_half_pixel_step, vp8_full_search_sad,
vp8_refining_search_sadx4 and vp8_refining_search_sad_c (the rtcd for
other optimizations are redirects to vp8_refining_search_sadx4).
The difference of valid motion vectors may still go beyond the range of
the MVcount array, however, so additional checks are added to
rd_update_mvcount() and update_mvcount().
Note the test source and settings (speed 1 and GOOD quality mode) come
from the issue report; additional coverage is added for realtime. The
realtime path does not trigger the error without the fix, but as it's
similar to the rd path, the same clamp is done to be safe.
Fixes:
vp8/encoder/rdopt.c:1579:5: runtime error: index 17467 out of bounds for
type 'unsigned int[2047]'
Bug: oss-fuzz:69906
Change-Id: Ia8bd087cfe4475ab09ba711ed806fbcbaa72e552
cpi->output_framerate may be as large as 10M. Previously this would
cause kf_boost to be ~20M which would overflow an int when multiplied by
values in kf_boost_qadjustment[].
Fixes:
vp8/encoder/ratectrl.c:340:25: runtime error: signed integer overflow:
19999984 * 220 cannot be represented in type 'int'
Bug: oss-fuzz:69100
Change-Id: I2d77c9d2912412f6265f6a8dc0e6b361b63b8242
The assignment "cpi->output_framerate = cpi->framerate;" after the
vp8_new_framerate() call is not needed, because vp8_new_framerate() sets
cpi->framerate and cpi->output_framerate to the same value.
Change-Id: I4de97b43957142d658e0c08ecfc6628844ce453a
+ fix an additional double -> int overflow warning (chrome's fuzzers do
not have the float-cast-overflow sanitizer enabled)
Bug: chromium:352414650
Change-Id: I634bb421a74236eac434df138ed71dadf197596a
The only real change is in the initialization of frame_window. The (int)
cast is moved to the result of VPXMIN(), so that
cpi->twopass.total_stats.count - cpi->common.current_video_frame is
calculated in double.
Change-Id: Ia80f24614af7184b37cfdd99d8a8b1639460f273
rc->avg_frame_bandwidth is capped at INT_MAX. Rather than multiply the
value by 3, divide projected_frame_size by 3 to avoid the overflow.
Without rounding this differs slightly from the original, but loss of
precision is acceptable in this case.
Bug: chromium:348440590
Change-Id: Id5960825c79d7c764d257e9b4bd0a1de751878d8
Replace the VERSION_STRING_NOSP macro by the public API function
vpx_codec_version_str().
Treat vpx_version.h as an absolutely internal header of the libvpx
library.
Change-Id: I86ba8548a62adae91ae7f5caad98169707f3fc64
This change happens in define_gf_group().
Since this part is not critical for ext_ratectrl,
turn off the error reporting for now.
Change-Id: Ie74aa06a116edb8c5d9e7b29cadbd366232fbc1d
The compare_fp_stats() and compare_fp_stats_md5() functions are not used
when CONFIG_REALTIME_ONLY is equal to 1. Define these functions only if
CONFIG_REALTIME_ONLY is 0 to avoid the -Wunused-function warnings.
Change-Id: Iaae208f67708cfaeee5304b0320ebce63c863f96
Allow the TPL group to use up to 3 reference frames from the
previous GOP. This slightly changes the coding stats in the range
of <0.1%.
STATS_CHANGED
Change-Id: Ieb4e948a783bf8ef9ca78717d56ff750f3f795a4
Fix double-to-int cast overflows in vp8 code caused by setting the
target bitrate to the maximum value (2000000).
Tested: Build libvpx with UndefinedBehaviorSanitizer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --arnr-maxframes=7 \
--arnr-strength=5 --kf-max-dist=9999 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50 --codec=vp8
Note: This is essentially the VP8 version of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/191361.
Bug: 349440066
Change-Id: Ia43e1aad8fcab60ace49da960579081c2c3a5445
Fix the following UBSan integer errors in test_decode():
vpxenc.c:1589:57: runtime error: implicit conversion from type 'int' of
value -16 (32-bit, signed) to type 'unsigned int' changed the value to
4294967280 (32-bit, unsigned)
vpxenc.c:1590:58: runtime error: implicit conversion from type 'int' of
value -16 (32-bit, signed) to type 'unsigned int' changed the value to
4294967280 (32-bit, unsigned)
Tested: Build libvpx with -fsanitize=integer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --arnr-maxframes=7 \
--arnr-strength=5 --kf-max-dist=9999 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50 --codec=vp8
Bug: 349440066
Change-Id: Ice2f0e7176ffec664856559e2c02bd51113c4d74
Tested: Build libvpx with -fsanitize=integer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --min-gf-interval=4 \
--max-gf-interval=22 --arnr-maxframes=7 --arnr-strength=5 \
--kf-max-dist=9999 --aq-mode=0 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50
This unsigned integer overflow seems to be caused by
g_timebase.num=1000000.
Note: This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/191401.
Bug: 349440066
Change-Id: I924fa9c653400764dd7320938b88b4ea40f38172
This patch fixes some additional cases where under extreme conditions
some of the VBR adjustment variables can wrap.
As this happens on a per frame level the extra saturation checks should
not be an issue for performance.
Note: This CL is a port of the following libaom CLs:
https://aomedia-review.googlesource.com/c/aom/+/190521https://aomedia-review.googlesource.com/c/aom/+/190888
Change-Id: I87c4ecca10f39767002f7d90d0f43b19c7150832
Current code was disallowing scene detection for
speeds >= 8, to avoid any encode_time increase
(see comment in the code).
But we can expect the cost to be small even at speed 8,9,
and that concern on encode_time was from some time ago
before 8 and 9 were further optimized. And this is
needed for content with scene changes (see issue attached).
So allow scene detection now for all RTC speed settings (speed >= 5).
Bug: b/346846607
Change-Id: I678dbb88ff1399ed89b2bf9770ae9427e3044fc4
The last reference to the flag in configure was removed in:
fad70a358 Remove -fno-strict-aliasing flag
The library should be expected to function without this flag; it's built
and tested elsewhere without it.
Bug: webm:570, webm:603
Change-Id: Icf85fd9bd5c9cb0c81d6eecf10fba07807f48b4a
The GNU Assembler was removed in r24. clang's internal assembler works,
but `-c` is necessary to avoid linking.
Bug: webm:1856
Change-Id: I61f80cf78657d3b71d5e73c5b2510575533ca5ea
Move the function into define_gf_group().
define_gf_group() has a lot of settings that might cause
performance drop if skipped.
Imitate define_gf_group_structure()'s behavior which add
an extra overlay frame at the end of gf_group whenever
alt_ref is used.
After this change, we can feed the baseline decision through
webmrc and get the same result as baseline.
This CL is tested with city_cif.yuv using ffmpeg
BUG = b/345528565
Change-Id: Ib61f0a0a72251f8662fb4072e0cfd7f456a243b3
Quiets some spurious -Wmaybe-uninitialized warnings with gcc 14.1.0.
In function 'calc_plane_error16',
inlined from 'main' at ../tools/tiny_ssim.c:464:5:
../tools/tiny_ssim.c:37:12: warning: 'v[0]' may be used uninitialized
[-Wmaybe-uninitialized]
37 | if (orig == NULL || recon == NULL) {
| ^
In function 'calc_plane_error16',
inlined from 'main' at ../tools/tiny_ssim.c:462:5:
../tools/tiny_ssim.c:37:12: warning: 'u[0]' may be used uninitialized
[-Wmaybe-uninitialized]
37 | if (orig == NULL || recon == NULL) {
| ^
In function 'calc_plane_error',
inlined from 'main' at ../tools/tiny_ssim.c:461:5:
../tools/tiny_ssim.c:61:12: warning: 'y[0]' may be used uninitialized
[-Wmaybe-uninitialized]
61 | if (orig == NULL || recon == NULL) {
To reduce confusion, read_input_file() is changed to return an int as
previously it would only return (size_t)-1/0/1 (and now returns 0/1).
Change-Id: I2344048ecc2bd233891ffcef08002ee98d6d262a
The default behavior changed in:
148d1085f Refactor and extend run-time CPU feature detection on Arm
This fixes build errors with these targets as there is no runtime cpu
detection defined for them.
Change-Id: Ie6b0bae1fc3e244d7dfcc823f60c3e466ccade79
Both VP8 and VP9 internally cap the target bitrate to the smaller of the
uncompressed bitrate and 1000000 kilobits per second.
Change-Id: I4008ce09b5e709e75111800341d015e41eb1da42
These change fixes issues that can occur if the user specifies a very
high target data rate or rate per frame.
Fixes some issue with overflow of int variables used to hold bitrate
values (rate per second, rate per frame etc).
Note: This CL is a port of the following libaom CLs:
https://aomedia-review.googlesource.com/c/aom/+/190381https://aomedia-review.googlesource.com/c/aom/+/190462
All the changes were ported to VP9. For VP8, only the new type of
cpi->bytes (equivalent to ppi->total_bytes in libaom) was ported.
Change-Id: I438dd46efd5a134389b893ffae1f8a2381207906
2024-05-21 v1.14.1 "Venetian Duck"
This release includes enhancements and bug fixes.
- Upgrading:
This release is ABI compatible with the previous release.
- Enhancement:
Improved the detection of compiler support for AArch64 extensions,
particularly SVE.
Added vpx_codec_get_global_headers() support for VP9.
- Bug fixes:
Added buffer bounds checks to vpx_writer and vpx_write_bit_buffer.
Fix to GetSegmentationData() crash in aq_mode=0 for RTC rate control.
Fix to alloc for row_base_thresh_freq_fac.
Free row mt memory before freeing cpi->tile_data.
Fix to buffer alloc for vp9_bitstream_worker_data.
Fix to VP8 race issue for multi-thread with pnsr_calc.
Fix to uv width/height in vp9_scale_and_extend_frame_ssse3.
Fix to integer division by zero and overflow in calc_pframe_target_size().
Fix to integer overflow in vpx_img_alloc() & vpx_img_wrap()(CVE-2024-5197).
Fix to UBSan error in vp9_rc_update_framerate().
Fix to UBSan errors in vp8_new_framerate().
Fix to integer overflow in vp8 encodeframe.c.
Handle EINTR from sem_wait().
Change-Id: Ic5e274fdc35c9141591a65e825bf012d2cca3caa
I introduced this bug in commit 2e32276:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5446333
I changed the line
stride_in_bytes = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
to three lines:
s = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
if (s > INT_MAX) goto fail;
stride_in_bytes = (int)s;
But I didn't realize that `s` is used later in the calculation of
alloc_size.
As a quick fix, undo the effect of s * 2 for high bit depths after `s`
has been assigned to stride_in_bytes.
Bug: chromium:332382766
Change-Id: I53fbf405555645ab1d7254d31aadabe4f426be8c
(cherry picked from commit 74c70af016)
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188962.
stride_align is documented to be the "alignment, in bytes, of each row
in the image (stride)."
Change-Id: I2184b50dc3607611f47719319fa5adb3adcef2fd
(cherry picked from commit 7d37ffacc6)
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188823.
Impose maximum values on the input parameters so that we can perform
arithmetic operations without worrying about overflows.
Also change the VpxImageTest.VpxImgAllocHugeWidth test to write to the
first and last samples in the first row of the Y plane, so that the test
will crash if there is unsigned integer overflow in the calculation of
stride_in_bytes.
Bug: chromium:332382766
Change-Id: I54cec6c9e26377abaa8a991042ba277ff70afdf3
(cherry picked from commit 06af417e79)
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188761.
Fix unsigned integer overflows in the calculation of stride_in_bytes in
img_alloc_helper() when d_w is huge.
Change the type of stride_in_bytes from unsigned int to int because it
will be assigned to img->stride[VPX_PLANE_Y], which is of the int type.
Test:
. ../libvpx/tools/set_analyzer_env.sh integer
../libvpx/configure --enable-debug --disable-optimizations
make -j
./test_libvpx --gtest_filter=VpxImageTest.VpxImgAllocHugeWidth
Bug: chromium:332382766
Change-Id: I3b39d78f61c7255e10cbf72ba2f4975425a05a82
(cherry picked from commit 2e32276277)
Ported from test/aom_image_test.cc in libaom commit 04d6253.
Change-Id: I56478d0a5603cfb5b65e644add0918387ff69a00
(cherry picked from commit 3dbab0e664)
If fmt is VPX_IMG_FMT_NONE, currently img_alloc_helper() allocates a
single plane because VPX_IMG_FMT_NONE (0) is not a planar format (the
VPX_IMG_FMT_PLANAR bit is not set in VPX_IMG_FMT_NONE).
Although this seems correct, the problem is that most of the code in
libvpx assumes planar formats and is likely to dereference a null
pointer when it uses img->planes[1]. Also, VPX_IMG_FMT_NONE isn't really
a valid image format. So it is safer to make img_alloc_helper() fail if
fmt is VPX_IMG_FMT_NONE.
Change-Id: I05b47f4b5eceb631a02384b2cce1c2f6fdca8673
(cherry picked from commit d3a946de8c)
The integer overflow happens
in vp9_calc_iframe_target_size_one_pass_cbr(), when
calculating the target size for L1T3 encoding.
The input target bitrate(kbps) is very large, so it gets set
to INT_MAX (before being multiplied by 1000 to convert to bps),
and avg_frame_bandwidth is then set to (INT_MAX / lc->framerate),
which when multipled by (16 + kf_boost) can exceed INT_MAX.
Fix is to cast the operands to int64_t and final result to int.
Bug: chromium:340918567
Change-Id: Ic00094b22c1f12ca988c0cb1fcaed473e1f8ed2b
In multi-threaded scenario, when the bitstream
buffer allocated is insufficient, the main thread
called 'longjmp' without waiting for the completion
of workers. In this patch, 'longjmp' is called by
the main thread after joining other worker threads.
This resolves the assertion failure as reported in
Bug: webm:1847
Bug: webm:1844
Change-Id: I399c76087b65e7b8d9a9fa4f12d784408243d648
(cherry picked from commit 611d9ba0a5)
Add the `size` and `error` members to the vpx_write_bit_buffer struct.
Add the vpx_wb_init() and vpx_wb_has_error() functions.
Instances of the vpx_write_bit_buffer struct are only allocated in the
vp9_pack_bitstream() function. So vp9_pack_bitstream() is the only
function outside vpx_dsp/bitwriter_buffer.* that needs updating.
This CL completes the work of adding output buffer bounds checks to
vp9/encoder/vp9_bitstream.c.
Bug: webm:1844
Change-Id: I6b362be572852ee51d96023b35bfb334faada7e1
(cherry picked from commit d790001fd5)
In the vpx_writer struct, change the buffer_end field to the size field.
Change vpx_stop_encode() to return true on success, false on failure
(output buffer full).
In write_compressed_header(), remove the assertion
assert(header_bc.pos <= 0xffff). The caller (vp9_pack_bitstream()) will
check that condition.
In vp9_pack_bitstream(), the variable "first_part_size" is renamed
"compressed_hdr_size".
Bug: webm:1844
Change-Id: I4ed6ab905a707ad44d875e53036d5a42523a65d0
(cherry picked from commit 73703c188b)
Fixes a static analysis warning:
Value stored to 'data_size' is never read
Bug: webm:1844
Change-Id: Ia27181b1051bb2c3a6bc4a4c2549df8b0525e889
(cherry picked from commit 9f73377821)
The buffer_end field will allow bounds checking when vpx_writer writes
to the output buffer. This CL sets up the plumbing to pass the output
buffer size from vp9_pack_bitstream() to vpx_start_encode(), which
initializes the vpx_writer struct. vpx_writer doesn't use the output
buffer size in bounds checks yet, but the code in vp9_bitstream.c does.
Bug: webm:1844
Change-Id: I995e469ab453c02d740f54b46e0b08c7f2eb1a2e
(cherry picked from commit e387187438)
Set up the plumbing to pass the size of the output buffer `dest` to
vp9_pack_bitstream(). The output buffer is the cx_data buffer in the
encoder_encode() function in vp9/vp9_cx_iface.c, and its size is
cx_data_sz.
In this CL vp9_pack_bitstream() ignores the `dest_size` parameter.
Bug: webm:1844
Change-Id: I53c80280143d409cf16f87c4d6deec3d9338aea3
(cherry picked from commit d48577579b)
In multi-threaded scenario, when the bitstream
buffer allocated is insufficient, the main thread
called 'longjmp' without waiting for the completion
of workers. In this patch, 'longjmp' is called by
the main thread after joining other worker threads.
This resolves the assertion failure as reported in
Bug: webm:1847
Bug: webm:1844
Change-Id: I399c76087b65e7b8d9a9fa4f12d784408243d648
cpi_->cyclic_refresh is nullptr if aq_mode is 0, in other words, the
rate controller runs in non adaptive quantization mode. This CL fixes
the crash in GetSegmentationData() in non aq mode.
Bug: b/259487065
Test: video encoding on ChromeOS
Change-Id: I503b30d15c697c8dd1da203b3c7361b91c428e87
(cherry picked from commit 1d007eafa3)
Issue happens for real-time nonrd pickmode.
Due to speed feature: sf->adaptive_rd_thresh_row_mt,
enabled for speed >= 8, and for speed >= 7 svc only.
Issue occurs where resolution (sb_rows) changes and
row_base_thresh_freq_fact needs to be re-allocated.
Fix is to add sb_rows to TileDataEnc and check for
re-alloc of row_base_thresh_freq_fac.
Bug: b:331108922
Change-Id: I1a1ca94c14f343200c180725e4cb8d91d3c55b83
(cherry picked from commit 3f8f19372b)
In vp9_init_tile_data(), call vp9_row_mt_mem_dealloc(cpi) to free the
row mt memory in cpi->tile_data before freeing cpi->tile_data.
Bug: b:331086799, b:331108729
Change-Id: Idc79984ce7e0110e6858139b2ed286492a2e8622
(cherry picked from commit 34277e53ad)
Before proceeding with Encode(). This avoids some static analysis
warnings about uninitialized `cfg_` members.
Change-Id: Ib67b278d6706ab1034219e8c1ad9ba0c5b574ba8
(cherry picked from commit 108f5128e2)
sem_wait() may be interrupted by a signal and fail with EINTR:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
Retry the sem_wait() call if it fails with EINTR.
This finishes the fix started in
https://chromium-review.googlesource.com/c/webm/libvpx/+/5299569. As a
speculative fix, that CL fixed only the sem_wait(&cpi->h_event_end_lpf)
calls responsible for bug chromium:324459561. ClusterFuzz verified the
fix, so this CL extends it to the other sem_wait() calls.
Note that sem_wait() calls like the following do not need this fix,
because the while (1) loop retries the sem_wait() call if it fails:
while (1) {
if (vpx_atomic_load_acquire(&cpi->b_multi_threaded) == 0) break;
if (sem_wait(&cpi->h_event_start_lpf) == 0) {
...
}
}
Bug: chromium:324459561
Change-Id: I0f0612616eee37fb3da68049e49b3e86927b5e24
(cherry picked from commit d4959f9825)
Before proceeding with Encode(). This avoids some static analysis
warnings about uninitialized `cfg_` members.
Change-Id: Ib67b278d6706ab1034219e8c1ad9ba0c5b574ba8
In very rare cases (e.g. encoding with very high bit rate), the
allocated token memory isn't enough, which causes a buffer overflow
and then an encoder failure. This is fixed by using the aligned
number of blocks while allocating this buffer.
BUG=b/328803779
Change-Id: I5437cce13398206bf9982d57f35d6f9da17b187f
This is a port of the change in libaom:
https://aomedia-review.googlesource.com/c/aom/+/189761
5ccdc66ab6 cpu.cmake: Do more elaborate test of whether SVE can be compiled
For Windows targets, Clang will successfully compile simpler
SVE functions, but if the function requires backing up and restoring
SVE registers (as part of the AAPCS calling convention), Clang
will fail to generate unwind data for this function, resulting
in an error.
This issue is tracked upstream in Clang in
https://github.com/llvm/llvm-project/issues/80009.
Check whether the compiler can compile such a function, and
disable SVE if it is unable to handle that case.
Change-Id: I8550248abd6a7876bd8ecf6ba66bc70518133566
(cherry picked from commit 35f0262c5e)
This is a port of the change in libaom:
https://aomedia-review.googlesource.com/c/aom/+/189761
5ccdc66ab6 cpu.cmake: Do more elaborate test of whether SVE can be compiled
For Windows targets, Clang will successfully compile simpler
SVE functions, but if the function requires backing up and restoring
SVE registers (as part of the AAPCS calling convention), Clang
will fail to generate unwind data for this function, resulting
in an error.
This issue is tracked upstream in Clang in
https://github.com/llvm/llvm-project/issues/80009.
Check whether the compiler can compile such a function, and
disable SVE if it is unable to handle that case.
Change-Id: I8550248abd6a7876bd8ecf6ba66bc70518133566
This mode is used infrequently and is quite slow. This shifts the tests
to nightly to speed up the presubmit.
Change-Id: I3020887e0ca0150d7cbea9cc726649c11f94d56c
Use the utility functions and set gf_group_size in
ext_rc_define_gf_group_structure()
Avoid using gop_decision->update_type to keep the logic simple
for now.
Also simplify the interface.
Change-Id: I78fd5892e6f9731d50d6e5da97598b46c70a1dde
The vpx_ports/msvc.h header provides snprintf() and round() for MSVC
older than Visual Studio 2015 and Visual Studio 2013, respectively.
Since configure now requires vs14 (Visual Studio 2015) or later, it is
safe to remove vpx_ports/msvc.h.
Change-Id: I2fe4c41eaa126f4cf17639c11895f1e464294c76
Replace %ld with %zu for `size_t`. Added in:
fd28f6f3c Add rate_ctrl_log_path
Fixes:
vp9\encoder\vp9_encoder.c(5748,15): warning C4477: 'fprintf' : format
string '%ld' requires an argument of type 'long', but variadic
argument 2 has type 'size_t'
Change-Id: I36fa9c7a9e14d4a2d9ef51a7f5c55de71bb34518
If img_data is not NULL, img_alloc_helper ignores buf_align, so
vpx_img_wrap can set buf_align to any placeholder value.
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/90362.
Bug: webm:1850
Change-Id: I42bc45aecf822a9314caf23058fe123d0574dc20
Port the changes to aom/src/aom_image.c in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/56643. The changes
related to `border` are not ported.
Bug: webm:1850
Change-Id: Ie81fffe0c84e912da880ffca245ae27cd71cf348
I introduced this bug in commit 2e32276:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5446333
I changed the line
stride_in_bytes = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
to three lines:
s = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
if (s > INT_MAX) goto fail;
stride_in_bytes = (int)s;
But I didn't realize that `s` is used later in the calculation of
alloc_size.
As a quick fix, undo the effect of s * 2 for high bit depths after `s`
has been assigned to stride_in_bytes.
Bug: chromium:332382766
Change-Id: I53fbf405555645ab1d7254d31aadabe4f426be8c
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188962.
stride_align is documented to be the "alignment, in bytes, of each row
in the image (stride)."
Change-Id: I2184b50dc3607611f47719319fa5adb3adcef2fd
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188823.
Impose maximum values on the input parameters so that we can perform
arithmetic operations without worrying about overflows.
Also change the VpxImageTest.VpxImgAllocHugeWidth test to write to the
first and last samples in the first row of the Y plane, so that the test
will crash if there is unsigned integer overflow in the calculation of
stride_in_bytes.
Bug: chromium:332382766
Change-Id: I54cec6c9e26377abaa8a991042ba277ff70afdf3
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188761.
Fix unsigned integer overflows in the calculation of stride_in_bytes in
img_alloc_helper() when d_w is huge.
Change the type of stride_in_bytes from unsigned int to int because it
will be assigned to img->stride[VPX_PLANE_Y], which is of the int type.
Test:
. ../libvpx/tools/set_analyzer_env.sh integer
../libvpx/configure --enable-debug --disable-optimizations
make -j
./test_libvpx --gtest_filter=VpxImageTest.VpxImgAllocHugeWidth
Bug: chromium:332382766
Change-Id: I3b39d78f61c7255e10cbf72ba2f4975425a05a82
The MAX_NUM_THREADS macro is unrelated to the VPxWorkerInterface, so it
doesn't need to be defined in vpx_util/vpx_thread.h.
The VP8 code doesn't seem to depend on MAX_NUM_THREADS, so VP8 can use
64 directly in the range check of its g_threads option. Move the
definition of the MAX_NUM_THREADS macro to vp9/encoder/vp9_ethread.h and
use it in VP9 code only.
Change-Id: Ibf788ca2496c743a2ac0498fefaab8a3c181228d
The `error: use of undeclared identifier 'EBUSY'` in
vpx_util/vpx_pthread.h was found in Mozilla's bug 1886318 [1]. This
patch addresses the issue by adding the `<errno.h>` header to introduce
the `EBUSY` identifier, resolving the problem.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1886318#c1
Change-Id: Ic417dafebf5ab160060dd29f692fa9c40d8db05a
The Google cpp style guide dictates that you should "include what you
use" with respect to symbols. This CL adds vpx_config.h imports to unit
tests that rely on config flags but were otherwise indirectly included.
Change-Id: Ia70a512cebe6c104d2d64afbed3cde8a405c68df
This CL will help run libvpx tests under Chromium against its partition
allocator. The allocator does not support single allocations above
3.998GiB. Because of this tests related to large video sizes that
Chromium is configured for are expected to fail.
Chromium also only supports the CONFIG_REALTIME_ONLY option,
some changes are scoped behind this flag.
Change-Id: I80e8743c0619ce502688109ce0be01cb252d5f92
ctx->pending_cx_data is a pointer. It looks nicer to compare
ctx->pending_cx_data with NULL than with 0.
Change-Id: I18815907b3d75551abfc603cb3c5c0297dceed23
cpi_->cyclic_refresh is nullptr if aq_mode is 0, in other words, the
rate controller runs in non adaptive quantization mode. This CL fixes
the crash in GetSegmentationData() in non aq mode.
Bug: b/259487065
Test: video encoding on ChromeOS
Change-Id: I503b30d15c697c8dd1da203b3c7361b91c428e87
VPX_CODEC_CORRUPT_FRAME is a decoder error. It is strange for
vpx_codec_encode() to fail with this error. In set_frame_size(), change
VPX_CODEC_CORRUPT_FRAME to VPX_CODEC_ERROR.
The use of VPX_CODEC_CORRUPT_FRAME was originally added in
commit 1ed56a46b3.
Change-Id: Iee92ed4cfca5061289b278ece2ba475cf98fec06
The current SVE2 approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for high bitdepth 2D convolution for
filter sizes smaller or equal to 4 to avoid the storing and
re-loading from the intermediate buffer.
This approach is not beneficial when applying an 8tap filter in the
convolution.
Change-Id: Ie090eb79f1cbf182300d9343ae63069396ef3956
These invalid value definitions are necessary to initialize
the gop decision in external RC so libvpx can tell which is populated
and which is not
Bug: b/329483680
Change-Id: I06bbb41fa59d0fb95296aebd0d05a703ec953b81
Coverity somehow thinks the return value of read_tx_mode() is between 0
and 7 (inclusive).
Hopefully this will fix Coverity CID 1584457: Out-of-bounds access in
read_coef_probs().
Change-Id: I49fbddf6fd6861bc9def9dfa91eaaaa4aefe5710
This array will be partially configured and used in later rate
distortion optimization search.
BUG=webm:1846
Change-Id: I83daba341c56767187031edb1c10d4528a4257a3
Add the `size` and `error` members to the vpx_write_bit_buffer struct.
Add the vpx_wb_init() and vpx_wb_has_error() functions.
Instances of the vpx_write_bit_buffer struct are only allocated in the
vp9_pack_bitstream() function. So vp9_pack_bitstream() is the only
function outside vpx_dsp/bitwriter_buffer.* that needs updating.
This CL completes the work of adding output buffer bounds checks to
vp9/encoder/vp9_bitstream.c.
Bug: webm:1844
Change-Id: I6b362be572852ee51d96023b35bfb334faada7e1
Issue happens for real-time nonrd pickmode.
Due to speed feature: sf->adaptive_rd_thresh_row_mt,
enabled for speed >= 8, and for speed >= 7 svc only.
Issue occurs where resolution (sb_rows) changes and
row_base_thresh_freq_fact needs to be re-allocated.
Fix is to add sb_rows to TileDataEnc and check for
re-alloc of row_base_thresh_freq_fac.
Bug: b:331108922
Change-Id: I1a1ca94c14f343200c180725e4cb8d91d3c55b83
In the vpx_writer struct, change the buffer_end field to the size field.
Change vpx_stop_encode() to return true on success, false on failure
(output buffer full).
In write_compressed_header(), remove the assertion
assert(header_bc.pos <= 0xffff). The caller (vp9_pack_bitstream()) will
check that condition.
In vp9_pack_bitstream(), the variable "first_part_size" is renamed
"compressed_hdr_size".
Bug: webm:1844
Change-Id: I4ed6ab905a707ad44d875e53036d5a42523a65d0
In vp9_init_tile_data(), call vp9_row_mt_mem_dealloc(cpi) to free the
row mt memory in cpi->tile_data before freeing cpi->tile_data.
Bug: b:331086799, b:331108729
Change-Id: Idc79984ce7e0110e6858139b2ed286492a2e8622
The code was using the bitstream_worker_data when it
wasn't allocated for big enough size. This is because
the existing condition was to only re-alloc the
bitstream_worker_data when current dest_size was larger
than the current frame_size. But under resolution change
where frame_size is increased, beyond the current dest_size,
we need to allow re-alloc to the new size.
The existing condition to re-alloc when dest_size is
larger than frame_size (which is not required) is kept
for now.
Also increase the dest_size to account for image format.
Added tests, for both ROW_MT=0 and 1, that reproduce
the failures in the bugs below.
Note: this issue only affects the REALTIME encoding path.
Bug: b/329088759, b/329674887, b/329179808
Change-Id: Icd65dbc5317120304d803f648d4bd9405710db6f
(cherry picked from commit c29e637283)
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all highbd SVE horizontal convolution algorithms process
4 rows at a time, but this means we end up doing at least 1 row too
much work in the 2D first pass case where we need h + 7, not h + 8
rows of output.
This patch adds an additional SVE2 path that processes h + 7 rows of
data exactly, saving the work of the unnecessary extra row.
Change-Id: I2f5d39ad737dbd7eccb08dd2b51586c6710119b8
If a local variable "pc" is defined as &cpi->common, replace
"cpi->common." with "pc->".
Also replace a memcpy() call with a struct assignment.
Change-Id: I6f4f12e69d9989beaa6e04c83d93230e7d726278
Declare the dest_size member of the VP9BitstreamWorkerData struct as
size_t instead of int.
Fix the following MSVC warning:
vp9\encoder\vp9_bitstream.c(1031,37): warning C4267: '=':
conversion from 'size_t' to 'int', possible loss of data
Change-Id: Idab5ad5d4bf4d1e4754f011a3073c9a89da29f55
The buffer_end field will allow bounds checking when vpx_writer writes
to the output buffer. This CL sets up the plumbing to pass the output
buffer size from vp9_pack_bitstream() to vpx_start_encode(), which
initializes the vpx_writer struct. vpx_writer doesn't use the output
buffer size in bounds checks yet, but the code in vp9_bitstream.c does.
Bug: webm:1844
Change-Id: I995e469ab453c02d740f54b46e0b08c7f2eb1a2e
This was added in libaom in:
5ddac0aac8 RTCD defs: Remove empty specialize statements once and for all.
https://aomedia-review.googlesource.com/c/aom/+/9062
Change-Id: I9c8fb0c8e4bd4dc9373d8533ab083dff816e7cbe
Set up the plumbing to pass the size of the output buffer `dest` to
vp9_pack_bitstream(). The output buffer is the cx_data buffer in the
encoder_encode() function in vp9/vp9_cx_iface.c, and its size is
cx_data_sz.
In this CL vp9_pack_bitstream() ignores the `dest_size` parameter.
Bug: webm:1844
Change-Id: I53c80280143d409cf16f87c4d6deec3d9338aea3
Avoid calling encode_tiles_buffer_alloc_size() twice by saving its
return value in a local variable.
Change-Id: I3050f9cf7c3520f7edc80abf66620ba233fadad8
The code was using the bitstream_worker_data when it
wasn't allocated for big enough size. This is because
the existing condition was to only re-alloc the
bitstream_worker_data when current dest_size was larger
than the current frame_size. But under resolution change
where frame_size is increased, beyond the current dest_size,
we need to allow re-alloc to the new size.
The existing condition to re-alloc when dest_size is
larger than frame_size (which is not required) is kept
for now.
Also increase the dest_size to account for image format.
Added tests, for both ROW_MT=0 and 1, that reproduce
the failures in the bugs below.
Note: this issue only affects the REALTIME encoding path.
Bug: b/329088759, b/329674887, b/329179808
Change-Id: Icd65dbc5317120304d803f648d4bd9405710db6f
SVE and SVE2 code paths in libvpx require intrinsics from
arm_neon_sve_bridge.h. SVE is disabled if the compiler does not
support this header. This patch conditionally disables SVE2 in the
same way.
Also gate the check for arm_neon_sve_bridge.h on whether SVE is
enabled in the first place. The check isn't necessary if the user has
explicitly disabled SVE. (Explicitly disabling SVE already disables
SVE2 since the former is a pre-requisite for the latter.)
Change-Id: Ibb21f09e8b2470d1ce5d98b71b101f5b7f7dbcdc
In encoder_encode(), remove the return statement after a
vpx_internal_error() call because setjmp() has been called at that
point.
Change-Id: Ib8ebbfbacb21097ce7f1b4e3bf53004bbe88a42b
in struct VP8RateControlRtcConfig and struct VP9RateControlRtcConfig;
structs default to public access.
Change-Id: Icdc5b44fb4c7297b0cb3c6cde8bec33ea5cee18c
vp8/vp8_ratectrl_rtc.h should come first as it's implemented in this
module. Split the rest of the groups on C/C++/vpx bounds.
Change-Id: If6bbbd8f3adf3766fa36fbc53ae06c9f6f76ebe9
Add SVE2 implementation of vpx_highbd_convolve8_avg_vert function.
Add the corresponding tests as well.
Change-Id: I20ca19e09a1686bb00c0b51bf756ddab0adbc2c0
Add SVE implementation of vpx_highbd_convolve8_avg_horiz function.
Add the corresponding tests as well.
Change-Id: If13793fa653834dfdfeddfee60b80129eea85dd7
Add SVE2 implementation of vpx_highbd_convolve8_vert function. Add
the corresponding tests as well.
Change-Id: I289ac79d4493935217feaa4fd2fa0b8ef9a62972
Add 'sve2' arch options to the configure, build and unit test files -
adding appropriate conditional options where necessary. Arm SIMD
extensions are treated as supersets in libvpx, so disable SVE2 if
SVE is unavailable.
Change-Id: Icdec2aace357e36fba77c77cd8b70da1e5427fce
This was deprecated in 1.9.5 [1]. It is now enabled by default. For
earlier versions of doxygen this will set the value to false, but I
don't believe we were relying on this functionality.
[1]: https://www.doxygen.nl/manual/changelog.html#log_1_9_5
Change-Id: I75f576d35ca86636761cf70fda0dd0ad37f71d71
The sem_* macros do not behave exactly like the POSIX sem_* functions.
Add the vp8_ prefix to the sem_* macro names to make it clear that they
are not the POSIX sem_* functions. Another reason for adding the vp8_
prefix is that we need to wrap sem_wait() (to handle EINTR) on the Unix
platforms that have real sem_wait() function.
Handle EINTR in the Unix (non-Apple) definition of vp8_sem_wait().
Change-Id: I3df02a30f851d41691a55cf7a84aa2ff054bba9c
Based on a clang-tidy warning:
`no header providing "sem_wait" is directly included`
Though this may not clear it entirely, it's the closest that can be
done given the platform-dependent includes and implementation in
vp8/common/threading.h
Change-Id: I19984f820f3f380e58deef40563a2f0c66187748
set --target to the more modern aarch64-android-gcc and remove an
incorrect comment regarding realtime-only.
Change-Id: I5f6c9de9fcd96a60817e37fc6f6505725ddea6b9
When dot-product and SVE support are disabled the hwcap variable is
currently unused. Fix this by wrapping it in an #ifdef matching the
conditions where it is needed.
Change-Id: I1c2e302d861c6c726b314e374f07d4fafe17ffc7
libvpx's check for conditionally defining __builtin_prefetch is broken,
since clang-cl defines __builtin_prefetch on Win ARM64: in addition, it
supports up to 3 arguments, with the latter 2 being optional. This
causes build breaks when paired with other libraries, like Abseil, which
do perform the conditional test correctly.
The real fix here is to define something like VPX_PREFETCH rather than
trying to #define an implementation-reserved name, which is undefined
behavior.
Bug: 328105513
Change-Id: Ibe14d9ce34306654bd20e560973f76c3b40036ee
Refactor the transpose_concat_*() helper function used in the Arm Neon
DotProd and I8MM vertical convolution implementations to not use TBL
instructions. Using vzip* to achieve the same outcome (with the same
number of instructions) avoids needing/loading the lookup indices and
also increases performance on little (in-order) Arm Cortex cores.
Change-Id: Iff62a44f8a9bf0ee239d5bb36be8424cab0dbca5
sem_wait() may be interrupted by a signal and fail with EINTR:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
Retry the sem_wait() call if it fails with EINTR.
This finishes the fix started in
https://chromium-review.googlesource.com/c/webm/libvpx/+/5299569. As a
speculative fix, that CL fixed only the sem_wait(&cpi->h_event_end_lpf)
calls responsible for bug chromium:324459561. ClusterFuzz verified the
fix, so this CL extends it to the other sem_wait() calls.
Note that sem_wait() calls like the following do not need this fix,
because the while (1) loop retries the sem_wait() call if it fails:
while (1) {
if (vpx_atomic_load_acquire(&cpi->b_multi_threaded) == 0) break;
if (sem_wait(&cpi->h_event_start_lpf) == 0) {
...
}
}
Bug: chromium:324459561
Change-Id: I0f0612616eee37fb3da68049e49b3e86927b5e24
We already have some logic in the configure.sh file to selectively
disable code dependent on particular architecture extensions, however we
do not yet have anything to check that the compiler being supplied
recognises and can compile code using these extensions.
This commit adds compiler "-march=..." flag tests to the existing
extension-disable loop so that we now correctly disable extensions that
are not supported by the compiler. For AArch64 this loop also needs to
move below the existing compiler/OS handling to ensure that prefixes
like $CROSS are handled correctly before running compiler tests.
Bug: webm:1841
Change-Id: I936b911c4b0ebf03abc34b7532b2bb4568129f57
(cherry picked from commit fa50b26848)
Disable SVE feature if arm_neon_sve_bridge header is not supported
by the compiler.
Change-Id: I3f78be2dd95b37b8d51b9f1fceca1f9701535eca
(cherry picked from commit 6ea3b51ec2)
Added unitest which triggers the data race in the
bug below, when only C code is forced.
The data race is between the loopfilter and variance
computation from generate_psnr_packet calculation.
Proposed fix is to move the wait for loopfilter thread to
finish up before entering generate_psnr_packet().
Bug: b/266833179.
Change-Id: Id2871c53274be0f404e65601c9a5c98aaead0c72
(cherry picked from commit 756b29a776)
We already have some logic in the configure.sh file to selectively
disable code dependent on particular architecture extensions, however we
do not yet have anything to check that the compiler being supplied
recognises and can compile code using these extensions.
This commit adds compiler "-march=..." flag tests to the existing
extension-disable loop so that we now correctly disable extensions that
are not supported by the compiler. For AArch64 this loop also needs to
move below the existing compiler/OS handling to ensure that prefixes
like $CROSS are handled correctly before running compiler tests.
Bug: webm:1841
Change-Id: I936b911c4b0ebf03abc34b7532b2bb4568129f57
Add SVE implementation for vpx_highbd_convolve8_horiz that specialises
for 4-tap filters. This way we avoid a lot of redundant work to
multiply and add zero, given that some of the 8-tap filters are
zero-padded, so they are effectively 4-tap filters.
Change-Id: Ib5e0377f924df1d893e9436f443fcbe7d196ea27
Rename dot_neon_sve_bridge.h to vpx_neon_sve_bridge.h in order to
reflect that other instructions can be implemented in the header
file. In a subsequent patch, the usage of vtbl with Neon-SVE bridge
intrinsics will be added.
Change-Id: I8f71aad2b7fb4932c9554badf041a80aca58c7cf
Remove the 4-tap Neon DotProd path for the horizontal pass of 2D
convolution since it has been made redundant by the horizontal-
vertical merged implementation. Also move the 8-tap path closer to
where it is used and call it explicitly rather than the filter-
agnostic wrapper.
Change-Id: I1861dc88a67a759c3e8deb0b471ec447a62063f2
The current SBD Neon DotProd approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for 4-tap standard bitdepth 2D
convolution to avoid storing to and re-loading from the intermediate
buffer - giving a 10-25% speedup depending on block size. Merging the
passes for 8-tap filters does not have the same benefit, so keep the
existing implementation.
Change-Id: Ic6008836d1a499ee2cd957b9db194fca5671ccb4
Remove the 4-tap Neon i8mm path for the horizontal pass of 2D
convolution since it has been made redundant by the horizontal-
vertical merged implementation. Also move the 8-tap path closer to
where it is used and call it explicitly rather than the filter-
agnostic wrapper.
Change-Id: Icddecb7e133656c54aa5e79536b49759715b6fcb
The current SBD Neon i8mm approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for 4-tap standard bitdepth 2D
convolution to avoid storing to and re-loading from the intermediate
buffer - giving a 5-40% speedup depending on block size. Merging the
passes for 8-tap filters does not have the same benefit, so keep the
existing implementation.
Change-Id: Ic8ec2822681176ef879dcaf8424d8d91c5e8d2df
With either CONFIG_VP8=0 or CONFIG_VP9=0. Fixes a warning about an extra
';' outside of a function due to VP[89]_INSTANTIATE_TEST_SUITE() being
defined to nothing.
Change-Id: I1878d7596e39c5166efbe96450a733efc08665ea
inter/intra_cost in VP9 TPL is calculated with SATD
which should be close enough to be used as inter/intra_pred_err
Bug: b/326262148
Change-Id: Ic0fd08708fcf3640398fc22a1a6bb6f449b2a9b8
Anonymous unions are not supported in C99, they were added in C11:
https://en.cppreference.com/w/c/language/union
Fixes -Wpendantic warning:
vp9/encoder/vp9_context_tree.h:93:4: warning: ISO C99 doesn’t support
unnamed structs/unions [-Wpedantic]
Change-Id: Ibd29d6deca35d81ea886e80e9f44575c73ecd96d
Fixes a -Wpedantic warning:
vp9/encoder/vp9_rdopt.c:1988:20: warning: invalid use of pointers to
arrays with different qualifiers in ISO C before C2X [-Wpedantic]
Change-Id: I581e21d7e59c0bae0e44056a3b3f049c5a4e7cf2
Add SVE implementation of vpx_highbd_convolve8_horiz function. Add
the corresponding tests as well.
Change-Id: I0b2815831daf203e167ea5289307087ce53ff9da
The new Armv8.0 Neon implementation of 4-tap vertical convolution is
faster than Armv8.4 DotProd and Armv8.6 I8MM implementations. This
patch removes the DotProd and I8MM implementations in favour of using
the Armv8.0 version everywhere.
Change-Id: I126470fd4862d8bb116153e90bb2e4f2f2dba1e4
Refactor Armv8.0 Neon 4-tap convolution functions to operate on 8-bit
types directly, rather than first widening to 16-bit.
2-tap (bilinear) filter values are always positive, but 4-tap filter
values are negative on the outer edges (taps 0 and 3), with taps 1
and 2 having much greater positive values to compensate. To use
instructions that operate on 8-bit types we also need the types to be
unsigned. In the convolution kernel, subtracting the products of taps
0 and 3 from the products of taps 1 and 2 always works since 2-tap
filters are 0-padded.
Co-authored by: Hari Limaye <hari.limaye@arm.com>
Change-Id: I87b32e2ef8cbd21eebb8cd2642e8826b704905b1
The THREADFN and THREAD_EXIT_SUCCESS macros are used to define the
thread start routines passed to our implementation of pthread_create(),
so it makes sense to define these macros in vpx_util/vpx_pthread.h. This
also allows the VP8 and VP9 code to share the macro definitions.
Replace the THREAD_FUNCTION macro by THREADFN. They have the same
definition.
Change-Id: I79a7476e43652667af6a8da7ad7ce346b1b6b024
This helps prevent name clashes if code e.g. #includes headers from both
libvpx and libaom.
Bug: none
Change-Id: Ifc9e7ac4862dc04a399e7777d2636e1453627970
Currently we use two rounds of complex right-shift operations to
narrow and pack results from the dot-product convolution kernels.
This patch refactors these sequences to use one "simple" right-shift
and one complex right-shift - reducing the latency by 4 cycles on
modern out-of-order Arm CPUs.
Change-Id: I3fd38560bb14d85826e417f40d35f11165ab80da
Currently we use two rounds of complex right-shift operations to
narrow and pack results from the dot-product convolution kernels.
This patch refactors these sequences to use one "simple" right-shift
and one complex right-shift - reducing the latency by 4 cycles on
modern out-of-order Arm CPUs.
Change-Id: I908147ed65a87157009363782399ff398406cdf9
- Initialize gop_decision
- Initialize GF group for a new one
- GF group index for key frame special treatment is not needed any more
when key frame is decided by the RC
Bug: b/323050877
Change-Id: Iaf36ea4f671b833f3ba4c524b9799a3093412dfa
The current Neon approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically, average with the dst block and store the final
output.
This patch merges the two phases for high bitdepth 2D convolution to
avoid the storing and re-loading from the intermediate buffer. This
provides a small gain (<5%) for large block sizes but the benefit
increases for small block sizes - as the proportion of compute to
memory access decreases. These effects are amplified further when
considering little (in-order) core performance.
Change-Id: I84f1cafcfbbfa48b2cfe4b20881da9c4bc3b56ac
The current Neon approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for high bitdepth 2D convolution to
avoid the storing and re-loading from the intermediate buffer. This
provides a small gain (<5%) for large block sizes but the benefit
increases for small block sizes - as the proportion of compute to
memory access decreases. These effects are amplified further when
considering little (in-order) core performance.
Change-Id: I8ec13fb9edd642fdb927bf5394a3c2a349d22a29
Add a highbd Neon implementation of the horizontal portion of 2D
convolution specialised for executing with 4-tap filters. This new
path is also used when executing with bilinear (2-tap) filters.
Change-Id: I513e35c4f8857bc89e0def5e9402bc31ddd46440
Add a highbd Neon implementation of vertical convolution specialised
for executing with 4-tap filters. This new path is also used when
executing with bilinear (2-tap) filters.
Change-Id: I30469c7b8e6ccff31d96588a3e4c21b401f1ed09
Add a highbd Neon implementation of horizontal convolution specialised
for executing with 4-tap filters. This new path is also used when
executing with bilinear (2-tap) filters.
Change-Id: Icabeea295af3e0bbeda755168996668cb960b0de
Filter tap reporting was made more granular recently[1] to enable Arm
Neon optimizations that specialise convolution implementations
according to the filter size. This patch removes an assert that
should have been removed during that change - it no longer serves any
purpose to assert that the filter being used is a no-op filter.
This change is a pre-requisite for some highbd Neon convolution
changes that specialise implementations according to filter size.
(Without this change a convolve-copy test would fail should we
interrogate the size of the filter.)
[1] https://chromium-review.googlesource.com/c/webm/libvpx/+/5063929
Change-Id: I2a71680d27134535e6c0663b1668ba1b150b1a6f
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all highbd Neon horizontal convolution algorithms process
4 rows at a time, but this means we end up doing at least 1 row too
much work in the 2D first pass case where we need h + 7, not h + 8
rows of output.
This patch adds an additional Neon path that processes h + 7 rows of
data exactly, saving the work of the unnecessary extra row.
Change-Id: Id6658b4e9e774effc760ff131e188b6907a57676
Call scalar C implementation of 2D convolution immediately if scaling
is required - instead of entering the Neon functions for the
horizontal and vertical passses and then falling back to the scalar
implementation. This has the benefit of being able to allocate a
smaller intermediate buffer.
Change-Id: Icacdd5f3a1401395951b613da1cd6932955bd0f8
There's no reason for these files to be separate, and merging them
will make life easier in subsequent commits adding a horizontal pass
specialised for the first pass of 2D.
Also perform some refactoring for 2D convolution definitions:
- Add a comment deriving the intermediate buffer height.
- Align the intermediate buffers to 32 bytes.
Change-Id: Ib92524396e6f9c58295339de54d08d894ace3bd1
Mostly a cosmetic change:
1) Remove forward declarations.
2) Remove excessive prefetches - some of which were wrong, prefetching
data that had just been loaded.
Change-Id: I17d8accc2abf3a9b2050603f859fce588a1f7178
CONFIG_PROFILE is unused currently. The option can still be selected
because it is in the CMDLINE_SELECT list and interpreted by configure
directly.
Bug: webm:1835
Change-Id: Id9667289113335a10018803f578b255967bd60b1
Move narrowing shift and max value clipping into the 4-pixel-output
kernel. As well as cleaning up the code quite a bit, this also
improves performance by 5-10% as it eliminates the implied top /
bottom register shuffling of the previous approach.
Also clean up the formatting and magic numbers in the 8-pixel-output
kernel.
Change-Id: I77a5e9e317ef4097f187330d4b32973022ba573f
In https://chromium-review.googlesource.com/c/webm/libvpx/+/71356, the
statement
clamp(q, active_best_quality, active_worst_quality);
was added to rc_pick_q_and_bounds_two_pass() (recently renamed
vp9_rc_pick_q_and_bounds_two_pass()).
The result of the clamp() call is not used, so the clamp() call has no
side effect.
Fix Coverity CID 1577645 Useless call:
side_effect_free: Calling
clamp(q, active_best_quality, active_worst_quality) is only useful for
its return value, which is ignored.
Change-Id: I014c3e4caf2bc999fe480000acc4e49e7ad15aaf
Various bits of tidying up to make the code more compact:
- Use appropriate load/store helper functions from mem_neon.h.
- Remove variable forward declarations.
- Use != 0 instead of > 0 in loop termination tests.
- Remove excessive prefetches.
Change-Id: I114cf4d2a34f02acc130558d125d2c191c6c5992
Various bits of tidying up to make the code more compact:
- Use/create appropriate mem_neon.h load/store helper functions.
- Remove variable forward declarations.
- Use != 0 instead of > 0 in loop termination tests.
- Remove excessive prefetches.
Change-Id: Ida7d3c4a3fe084600417f196baa26501c6e2d45a
Initialise result vectors of mem_neon.h helpers with vdup_n_<type>(0)
instead of load-broadcast of the first loaded elements. The former is
more easily optimized by modern compilers.
Change-Id: If967e2bb55523670c3e433dd66d060665e13b4f2
Align the intermediate buffers to 32 bytes and always use a stride of
64, regardless of the actual data block width.
Change-Id: I738eaa711168bc8231d8ac54d9e5e5e87b62e703
Add rdmult to the frame decision as RC can return this information, and
we may want to use it in the future.
Bug: b/323234722
Change-Id: I8ddb7038073d89af1ef84932448b1abaf1937cee
Use uv_crop_(width|height). This fixes an issue with 1 to 2 scaling from
1x1 where the unrounded value would go to zero, resulting in a heap
overflow. This path is only executed when the library is built without
--enable-vp9-highbitdepth.
Bug: b:319964497
Change-Id: I9cb6632f864ec54c045608af86aede20657d6253
(cherry picked from commit 7ad5f4f695)
Observed when built using Visual Studio 2019.
Move 720P image allocation to the heap.
Bug: webm:1831
Change-Id: I4e343af08d2f282618ad1b328a39d7dba5e79654
(cherry picked from commit 43e1c8bf10)
This can happen in the setting of the frame
target size for delta frames, for non-CBR mode
(end_usage != USAGE_STREAM_FROM_SERVER) and with
temporal layers.
In calc_pframe_target_size(): the percent_high
(factor to adjust the target_size) may end up dividing
bits_off_target by total_byte_count. The total_byte_count
is define per layer for temporal layers, so it will be zero
for delta frames if the enhancement layer has never been
encoded before.
Since percent_high is capped to over_shoot_pct, the proposed
fix is to apply this cap if total_byte_count is zero.
Also this CL fixes a few integer overflow issues in setting
the layer target_bandwidth, the recale function, and in
setting target_bits_per_mb.
Unittest is added by Wan-Teh which triggers this issue.
Bug: chromium:1514684
Change-Id: I091158e720ece75d7ab9b7c4d18d30a5783102ab
(cherry picked from commit 43bd567950)
Equivalent to the change to av1_change_config() in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/182413.
Because we call alloc_compressor_data() only if
cm->mi_alloc_size < new_mi_size, this change won't cause
alloc_compressor_data() to be called unnecessarily, unlike the libaom
bug https://crbug.com/aomedia/3526.
Bug: b:317105128
Change-Id: I8a772a1d5c4766846641a6d541a6d861bf76c60f
(cherry picked from commit aef73b22cb)
This change was intended to be cosmetic in that it tweaks some
comments, removes forward declarations and moves some constant
declarations into the kernels where they're used. However, it also
adds some performance for 8-tap vertical convolution paths as it
appears removing forward declarations also removes some false loop-
carried dependencies that the compiler wasn't able to figure out.
Change-Id: Ic58658b10fbe8378062920199819359d2df008de
The updated test will validate the QP / frame type / ARF settings by the
rate controller and callbacks, making sure the callbacks are working as
expected.
Removed the old tests that verify the signals from the encoder, which
are not needed any more.
Change-Id: Ida3c484e2ac520f3e81358d7cbf7918abfdaca54
Disable some tests because they rely on vpx_rc_gop_info_t
which isn't populated when the callback is used for key frame
This parameter will be deleted / cleaned up in the follow-up.
Bug: b/323050877
Change-Id: If1c0476eac8d324c8d5a460bfc9afdb6d93aacdf
Use uv_crop_(width|height). This fixes an issue with 1 to 2 scaling from
1x1 where the unrounded value would go to zero, resulting in a heap
overflow. This path is only executed when the library is built without
--enable-vp9-highbitdepth.
Bug: b:319964497
Change-Id: I9cb6632f864ec54c045608af86aede20657d6253
Simplify the computation of the Armv8.4 DotProd convolution
correction constant. Summing 128 * filter_tap[0,7] is always the same
as 128 * 128 since the filter taps always sum to 128.
Change-Id: I227ba47ae47bed8304a695a2395bcc85f33c245c
Move the convolution kernels using Armv8.4 dotprod and Armv8.6 i8mm
instructions into the respective .c files. These kernels are only used
in the respective .c files so it isn't useful for them to be declared
in a header.
This change also removes the need for feature-macro guarding - which
wasn't being done correctly for MSVC (since Microsoft's Arm
architecture feature macros are named differently to those defined by
GNU-compliant compilers.)
Bug: webm:1838
Change-Id: I495fca2a982c34978b6c9102f144bb9c45352a9a
Move the Arm Neon dotprod and i8mm 2D convolution functions into the
appropriate vpx_convolve8_neon_[dotprod|i8mm].c file. Only the
Armv7/Armv8.0 Neon files needed to be split in this way to allow
linking against a handwritten assembly implementation of the kernels
for Armv7 builds.
Change-Id: Ifc363556c3961aa78b9e53761537d4816c5b9964
This is one commit after the libwebm-1.0.0.31 tag:
affd7f4 In MakeUID(), call rand() under #ifdef _WIN32
Change-Id: I5979a8cd3b064d4f4f0dbeca9f84f6791e593b47
Call indirect RTCD high bitdepth variance functions (instead of the
Neon functions) in the high bitdepth Neon subpel variance paths so
that faster SVE variance functions can be used on CPUs where SVE is
supported.
Change-Id: I04bdef235afac06f2100df0cbaccfb8caef41ac7
Add SVE implementation of get<w>x<h>var functions for 8-, 10-, 12- bit
depth. Add the corresponding tests as well.
Change-Id: Id4feb8726a3eb0a963e3dd8932ee52374a67da48
Add standard and high bitdepth unit tests for vpx_get<w>x<h>var
functions. Enable these unit tests for the C implementation.
Change-Id: I8716fd6a9718dab3eef218a8a60a1efd4c0e316c
Fix Coverity defects CID 1568604 and CID 1568615 (Uninitialized pointer
field). Since the constructors are private and the Create() factory
methods set the cpi_ pointer field, these two Coverity defects are
harmless.
Define the constructors with "= default" instead of "{}".
Change-Id: Ie6b45fce66c23941a9a5c38ee0bccbc4b7d3a2a2
Add SVE implementation of variance functions for 8-, 10-, 12- bit
depth. Add the corresponding tests as well.
Change-Id: I785d85760ad4346cbfbf0f842784b4945870afee
Observed when built using Visual Studio 2019.
Move 720P image allocation to the heap.
Bug: webm:1831
Change-Id: I4e343af08d2f282618ad1b328a39d7dba5e79654
read_yuv_frame() supports VPX_IMG_FMT_NV12. Port its code to
vpx_img_read() and vpx_img_write().
The code in vp9/simple_encode.cc, including img_read(), doesn't support
VPX_IMG_FMT_NV12. Check before the vpx_img_alloc() calls and abort the
process if the image format is VPX_IMG_FMT_NV12.
Bug: chromium:1510090
Change-Id: Ie77e29c2c9ee7a01e6a59c8ad3cbcc769d9f2d4c
If fmt is VPX_IMG_FMT_NONE, currently img_alloc_helper() allocates a
single plane because VPX_IMG_FMT_NONE (0) is not a planar format (the
VPX_IMG_FMT_PLANAR bit is not set in VPX_IMG_FMT_NONE).
Although this seems correct, the problem is that most of the code in
libvpx assumes planar formats and is likely to dereference a null
pointer when it uses img->planes[1]. Also, VPX_IMG_FMT_NONE isn't really
a valid image format. So it is safer to make img_alloc_helper() fail if
fmt is VPX_IMG_FMT_NONE.
Change-Id: I05b47f4b5eceb631a02384b2cce1c2f6fdca8673
This often falls out of sync with the release and the version is already
contained in CHANGELOG.
Bug: webm:1833
Change-Id: Ieee6ca40249bf6e77037fbec30d87b109ca8fe21
Release v1.14.0 Venetian Duck
2024-01-18 v1.14.0 "Venetian Duck"
This release drops support for old C compilers, such as Visual Studio 2012
and older, that disallow mixing variable declarations and statements (a C99
feature). It adds support for run-time CPU feature detection for Arm
platforms, as well as support for darwin23 (macOS 14).
- Upgrading:
This release is ABI incompatible with the previous release.
Various new features for rate control library for real-time: SVC parallel
encoding, loopfilter level, support for frame dropping, and screen content.
New callback function send_tpl_gop_stats for vp9 external rate control
library, which can be used to transmit TPL stats for a group of pictures. A
public header vpx_tpl.h is added for the definition of TPL stats used in
this callback.
libwebm is upgraded to libwebm-1.0.0.29-9-g1930e3c.
- Enhancement:
Improvements on Neon optimizations: VoD: 12-35% speed up for bitdepth 8,
68%-151% speed up for high bitdepth.
Improvements on AVX2 and SSE optimizations.
Improvements on LSX optimizations for LoongArch.
42-49% speedup on speed 0 VoD encoding.
Android API level predicates.
- Bug fixes:
Fix to missing prototypes from the rtcd header.
Fix to segfault when total size is enlarged but width is smaller.
Fix to the build for arm64ec using MSVC.
Fix to copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic.
Fix to -Wshadow warnings.
Fix to heap overflow in vpx_get4x4sse_cs_neon.
Fix to buffer overrun in highbd Neon subpel variance filters.
Added bitexact encode test script.
Fix to -Wl,-z,defs with Clang's sanitizers.
Fix to decoder stability after error & continued decoding.
Fix to mismatch of VP9 encode with NEON intrinsics with C only version.
Fix to Arm64 MSVC compile vpx_highbd_fdct4x4_neon.
Fix to fragments count before use.
Fix to a case where target bandwidth is 0 for SVC.
Fix mask in vp9_quantize_avx2,highbd_get_max_lane_eob.
Fix to int overflow in vp9_calc_pframe_target_size_one_pass_cbr.
Fix to integer overflow in vp8,ratectrl.c.
Fix to interger overflow in vp9 svc.
Fix to avg_frame_bandwidth overflow.
Fix to per frame qp for temporal layers.
Fix to unsigned integer overflow in sse computation.
Fix to uninitialized mesh feature for BEST mode.
Fix to overflow in highbd temporal_filter.
Fix to unaligned loads w/w==4 in vpx_convolve_copy_neon.
Skip arm64_neon.h workaround w/VS >= 2019.
Fix to c vs avx mismatch of diamond_search_sad().
Fix to c vs intrinsic mismatch of vpx_hadamard_32x32() function.
Fix to a bug in vpx_hadamard_32x32_neon().
Fix to Clang -Wunreachable-code-aggressive warnings.
Fix to a bug in vpx_highbd_hadamard_32x32_neon().
Fix to -Wunreachable-code in mfqe_partition.
Force mode search on 64x64 if no mode is selected.
Fix to ubsan failure caused by left shift of negative.
Fix to integer overflow in calc_pframe_target_size.
Fix to float-cast-overflow in vp8_change_config().
Fix to a null ptr before use.
Conditionally skip using inter frames in speed features.
Remove invalid reference frames.
Disable intra mode search speed features conditionally.
Set nonrd keyframe under dynamic change of deadline for rtc.
Fix to scaled reference offsets.
Set skip_recode=0 in nonrd_pick_sb_modes.
Fix to an edge case when downsizing to one.
Fix to a bug in frame scaling.
Fix to pred buffer stride.
Fix to a bug in simple motion search.
Update frame size in actual encoding.
Change-Id: I9c27fb2b917f9b80ed4bcc5cb3b4f87c56b62c2f
Add SVE implementation of MSE functions for 10-, 12- bit depth. Add
the corresponding tests as well.
An implementation was not added for 8 bit depth as the Neon DotProd
version is faster than the SVE implementation.
Change-Id: I0c5712ba2735a2879a0aa3a9a52980032fddc7a6
Enable Neon Dotprod 8-bit high bitdepth implementation for MSE
function as it is now not called with bit depth 10 or 12.
Bug: webm:1819
Change-Id: I9d1d506401aa0523fba2d8ea4978dc00fdacbb95
Instead of always calling highbd_get_block_variance_fn with bit depth
8 use the macroblock's bit depth.
Bug: webm:1819
Change-Id: Ib4b19703384e897ee9ffeef73a11a8af2d262558
For svc with no inter-layer prediction: reset
the RC and force max_qp on all spatial layers
on scene/slide changes. In the current code it was only
reset on current spatial layer because it was assumed
we can predict off lower spatial layer to avoid
prediction across scene change. But this does not apply
when inter-layer prediction is off on delta frames.
Also reset only up to current temporal layer.
Because of the hierarchical prediction structure
only the lower temporal layers need the RC to be reset.
This helps to reduce excessive frame drops for the
full_superframe_drop mode.
Change-Id: I76925681850b82aa7fff7f9b1c1a0a605cf3cf3b
for VPX_CODEC_USE_PSNR. This clears a clang-tidy warning. vpx_encoder.h
exports vpx_codec.h so it shouldn't be necessary.
Change-Id: I863b6f8689eeef59cd9eadf3cdc177247a0653f8
This can happen in the setting of the frame
target size for delta frames, for non-CBR mode
(end_usage != USAGE_STREAM_FROM_SERVER) and with
temporal layers.
In calc_pframe_target_size(): the percent_high
(factor to adjust the target_size) may end up dividing
bits_off_target by total_byte_count. The total_byte_count
is define per layer for temporal layers, so it will be zero
for delta frames if the enhancement layer has never been
encoded before.
Since percent_high is capped to over_shoot_pct, the proposed
fix is to apply this cap if total_byte_count is zero.
Also this CL fixes a few integer overflow issues in setting
the layer target_bandwidth, the recale function, and in
setting target_bits_per_mb.
Unittest is added by Wan-Teh which triggers this issue.
Bug: chromium:1514684
Change-Id: I091158e720ece75d7ab9b7c4d18d30a5783102ab
Add header file containing helper functions to make use of SVE
dot-product intrinsics via the Neon-SVE bridge.
Change-Id: I6cd198f8202559672817cbc19f890db35c03d3ff
GCC already does not allow implicit vector type conversions by default,
add -flax-vector-conversions=none to Clang builds to have the same
behavior.
Change-Id: I9d1adb836377077cf48818c80fe71025e2d2bdc7
Added unitest which triggers the data race in the
bug below, when only C code is forced.
The data race is between the loopfilter and variance
computation from generate_psnr_packet calculation.
Proposed fix is to move the wait for loopfilter thread to
finish up before entering generate_psnr_packet().
Bug: b/266833179.
Change-Id: Id2871c53274be0f404e65601c9a5c98aaead0c72
Equivalent to the change to av1_change_config() in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/182413.
Because we call alloc_compressor_data() only if
cm->mi_alloc_size < new_mi_size, this change won't cause
alloc_compressor_data() to be called unnecessarily, unlike the libaom
bug https://crbug.com/aomedia/3526.
Bug: b:317105128
Change-Id: I8a772a1d5c4766846641a6d541a6d861bf76c60f
The VpxTpl* structs defined in vpx_tpl.h are only used by the external
rate control library. Add a VPX_TPL_ABI_VERSION component to
VPX_EXT_RATECTRL_ABI_VERSION and remove the VPX_TPL_ABI_VERSION
component from VPX_ENCODER_ABI_VERSION.
The current value of VPX_TPL_ABI_VERSION is 2. It is subtracted from
VPX_EXT_RATECTRL_ABI_VERSION and added to VPX_ENCODER_ABI_VERSION so
that the values of those two macros stay the same.
Add a note to explain why VPX_ENCODER_ABI_VERSION has a
VPX_EXT_RATECTRL_ABI_VERSION component.
Change-Id: I680b8522dc04328cd51df6de590fdec75ca88ae8
Commit db83435 introduced support for configuring for *-darwin23-gcc.
However configuring for *-darwin23-gcc does not currently add the
`-arch` flag to CFLAGS/LDFLAGS, so correct this here.
Change-Id: Ieeda1a5039ad40590dfcdcc6ba615a1d1697d54d
Before release:
c-a=8, a=0, r=1 -> c=8, a=0, r=1
After release:
- If the library source code has changed at all since the last
update, then increment revision:
c=8, a=0, r=r+1=2
- If any interfaces have been added, removed, or changed since
the last update, increment current, and set revision to 0:
c=c+1=9, a=0, r=0
- If any interfaces have been added since the last public release,
then increment age:
c=9, a=a+1=1, r=0
- If any interfaces have been removed or changed since the last
public release, then set age to 0:
c=9, a=0, r=0 (VpxTpl* structure changes)
MAJOR=c-a=9
MINOR=a=0
PATCH=r=0
Bug: webm:1833
Change-Id: Id24c9a0ff415a6f625d17b6098cdd0baf27432e3
Change if to assertion in vp9_extrc_get_encodeframe_decision
Clarify comment for VP9E_ENABLE_EXTERNAL_RC_TPL that
rc_type | VPX_RC_QP must be non zero for this control to work.
Change-Id: I2c54cf7eda1f0f12f4ff7ac929e8e6a1fdd2215d
Performance optimization. get_msb utilizes
the compiler/platform specific last significant bit
operator.
Note: 32 bit unsigned assumed, like all get_msb implementations do.
Change-Id: Ib013ad24aa0ea845efeb52aacd448b067edf91da
This is never used.
A callback in external rc func was added and used instead.
Change-Id: Iade6f361072f0c28af98904baf457d2f0e9ca904
(cherry picked from commit 41ced868a6)
Commit db83435 introduced support for configuring for *-darwin23-gcc.
However configuring for *-darwin23-gcc does not currently add the
`-arch` flag to CFLAGS/LDFLAGS, so correct this here.
Change-Id: Ieeda1a5039ad40590dfcdcc6ba615a1d1697d54d
Explain why the encoder init functions cannot call update_error_state().
In vp8/vp8_cx_iface.c, this comment should have been added in
https://chromium-review.googlesource.com/c/webm/libvpx/+/4506609.
Rewrite update_error_state() in vp8/vp8_cx_iface.c to look like the
versions in vp9/vp9_cx_iface.c and av1/av1_cx_iface.c (in libaom).
Change-Id: I3f153d67b8c549ca5ac8ea0cfbcaad4ae705c8e6
After a longjmp() call in vp8e_encode(), call update_error_state() so
that we return the error code and error detail set by the
vpx_internal_error() call.
Change-Id: I1f2428eb1b1f61e46c02604e16a5d44dcf162479
The function convolve8_4_usdot contains a comment relating to the
SDOT implementation of convolve8, which requires addition of a
correction constant to account for range clamp of the input values.
This is not performed in the i8mm USDOT implementation - so remove the
comment.
Also add some const qualifiers to function arguments.
Change-Id: I10aff560d20403897f708ee293bf873be9c35761
Fix the following clang-tidy misc-include-cleaner warnings:
vp9/encoder/vp9_encoder.c:
no header providing "vp9_is_valid_scale" is directly included
no header providing "VPX_CODEC_CORRUPT_FRAME" is directly included
vp9/vp9_cx_iface.c:
no header providing "valid_ref_frame_size" is directly included
Change-Id: I20e846f5b14c42c72aaefec0718b4ae9c7eea44a
Issue explanation:
The unit test calls set_config function twice after encoding the
first frame.
The first call of set_config reduces frame width, but is still within
half of the first frame.
The second call reduces frame width even more, making is less than
half of the first frame, which according to the encoder logic,
there is no valid ref frames, and this frame should be set as a
forced keyframe. This leads to null pointer access in scale_factors
later.
Solution:
To make sure the correct detection of a forced key frame,
we need to update the frame width and height only when the actual
encoding is performed.
Bug: b/311985118
Change-Id: Ie2cd3b760d4a4b399845693d7421c4eb11a12775
(cherry picked from commit 1ed56a46b3)
This change fixed a bug revealed by b/311294795.
In simple motion search, the reference buffer pointer needs to be
restored after the search. Otherwise, it causes problems while the
reference frame scaling happens. This CL fixes the bug.
Bug: b/311294795
Change-Id: I093722d5888de3cc6a6542de82a6ec9d601f897d
(cherry picked from commit 50ed636e49)
Use vpx_sse and vpx_highbd_sse instead of vpx_mse16x16 and
vpx_highbd_8_mse16x16 respectively to compute SSE for PSNR
calculations. This solves an issue whereby vpx_highbd_8_mse16x16
was being used to calculate SSE for 10- and 12-bit input.
This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063
by Jonathan Wright <jonathan.wright@arm.com>.
Bug: webm:1819
Change-Id: I37e3ac72835e67ccb44ac89a4ed16df62c2169a7
(cherry picked from commit 7dfe343199)
Issue explanation:
The unit test calls set_config function twice after encoding the
first frame.
The first call of set_config reduces frame width, but is still within
half of the first frame.
The second call reduces frame width even more, making is less than
half of the first frame, which according to the encoder logic,
there is no valid ref frames, and this frame should be set as a
forced keyframe. This leads to null pointer access in scale_factors
later.
Solution:
To make sure the correct detection of a forced key frame,
we need to update the frame width and height only when the actual
encoding is performed.
Bug: b/311985118
Change-Id: Ie2cd3b760d4a4b399845693d7421c4eb11a12775
This change fixed a bug revealed by b/311294795.
In simple motion search, the reference buffer pointer needs to be
restored after the search. Otherwise, it causes problems while the
reference frame scaling happens. This CL fixes the bug.
Bug: b/311294795
Change-Id: I093722d5888de3cc6a6542de82a6ec9d601f897d
fseeko and ftello are available on Android only from API level 24. Add
the needed guards for these functions.
Suggested by Yifan Yang.
Change-Id: I3a6721d31e1d961ab10b434ea6e92959bd5a70ab
(cherry picked from commit bf07554183)
This change fixed a corner case bug reealed by b/311394513.
During the frame scaling, vpx_highbd_convolve8() and vpx_scaled_2d()
requires both x_step_q4 and y_step_q4 are less than or equal to a
defined value. Otherwise, it needs to call vp9_scale_and_extend_
frame_nonnormative() that supports arbitrary scaling.
The fix was done in LBD and HBD funnctions.
Bug: b/311394513
Change-Id: Id0d34e7910ec98859030ef968ac19331488046d4
(cherry picked from commit 8bf3649d41)
Need to set skip_recode properly so that
vp9_encode_block_intra() can work properly when it is
called by block_rd_txfm(). We can not skip "recode" because
it is still at the rd search stage.
Bug: b/310340241
Change-Id: I7d7600ef72addd341636549c2dad1868ad90e1cb
(cherry picked from commit f10481dc0a)
no header providing "CONFIG_VP9_HIGHBITDEPTH" is directly included
no header providing "VPX_BITS_8" is directly included
Change-Id: Ie6d78c79ab462501417f2b451bbe808a1fdce931
Since the reference frame is already scaled, do not scale the offsets.
BUG: b/311489136, b/312656387
Change-Id: Ib346242e7ec8c4d3ed26668fa4094271218278ed
(cherry picked from commit 845a817c05)
Use vpx_sse and vpx_highbd_sse instead of vpx_mse16x16 and
vpx_highbd_8_mse16x16 respectively to compute SSE for PSNR
calculations. This solves an issue whereby vpx_highbd_8_mse16x16
was being used to calculate SSE for 10- and 12-bit input.
This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063
by Jonathan Wright <jonathan.wright@arm.com>.
Bug: webm:1819
Change-Id: I37e3ac72835e67ccb44ac89a4ed16df62c2169a7
vpx/vpx_integer.h is clearly intended as the facade header for the
Standard C Library headers <stddef.h>, <inttypes.h>, and <stdint.h>.
It is reasonable to expect that vpx/vpx_decoder.h and vpx/vpx_encoder.h
should provide the symbols from vpx/vpx_codec.h.
Change-Id: I220797e63b2efc3dd9e2ac197fe2f918bf80d247
This change fixed a corner case bug reealed by b/311394513.
During the frame scaling, vpx_highbd_convolve8() and vpx_scaled_2d()
requires both x_step_q4 and y_step_q4 are less than or equal to a
defined value. Otherwise, it needs to call vp9_scale_and_extend_
frame_nonnormative() that supports arbitrary scaling.
The fix was done in LBD and HBD funnctions.
Bug: b/311394513
Change-Id: Id0d34e7910ec98859030ef968ac19331488046d4
Need to set skip_recode properly so that
vp9_encode_block_intra() can work properly when it is
called by block_rd_txfm(). We can not skip "recode" because
it is still at the rd search stage.
Bug: b/310340241
Change-Id: I7d7600ef72addd341636549c2dad1868ad90e1cb
Define the VPX_DL_REALTIME, VPX_DL_GOOD_QUALITY, and VPX_DL_BEST_QUALITY
macros as unsigned long, because the deadline parameter of
vpx_codec_encode() is of the unsigned long type. This enables C++
templates to deduce the unsigned long type from these macros.
Change-Id: I2173e3bbf5e15c84c11843790df93a497a35ed7d
fseeko and ftello are available on Android only from API level 24. Add
the needed guards for these functions.
Suggested by Yifan Yang.
Change-Id: I3a6721d31e1d961ab10b434ea6e92959bd5a70ab
The changes in this CL show that both the VP8 and VP9 implementations of
the decode function eventually discard the deadline parameter. Change
the code to ignore the deadline parameter in vpx_codec_decode() without
passing it to the decode function, and document that the deadline
parameter is ignored and 0 should be passed.
Change-Id: Ia977e16cdbdf97901207aa2d749887980137c4c0
Since the reference frame is already scaled, do not scale the offsets.
BUG: b/311489136, b/312656387
Change-Id: Ib346242e7ec8c4d3ed26668fa4094271218278ed
Add an Armv8.0 MLA Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ic2c3cb307b95964cd0ba86f1c42eece3a8ab7cf4
Add an Armv8.0 MLA Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I027eaf2d1bb9711c2217cc8aa6b1e379d3e66b26
The deadline parameter of vpx_codec_encode() is of the unsigned long
type. The cpplint runtime/int check and the clang-tidy
google-runtime-int warn about the use of the unsigned long type. Adding
a type alias works around this issue.
Note: vpx_codec_decode() also has a deadline parameter, but it is of the
long type. So unfortuntely this type alias cannot be simply named
vpx_codec_deadline_t and the name must suggest it is encoder-specific.
Change-Id: I27b6b25730b620b328422ec3f91e63fdc55b377a
For realtime mode: if the deadline mode (good/best/realtime)
is changed on the fly (via codec_encode() call), force a
key frame and set the speed feature nonrd_keyframe = 1 to
avoid entering the rd pickmode.
nonrd_pickmode=0/off is the only feature in realtime mode that
involves rd pickmode, so by forcing it on/1 we can cleanly
separate nonrd (realtime) from rd (good/best), so we can
avoid possible issues on this dynamic mode switch, such as in
bug listed below.
Dynamic change of deadline, in particular for realtime mode,
involves a lot of coding/speed feature changes, so best to
also force reset with keyframe.
Added unitest that triggers the issue in the bug.
Bug: b/310663186
Change-Id: Idf8fd7c9ee54b301968184be5481ee9faa06468d
Add an Armv8.6 USDOT Neon path for the horizontal portion of 2D
convolution, specialised for executing with 4-tap filters (the most
common filter size for settings --good --cpu-used=1.) This new path
is also used when executing with bilinear (2-tap) filters.
Change-Id: I455e5a94bdcea1358025bd8e4d4c8c62e373aa5d
Add an Armv8.6 USDOT Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I8f7633d9852ebfe8feb9b4a055715f849cccf297
Add an Armv8.4 SDOT Neon path for the horizontal portion of 2D
convolution, specialised for executing with 4-tap filters (the most
common filter size for settings --good --cpu-used=1.) This new path
is also used when executing with bilinear (2-tap) filters.
Change-Id: I5116d10ddb371ac2cf302ef905d06f2140dc7600
Add an Armv8.4 SDOT Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ib396681b3f7b8b0eeba94381fbe33a06cf7b4a13
Add an Armv8.6 USDOT Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ic893b25541e3317c5d5c270c338f868f080aed7c
Add an Armv8.4 SDOT Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I3eb00b5a34f5676b68bda60a2a29be56e3d7d0cd
vpx_get_filter_taps() currently reports either 8-tap or 2-tap.
However, many 8-tap filters are actually 0-padded, resulting in a
lot of redundant work (multiplying by, and adding, 0) when processing
using an 8-tap convolution function. In preparation for adding 2- and
4-tap SIMD implementations for the convolution paths, make the filter
size reporting more granular, stripping any 0 padding. Filter sizes
can now be reported as 2-, 4-, 6- or 8-tap.
Change-Id: I100133aac7173134af34b918c9ad3007d98d6060
Delete redundant transpose/permute code in the Neon dot-product
vertical convolution paths. Variable values were assigned but never
used before subsequent assignment.
Change-Id: I15b29d0c993f56599e0d18ac1d5787e6385d2a3a
The test shows that the comment for kf_max_dist in vpx/vpx_encoder.h
differs from its behavior by one. We should modify the comment to match
the encoding behavior.
Bug: webm:1829
Change-Id: Icdc58b8f6b25353f10ce8ecc481c862bd3fe86df
When all the inter reference frames are invalid, disable the speed
features that bypass intra mode search.
BUG=b/312517065
Change-Id: I246c953fad3be61b9d307da11c752a21a36b90ff
vpx_codec_iface_t is defined as follows:
typedef const struct vpx_codec_iface vpx_codec_iface_t;
Since vpx_codec_iface_t is already a const struct, it is redundant to
add "const" to vpx_codec_iface_t.
Note: I think vpx_codec_iface_t should not have been defined as a const
struct, but it is too late to change that now.
Change-Id: Ifbd3f8a63c1d48e9169ff77fa0b505ea1e65519d
When the reference frame's scaling factor is not in the supported
range, skip using it for motion compensation prediction in the
partition speed features.
BUG=b/312517065
Change-Id: Ie3687186521ad2616be258e80d3e5b16e5f2d5e9
The code is ported from libaom's aom_sse and aom_highbd_sse at
commit 1e20d2da96515524864b21010dbe23809cff2e9b.
The vpx_sse and vpx_highbd_sse functions will be used by vpx_dsp/psnr.c.
Bug: webm:1819
Change-Id: I4fbffa9000ab92755de5387b1ddd4370cb7020f7
is -> if
returns -> computes
in the documentation for ComputeQP().
This is the same as:
9142314c2 ratectrl_rtc.h: fix a few typos
+ remove a duplicate, commented out, version of GetLoopfilterLevel()
Change-Id: I8832e628b63b0b7dac6236631072f36ad55d90e8
Move some internal drop_frame code to separate
function so the external RC can use.
And add new flag setting under VP8E_SET_RTC_EXTERNAL_RATECTRL
to disable vp8_drop_encodedframe_overshoot() for
testing the external RC.
Unittest added for single layer and 3 temporal layers.
Bug: b/280363228
Change-Id: Ibea2f627cc54e7156ff35259a64dd111d42d146c
Older versions of MSVC do not allow declarations after statements in C
files. We don't need to support those versions of MSVC now.
Use -std=gnu99 instead of -std=gnu89.
Change-Id: I76ba962f5a2bca30d6a5b2b05c5786507398ad32
Most are related to include-what-you-use. One is to avoid using the
unsigned long type explicitly (by passing VPX_DL_REALTIME directly to
vpx_codec_encode).
Change-Id: Ieaf3418382ad8516cb4b172f7678893286fcb8cf
Declare the oxcf parameters of vp8_create_compressor() and init_config()
as const. This helps code analysis.
Change-Id: I344ef3e6afc3adced2b2865b7e0057c6d4b1d3c0
Fixes the creation of DT_TEXTREL entries in binaries built with PIE
enabled:
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
This matches the changes made in libaom:
1df26009da aom_configure: only override CONFIG_PIC if not set on cmd line
7235e65746 aom_configure.cmake: detect PIE and set CONFIG_PIC
Change-Id: I0a43e964af2d8eb8c5e7811ce14ad39285eec3a8
- Enable C vs SIMD test for x86 32-bit platform
- Correct a print message in run_tests()
BUG=webm:1800
Change-Id: Ib1ccd3a87a64b5ec6cde524a14d5d1b7e200abfb
Supports single layer and svc. For svc only the
framedrop_mode = FULL_SUPERFRAME_DROP is allowed
for now.
Dropping frames due to overshoot is enabled by the
oxcf->drop_frames_water_mark, which is zero as default.
Note that this CL also allows for drop/skip encoding of
enhancement layers if that layer bitrate is zero.
max_consec_drop is also added, set to INT_MAX as default.
Note that max_consec_drop is only used for svc mode.
It has not been added yet for single layer in libvpx encoder.
Tests added for single layer and svc case.
Change-Id: Ic12f6a0eb3fbf07d8eb8456c46cec27b2e1930d3
Guard hwcap2 feature interrogation on HAVE_NEON_I8MM so that it gets
disabled if neon_i8mm is disabled when configuring the build.
Bug: webm:1825
Change-Id: Ic6ff71f17387b96219591928a583d43560bb7c7a
The intermediate value in the target bandwidth
calculation may exceed integer bounds.
Bug: 308007926
Change-Id: I8288c5820db06a550d88bf91fccc86106996deaa
Signed-off-by: Xiahong Bao <xiahong.bao@nxp.com>
Add 'sve' arch options to the configure, build and unit test files -
adding appropriate conditional options where necessary. Arm SIMD
extensions are treated as supersets in libvpx, so disable SVE if
either Neon DotProd or I8MM are unavailable.
Change-Id: I39dd24f2b209251084d1e28d7ac68099460309bb
- Use smaller frame size that still triggers the overflow
- Do not run encoder as the encoder init also triggers the overflow
Bug: chromium:1492864
Change-Id: I392549abf69f1cfb3754cc847a214513ec9bedc5
Frame size caps the target bitrate internally, so the frame size needs
to be large enough to reproduce the target bitrate overflow in the
fuzzing test.
However the frame size needed exceeds the max buffer allowed on 32bit
system defined by VPX_MAX_ALLOCABLE_MEMORY
Bug: chromium:1492864
Change-Id: Ia3a9a78cd35516373897039a7769b492e29e8450
avg_frame_bandwidth = target_bandwidth / framerate
If target_bandwidth is too big and/or framerate is too small (< 1),
avg_frame_bandwidth could be overflow
Bug: chromium:1492864
Change-Id: I32314da1414b472ae4bf2acdcd81b8a948286146
A speed feature disable_split_mask (set to 63) could cause no mode and
partition to be selected in rd_pick_partition because:
-> thresh_mult_sub8x8 all INT_MAX
-> All modes skipped for sub8x8 blocks
-> found_best_rd is 0 -> break from the loop of 4 sub blocks
-> sum_rdc is INT_MAX -> No rd update -> should_encode_sb is 0
-> Propagating to top of the tree
-> No partition / mode is selected
Bug: b/290499385
Change-Id: Ia655e262f3b32445347ae0aaf1a2d868cea997f3
Port the following libaom CLs to libvpx:
https://aomedia-review.googlesource.com/c/aom/+/178361https://aomedia-review.googlesource.com/c/aom/+/180701https://aomedia-review.googlesource.com/c/aom/+/181821
The tests themselves are not feature-gated in the same way that they are
used in the rest of the codebase since they are not controlled by
rtcd.pl. This means that tests that assume the existence of features not
present on the target can cause SIGILL to be thrown.
This commit extends init_vpx_test.cc to match the behaviour for other
targets and automatically disable testing for features that are not
available on the machine running the tests.
Call arm_cpu_caps() and x86_simd_caps() inside #if !CONFIG_SHARED.
All the SIMD-specialized functions (arm or x86) are internal functions,
so they are not exported from the libvpx shared library. If
CONFIG_SHARED is 1, it is not necessary to call arm_cpu_caps(),
x86_simd_caps(), and append_negative_gtest_filter() either.
Change-Id: I330631816bdb52842020c5aa2a1ad802865cc285
Fix the TODO(https://crbug.com/1486441) comment in vp8/vp8_cx_iface.c.
Make vp8cx_create_encoder_threads() work after it has been called
before. If there are already the exact number of threads it needs to
create, return immediately. Otherwise, shut down the existing threads
(by calling vp8cx_remove_encoder_threads()) and create the required
number of threads.
Call vp8cx_create_encoder_threads() in vp8e_set_config() to respond to
changes in g_threads or g_w (which also affects the number of threads
through cm->mb_cols and cpi->mt_sync_range).
Change-Id: I552eeca5b1f1f5313f59559eb1da396f270a2429
Add the mt_current_mb_col_size field to VP8_COMP to record the size of
the mt_current_mb_col array.
Move the allocation of the mt_current_mb_col array from
vp8_alloc_compressor_data() to vp8_encode_frame(), where the use of
mt_current_mb_col starts. Allocate mt_current_mb_col right before use
if mt_current_mb_col hasn't been allocated or if the current size is
incorrect.
Move the deallocation of the mt_current_mb_col array from
dealloc_compressor_data() to vp8cx_remove_encoder_threads().
Move the TODO(https://crbug.com/1486441) comment from
vp8/encoder/onyx_if.c to vp8/vp8_cx_iface.c.
Change-Id: Ic5a0793278c2cc94876669aaa0dd732412876673
This CL adds an `emms` instruction at the end of the MMX assembly
for the vpx_subtract_block function, to properly clear the register
state. This resolves a mismatch between x86 build and C only build.
BUG=webm:1816
Change-Id: I79d2947da7f587f3558a2ae17df214d2faf59e74
Make vp8cx_create_encoder_threads() undo everything cleanly before
returning an error.
Make vp8cx_remove_encoder_threads() reset pointer fields to NULL after
freeing them, reset encoding_thread_count to 0, and reset b_lpf_running
to 0 (false). This makes it safe to call vp8cx_create_encoder_threads()
after calling vp8cx_remove_encoder_threads().
Change-Id: I586f06ce3d5b1c88ca46884bb4d6667ffc97e440
Fix the following compiler warning when libvpx is configured with
the --disable-multithread option:
vp9/common/vp9_thread_common.c:391:7: warning:
variable 'cur_row' set but not used [-Wunused-but-set-variable]
int cur_row;
^
Change-Id: I53aa279152715083df40990eb7fdcaeb77a66777
vp8cx_create_encoder_threads() caps the thread count at
(cm->mb_cols / cpi->mt_sync_range) - 1. If cfg.g_w is 16, cm->mb_cols is
only 1 (see vp8_alloc_frame_buffers: mb_cols = width >> 4), so we won't
be using multiple threads. To reproduce bug chromium:1486441, the test
just needs to increase cfg.g_h sufficiently.
Bug: chromium:1486441
Change-Id: Ie6b2da2e31cfa1717a481f55eebc8c875db94d87
Use $PWD to get the current directory.
Quote directory pathnames.
Suggested by James Zern.
Bug: webm:1800
Change-Id: I51e922b24da0e89d936370f858eab55d193ebdcb
These functions assume the uint16_t samples are <= 255 (bit depth 8),
but vpx_highbd_8_mse16x16() is called for any bit depth, not just 8.
A better fix is to port the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063 to libvpx, but
that requires porting aom_sse() and aom_highbd_sse() to libvpx, which is
quite involved. So disable vpx_highbd_8_mse16x16_neon_dotprod, etc.
first.
Bug: webm:1819
Change-Id: If495a5dedc58d9981317b9993c9fbb81ff3ab50c
libvpx 1.13.1
2023-09-29 v1.13.1 "Ugly Duckling"
This release contains two security related fixes. One each for VP8 and VP9.
- Upgrading:
This release is ABI compatible with the previous release.
- Bug fixes:
https://crbug.com/1486441 (CVE-2023-5217)
Fix to a crash related to VP9 encoding (#1642)
* tag 'v1.13.1':
update CHANGELOG
update version to 1.13.1
Fix bug with smaller width bigger size
vp9_alloccommon: clear allocation sizes on free
VP8: disallow thread count changes
encode_api_test: add ConfigResizeChangeThreadCount
README: update release version to 1.13.0
Bug: webm:1818
Change-Id: I732e2423f635d4115890f00fd63f9886e31f39a6
use sizeof(var) instead of sizeof(type) and sizeof(*var) instead of
sizeof(var[0]) for consistency in some places.
Change-Id: Ibd9a783cfef5ce1d06131df3831a4093821a502f
SO_VERSION_MAJOR = 8
SO_VERSION_MINOR = 0
SO_VERSION_PATCH = 1
The increase of the patch number corresponds to the revision number in
the libtool text.
3. If the library source code has changed at all since the last update,
then increment revision (‘c:r:a’ becomes ‘c:r+1:a’).
Bug: webm:1818
Change-Id: Ia114368e9fd7a908e7fcf6e4d3142f142770e3f4
Fixed previous patch that clusterfuzz failed on.
Local fuzzing passing overnight.
Bug: webm:1642
Change-Id: If0e08e72abd2e042efe4dcfac21e4cc51afdfdb9
(cherry picked from commit 263682c9a2)
This fixes reallocations (and avoids potential crashes) if any
allocations fails and the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: If5dc96b73c02efc94ec84c25eb50d10ad6b645a6
(cherry picked from commit 02ab555e99)
When the next frame is null and the current frame is an overlay
frame, which is equivalent to there is an active alt ref frame,
we call this an end of sequence.
Change-Id: I49c2cf7a001df98aff8b62ba034317e408274bd4
Currently allocations are done at encoder creation time. Going from
threaded to non-threaded would cause a crash.
Bug: chromium:1486441
Change-Id: Ie301c2a70847dff2f0daae408fbef1e4d42e73d4
(cherry picked from commit 3fbd1dca6a)
Update thread counts and resolution to ensure allocations are updated
correctly. VP8 is disabled to avoid a crash.
Bug: chromium:1486441
Change-Id: Ie89776d9818d27dc351eff298a44c699e850761b
(cherry picked from commit af6dedd715)
Currently allocations are done at encoder creation time. Going from
threaded to non-threaded would cause a crash.
Bug: chromium:1486441
Change-Id: Ie301c2a70847dff2f0daae408fbef1e4d42e73d4
Update thread counts and resolution to ensure allocations are updated
correctly. VP8 is disabled to avoid a crash.
Bug: chromium:1486441
Change-Id: Ie89776d9818d27dc351eff298a44c699e850761b
define_gf_group is called at the last frame of each GOP to get GOP size
for next one, which means it'll also be called at the last GOP of the
sequence, when calling WebM RC will be returned with error since WebM RC
does not have any more GOP to return.
When gop_coding_frames from the encoder is 1, it means it's running out
of firstpass stats, which means end of sequence.
Bug: b/299610956
Change-Id: I30e077a28fe41593ebabbc1dc0c2915a4bcbece3
This cpu detection implementation doesn't do anything MSVC specific,
it just calls the IsProcessorFeaturePresent function. This can be
compiled with mingw compilers just as well.
Change-Id: I55e607a47c8f5b70d9f707ef96b2fa7553f2f79f
The original ref frame index was the index in the GF group; RC expects
the index to be the one for ref frame buffer.
Change-Id: I9a2b0e72b6332023fb2e8da131b557f82db02e39
Arm Neon DotProd implementations of vpx_highbd_8_mse<w>x<h> currently
need to be enabled at compile time since they're guarded by #ifdef
feature macros. Now that run-time feature detection has been enabled
for Arm platforms, expose these implementations with distinct
*neon_dotprod names in a separate file and wire them up to the build
system and rtcd.pl. Also add new test cases for the new functions.
Change-Id: I26be6fb587258c8fa9fbf03509b7602358a001a8
Enable Arm Neon DotProd implementations of vpx_get_var_sse_sum*
specialty variance functions via run-time feature detection, wiring
up the new *neon_dotprod names to rtcd.pl. Also add new test cases.
Change-Id: I04ac3db87d32ee7f94702b6c0360254e5688f713
Arm Neon DotProd implementations of vpx_variance<w>x<h> currently
need to be enabled at compile time since they're guarded by #ifdef
feature macros. Now that run-time feature detection has been enabled
for Arm platforms, expose these implementations with distinct
*neon_dotprod names in a separate file and wire them up to the build
system and rtcd.pl. Also add new test cases for the new functions.
Remove the _neon suffix in functions making reference to
vpx_variance<w>x<h>_neon() (e.g. sub-pixel variance) - enabling use
of the appropriate *neon or *neon_dotprod version at run time.
Similar changes for the specialty variance and MSE functions will be
made in a subsequent commit.
Change-Id: I69a0ef0d622ecb2d15bd90b4ace53273a32ed22d
Arm Neon DotProd implementations of vpx_sad*4d currently need to be
enabled at compile time since they're guarded by ifdef feature
macros. Now that run-time feature detection has been enabled for Arm
platforms, expose these implementations with distinct *neon_dotprod
names in separate files and wire them up to the build system and
rtcd.pl. Also add new test cases for the new DotProd functions.
Change-Id: Ie99ee0b03ec488626f52c3f13e4111fe26cc5619
Arm Neon DotProd implementations of vpx_sad* currently need to be
enabled at compile time since they're guarded by ifdef feature
macros. Now that run-time feature detection has been enabled for Arm
platforms, expose these implementations with distinct *neon_dotprod
names in separate files and wire them up to the build system and
rtcd.pl. Also add new test cases for the new DotProd functions.
Change-Id: Ic6906c28240276ba89787eadbc9393a232374f95
Arm Neon DotProd and I8MM implementations of vpx_convolve8* currently
need to be enabled at compile time since they're guarded by ifdef
feature macros. Now that run-time feature detection has been enabled
for Arm platforms, expose these implementations with distinct
*neon_dotprod/*neon_i8mm names in separate files and wire them up to
the build system and rtcd.pl. Also add new test cases for the new
DotProd and I8MM functions.
Change-Id: I3db3cd62e8596099d9fec7805ca3ee86b2a01c74
1) Overhaul the Arm CPU feature detection code, taking inspiration
from similar recent changes in libaom.
2) Add neon_dotprod and neon_i8mm arch options in the configure,
build and unit test files, adding appropriate conditional options
where necessary.
3) Soft-enable run-time CPU feature detection by default for both 32-
bit and 64-bit Arm platforms.
Change-Id: I3f13317d88324acc5753394351188baa8d18a261
Simplify the parameters and return values of the Neon MSE helper
functions for both standard and high bitdepth - avoiding unused
return values.
Change-Id: I6f9208f9ce890fbe58346d9c7d9d701f28f2f90f
Overflow was happening in two places:
one in set_encoder_config(), where the input
layer_target_bitrates are converted from kbps to bps,
the other in vp9_calc_pframe_target_size_one_pass_vbr(),
where target is scaled by kf_ratio.
vp9_ratectrl.c:2039: runtime error: signed integer overflow:
-137438983 * 25 cannot be represented in type 'int'
Bug: chromium:1475943
Change-Id: I1ab0980862548c8827fae461df9a7a74425209ff
vp9/encoder/vp9_ratectrl.c:2171:23: runtime error: signed integer
overflow: 103079280 * -22 cannot be represented in type 'int'
Bug: chromium:1473268
Change-Id: Ic1de7d48e74d94c2a992e53ec4382b5b44dba7af
in calc_iframe_target_size():
vp8/encoder/ratectrl.c:349:31: runtime error: signed integer overflow:
38 * 343597280 cannot be represented in type 'int'
Bug: chromium:1473473
Change-Id: Ie8f7b147efb27c92314df09837b66f7d97046883
Remove '= {}' (C23 [1]) and use memset to clear a vpx_rc_config_t
instance.
after:
6e2c3b9b3 Add RC mode to vpx external RC interface
Fixes compile with -pedantic and Microsoft's cl compiler.
[1] https://en.cppreference.com/w/c/language/initialization
Change-Id: I2019cdf0c42103cfc80b1e58c68b7596e497007f
Use an array for constant initialization rather than array syntax which
assumes the underlying type is a vector. Fixes compile error with
cl targeting Windows Arm64:
vpx_dsp\arm\fdct4x4_neon.c(55,52): error C2078: too many initializers
No change in assembly with gcc 12.2.0 & clang 14.
Bug: b/277255390
Bug: webm:1810
Fixed: webm:1810
Change-Id: Ia30edcdbb45067dfe865b9958a5eecf1fd9ddfc8
after:
22818907d normalize *const in rtcd
fixes warnings of the form:
vpx_dsp\x86\quantize_avx.c(145): warning C4028: formal parameter 2
different from declaration
Change-Id: I4dc423f11ec4a9171e18bdb6be2fa8dfb65ee61a
Fix a bug in vpx_int_pro_row_neon (increment pointer after peeled
first loop iteration) and re-enable both vpx_int_pro_row/col_neon
paths.
Also fix IntProRowTest to use width_ (instead of 0) as the src_stride
for the input data block. The test's use of 0 for src_stride is the
reason the tests passed with the buggy Neon implementation noted in
the listed bugs. (The old buggy Neon implementation fails the
adjusted unit tests.)
BUG=webm:1800
BUG=webm:1809
Change-Id: I1f4572ee155653a7596fe2c10b5938ea7a3f63ae
Arm SIMD testing was enabled in c vs SIMD bit-exactness test after
arm SIMD mismatch was resolved.
BUG=webm:1800
Change-Id: Id60127313a0955f4a5c8468281fd5a441668fddb
The vpx_int_pro_row/col neon SIMD version caused a mismatch between
neon encoding vs c encoding. Disabled them for now to ensure the
correctness of VP9 encoding on the arm platform. Since these 2
functions were not used much, so this wouldn't affect the overall
encoder speed much.
BUG=webm:1800
BUG=webm:1809
Change-Id: Id1a7d542fc03d4cf9fa1039a49832abf35fb722f
- Include vpx/vpx_ext_ratectrl.h in vp9_ext_ratectrl.c
- Include vpx/internal/vpx_codec_internal.h
- Include <stddef.h> for NULL
Bug: b/294049605
Change-Id: Iedd8b3864da27fde1678bfa6606e6fc5630a7a09
- Use zero initializer instead of memset to avoid including <cstring>
- Include vpx_codec.h for vpx_codec_err_t and error codes
- Include vpx_tpl.h for VpxTplGopStats
Change-Id: Iac5837ce2173bd945bfe8eeb401ff4dfd04fd2e1
This CL adds a shell script to test bit exactness of C and SIMD
VP9 encoder for x86 platform.
As C Vs NEON encoding outputs are not bit-exact (BUG=webm:1809),
ARM tests are currently disabled.
BUG=webm:1800
Change-Id: Iffcc70863e8cf83ccb5bc5be73e8866165697358
apply similar steps as to the other quantize functions to switch to
macroblock_plane and ScanOrder
Change-Id: I486d653326aaf52ffd3beafd2e891ba6a5d57ef3
Pass macroblock_plane and ScanOrder instead of looking up the values
beforehand. Avoids pushing arguments to the stack.
Change-Id: I22df6f645eb1a1d89ba5a4d9bc58acb77af51aa9
Update functions in WRITE_COMPRESSED_STREAM blocks, which are disabled
by default. This caused them to be missed in:
84e6b7ab0 test/*.cc: prefer 'override' to 'virtual'
Change-Id: I0e462263f19c15eb0a30d0c0f4e145062f789489
In file included from ../test/bench.cc:14:
../test/bench.h:17:7: warning: 'AbstractBench' has virtual functions but
non-virtual destructor [-Wnon-virtual-dtor]
class AbstractBench {
Change-Id: Ibbfb949b63c8dff936c7ed4f2d056dea0343377b
With gcc 13.1.1
In function ‘handle_inter_mode’,
inlined from ‘vp9_rd_pick_inter_mode_sb’ at
../vp9/encoder/vp9_rdopt.c:3872:17:
../vp9/encoder/vp9_rdopt.c:3142:8: warning: ‘tmp_rd’ may be used
uninitialized [-Wmaybe-uninitialized]
3142 | rd = tmp_rd + RDCOST(x->rdmult, x->rddiv, rs, 0);
../vp9/encoder/vp9_rdopt.c: In function ‘vp9_rd_pick_inter_mode_sb’:
../vp9/encoder/vp9_rdopt.c:2846:15: note: ‘tmp_rd’ was declared here
2846 | int64_t rd, tmp_rd, best_rd = INT64_MAX;
Change-Id: I8608957cc8bbeb1ae525f3c3dad6fe9785b2a9b4
These were removed in If7a49e920e12f7fca0541190b87e6dae510df05c but
the leftovers can cause a build to fail if the code isn't optimized out.
I just found this out in the Meson port of libvpx for GStreamer.
BUG=webm:1584
Change-Id: I1c953720a2cbec3796200d4ec4020dca0b672bfb
vp9/common/vp9_mfqe.c|240 col 16| warning: code will never be executed
[-Wunreachable-code]
BLOCK_SIZE mfqe_bs, bs_tmp;
^~~~~~~
Change-Id: I566b20d8c294e19bc4b90b57b730f933048e71a5
Based on the change in libaom:
fe36011455 Fix Clang -Wunreachable-code-aggressive warnings
Clang's -Wunreachable-code-aggressive flag enables several warning flags
such as -Wunreachable-code-break and -Wunreachable-code-return. Chrome's
build system enables -Wunreachable-code-aggressive (in
build/config/compiler/BUILD.gn), so it would be good if libvpx could be
compiled without -Wunreachable-code-aggressive warnings.
This requires the VPX_NO_RETURN macro be defined correctly for all the
compilers we support, otherwise some compilers may warn about missing
return statements after a die() or fatal() call (which does not return).
Change-Id: I0c069133af45a7a61759538b6d74c681ea087dcd
This fixes a crash if the application continues to call
vpx_codec_decode(). Previously a non-keyframe could cause a crash if the
decoder failed before fully initializing due to an allocation failure.
The stream info and frame resolution would be 0, skipping an allocation.
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I1c17302f4d3a488ba3b4eefe0bf53853dc558bc1
This fixes a crash if the application continues to call
vpx_codec_decode(). Previously the decoder instance would be freed,
causing a crash when attempting to access it with restart_threads=1.
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: Ic084894b776729bb1572f747082cef002f0832a8
This fixes a crash if the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I9867f5fc3d1163026f521a9609d3cbbc00568d1d
This avoids a crash if any of the thread allocations fail and the
application continues to call vpx_codec_decode(). Previously
num_tile_workers would be non-zero, but not equal to num_threads, which
would cause a crash during later thread management.
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: Ie3faf7b36764aebedac0924acb6e4cb7545aec7d
This fixes reallocations (and avoids potential crashes) if any
allocations fails and the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: If5dc96b73c02efc94ec84c25eb50d10ad6b645a6
If any allocations fail in init_decoder() and the application continues
to call vpx_codec_decode() some of the allocations would be orphaned or
the decoder would be left in a partially initialized state.
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I44f662526d715ecaeac6180070af40672cd42611
A right shift by 2 is equivalent to two halving operations if there is
no no addition or subtraction between the two halving operations.
Note: Since vhaddq_s16() and vhsubq_s16() have 17-bit intermediate
precision, the Neon code doesn't need to go to int32_t as was done in
https://chromium-review.googlesource.com/c/webm/libvpx/+/4604169.
Change-Id: Ibe0691cde0fd3b94ee7c497845ba459d30d503b0
The corresponding case block is not only for ARM.
Original comment text makes reader confused.
Test: N/A, just comment text changes.
Change-Id: I3154d18d3b3d237c1eecfe07dc7ec237c98194cf
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
This CL resolves the mismatch between C and intrinsic implementation
of vpx_hadamard_32x32 function. The mismatch was due to integer
overflow during the addition operation in the intrinsic functions.
Specifically, the addition in the intrinsic function was performed
at the 16-bit level, while the calculation of a0 + a1 resulted in
a 17-bit value.
This code change addresses the problem by performing
the addition at the 32-bit level (with sign extension) in both SSE2
and AVX2, and then converting the results back to the 16-bit level
after a right shift.
STATS_CHANGED
Change-Id: I576ca64e3b9ebb31d143fcd2da64322790bc5853
impace -> impact
taget -> target
prediciton -> prediction
addtion -> addition
the the -> the
Bug: webm:1803
Change-Id: I759c9d930a037ca69662164fcd6be160ed707d77
Dont -> Don't
setings -> settings
thresold -> thresh
thresold -> threshold
becasue -> because
itterations -> iterations
its a -> it's a
an constant -> a constant
Bug: webm:1803
Change-Id: I1e019393939ed25c59c898c88d4941ec360b026d
In the function vp9_diamond_search_sad_avx(), arranged
the cost vector in a specific order. This ensures that
the motion vector with the least index is selected,
when there exists more than one candidate motion
vector with the minimum cost, thus resolving the
c vs avx mismatch.
STATS_CHANGED
Change-Id: I4f8864f464f9ea2aae6250db3d8ad91cb08b26e2
Double the number of accumulator registers to remove the bottleneck.
Also peel the first loop iteration.
Change-Id: I6a90680369f9c33cdfe14ea547ac1569ec3f50de
* changes:
vpx_dsp_common.h,clip_pixel: work around VS2022 Arm64 issue
fdct_partial_neon.c: work around VS2022 Arm64 issue
fdct8x8_test.cc: work around VS2022 Arm64 issue
New file (vpx_tpl.c) in the following CLs will add new APIs dealing with
TPL stats from VP9 encoder.
Change-Id: I5102ef64214cba1ca6ecea9582a19049666c6ca4
This CL refactors the code related to convolve function.
Furthermore, improved the AVX2 intrinsic to compute
convolve vertical for w = 4 case, and convolve horiz for
w = 16 case.
Please note the module level scaling w.r.t C function
(timer based) for existing (AVX2) and new AVX2 intrinsics:
Block Scaling
Size AVX2 AVX2
(existing) (New)
4x4 5.34x 5.91x
4x8 7.10x 7.79x
16x8 23.52x 25.63x
16x16 29.47x 30.22x
16x32 33.42x 33.44x
This is a bit exact change.
Change-Id: If130183bc12faab9ca2bcec0ceeaa8d0af05e413
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all Neon horizontal convolution algorithms process 4 rows
at a time, but this means we end up doing at least 1 row too much
work in the 2D first pass case where we need h + 7, not h + 8 rows of
output.
This patch adds additional dot-product (SDOT and USDOT) Neon paths
that process h + 7 rows of data exactly, saving the work of the
unnecessary extra row. It is impractical to take a similar approach
for the Armv8.0 MLA paths since we have to transpose the data block
both before and after calling the convolution helper functions.
vpx_convolve_neon performance impact: we observe a speedup of ~9% for
smaller (and wider) blocks, and a speedup of 0-3% for larger blocks.
This is to be expected since the proportion of redundant work
decreases as the block height increases.
Change-Id: Ie77ad1848707d2d48bb8851345a469aae9d097e1
This avoids link errors related to the sanitizers:
https://clang.llvm.org/docs/AddressSanitizer.html#usage
"When linking shared libraries, the AddressSanitizer run-time is not
linked, so -Wl,-z,defs may cause link errors ..."
See also:
https://crbug.com/aomedia/3438
Bug: webm:1801
Fixed: webm:1801
Change-Id: Ie212318005a5f7222e5486775175534025306367
1) Use #define constant instead of magic numbers for right shifts.
2) Move saturating narrow into helper functions that return 4-element
result vectors.
3) Use mem_neon.h helpers for load/store sequences in Armv8.0 paths.
4) Tidy up: assert conditions and some longer variable names.
5) Prefer != 0 to > 0 where possible for loop termination conditions.
Change-Id: Idfcac43ca38faf729dca07b8cc8f7f45ad264d24
* changes:
vp8_[cd]x_iface: clear setjmp flag on function exit
vp9_decodeframe,tile_worker_hook: relocate setjmp=1
vp9,encoder_set_config: set setjmp flag after setjmp()
rather than define new targets, add a platform to the arm64 list as they
share the same configuration.
Bug: webm:1788
Change-Id: Iac020280b1103fb12b559f21439aeff26568fba4
x86 and armv7 are skipped for now as the intrinsics will need different
flags than cl.exe (/arch:... -> -m...).
Bug: webm:1788
Change-Id: I8ca8660a8644cdd84c51cb1f75005e371ba8207d
Contains the size of GOP - also the size of the list of TPL stats for
each frame in this GOP.
VpxTplGopStats will be the unit for VP9E_GET_TPL_STATS control to return
TPL stats from the encoder.
Bug: b/273736974
Change-Id: I1682242fc6db4aafcd6314af023aa0d704976585
There were multiple implementations of CHECK_MEM_ERROR across the
library that take different arguments and used in different places.
This CL will unify them and have only one implementation that takes
vpx_internal_error_info.
Change-Id: I2c568639473815bc00b1fc2b72be56e5ccba1a35
* changes:
Overwrite cm->error->detail before freeing
Have vpx_codec_error take const vpx_codec_ctx_t *
Add comments about vpx_codec_enc_init_ver failure
Added AVX2 intrinsic optimization for the following functions
1. vpx_idct16x16_256_add
2. vpx_idct32x32_1024_add
3. vpx_idct32x32_135_add
The module level scaling w.r.t C function (timer based) for
existing (SSE2) and new AVX2 intrinsics:
Scaling
Function Name SSE2 AVX2
vpx_idct32x32_1024_add 3.62x 7.49x
vpx_idct32x32_135_add 4.85x 9.41x
vpx_idct16x16_256_add 4.82x 7.70x
This is a bit-exact change.
Change-Id: Id9dda933aa1f5093bb6b35ac3b8a41846afca9d2
Help detect use after free of the return value of
vpx_codec_error_detail(). If vpx_codec_error_detail() is called after
vpx_codec_encode() fails, the return value may be equal to
cm->error->detail, which is freed when vpx_codec_destroy() is called.
Document the lifetime of the string returned by
vpx_codec_error_detail().
Change-Id: I8089e90a4499b4f3cc5b9cfdbb25d72368faa319
Also have vpx_codec_error_detail take vpx_codec_ctx_t *. Both functions
are getter functions that don't modify the codec context.
Change-Id: I4689022425efbf7b1da5034255ac052fce5e5b4f
Address the questions:
1. If vpx_codec_enc_init_ver() fails, should I still call
vpx_codec_destroy() on the encoder context?
2. Is it safe to call vpx_codec_error_detail() when
vpx_codec_enc_init_ver() failed?
Change-Id: I1b0e090d11dd9f853fe203f4cbb6080c3c7b0506
I realized the calculation of the size of the list of VpxTplBlockStats
is non-trivial. So it's better to add the field for the size.
Bug: b/273736974
Change-Id: Ic1b50597c1f89a8f866b5669ca676407be6dc9d8
This allows AArch64 to be correctly detected when building with Visual
Studio (cl.exe) and fixes a crash in vp9_diamond_search_sad_neon.c.
There are still test failures, however.
Microsoft's compiler doesn't define __ARM_FEATURE_*. To use those paths
we may need to rely on _M_ARM64_EXTENSION.
Bug: webm:1788
Bug: b/277255076
Change-Id: I4d26f5f84dbd0cbcd1cdf0d7d932ebcf109febe5
This will allow identifying Windows Visual Studio targets as aarch64;
the Microsoft compiler does not define __aarch64__.
An alternative would be to define this in the code, checking for
_M_ARM64 or _M_ARM64EC. For now we'll use the existing VPX_ARCH_*
system. For compatibility VPX_ARCH_ARM will continue to be defined to 1
in this case.
Bug: webm:1788
Bug: b/277255076
Change-Id: I12e25710891e86f0c7339ba96884c18ed90ba16f
Get ready for changes to follow:
- Custom reader/writer IO functions
- Codec control to get TPL stats from the encoder
Move the definition of TplFrameStats to public header so applications
can use them directly.
Bug: b/273736974
Change-Id: Ieb0db4560ddd966df1bc01f6a7e179cc97f9bac1
Joint motion search during compound mode eval is optimized by
reducing the number of mv search iterations based on bsize.
The sf 'comp_inter_joint_search_thresh' is renamed as
'comp_inter_joint_search_iter_level' and used to add the logic.
cpu Testset Instr. Cnt BD Rate loss (%)
Red (%) avg. psnr ovr.psnr ssim
0 LOWRES2 5.373 0.0917 0.1088 0.0294
0 MIDRES2 3.395 0.0239 0.0520 0.0783
0 HDRES2 2.291 0.0223 0.0301 0.0053
0 Average 3.686 0.0460 0.0636 0.0377
STATS_CHANGED
Change-Id: I7ee8873ebc8af967382324ae8f5c70c26665d5e6
This is a reland of commit 3c59378e4e
Addressed issues from the previous CL:
- Both recon_error and rate_cost are scaled up
- recon_error and rate_cost are not accumulated across ref frames,
instead they are calculated with the best ref frame picked.
- get_quantize_error() is put where it was, so there is no behavior
change for vp9.
Bug: b/273736974
Original change's description:
> Calculate recrf_dist and recrf_rate
>
> Change-Id: I74e74807436b92d729e2ccaab96149780f1f52d9
Change-Id: I20e1f5543e83b576a074bd4e6b44d99da65f4b56
This reverts commit 3c59378e4e.
Reason for revert:
recon_error and recon_rate is summed by mistake across reference frames, as pointed out by Angie.
It could also cause vp9 behavior changes.
Original change's description:
> Calculate recrf_dist and recrf_rate
>
> Change-Id: I74e74807436b92d729e2ccaab96149780f1f52d9
Change-Id: I6106ce77cb0fe8c12b2bcf070d01513ffa8dc613
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
This allows the testdata target to work environments like cygwin/msys
when a windows style path is used. It may also fix using paths with
spaces, though that's not generally recommended.
Change-Id: Id444c14468b05d589bce49c1f612aa712a3f0c8c
in get_rdmult_delta() and compute_frame_aq_offset().
quiets -Wunused-but-set-variable with clang-17
Change-Id: I726852f3bc42afa80a18475de910040a9436b0bb
Add Neon implementations of high bitdepth downsampling SAD4D
functions for all block sizes.
Also add corresponding unit tests.
Change-Id: Ib0c2f852e269cbd6cbb8f4dfb54349654abb0adb
Add Neon implementations of standard bitdepth downsampling SAD4D
functions for all block sizes.
Also add corresponding unit tests.
Change-Id: Ieb77661ea2bbe357529862a5fb54956e34e8d758
Add Neon implementations of high bitdepth downsampling SAD functions
for all block sizes.
Also add corresponding unit tests.
Change-Id: I56ea656e9bb5f8b2aedfdc4637c9ab4e1951b31b
Add Neon implementations of standard bitdepth downsampling SAD
functions for all block sizes.
Also add corresponding unit tests.
Change-Id: Ibda734c270278d947673ffcc29ef17a2f4970b01
Introduced AVX2 intrinsic to compute FDCT for block size
16x16 case. This is a bit-exact change.
Please check the module level scaling w.r.t C function (timer based)
for existing (SSE2) and new AVX2 intrinsics:
Scaling
SSE2 AVX2
3.88x 5.95x
Change-Id: I02299c3746fcb52d808e2a75d30aa62652c816dc
I believe the following comments are wrongly scoped, possibly left over
from previous changesets. This made me very confused when reading the
test suite Makefile, in order to port it to Meson.
Change-Id: Ice3c7ba50c6909a9c7dfd4001afa1e1ddfa4b5ce
New linear model to calculate loopfilter level from frame qp.
Linear regression was done on qvga, vga, and hd clips.
Bug: b/275304642
Change-Id: I552b312212bb4de21b53b762d139aa9588c64ae2
Added an assert for prune_single_mode_based_on_mv_diff_mode_rate
speed feature. This ensures NEARMV or ZEROMV modes are pruned
only when NEARESTMV and NEWMV modes are not early terminated.
Change-Id: Id8b03eef6d1ef3f16714a9cbfde0c171c0c6fe0b
Pack nz_mask with zero. After the result is permuted this has the effect
of ignoring the upper half of the iscan register which is only loaded
with 128-bits. Depending on the optimization level and the load used the
upper half of the ymm register may contain undefined values which can
produce an incorrect eob. If this is large enough it can cause a crash.
Bug: chromium:1431729
Change-Id: I4ebae9fa39f228bdd29dcc19935f3f07759d75f5
Add a widening 4D reduction function operating on uint16x8_t vectors
and use it to optimize the final reduction in Armv8.0 Neon standard
bitdepth 16xh, 32xh and 64h SAD4D computations.
Also simplify the Armv8.0 Neon version of the sad64xhx4d_neon helper
function since VP9 block sizes are not large enough to require
widening to 32-bit accumulators before the final reduction.
Change-Id: I32b0a283d7688d8cdf21791add9476ed24c66a28
Add a 4D reduction function operating on uint16x8_t vectors and use
it to optimize the final reduction in standard bitdepth 4xh and 8xh
SAD4D computations. Similar 4D reduction optimizations have already
been implemented for all other standard bitdepth block sizes, and all
high bitdepth block sizes.[1]
[1] https://chromium-review.googlesource.com/c/webm/libvpx/+/4224681
Change-Id: I0aa0b6e0f70449776f316879cafc4b830e86ea51
Added AVX2 intrinsic optimization for the following functions
1. vpx_variance8x4
2. vpx_variance8x8
3. vpx_variance8x16
This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.698
0 MIDRES2 0.577
0 HDRES2 0.469
0 Average 0.582
Change-Id: Iae8fdf9344fd012cda4955ed140633141d60ba86
The shift instructions have marginally worse performance on some
micro-architectures, and the vget_{low,high} instructions are
unnecessary.
This commit improves performance of the d135 predictors by 1.5% geomean
averaged across a range of compilers and micro-architectures.
Change-Id: Ied4c3eecc12fc973841696459d868ce403ed4e6c
Use sum_neon.h helpers for horizontal reductions in Neon DC predictors,
enabling use of dedicated Neon reduction instructions on AArch64. Some
of the surrounding code is also optimized to remove redundant broadcast
instructions in the dc_store helpers.
Performance is largely unchanged on both the standard as well as the
high bit-depth predictors. The main improvement appears to be the 16x16
standard-bitdepth dc predictor, which improves by 10-15% when
benchmarked on Neoverse N1.
Change-Id: Ibfcc6ecf4b1b2f87ce1e1f63c314d0cc35a0c76f
* changes:
Avoid LD2/ST2 instructions in highbd v predictors in Neon
Avoid interleaving loads/stores in Neon for highbd dc predictor
Avoid LD2/ST2 instructions in vpx_dc_predictor_32x32_neon
For these block sizes there is no need to widen to 32-bits until the
final reduction, so use a single vabaq instead of vabd + vpadalq.
Change-Id: I9c19d620f7bb8b3a6b0bedd37789c03bb628b563
The interleaving load/store instructions (LD2/LD3/LD4 and ST2/ST3/ST4)
are useful if we are dealing with interleaved data (e.g. real/imag
components of complex numbers), but for simply loading or storing larger
quantities of data it is preferable to simply use the normal load/store
instructions.
This patch replaces such occurrences in the two larger block sizes:
vpx_highbd_v_predictor_16x16_neon and vpx_highbd_v_predictor_32x32_neon.
Change-Id: Ie4ffa298a2466ceaf893566fd0aefe3f66f439e4
The interleaving load/store instructions (LD2/LD3/LD4 and ST2/ST3/ST4)
are useful if we are dealing with interleaved data (e.g. real/imag
components of complex numbers), but for simply loading or storing larger
quantities of data it is preferable to simply use two or more of the
normal load/store instructions.
This patch replaces such occurrences in the two larger block sizes:
vpx_highbd_dc_predictor_16x16_neon, vpx_highbd_dc_predictor_32x32_neon,
and related helper functions.
Speedups over the original Neon code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 16x16 | 1.25
Neoverse N1 | LLVM 15 | 32x32 | 1.13
Neoverse N1 | GCC 12 | 16x16 | 1.56
Neoverse N1 | GCC 12 | 32x32 | 1.52
Neoverse V1 | LLVM 15 | 16x16 | 1.63
Neoverse V1 | LLVM 15 | 32x32 | 1.08
Neoverse V1 | GCC 12 | 16x16 | 1.59
Neoverse V1 | GCC 12 | 32x32 | 1.37
Change-Id: If5ec220aba9dd19785454eabb0f3d6affec0cc8b
The LD2 and ST2 instructions are useful if we are dealing with
interleaved data (e.g. real/imag components of complex numbers), but for
simply loading or storing larger quantities of data it is preferable to
simply use two of the normal load/store instructions.
This patch replaces such occurrences in vpx_dc_predictor_32x32_neon and
related functions.
With Clang-15 this speeds up this function by 10-30% depending on the
micro-architecture being benchmarked on. With GCC-12 this speeds up the
function by 40-60% depending on the micro-architecture being benchmarked
on.
Change-Id: I670dc37908aa238f360104efd74d6c2108ecf945
The existing tests duplicate `above_row_[block_size - 1]` after the
first `block_size` elements, which can lead to tests incorrectly passing
due to differing behaviour when calculating the average for the last
elements of the output.
This change adjusts the above array setup to be fully random instead,
allowing us to catch such issues here rather than in other larger tests
like the external MD5 tests.
It doesn't appear that other architectures are fully clean with this
change so restrict it to just Neon for now until they are fixed.
Bug: webm:1797
Change-Id: If83ff1adbf1e8d30f2a92474d7186c65840a5d0b
The existing standard bitdepth implementation doesn't appear to manifest
as a failure in any of the predictor or MD5 tests, but it does rely on
the predictor tests filling the second `bs` elements of the `above`
input array with copies of `above[bs - 1]` in order to match the C
implementation.
This patch adjusts the Neon implementation to correctly match the C
implementation in the case where the elements of the `above` array all
differ.
The geomean of performance for the predictor is approximately a 2%
slowdown compared to the previous vectorized implementation. This is
still considerably faster than the unspecialized naive C implementation.
Bug: webm:1797
Change-Id: I8fb00a154288d54b24a72a7ff63c816bdcf3aca3
The existing implementation doesn't appear to manifest as a failure in
any of the predictor or MD5 tests, but it does rely on the predictor
tests filling the second `bs` elements of the `above` input array with
copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C
implementation in the case where the elements of the `above` array all
differ.
Performance of the predictor is mostly unchanged, except for the 32x32
block size where it appears to have gotten about 40% faster when
compiled with clang-15.
Bug: webm:1797
Change-Id: Iaad58e77c5467307a3c80d6989b7cf2988e09311
The existing implementation doesn't appear to manifest as a failure in
any of the predictor or MD5 tests, but it does rely on the predictor
tests filling the second `bs` elements of the `above` input array with
copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C
implementation in the case where the elements of the `above` array all
differ.
Performance of the predictor is mostly unchanged, except for the 16x16
block size where it appears to have gotten marginally faster across most
compiler/micro-architecture combinations.
Bug: webm:1797
Change-Id: Iac166d6047316c0382e0f2790ce780fc99674b43
Introduced AVX2 intrinsic to compute convolve vertical for
w = 4 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.364
0 MIDRES2 0.236
0 HDRES2 0.162
0 Average 0.254
Change-Id: I413f58aa6333a6f2421d4c10d49dec01e55b2098
This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in:
03f6fdcfca Fix warnings reported by -Wshadow: Part1b: scan_order struct
and variable
Bug: webm:1793
Change-Id: Ide5127886b7fd7778e6d8a983bfba6edda21ff28
Fix comment typos for vpx_codec_destroy() and vpx_codec_enc_init_ver().
Based on the change made in libaom:
https://aomedia.googlesource.com/aom/+/365a968684
365a968684 Fix comment typos (likely copy-and-paste errors)
Change-Id: I39edae835ed0752b569e8e7328d0709c59724ac2
This reverts commit 9c15fb62b3.
Reason for revert:
vpxenc should only use public interface
Original change's description:
> Add codec control to get tpl stats
>
> Add command line flag to vpxenc to export tpl stats
>
> Bug: b/273736974
> Change-Id: I6980096531b0c12fbf7a307fdef4c562d0c29e32
Bug: b/273736974
Change-Id: Ifa8951bb34e5936bbfc33086b22e9fc36d379bc9
Add Neon implementation of vpx_highbd_avg_4x4_c and vpx_highbd_avg_8x8_c
as well as the corresponding tests.
Change-Id: Ib1b06af5206774347690c9c56e194b76aa409c91
Shift the final read from the source by 3 to avoid breaking the
assumption that the 6-tap filter needs only 5 pixels outside of the
macroblock; this matches the sse2 and ssse3 implementations.
It's possible this restriction could be removed if the source buffers
are assumed to be padded.
Bug: webm:1795
Change-Id: I4c791e3a214898a503c78f4cedca154c75cdbaef
Fixed: webm:1795
The code to enable trellis coefficient optimization
is refactored using the sf 'trellis_opt_tx_rd'. This
change facilitates adaptive skipping of trellis
optimization based on block properties.
Change-Id: Ia1ff7cbbe5acf86414410f62655d46c099387847
This is a reland of commit 14fc40040f
Parent change fixed in crrev.com/c/webm/libvpx/+/4305500
Original change's description:
> quantize: use scan_order instead of passing scan/iscan
>
> further reduces the arguments for the 32x32. This will be applied to the base
> version as well.
>
> Change-Id: I25a162b5248b14af53d9e20c6a7fa2a77028a6d1
Change-Id: I2a7654558eaddd68bd09336bf317b297f18559d2
This is a reland of commit 573f5e662b
Alignment issue with tests fixed in crrev.com/c/webm/libvpx/+/4305500
Original change's description:
> quantize: simplify highbd 32x32_b args
>
> Change-Id: I431a41279c4c4193bc70cfe819da6ea7e1d2fba1
Change-Id: Ic868b6f987c99d88672858fedd092fa49c125e19
Change the VP9RateControlRtcConfig constructor to initialize
ss_number_layers (to 1).
Change UpdateRateControl() to return bool so that it can report failure
(due to invalid configuration).
Also change InitRateControl() to return bool to propagate the return
value of UpdateRateControl().
Note: This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/172042.
Change-Id: I90b60353b5f15692dba5d89e7b1a9c81bb2fdd89
The code that sets oxcf->ts_rate_decimator[tl] does not need to be
inside a loop that iterates over sl. Move the code out of the sl loop so
that oxcf->ts_rate_decimator[tl] is set only once.
Change-Id: I22f6c117d200ec38a757b749a8700660d15436c1
Remove the `ts_number_layers` field from VP9RateControlRtcConfig because
the base class VpxRateControlRtcConfig already has that field.
Note: In commit 65a1751e5b,
`ts_number_layers` was moved to the newly created base class
VpxRateControlRtcConfig but was inadvertently left in
VP9RateControlRtcConfig:
https://chromium-review.googlesource.com/c/webm/libvpx/+/3140048,
Change-Id: I98d48e152683ec2e5e62efffb56b7f010c5d0695
Introduced AVX2 intrinsic to compute convolve horizontal for
w = 4 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.763
0 MIDRES2 0.466
0 HDRES2 0.317
0 Average 0.516
Change-Id: I124f3f8e994c24461812f4963b113819466db44f
Optimize vpx_minmax_8x8_neon on AArch64 targets by using the UMAXV and
UMINV instructions - computing the maximum and minimum elements in a
Neon vector.
Change-Id: I54c3a3a087d266f6774e6113e5947253df288a64
Optimize Neon implementation of vpx_satd by using ABD and UADALP instead
of ABAL and ABAL2, splitting the accumulator and using a dedicated
helper function to perform the final reduction.
Change-Id: Idcfa49e001b68b1dcd87c13fd9acc317a208cd2a
Both are around 3x faster than original C version. 8-bit gives a
small 0.5% speed increase, whereas highbd gives ~2.5%.
Change-Id: I71d75ddd2757b19aa201e879fd9fa8f3a25431ad
Introduced AVX2 intrinsic to compute convolve vertical for
w = 8 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 1.347
0 MIDRES2 1.046
0 HDRES2 0.805
0 Average 1.066
Change-Id: Idf77fff054beaf2c985b9bf2335591bda47e811f
Function renamed as 'build_inter_pred_model_rd_earlyterm' and
added a comment to explain its behavior.
Change-Id: I804e6273558ba36241232f62cf18ea754b85e369
The high bitdepth Neon code applying the first pass of the bilinear
filter for subpixel variance on blocks of width 4 processed two rows
at a time. This resulted in a source buffer overread, attempting to
produce two rows of padding for the second (vertical) pass of the
bilinear filter.
This patch modifies highbd_var_filter_block2d_bil_w4 and
highbd_avg_pred_var_filter_block2d_bil_w4 such that they only process
a single row per iteration, and only require a single row of padding
for the second pass. This prevents the buffer overread.
Since all block sizes are now processed one row at a time, there is
no need for a "padding" macro parameter - the value is always 1, with
no special case for 4xh blocks. As well as re-enabling the Neon paths
and their associated tests, we remove the now-redundant 'padding'
macro parameter.
Bug: webm:1796
Change-Id: Icd6076b38eb4476139795bb1734ca800c9edf079
vpx_highbd_8_sub_pixel_avg_variance4x4_neon
vpx_highbd_8_sub_pixel_avg_variance4x8_neon
vpx_highbd_10_sub_pixel_avg_variance4x4_neon
vpx_highbd_10_sub_pixel_avg_variance4x8_neon
vpx_highbd_12_sub_pixel_avg_variance4x4_neon
vpx_highbd_12_sub_pixel_avg_variance4x8_neon
all cause heap overflows of the form:
i[ RUN ] NEON/VpxHBDSubpelAvgVarianceTest.Ref/33
=================================================================
==535205==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff95bb0b89 at pc 0x00000116dabc bp 0xffffd09f6430 sp 0xffffd09f6428
READ of size 8 at 0xffff95bb0b89 thread T0
#0 0x116dab8 in load_unaligned_u16q vpx_dsp/arm/mem_neon.h:176:3
#1 0x116dab8 in highbd_var_filter_block2d_bil_w4
vpx_dsp/arm/highbd_subpel_variance_neon.c:49:21
#2 0x116dab8 in vpx_highbd_8_sub_pixel_avg_variance4x4_neon
vpx_dsp/arm/highbd_subpel_variance_neon.c:543:1
...
0xffff95bb0b89 is located 0 bytes to the right of 73-byte region
[0xffff95bb0b40,0xffff95bb0b89)
allocated by thread T0 here:
#0 0x5f18b0 in malloc (test_libvpx+0x5f18b0)
#1 0xce4a40 in vpx_memalign vpx_mem/vpx_mem.c:62:10
#2 0xce4a40 in vpx_malloc vpx_mem/vpx_mem.c:70:40
#3 0xa52238 in (anonymous namespace)::SubpelVarianceTest<unsigned
int (*)(unsigned char const*, int, int, int, unsigned char
const*, int, unsigned int*, unsigned char
const*)>::SetUp()
test/variance_test.cc:586:14
...
This is the same issue as:
e33d4c276 disable vpx_highbd_*_sub_pixel_variance4x{4,8}_neon
They have highbd_var_filter_block2d_bil_w4 in common.
Bug: webm:1796
Change-Id: I3ed70d0ba22e127720542612ea9f6665948eedfc
vpx_highbd_8_sub_pixel_variance4x4_neon
vpx_highbd_8_sub_pixel_variance4x8_neon
vpx_highbd_10_sub_pixel_variance4x4_neon
vpx_highbd_10_sub_pixel_variance4x8_neon
vpx_highbd_12_sub_pixel_variance4x4_neon
vpx_highbd_12_sub_pixel_variance4x8_neon
all cause heap overflows of the form:
[ RUN ] NEON/VpxHBDSubpelVarianceTest.Ref/24
=================================================================
==450528==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8311a571 at pc 0x0000010ca52c bp 0xffffc63e96b0 sp 0xffffc63e96a8
READ of size 8 at 0xffff8311a571 thread T0
#0 0x10ca528 in load_unaligned_u16q vpx_dsp/arm/mem_neon.h:176:3
#1 0x10ca528 in highbd_var_filter_block2d_bil_w4
vpx_dsp/arm/highbd_subpel_variance_neon.c:49:21
#2 0x10ca528 in vpx_highbd_10_sub_pixel_variance4x8_neon
vpx_dsp/arm/highbd_subpel_variance_neon.c:257:1
...
0xffff8311a571 is located 0 bytes to the right of 113-byte region
[0xffff8311a500,0xffff8311a571)
allocated by thread T0 here:
#0 0x5f18b0 in malloc (test_libvpx+0x5f18b0)
#1 0xce4f90 in vpx_memalign vpx_mem/vpx_mem.c:62:10
#2 0xce4f90 in vpx_malloc vpx_mem/vpx_mem.c:70:40
#3 0xa4ad44 in (anonymous namespace)::SubpelVarianceTest<unsigned
int (*)(unsigned char const*, int, int, int, unsigned char
const*, int, unsigned int*)>::SetUp() test/variance_test.cc:586:14
Bug: webm:1796
Change-Id: I39f7f936bae2bcbbe1f803fb10375ec02d1c1277
* changes:
Implement highbd_d207_predictor using Neon
Implement highbd_d153_predictor using Neon
Implement d207_predictor using Neon
Implement d153_predictor using Neon
Implement highbd_d63_predictor using Neon
Introduced AVX2 intrinsic to compute convolve horizontal for
w = 8 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 1.509
0 MIDRES2 1.165
0 HDRES2 0.898
0 Average 1.191
Change-Id: I699c94aa3d7ea74c58f901df906eed0b81b4ee79
horizontal_add_int64x2 was incorrectly returning a uint64_t instead of
an int64_t. This patch fixes that.
Change-Id: Ic6016cf87aebfc6a14f540b784d6648757e12b49
Currently vp9_block_error_fp_neon is only used when
CONFIG_VP9_HIGHBITDEPTH is set to false. This patch optimizes the
implementation and uses tran_low_t instead of int16_t so that the
function can also be used in builds where vp9_highbitdepth is enabled.
Change-Id: Ibab7ec5f74b7652fa2ae5edf328f9ec587088fd3
Use a mem_neon.h helper to do strided 4-byte loads instead of Neon
8-byte loads - where the last 4 bytes are out of bounds.
Re-enable the Neon code path and the tests.
Bug: webm:1794
Change-Id: I69ccff730f4a5cbf585dd6a9aa0f3eb13e150074
Add an additional 32-bit vector accumulator to allow parallel
processing on CPUs that have more than one Neon multiply-accumulate
pipeline. Also use sum_neon.h horizontal-add helpers for reduction.
Change-Id: Ibcb48a738f5dee1430c3ebcd305b5ea8ea344c40
The load of `left[bs]` in the standard bitdepth d117 Neon implementation
triggered an address-sanitizer failure.
The highbd equivalent does not appear to trigger any asan failures when
running the VP9/ExternalFrameBufferMD5Test or
VP9/TestVectorTest.MD5Match tests, but for consistency with the standard
bitdepth implementation we adjust it to avoid the over-read.
Performance is roughly identical, with a 0.8% performance improvement on
average over the previous optimised code.
Change-Id: I05dc4d43f244f4915c0ccc52cc0af999bbacb018
Add Neon implementations of the d117 predictor for 4x4, 8x8, 16x16 and
32x32 block sizes. Also update tests to add new corresponding cases.
This re-lands commit 360e9069b6,
previously reverted in commit 394de691a0.
The implementation is mostly identical to the original but with an
adjustment to how data is loaded from the `left` array. In particular
the left array cannot be guaranteed to be larger than the block size, so
the read of e.g. `left[32]` in the `bs=32` case is not valid. This turns
out to be not a problem since the last lane loaded in this case is
unused. I have added comments in the code to explain why this is the
case.
Since we cannot load the last element directly, we instead construct it
from the previous aligned read. This seems to have an inconsistent
affect on performance, improving by up to 10% in some cases and
regressing by up to 10% on others. Either way it is still significantly
faster than the original C code.
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 1.88
Neoverse N1 | LLVM 15 | 8x8 | 5.19
Neoverse N1 | LLVM 15 | 16x16 | 9.63
Neoverse N1 | LLVM 15 | 32x32 | 13.85
Neoverse N1 | GCC 12 | 4x4 | 2.04
Neoverse N1 | GCC 12 | 8x8 | 4.62
Neoverse N1 | GCC 12 | 16x16 | 9.79
Neoverse N1 | GCC 12 | 32x32 | 4.69
Neoverse V1 | LLVM 15 | 4x4 | 1.75
Neoverse V1 | LLVM 15 | 8x8 | 6.71
Neoverse V1 | LLVM 15 | 16x16 | 9.62
Neoverse V1 | LLVM 15 | 32x32 | 13.81
Neoverse V1 | GCC 12 | 4x4 | 1.75
Neoverse V1 | GCC 12 | 8x8 | 6.01
Neoverse V1 | GCC 12 | 16x16 | 6.91
Neoverse V1 | GCC 12 | 32x32 | 4.39
Change-Id: Ia0977ff0b0eba2c41c7884b64e7c22ff9bc9549d
Add Neon implementations of the highbd d63 predictor for 4x4, 8x8, 16x16
and 32x32 block sizes. Also update tests to add new corresponding cases.
This re-lands commit 7cdf139e3d,
previously reverted in 7478b7e4e4.
Compared to the previous implementation attempt we now correctly match
the behaviour of the C code when handling the final element loaded from
the 'above' input array. In particular:
- The C code for a 4x4 block performs a full average of the last element
rather than duplicating the final element from the input 'above'
array.
- The C code for other block sizes performs a full average for the
stride=0 and stride=1, and otherwise shifts in duplicates of the final
element from the input 'above' array. Notably this shifting for later
strides _replaces_ the final element which we previously performed an
average on (see {d0,d1}_ext in the code).
It is worth noting that this difference is not caught by the existing
VP9HighbdIntraPredTest test cases since the test vector initialisation
contains this loop:
for (int x = block_size; x < 2 * block_size; x++) {
above_row_[x] = above_row_[block_size - 1];
}
Since AVG2(a, a) and AVG3(a, a, a) are simply 'a', such differences in
behaviour for the final element are not observed.
Tested on AArch64 with:
- ./test_libvpx --gtest_filter="*VP9HighbdIntraPredTest*"
- ./test_libvpx --gtest_filter="*VP9/TestVectorTest.MD5Match*"
- ./test_libvpx --gtest_filter="*VP9/ExternalFrameBufferMD5Test*"
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 2.43
Neoverse N1 | LLVM 15 | 8x8 | 3.92
Neoverse N1 | LLVM 15 | 16x16 | 3.19
Neoverse N1 | LLVM 15 | 32x32 | 4.13
Neoverse N1 | GCC 12 | 4x4 | 2.92
Neoverse N1 | GCC 12 | 8x8 | 6.51
Neoverse N1 | GCC 12 | 16x16 | 4.55
Neoverse N1 | GCC 12 | 32x32 | 3.18
Neoverse V1 | LLVM 15 | 4x4 | 1.99
Neoverse V1 | LLVM 15 | 8x8 | 3.65
Neoverse V1 | LLVM 15 | 16x16 | 3.72
Neoverse V1 | LLVM 15 | 32x32 | 3.26
Neoverse V1 | GCC 12 | 4x4 | 2.39
Neoverse V1 | GCC 12 | 8x8 | 4.76
Neoverse V1 | GCC 12 | 16x16 | 3.24
Neoverse V1 | GCC 12 | 32x32 | 2.44
Change-Id: Iefaa774d6a20388b523eaa7f5df6bc5f5cf249e4
Allocate mb_plane_ on the heap to ensure src is aligned.
Now that all the implementations of the 32x32 quantize are in
intrinsics we can reference struct members directly. Saves
pushing them to the stack.
n_coeffs is not used at all for this function.
Change-Id: Ib551f7f583977602504d962b72063bc6eda9dda9
This causes various buffer overflows in the tests:
[ RUN ] NEON/SixtapPredictTest.TestWithPresetData/0
=================================================================
==22346==ERROR: AddressSanitizer: global-buffer-overflow on address
0x0000012b4a5b at pc 0x000000df0f60 bp 0xffffcf6e64b0 sp 0xffffcf6e64a8
READ of size 8 at 0x0000012b4a5b thread T0
#0 0xdf0f5c in vp8_sixtap_predict16x16_neon
vp8/common/arm/neon/sixtappredict_neon.c:1507:13
#1 0x8819e4 in (anonymous
namespace)::SixtapPredictTest_TestWithPresetData_Test::TestBody()
test/predict_test.cc:293:3
...
0x0000012b4a5b is located 2 bytes to the right of global variable
'kTestData' defined in '../test/predict_test.cc:237:24' (0x12b48a0) of
size 441
[ RUN ] NEON/SixtapPredictTest.TestWithRandomData/0
=================================================================
==22338==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8b5321fb at pc 0x000000df0f60 bp 0xfffff7e0cf30 sp 0xfffff7e0cf28
READ of size 8 at 0xffff8b5321fb thread T0
#0 0xdf0f5c in vp8_sixtap_predict16x16_neon
vp8/common/arm/neon/sixtappredict_neon.c:1507:13
#1 0x87d4c0 in (anonymous
namespace)::PredictTestBase::TestWithRandomData(void (*)(unsigned
char*, int, int, int, unsigned char*, int))
test/predict_test.cc:170:9
...
0xffff8b5321fb is located 2 bytes to the right of 441-byte region
[0xffff8b532040,0xffff8b5321f9)
allocated by thread T0 here:
#0 0x5fd4f0 in operator new[](unsigned long) (test_libvpx+0x5fd4f0)
#1 0x87c2e0 in (anonymous namespace)::PredictTestBase::SetUp()
test/predict_test.cc:47:12
#2 0x87d074 in non-virtual thunk to (anonymous
namespace)::PredictTestBase::SetUp() test/predict_test.cc
...
Bug: webm:1795
Change-Id: I32213a381eef91547d00f88acf90f1cf2ec2ea75
This function causes a heap overflow in the tests:
[ RUN ] NEON/VpxSseTest.RefSse/0
=================================================================
==876922==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8949d903 at pc 0x000000dd95d4 bp 0xfffffdd7f260 sp 0xfffffdd7f258
READ of size 8 at 0xffff8949d903 thread T0
#0 0xdd95d0 in vpx_get4x4sse_cs_neon
vpx_dsp/arm/variance_neon.c:556:10
#1 0x9d4894 in (anonymous namespace)::MainTestClass<unsigned int
(*)(unsigned char const*, int, unsigned char const*,
int)>::RefTestSse() test/variance_test.cc:531:5
#2 0x9d4894 in (anonymous
namespace)::VpxSseTest_RefSse_Test::TestBody()
test/variance_test.cc:772:30
...
0xffff8949d903 is located 3 bytes to the right of 16-byte region
[0xffff8949d8f0,0xffff8949d900)
allocated by thread T0 here:
#0 0x5fd050 in operator new[](unsigned long) (test_libvpx+0x5fd050)
#1 0x9d3e04 in (anonymous namespace)::MainTestClass<unsigned int
(*)(unsigned char const*, int, unsigned char const*,
int)>::SetUp() test/variance_test.cc:299:12
Bug: webm:1794
Change-Id: I4bc681eb9a436743ef8bfe2a2abae59ce754309c
This reverts commit 360e9069b6.
This causes ASan errors:
[ RUN ] VP9/TestVectorTest.MD5Match/1
=================================================================
==837858==ERROR: AddressSanitizer: stack-buffer-overflow on address
0xffff82ecad40 at pc 0x000000c494d4 bp 0xffffe1695800 sp 0xffffe16957f8
READ of size 16 at 0xffff82ecad40 thread T0
#0 0xc494d0 in vpx_d117_predictor_32x32_neon (test_libvpx+0xc494d0)
#1 0x1040b34 in vp9_predict_intra_block (test_libvpx+0x1040b34)
#2 0xf8feec in decode_block (test_libvpx+0xf8feec)
#3 0xf8f588 in decode_partition (test_libvpx+0xf8f588)
#4 0xf7be5c in vp9_decode_frame (test_libvpx+0xf7be5c)
...
Address 0xffff82ecad40 is located in stack of thread T0 at offset 64 in
frame
#0 0x103fd3c in vp9_predict_intra_block (test_libvpx+0x103fd3c)
This frame has 2 object(s):
[32, 64) 'left_col.i' <== Memory access at offset 64 overflows this
variable
[96, 176) 'above_data.i'
Change-Id: I058213364617dfe1036126c33a3307f8288d9ae0
This reverts commit 5359ae810c.
Reason for revert: Blocks quantize cleanups
Original change's description:
> Allow macroblock_plane to have its own rounding buffer
>
> Add 8 bytes buffer to macroblock_plane to support rounding factor.
>
> Change-Id: I3751689e4449c0caea28d3acf6cd17d7f39508ed
Change-Id: Ia2424d2114207370f0b45350313a5ff8521d25a8
While porting this function to NEON, using SSE4_1 implementation
as base I noticed that both were producing files with different
checksums to the C reference implementation. After investigating
further I found that this saturating pack was the culprit. Doing
the multiplication on the 32-bit values, leads to producing the
correct results with the C implementation.
Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
This reverts commit 848f6e7337.
This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*
Change-Id: Ic12014ab0a78ed3cde02d642509061552cdc8fc9
This reverts commit 573f5e662b.
This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*
Change-Id: Ibf05e6b116c46f6e2c11187b3e3578bbd2d2c227
This reverts commit 14fc40040f.
This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*
Change-Id: I934f9a4c3ce3db33058a65180fa645c8649c3670
This reverts commit 7cdf139e3d.
This causes failures in the VP9/ExternalFrameBufferMD5Test and
VP9/TestVectorTest.MD5Match tests in both armv7 and aarch64 builds.
Change-Id: I7ac4ba0ddc70e7e7860df9f962e6658defe1cdd5
Currently MSE functions just call the variance helpers but don't
actually use the computed sum. This patch adds dedicated helpers to
perform the computation of sse.
Add the corresponding tests as well.
Change-Id: I96a8590e3410e84d77f7187344688e02efe03902
* changes:
Implement highbd_d117_predictor using Neon
Implement highbd_d63_predictor using Neon
Implement d117_predictor using Neon
Implement d63_predictor using Neon
Now that all the implementations of the 32x32 quantize are in
intrinsics we can reference struct members directly. Saves
pushing them to the stack.
n_coeffs is not used at all for this function.
Change-Id: I2104fea3fa20c455087e21b347d6abd7ea1f3e1e
Currently only vpx_mse16x16 has a Neon implementation. This patch adds
optimized Armv8.0 and Armv8.4 dot-product paths for all block sizes:
8x8, 8x16, 16x8 and 16x16.
Add the corresponding tests as well.
Change-Id: Ib0357fdcdeb05860385fec89633386e34395e260
1) Use vtrn[12]q_[su]64 in vpx_vtrnq_[su]64* helpers on AArch64
targets. This produces half as many TRN1/2 instructions compared to
the number of MOVs that result from vcombine.
2) Use vpx_vtrnq_[su]64* helpers wherever applicable.
3) Refactor transpose_4x8_s16 to operate on 128-bit vectors.
Change-Id: I9a8b1c1fe2a98a429e0c5f39def5eb2f65759127
Use (void) to indicate an empty parameter list and match the declaration
of vpx_codec_vp[89]_[cd]x. This fixes a cfi sanitizer error.
Change-Id: I190f432eea4d1765afffd84c7458ec44d863f90c
* changes:
Add Neon implementation of high bitdepth 32x32 hadamard transform
Add Neon implementation of high bitdepth 16x16 hadamard transform
Add Neon implementation of high bitdepth 8x8 hadamard transform
This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in:
863b04994b Fix warnings reported by -Wshadow: Part2: av1 directory
Bug: webm:1793
Change-Id: I4df1bbc8d079a3174d75f0d35d54c200ffdbb677
Specialize implementation of high bitdepth variance functions such that
we only widen data processing element types when absolutely necessary.
Change-Id: If4cc3fea7b5ab0821e3129ebd79ff63706a512bf
In joint_motion_search, there are four iterations.
Even iterations search in the first reference frame
and odd iterations search in the second. The last two
iterations use the search result of the first two
iterations as the start point. If the search result does
not change,last two iterations are not necessary and can
be skipped.
Instruction Count
cpu-used Reduction(%)
0 1.411
Change-Id: Ie583c9f75dd0a22bbdfb432ccdd62eea6ec4fce8
Added unit test.
Keep track of spatial layer id and frame type in case where spatial
layers are encoded parallel by the hardware encoder.
ComputeQP() / PostEncodeUpdate() doesn't need to be called sequentially
when there is no inter layer prediction.
Bug: b/257368998
Change-Id: I50beaefcfc205d3f9a9d3dbe11fead5bfdc71489
* changes:
Optimize vpx_highbd_comp_avg_pred_neon
Add Neon AvgPredTestHBD test suite
Specialize Neon high bitdepth avg subpel variance by filter value
Specialize Neon high bitdepth subpel variance by filter value
Refactor Neon high bitdepth avg subpel variance functions
Optimize Neon high bitdepth subpel variance functions
Optimize the implementation of vpx_highbd_comp_avg_pred_neon by making
use of the URHADD instruction to compute the average.
Change-Id: Id74a6d9c33e89bc548c3c7ecace59af69051b4a7
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Change-Id: Id5a2b2d9fac6f878795a6ed9de2bc27d9e62d661
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Change-Id: I73182c979255f0332a274f2e5907df7f38c9eeb3
Use the same general code style as in the standard bitdepth Neon
implementation - merging the computation of vpx_highbd_comp_avg_pred
with the second pass of the bilinear filter to avoid storing and loading
the block again.
Also move vpx_highbd_comp_avg_pred_neon to its own file (like the
standard bitdepth implementation) since we're no longer using it for
averaging sub-pixel variance.
Change-Id: I2f5916d5b397db44b3247b478ef57046797dae6c
Use the same general code style as in the standard bitdepth Neon
implementation. Additionally, do not unnecessarily widen to 32-bit data
types when doing bilinear filtering - allowing us to process twice as
many elements per instruction.
Change-Id: I1e178991d2aa71f5f77a376e145d19257481e90f
Release v1.13.0 Ugly Duckling
2023-01-31 v1.13.0 "Ugly Duckling"
This release includes more Neon and AVX2 optimizations, adds a new codec
control to set per frame QP, upgrades GoogleTest to v1.12.1, and includes
numerous bug fixes.
- Upgrading:
This release is ABI incompatible with the previous release.
New codec control VP9E_SET_QUANTIZER_ONE_PASS to set per frame QP.
GoogleTest is upgraded to v1.12.1.
.clang-format is upgraded to clang-format-11.
VPX_EXT_RATECTRL_ABI_VERSION was bumped due to incompatible changes to the
feature of using external rate control models for vp9.
- Enhancement:
Numerous improvements on Neon optimizations.
Numerous improvements on AVX2 optimizations.
Additional ARM targets added for Visual Studio.
- Bug fixes:
Fix to calculating internal stats when frame dropped.
Fix to segfault for external resize test in vp9.
Fix to build system with replacing egrep with grep -E.
Fix to a few bugs with external RTC rate control library.
Fix to make SVC work with VBR.
Fix to key frame setting in VP9 external RC.
Fix to -Wimplicit-int (Clang 16).
Fix to VP8 external RC for buffer levels.
Fix to VP8 external RC for dynamic update of layers.
Fix to VP9 auto level.
Fix to off-by-one error of max w/h in validate_config.
Fix to make SVC work for Profile 1.
Bug: webm:1780
Change-Id: I371fc1444ead56f8d7fc510e05582b6415c3ddb1
Use standard loads and stores instead of the significantly slower
interleaving/de-interleaving variants. Also move all loads in loop
bodies above all stores as a mitigation against the compiler thinking
that the src and dst pointers alias (since we can't use restrict in
C89.)
Change-Id: Idd59dca51387f553f8db27144a2b8f2377c937d3
Add missing 4x4 and 4x8 tests for both high bitdepth sub-pixel variance
and high bitdepth averaging sub-pixel variance.
Change-Id: I042752c5b7ccc14f58075694d0bb1d36f144ad06
Move the 4D reduction helper function to sum_neon.h and use this for
both standard and high bitdepth SAD4D paths. This also removes the
AArch64 requirement for using the UDOT Neon SAD4D paths.
Change-Id: I207f76b3d42aa541809b0672c3b3d86e54d133ff
* changes:
Optimize Neon implementation of high bitdepth SAD4D functions
Optimize Neon implementation of high bitdepth avg SAD functions
Optimize Neon implementation of high bitdepth SAD functions
Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
block once - instead of four times.
Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723
Optimizations take a similar form to those implemented for standard
bitdepth averaging SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Change-Id: I75c5f09948f6bf17200f82e00e7a827a80451108
Optimizations take a similar form to those implemented for standard
bitdepth SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Change-Id: I9e626d7fa0e271908dc43448405a7985b80e6230
At BEST encoding mode, the mesh search range wasn't initialized for
non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly
used speed 0's setting. Fixed it by adding the initialization.
There were 2 ways to fix this. Patchset 1 set to use speed 0's setting
for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's
encoding results much, and only a couple of clips' results were changed.
Borg result for BEST mode:
avg_psnr: ovr_psnr: ssim: encoding_spdup:
lowres2: -0.004 -0.003 -0.000 0.030
midres2: -0.006 -0.009 -0.012 0.033
hdres2: 0.002 0.002 0.004 0.015
Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type.
However, the majority of test clips' BDrate got changed up to
~0.5% (gain or loss), and overall it didn't give better performance
than patchset 1. So, we chose to use patchset 1.
Change-Id: Ibbf578dad04420e6ba22cb9a3ddec137a7e4deef
rather than the gcc specific __attribute__((aligned())); fixes build
targeting ARM64 windows.
Bug: webm:1788
Change-Id: I2210fc215f44d90c1ce9dee9b54888eb1b78c99e
Use the load_unaligned helper functions in mem_neon.h to load strided
sequences of 4 bytes where alignment is not guaranteed in the Neon
SAD and SAD4D paths.
Change-Id: I941d226ef94fd7a633b09fc92165a00ba68a1501
Refactor the Neon implementation of transpose_s16_8x8(q) and
transpose_u16_8x8 so that the final step compiles to 8 ZIP1/ZIP2
instructions as opposed to 8 EXT, MOV pairs. This change removes 8
instructions per call to transpose_s16_8x8(q), transpose_u16_8x8
where the result stays in registers for further processing - rather
than being stored to memory - like in vpx_hadamard_8x8_neon, for
example.
This is a backport of this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/169426
Change-Id: Icef3e51d40efeca7008e1c4fc701bf39bd319c88
In total this gives about 9% extra performance for both rt/best
profiles.
Furthermore, add transpose_s32 16x16 function
Change-Id: Ib6f368bbb9af7f03c9ce0deba1664cef77632fe2
Use the same specialization for averaging subpel variance functions
as used for the non-averaging variants. The rationale for the
specialization is as follows:
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/166962
Change-Id: I7860c852db94a7c9c3d72ae4411316685f3800a4
Merge the computation of vpx_comp_avg_pred into the second pass of the
bilinear filter - avoiding the overhead of loading and storing the
entire block again.
This is a backport of this libaom change[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/166961
Change-Id: I9327ff7382a46d50c42a5213a11379b957146372
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
(>= 16x16) as we need to be doing enough work to make the cost of
finding the optimal implementation worth it.
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/162463
Change-Id: Ia818e148f6fd126656e8411d59c184b55dd43094
Use case is for 1 pass encoding.
Forces max_quantizer = min_quantizer and aq-mode = 0.
Applicalble to spatial layers, where user may set
the QP per spatial layer.
Change-Id: Idfcb7daefde94c475ed1bc0eb8af47c9f309110b
This simplifies integration with the Android platform and avoids the
files from being used when a non-NDK build is performed. In that case
Android.bp is preferred.
Change-Id: I803912146dac788b7f0af27199c7613cabbc9fa0
Refactor and optimize the Neon implementation of variance functions -
effectively backporting these libaom changes[1,2].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/162241
[2] https://aomedia-review.googlesource.com/c/aom/+/162262
Change-Id: Ia4e8fff4d53297511d1a1e43bca8053bf811e551
Failure occurs for 1 pass non-realtime mode at speed 0.
Due to speed feautre rd_ml_partition.var_pruning, which
doesn't check for scaled reference in simple_motion_search().
Bug: webm:1768
Change-Id: Iddcb56033bac042faebb5196eed788317590b23f
Add additional AArch64 paths for vpx_convolve8_vert_neon and
vpx_convolve8_avg_vert_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.
The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.
Change-Id: Ifbf467681dd53bb1d26e22359885e6edde3c5c72
Add additional AArch64 paths for vpx_convolve8_horiz_neon and
vpx_convolve8_avg_horiz_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.
The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.
Change-Id: If19f5872c3453458a8cfb7c7d2be82a2c0eab46a
avoids a warning on some platforms:
egrep: warning: egrep is obsolescent; using grep -E
Bug: webm:1786
Change-Id: Ia434297731303aacb0b02cf3dcbfd8e03936485d
Fixed: webm:1786
Define all Neon load/store helper functions in mem_neon.h and use
them consistently in Neon convolution functions.
Change-Id: I57905bc0a3574c77999cf4f4a73442c3420fa2be
The Neon convolution helper functions take a pointer to a filter and
load the 8 values into a single Neon register. For some reason,
filter values 3 and 4 are then duplicated into their own separate
registers.
This patch modifies these helper functions so that they access filter
values 3 and 4 via the lane-referencing versions of the various Neon
multiply instructions. This reduces register pressure and tidies up
the source code quite a bit.
Change-Id: Ia4aeee8b46fe218658fb8577dc07ff04a9324b3e
This change replaces references to a number of deprecated NumPy type
aliases (np.bool, np.int, np.float, np.complex, np.object, np.str)
with their recommended replacement
(bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases
so we must remove uses before updating NumPy.
Change-Id: I9f5dfcbb11fe6534fce358054f210c7653f278c3
Test to verify RC for going down and back up in
spatial layers. Going back up has an issue so added
a TODO.
Make the test more flexible to handle dynamic layers.
Test for dyanmic change in temporal layers to follow.
Change-Id: Ic5542f7b274135277429e116f56ba54e682e96a0
Allow external model to control frame rdmult.
A function is called per frame to get the value of rdmult from
the external model.
The external rdmult will overwrite libvpx's default rdmult unless
a reserved value is selected.
A unit test is added to test when the default rdmult value is set.
Change-Id: I2f17a036c188de66dc00709beef4bf2ed86a919a
SVC test is only in CBR and the frame_flags are
set by the SVC pattern, so we shouldn't undo them
for svc mode.
Change-Id: I5ffa65dd58a7b47f287d124d9e71ba1dc7c5a549
A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.
Bug: b:259487065
Test: Build
Change-Id: If05854530f97e1430a7b97788910f277ab673a87
Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.
Added rate targetting unittest for (2SL, 3TL).
Bug: chromium:1375111
Change-Id: I5a62ffe7fbea29dc5949c88a284768386b1907a9
This was just a helper function which called vpx_quantize_b or
vpx_highbd_quantize_b. It also checked for skip_block, which was
necessary when webm:1439 was filed but does not appear to be
necessary now.
Removes a quantize variant and makes subsequent cleanups easier.
Change-Id: Ibe545eccd19370f07ff26c8e151f290c642efd2a
Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.
Change-Id: I3ebcd26010f6c5c067026aa9353cde46669c5d94
Add an Arm Neon implementation of vpx_hadamard_32x32 and use it
instead of the scalar C implementation.
Also add test coverage for the new Neon implementation.
Change-Id: Iccc018eec4dbbe629fb0c6f8ad6ea8554e7a0b13
For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.
Change-Id: I1ce4bbef6e364bbadc76264056aa3f86b1a8edc5
Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.
Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3
(src|ref)8_ptr -> (src|ref)_ptr. aligns the names with the rtcd header;
clears some clang-tidy warnings
Change-Id: Id1aa29da8c0fa5860b46ac902f5b2620c0d3ff54
On a dynamic change of temporal layers:
starting/maimum/optimal were being set twice,
causing incorrect large values.
Bug: b/253927937
Change-Id: I204e885cff92530336a9ed9a4363c486c5bf80ae
On change/update of rc_cfg: when number of temporal
layers change call vp8_reset_temporal_layer_change(),
which in turn will call vp8_init_temporal_layer_context()
only for the new layers.
Bug:b/249644737
Change-Id: Ib20d746c7eacd10b78806ca6a5362c750d9ca0b3
Move all butterfly functions to fdct_neon.h
Slightly optimize load/scale/cross functions
in fdct 16x16.
These will be reused in highbd variants.
Change-Id: I28b6e0cc240304bab6b94d9c3f33cca77b8cb073
In assembly it made sense to iterate using n_coeffs.
In intrinsics it's just as fast to use index and
easier to read.
Change-Id: I403c959709309dad68123d0a3d0efe183874543d
warning: ‘s2[3]’ may be used uninitialized
and
warning: ‘s1[3]’ may be used uninitialized
The warnings exposed unused code.
Change-Id: I75cf1f9db75e811cb42e2f143be1ad76f3e4dee9
Match style for vpx_quantize_b_sse2 and prepare to rewrite
ssse3 version in intrinsics.
Need to evaluate the value of threshold breakout before
going further.
Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
vp9_quantize_fp_sse2 was only tested in non-hbd
configuration. Missed when fixing this for
vpx_quantize_b_sse2.
Change-Id: Ide346e5727d74281c774f605c90d280050e0bf62
All of the assembly adds 1 to iscan to convert from
a 0 based array to the EOB value.
Add 1 to all iscan values and remove the extra
instructions from the assembly.
Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12:
In function ‘_mm_add_epi16’,
inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized]
1060 | return (__m128i) ((__v8hu)__A + (__v8hu)__B);
| ^~~~~~~~~~~
../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’:
../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here
39 | __m128i below_context;
Change-Id: I2fc592f121c4e85d0aff1640014c3444f5eb09fd
Allow to handle external q and external max frame size separately.
Rely on libvpx's decision to catch overshoot/undershoot and recode frames.
Previously, when external max frame size is set, we didn't handle
undershoot cases, and now we fall back to libvpx's decision to
recode a frame if overshoot/undershoot is seen.
Change-Id: Ic3eee042cfe104b528c5f2c6c82b98dd5d8fa8ca
fixes -Wclobbered warnings with gcc 12.1.0:
vp8/vp8_dx_iface.c|278 col 16| warning: variable 'w' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
vp8/vp8_dx_iface.c|278 col 19| warning: variable 'h' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
Change-Id: Ib2c606a3450188d7869c066cacaf5615d9746181
missed in
447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check
+ fix #if comments
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
Change-Id: I098b96421b7de5928bb3b11612ca1f32e7b6cbc4
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
Change-Id: I164fe121ccefda99050a9b6a99738a2b518520f3
this produces better assembly with gcc (11.3.0-3); no change in assembly
using clang from the r24 android sdk (Android (8075178, based on
r437112b) clang version 14.0.1
(https://android.googlesource.com/toolchain/llvm-project
8671348b81b95fc603505dfc881b45103bee1731)
Change-Id: Ifec252d4f499f23be1cd94aa8516caf6b3fbbc11
Pass the encode frame info to external ml model, with the information
of gop size and whether alt ref is used.
Change-Id: I55be2d3de83d7182c1a1a174e44ead7e19045c9d
This was reported with doxygen 1.9.4.
Also update the comment for CLASS_GRAPH by running "doxygen -u" because
the original comment for CLASS_GRAPH mentions the obsolete tag
'CLASS_DIAGRAMS',
Change-Id: I3bca547201f794d363bd814b7c7f7c9d7088797a
only store the deltas from --style Google in the file and reapply using
Debian clang-format version 11.1.0-6+build1
Bug: b/229626362
Change-Id: I3e18a2e7c17a90a48405b3cf1b37ebc652aba0db
this was added in:
7beafefd1 vp9: Allow for disabling loopfilter per spatial layer
but the test doesn't zero initialize its svc_params_ member.
fixes the use of an uninitialized value, reported by valgrind and
integer sanitizer:
[ RUN ] VP9/RcInterfaceSvcTest.Svc/0
==1064682== Conditional jump or move depends on uninitialised value(s)
==1064682== at 0x1C5624: loopfilter_frame (vp9_encoder.c:3285)
==1064682== by 0x1C9B54: encode_frame_to_data_rate (vp9_encoder.c:5595)
==1064682== by 0x1CA2EE: SvcEncode (vp9_encoder.c:5789)
==1064682== by 0x1CEA01: vp9_get_compressed_data (vp9_encoder.c:7891)
==1064682== by 0x185F0E: encoder_encode (vp9_cx_iface.c:1437)
==1064682== by 0x1503BB: vpx_codec_encode (vpx_encoder.c:208)
vp9/encoder/vp9_svc_layercontext.c:362:26: runtime error: implicit
conversion from type 'int' of value -1 (32-bit, signed) to type
'LOOPFILTER_CONTROL' changed the value to 4294967295 (32-bit, unsigned)
#0 0x558925f45377 in vp9_restore_layer_context vp9/encoder/vp9_svc_layercontext.c:362:26
#1 0x558925ef89fd in vp9_get_compressed_data vp9/encoder/vp9_encoder.c:7781:5
#2 0x558925e3ef3e in encoder_encode vp9/vp9_cx_iface.c:1437:20
Bug: b/229626362
Change-Id: I33d244be7752c68b71efa9c62ca45d6b202ec761
with block sizes < 8x8 previously only the inner loop was aborted. this
could cause propagation of invalid motion vectors to scale_mv().
this quiets integer sanitizer warnings of the form:
vp9/common/vp9_mvref_common.h:239:18: runtime error: implicit conversion
from type 'int' of value 32768 (32-bit, signed) to type 'int16_t' (aka
'short') changed the value to -32768 (16-bit, signed)
Bug: b/229626362
Change-Id: I58b5a425adf21542cbf4cc4dd5ab3cc5ed008264
these shift values off the most significant bit as part of the process;
vp8_regular_quantize_b_sse4_1 is included here for a special case of
mask creation
quiets warnings of the form:
vp8/decoder/dboolhuff.h:81:11: runtime error: left shift of
2373679303235599696 by 3 places cannot be represented in type
'VP8_BD_VALUE' (aka 'unsigned long')
vp8/encoder/bitstream.c:257:18: runtime error: left shift of 2147493041
by 1 places cannot be represented in type 'unsigned int'
vp8/encoder/x86/quantize_sse4.c:114:18: runtime error: left shift of
4294967294 by 1 places cannot be represented in type 'unsigned int'
vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of
4294967295 by 1 places cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: Iabed118b2a094232783e5ad0e586596d874103ca
and use it on MD5Transform(); this behavior is well defined and is only
a warning with -fsanitize=integer, not -fsanitize=undefined.
quiets warnings of the form:
md5_utils.c:163:3: runtime error: left shift of 143704723 by 7 places
cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: I60a384b2c2556f5ce71ad8ebce050329aba0b4e4
this changes from scaling best sse to downscaling base sse in
comparisons.
this quiets an integer sanitizer warning of the form:
vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of
4294967295 by 1 places cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: Iee2920474ba700a46177d4514ba6ef7691958069
make source_variance unsigned; this matches update_thresh_freq_fact()
and the type of the MACROBLOCK member.
quiets integer sanitizer warnings of the form:
vp9/encoder/vp9_pickmode.c:2710:58: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
Bug: b/229626362
Change-Id: I812c6ca914507bf25cad323dea3d91a3a2ea4f1d
flat/flat2 are stored as int8_t as returned by the filter_mask*
functions.
this quiets integer sanitizer warnings of the form:
vpx_dsp/loopfilter.c:197:28: runtime error: implicit conversion from
type 'int8_t' (aka 'signed char') of value -1 (8-bit, signed) to type
'uint8_t' (aka 'unsigned char') changed the value to 255 (8-bit,
unsigned)
Bug: b/229626362
Change-Id: Iacb6ae052d4cb2b6e0ebccbacf59ece9501d3b5f
* changes:
vpx_encoder.h: make flag constants unsigned
vp8,VP8_COMP: normalize segment_encode_breakout type
webmdec,WebmInputContext: make timestamp_ns signed
highbd_quantize_intrin_sse2: quiet int sanitizer warnings
load_unaligned_u32: use an int w/_mm_cvtsi32_si128
variance_sse2.c: add some missing casts
this matches the type for vpx_codec_frame_flags_t and
vpx_codec_er_flags_t and quiets int sanitizer warnings of the form:
implicit conversion from type 'int' of value -9 (32-bit, signed) to type
'unsigned int' changed the value to 4294967287 (32-bit, unsigned)
Bug: b/229626362
Change-Id: Icfc5993250f37cedb300c7032cab28ce4bec1f86
use unsigned int as the API value is of this type; this quiets some
integer sanitizer warnings of the form:
implicit conversion from type 'unsigned int' of value 2147483648
(32-bit, unsigned) to type 'int' changed the value to -2147483648
(32-bit, signed)
Bug: b/229626362
Change-Id: I3d1ca618bf1b3cd57a5dca65a3067f351c1473f8
this matches the type returned from libwebm, which uses -1 as an error;
quiets integer sanitizer warnings of the form:
implicit conversion from type 'long long' of value -1 (64-bit, signed)
to type 'uint64_t' (aka 'unsigned long') changed the value to
18446744073709551615 (64-bit, unsigned)
Bug: b/229626362
Change-Id: Id3966912f802aee3c0f7852225b55f3057c3e76a
add a missing cast in ^ operations; quiets warnings of the form:
implicit conversion from type 'int' of value -1 (32-bit, signed) to type
'unsigned int' changed the value to 4294967295 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I56f74981050b2c9d00bad20e68f1b73ce7454729
this matches the type of the function parameter; quiets integer
sanitizer warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
3215646151 (32-bit, unsigned) to type 'int' changed the value to
-1079321145 (32-bit, signed)
Bug: b/229626362
Change-Id: Ia9a5dc5e1f57cbf4f8f8fa457bb674ef43369d37
quiets integer sanitizer warnings of the form:
../vpx_dsp/x86/variance_sse2.c:100:10: runtime error: implicit
conversion from type 'unsigned int' of value 4294966272 (32-bit,
unsigned) to type 'int' changed the value to -1024 (32-bit, signed)
Bug: b/229626362
Change-Id: I150cc0a6a6b85143c3bf96886686fe3a40897db5
with certain optimization flags or sanitizers enabled some code may fail
to vectorize:
third_party/libyuv/source/row_common.cc:3178:7: warning: loop not
vectorized: the optimizer was unable to perform the requested
transformation; the transformation might be disabled or specified as
part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]
this was observed with integer/undefined sanitizers using clang 11/13
Bug: b/229626362
Change-Id: I01595c641763c4cd4242e02f2cc5cbabfe69d03e
fixes warnings of the form:
../vp9/simple_encode.cc:755:48: warning: empty expression statement has
no effect; remove unnecessary ';' to silence this warning
[-Wextra-semi-stmt]
SET_STRUCT_VALUE(config, oxcf, ret, key_freq);
Bug: b/229626362
Change-Id: I1c9b0ae9927cdd7c31da000633bcb6e2b8242cd4
Up to 4.1x faster than vp9_highbd_quantize_fp_c() for full
calculations.
~1.3% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I8c6466bdbcf1c398b1d8b03cab4165c1d8556b0c
the snapshot of googletest and the test files themselves are targeting
c++11 currently; these warnings are supported by recent versions of
clang
Change-Id: I5d36b3bd4058ba1610f0c8b27cad27aadee85939
Assume the level definition of min_gf_interval is the minimum allowed
gf_interval. We take this level comformant min_gf_interval instead of
+1.
Change-Id: I9c7e62f210c95b356e9716579ee4c19638de8e35
avoid calculating the end timestamp when performing a flush to prevent
an implicit conversion warning when applying a non-zero offset to a 0
pts used in that case:
vp9/vp9_cx_iface.c:1361:50: runtime error: implicit conversion from type
'vpx_codec_pts_t' (aka 'long') of value -15 (64-bit, signed) to type
'unsigned long' changed the value to 18446744073709551601 (64-bit,
unsigned)
Bug: b/229626362
Change-Id: I68ba19b7d6de35cc185707dfb6b43406b7165035
The iteration index is wrong, causing the starting level to be chosen
is "LEVEL_5_2", which is intended for videos of a large resolution.
Change-Id: Id88836981bdcbd7494bd06193d6a433ac75a6d2e
Up to 5.37x faster than vp9_highbd_quantize_fp_c() for full
calculations.
~1.6% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I584fd1f60a3e02f1ded092de98970725fc66c5b8
this fixes runtime errors with clang -fsanitize=integer in x86 builds:
../vp9/encoder/vp9_rdopt.c:3250:17: runtime error: signed integer
overflow: 18 - -2147483648 cannot be represented in type 'int'
../vp9/encoder/vp9_rdopt.c:3277:16: runtime error: signed integer
overflow: 26 - -2147483648 cannot be represented in type 'int'
Bug: b/229626362
Change-Id: Ic9a5063c840b4fce7056f61362234721add056a6
prefer int in most cases
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -809931979 (32-bit, signed)
to type 'uint32_t' (aka 'unsigned int') changed the value to 3485035317
(32-bit, unsigned)
Bug: b/229626362
Change-Id: I0c6604efc188f2660c531eddfc7aa10060637813
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -2 (32-bit, signed) to type
'unsigned int' changed the value to 4294967294 (32-bit, unsigned)
Bug: b/229626362
Change-Id: Id7e13b3d494ccd1a2351db8fab6fdb6a9a771d51
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -1323 (32-bit, signed) to
type 'unsigned int' changed the value to 4294965973 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I7291d9bd5cacea0d88d9f4c4624c096764f4a472
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
4294443008 (32-bit, unsigned) to type 'int' changed the value to -524288
(32-bit, signed)
Bug: b/229626362
Change-Id: Ic7c0a2e7b64a1dd6fd5cc64adcd5765318c2a956
unsigned -> int and vice versa
reported by clang -fsanitize=integer
vp8/common/findnearmv.c:108:11: runtime error: implicit conversion from
type 'uint32_t' (aka 'unsigned int') of value 4294443008 (32-bit,
unsigned) to type 'int' changed the value to -524288 (32-bit, signed)
vp8/common/findnearmv.c:110:33: runtime error: implicit conversion from
type 'int' of value -524288 (32-bit, signed) to type 'uint32_t' (aka
'unsigned int') changed the value to 4294443008 (32-bit, unsigned)
Bug: b/229626362
Change-Id: Ic7ce0fd98255ccf9307ac73e9fb6a8189b268214
use vpx_enc_frame_flags_t; this avoids int -> unsigned conversion
warnings; reported w/clang -fsanitize=integer:
test/error_resilience_test.cc:95:9: runtime error: implicit conversion
from type 'int' of value -12845057 (32-bit, signed) to type 'unsigned
long' changed the value to 4282122239 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I0fc1dbe44a258f397cf1a05347d8cb86ee70b1b8
reported under clang-13. null data may be passed as a flush; move
data_end after that check
vp9/vp9_dx_iface.c:337:40: runtime error: applying zero offset to null
pointer
Bug: b/229626362
Change-Id: I845726fd6eb6ac7a776e49272c6477a5ad30ffdf
reported under clang-13; use a while loop in file_read() to force a size
check before attempting to read. buf (aux_buf) may be may be null when
no conversion is necessary.
y4minput.c:29:43: runtime error: applying zero offset to null pointer
Bug: b/229626362
Change-Id: Ia3250d6ff9c325faf48eaa31f4399e20837f8f7b
this clears warnings under clang-13 of the form:
vp9/encoder/x86/highbd_temporal_filter_sse4.c|196 col 63| warning:
parameter 'v_pre' set but not used [-Wunused-but-set-parameter]
this is the high-bitdepth version of:
73b8aade8 temporal_filter_sse4: remove unused function params
Change-Id: I9b2c9bf27c16975e4855df6a2c967da4c8c63a3a
Up to 11.78x faster than vpx_quantize_b_32x32_sse2() for full
calculations.
~1.7% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: Ib759056db94d3487239cb2748ffef1184a89ae18
Up to 3.61x faster than vpx_highbd_quantize_b_sse2() for full
calculations.
~2.3% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I23f88d2a7f96aaa4103778372f4f552207f73cee
Add unit tests for a 4 frame video, which could be considered as a
corner case.
Three different GOP settings are tested and verified as valid.
(1). The first GOP has 3 coding frames, no alt ref.
The second GOP has 1 coding frame, no alt ref.
The numer of coding frames is 4.
Their frame types are: keyframe, inter_frame, inter_frame,
golden_frame.
(2). The first GOP has 4 coding frames, use alt ref.
The second GOP has 1 coding frame, which is the overlay of
the first GOP's alt ref frame.
The numer of coding frames is 5.
Their types are: keyframe, alt_ref, inter_frame, inter_frame,
overlay_frame.
(3). Only one GOP with 4 coding frames, do not use alt ref.
The numer of coding frames is 4.
Their types are: keyframe, inter_frame, inter_frame, inter_frame.
Change-Id: I4079ff5065da79834b363b1e1976f65efed3f91f
in CheckLowFilterOutput(); use std::unique_ptr to avoid spurious memory
leak warning:
test/pp_filter_test.cc|466 col 3| warning: Potential leak of memory
pointed to by 'expected_output' [cplusplus.NewDeleteLeaks]
ASSERT_NE(expected_output, nullptr);
Bug: b/229626362
Change-Id: Ie9e06c9b9442ffa134e514d2aee70841d19c8ecb
in ConfigChangeThreadCount(); initialize cfg as the static analyzer can
assume AlwaysTrue() within EXPECT_NO_FATAL_FAILURE may return false
causing InitCodec() not to be called.
test/encode_api_test.cc|321 col 3| warning: 1st function call argument
is an uninitialized value [core.CallAndMessage]
video.SetSize(cfg.g_w, cfg.g_h);
Bug: b/229626362
Change-Id: I54899ed0a207ca685416bed3a0e9c9644668e163
Up to 1.36x faster than vpx_quantize_b_32x32_avx() for full
calculations. Up to 1.29x faster for VP9_HIGHBITDEPTH builds.
Bug: b/237714063
Change-Id: I97aa6a18d4dc2f3187b76800f91bbba7be447ef1
this quiets a couple static analysis warnings with clang 11:
vpx_dsp/x86/avg_intrin_sse2.c:278:45: warning: Although the value stored
to 'src_diff' is used in the enclosing expression, the value is never
actually read from 'src_diff' [deadcode.DeadStores]
src[7] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
^ ~~~~~~~~~~
vpx_dsp/x86/avg_intrin_avx2.c:307:49: warning: Although the value stored
to 'src_diff' is used in the enclosing expression, the value is never
actually read from 'src_diff' [deadcode.DeadStores]
src[7] = _mm256_loadu_si256((const __m256i *)(src_diff += src_stride));
^ ~~~~~~~~~~
Bug: b/229626362
Change-Id: I4b0201bd39775885df0afc03fa5da70910b9dad6
this quiets a static analysis warning with clang 11:
vpx_dsp/avg.c:353:15: warning: Assigned value is garbage or undefined
[core.uninitialized.Assign]
hbuf[idx] /= norm_factor;
^ ~~~~~~~~~~~
the same fix was applied in libaom:
1ad0889bc aom_int_pro_row_c: add an assert for height
Bug: b/229626362
Change-Id: Ic8a249f866b33b02ec9f378581e51ac104d97169
this avoids a warning with certain versions of gcc; observed with:
mipsisa32r6el-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110
Change-Id: I8999f487a79a9d53133816d572054b2423330bcf
This reverts commit 9f1329f8ac
and fixes a dumb mistake in evaluation of vfcmv. Used vdupq_n_s16,
instead of vdupq_n_s32.
Change-Id: Ie236c878c166405c49bc0f93f6d63a6715534a0a
Added datarate unittest for 4:4:4 and 4:2:2 input,
for spatial and temporal layers.
Fix is needed in vp9_set_literal_size():
the sampling_x/y should be passed into update_inital_width(),
othewise sampling_x/y = 1/1 (4:2:0) was forced.
vp9_set_literal_size() is only called by the svc and
on dynamic resize.
Fix issue with the normative optimized scaler:
UV width/height was assumed to be 1/2 of Y, for
the ssse and neon code.
Also fix to assert for the scaled width/height:
in case scaled width/height is odd it should be
incremented by 1 (make it even).
Change-Id: I3a2e40effa53c505f44ef05aaa3132e1b7f57dd5
Release v1.12.0 Torrent Duck
2022-06-17 v1.12.0 "Torrent Duck"
This release adds optimizations for Loongarch, adds support for vp8 in the
real-time rate control library, upgrades GoogleTest to v1.11.0, updates
libwebm to libwebm-1.0.0.28-20-g206d268, and includes numerous bug fixes.
- Upgrading:
This release is ABI compatible with the previous release.
vp8 support in the real-time rate control library.
New codec control VP8E_SET_RTC_EXTERNAL_RATECTRL is added.
Configure support for darwin21 is added.
GoogleTest is upgraded to v1.11.0.
libwebm is updated to libwebm-1.0.0.28-20-g206d268.
- Enhancement:
Numerous improvements on checking memory allocations.
Optimizations for Loongarch.
Code clean-up.
- Bug fixes:
Fix to a crash related to {vp8/vp9}_set_roi_map.
Fix to compiling failure with -Wformat-nonliteral.
Fix to integer overflow with vp9 with high resolution content.
Fix to AddNoiseTest failure with ARMv7.
Fix to libvpx Null-dereference READ in vp8.
Change-Id: I6964e96bccf016f977cc6e83dc0a192d66a19618
min/max_gf_interval is fixed and can be passed from the command line.
It must satisfy the level constraints.
active_min/max_gf_interval might be changing based on
min/max_gf_interval. It is determined per GOP.
Change-Id: If456c691c97a8b4c946859c05cedd39ca7defa9c
replace the check on use_nonrd_pick_mode with an assert. this is only a
start, there are many branches that could be removed that check mode ==
REALTIME, etc. with this configuration.
Bug: webm:1773
Change-Id: I38cf9f83e7c085eb8e87d5cf6db7dc75359b611b
(cherry picked from commit 08b86d7622)
this avoids a crash if cpu-used is not explicitly set as there are some
(unnecessary) checks against use_nonrd_pick_mode which would cause
encoding to be skipped if the old default of 0 were used
Bug: webm:1773
Change-Id: I62fba5fb51d8afa422689b7de3f03e8f7570e50b
Fixed: webm:1773
(cherry picked from commit 95d196fdf4)
A stale codec control was removed, but compatibility was restored.
New codec control was added.
Bump *current* and *age*, and keep *revision* as 0.
Bug: webm:1752
Bug: webm:1757
Change-Id: I76179f129a10c06d897b5c62462808ed9b9c2923
replace the check on use_nonrd_pick_mode with an assert. this is only a
start, there are many branches that could be removed that check mode ==
REALTIME, etc. with this configuration.
Bug: webm:1773
Change-Id: I38cf9f83e7c085eb8e87d5cf6db7dc75359b611b
this avoids a crash if cpu-used is not explicitly set as there are some
(unnecessary) checks against use_nonrd_pick_mode which would cause
encoding to be skipped if the old default of 0 were used
Bug: webm:1773
Change-Id: I62fba5fb51d8afa422689b7de3f03e8f7570e50b
Fixed: webm:1773
This CL breaks the backward compatibility:
1365e7e1a vp9-svc: Remove VP9E_SET_TEMPORAL_LAYERING_MODE
Forcing the value of the next element
Bug: webm:1752
Change-Id: I83c774b3aa6cca25f2f14995590fb20c0a1668d4
(cherry picked from commit 013ec5722c)
This CL breaks the backward compatibility:
1365e7e1a vp9-svc: Remove VP9E_SET_TEMPORAL_LAYERING_MODE
Forcing the value of the next element
Bug: webm:1752
Change-Id: I83c774b3aa6cca25f2f14995590fb20c0a1668d4
Convert the data member EncoderTest::last_pts_ to a local variable in
the EncoderTest::RunLoop() and VP9FrameSizeTestsLarge::RunLoop()
methods. EncoderTest::last_pts_ is only used in these two methods, and
these two methods first set EncoderTest::last_pts_ to 0 before using it.
So EncoderTest::last_pts_ is effectively a local variable in these two
methods.
Note that several subclasses of EncoderTest declare their own last_pts_
data member and use it to calculate the data rate. Apparently their own
last_pts_ data member hides the same-named data member in the base
class. Although this is allowed by C++, this is very confusing.
Change-Id: I55ce1cf8cc62e07333d8a902d65b46343a3d5881
If the external model recommends an invalid q value, we use the
default q selected by libvpx's rate control strategy.
We update the test so that when the external model wants to control
GOP decision, it could get per frame information and just recommend
an invalid q.
Change-Id: I69be4b0ee0800e7ab0706d305242bb87f001b1f7
'gop_index' has already been used in vpx_rc_encodeframe_info_t,
which represents the frame index inside the current
group of picture (gop).
We therefore use 'gop_global_index' to represent the index of
the current gop to avoid duplicate names.
Change-Id: I3eb8987dd878f650649b013e0036e23d0846b5f0
This change let the encoder send first pass stats before gop
decisioins so that external models could make use of it.
Change-Id: Iafc7eddab93aa77ceaf8e1f7663a52b27d94af80
The bit mask allows us to easily add an additional control mode
which both the QP and GOP are controlled by an external model.
Change-Id: I49f676f622a6e70feb2a39dc97a4e5050b7f4760
vp9_change_config may call functions that perform allocations which
expect failures detected by CHECK_MEM_ERROR to not return.
Change-Id: I1dd1eca9c661ed157d51b4a6a77fc9f88236d794
(cherry picked from commit 3997d9bc62)
vp8_change_config may call vp8_alloc_compressor_data which expects
failures detected by CHECK_MEM_ERROR to not return.
Change-Id: Ib7fbf4af904bd9b539402bb61c8f87855eef2ad6
(cherry picked from commit 365eebc147)
this fixes a regression in make 4.2 and still present in 4.3 causing
double colon rules to be serialized which breaks sharding done by the
test and test-no-data-check rules. these targets only define one set of
rules so ordinary rules work unlike clean. install may be another
candidate, but that's left for a follow up.
Change-Id: I9f074eca2ad266eeca6e31aae2e9f31eec8680e0
Tested: make 3.81, 4.1, 4.2, 4.2.1, 4.3
- Return error instead of OK when GOP model is not set.
- Update descriptions for a few variables.
Change-Id: I213f6b7085c487507c3935e7ce615e807f4474cc
the issues fixed in this change are related to implicit conversions
between int / unsigned int:
vp9/encoder/vp9_segmentation.c:42:36: runtime error: implicit conversion
from type 'int' of value -9 (32-bit, signed) to type 'unsigned int'
changed the value to 4294967287 (32-bit, unsigned)
vpx_dsp/x86/sum_squares_sse2.c:36:52: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
vpx_dsp/x86/sum_squares_sse2.c:36:67: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
vp9/encoder/x86/vp9_diamond_search_sad_avx.c:81:45: runtime error:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
4290576316 (32-bit, unsigned) to type 'int' changed the value to
-4390980 (32-bit, signed)
vp9/encoder/vp9_rdopt.c:3472:31: runtime error: implicit conversion from
type 'int' of value -1024 (32-bit, signed) to type 'uint16_t' (aka
'unsigned short') changed the value to 64512 (16-bit, unsigned)
unsigned is forced for masks and int is used with intel intrinsics
Bug: webm:1767
Change-Id: Icfa4179e13bc98a36ac29586b60d65819d3ce9ee
Fixed: webm:1767
vp9_change_config may call functions that perform allocations which
expect failures detected by CHECK_MEM_ERROR to not return.
Change-Id: I1dd1eca9c661ed157d51b4a6a77fc9f88236d794
vp8_change_config may call vp8_alloc_compressor_data which expects
failures detected by CHECK_MEM_ERROR to not return.
Change-Id: Ib7fbf4af904bd9b539402bb61c8f87855eef2ad6
this should match VP8 and use ONE_PASS_TEST_MODES, but currently the
code will produce integer sanitizer warnings and may segfault under
certain conditions
Bug: webm:1767,webm:1768
Change-Id: I6482ff1862f19716fde3d57522591bc61d76a84f
DISABLED_TestExternalResizeSmallerWidthBiggerSize was added for
webm:1642, but never fixed
Bug: webm:1642
Change-Id: I0fa368a44dda550241ea997068c58eaff551233c
- COLS_IN_ALPHA_INDEX
this was unused given ALPHABETICAL_INDEX = NO
- PERL_PATH / MSCGEN_PATH
these were unused
quiets warnings with doxygen 1.9.1:
warning: Tag 'COLS_IN_ALPHA_INDEX' at line 1110 of file 'doxyfile' has
become obsolete.
warning: Tag 'PERL_PATH' at line 1105 of file 'doxyfile' has become
obsolete.
warning: Tag 'MSCGEN_PATH' at line 1126 of file 'doxyfile' has become
obsolete
Change-Id: I6229311afaa3318a3f9bcaf40fafcc5ea71ae271
Add a helper function to call the external rate control model.
The helper function is placed in the function where vp9 determines
GOP decisions.
The helper function passes frame information, including current
frame show index, coding index, etc to the external rate control
model, and then receives GOP decisions.
The received GOP decisions overwrites the default GOP decision, only
when the external rate control model is set to be active via
the codec control.
The decision should satisfy a few constraints, for example, larger
than min_gf_interval; smaller than max_gf_interval. Otherwise,
return error.
Unit tests are added to test the new functionality.
Change-Id: Id129b4e1a91c844ee5c356a7801c862b1130a3d8
This reverts commit 258affdeab.
Reason for revert:
Not bitexact with C version
Original change's description:
> [NEON] Optimize vp9_diamond_search_sad() for NEON
>
> About 50% improvement in comparison to the C function.
> I have followed the AVX version with some simplifications.
>
> Change-Id: I72ddbdb2fbc5ed8a7f0210703fe05523a37db1c9
Change-Id: I5c210b3dfe1f6dec525da857dd8c83946be566fc
rather than tmpfile(). this allows for setting the path with TEST_TMPDIR
and provides a valid default for android.
Change-Id: Iecb26f381b6a6ec97da62cfa0b7200f427440a2f
Simplify architecture support code and remove redundant code
to improve efficiency.
Bug: webm:1755
Change-Id: I03bc251aca115b0379fe19907abd165e0876355b
only lint-hunks.py is tested as part of the presubmit; the rest may
need further changes as they're used.
Bug: b/229626362
Change-Id: I2fd6e96deab8d892d34527e484ea65e3df86d162
Some macros have been changed to "#define do {...} While (0)",
change the rest to "static INLINE ..."
Bug: webm:1755
Change-Id: I445ac0c543f12df38f086b479394b111058367d0
If aq_mode=0 the segmentation feature may still be used
for active_maps, so the condition active_maps.enabled
needs to be added in two places regarding segmentation
logic in encodeframe.c. Otherwise the active_maps would
have no effect.
This also resolves why the assert in bug webm:1762 was
not triggered when aq_mode=0.
Change-Id: Ibd68e9b5c3f81728241a168d3fb3567d6845633d
For segment skip feature: allow for setting the
mi->interp_filter to BILINEAR, if cm->interp_filter
is set BILIENAR. This can happen at speed 9 when the
segment skip feature is used (e.g., active_maps)
Without this fix the assert can be triggered with the
active_map_test.cc for speed 9 included.
Updated the test.
Fixes the assert triggered in the issue:
Bug: webm:1762
Change-Id: I462e0bdd966e4f3cb5b7bc746685916ac8808358
About 50% improvement in comparison to the C function.
I have followed the AVX version with some simplifications.
Change-Id: I72ddbdb2fbc5ed8a7f0210703fe05523a37db1c9
previously vp9_bitstream_worker_data was checked after it was memset();
this change uses CHECK_MEM_ERROR for consistency to ensure the pointer
is checked first
Change-Id: I532d0eb0e746dc6b8d694b616eba693c5c0053ac
around ASM_REGISTER_STATE_CHECK() this helps keep the call ordering
consistent avoiding some code reordering which may affect the registers
being checked
fixes issue with armv7 and multiple versions of gcc:
[ RUN ] C/AddNoiseTest.CheckNoiseAdded/0
test/register_state_check.h:116: Failure
Expected equality of these values:
pre_store_[i]
Which is: 0
post_store[i]
Which is: 4618441417868443648
Bug: webm:1760
Change-Id: Ib8bcefd2c4d263f9fc4d4b4d4ffb853fe89d1152
Fixed: webm:1760
this avoids a desynchronization of mb_rows if an allocation prior to
vp8mt_alloc_temp_buffers() fails and the decoder is then destroyed
Bug: webm:1759
Change-Id: I75457ef9ceb24c8a8fd213c3690e7c1cf0ec425f
when no frames were decoded, for example due to a decoder initialization
failure, an orphan buffer pointer from webm_guess_framerate() via
webm_read_frame() would have been freed during cleanup
Change-Id: I6ea3defdd13dd75427f79c516e207b682391e4fa
previously the returns for alloc_context_buffers_ext() and
vp9_alloc_context_buffers() were ignored which would result in a NULL
access during encoding should they fail
Change-Id: Icd76576f3d5f8d57697adc9ae926a3a5be731327
avoid setting num_internal_frame_buffers until the allocation is
checked, avoiding an invalid access in vp9_free_internal_frame_buffers()
Change-Id: I28a544a2553d62a6b5cb7c45bf10591caa4ebab6
in vp9_free_ref_frame_buffers() and vp9_free_context_buffers(); pool and
free_mi may be NULL due to earlier allocation failures
Change-Id: I3bd26ea29b3aea6c58f33d5b7f5a280eb6250ec7
The release tag is release-1.11.0.
Ref: https://aomedia-review.googlesource.com/c/aom/+/156641
79c98a122 Upgrade GoogleTest to v1.11.0
Note the tree structure differs from libaom, but is left untouched to
avoid breaking test include paths in this commit.
Change-Id: Ia3c6861d45a3befc2decb1da5b1018bcfd38f95a
this matches the call with int_mv::as_int and fixes a warning with
clang-13 -fsanitize=integer:
vp8/decoder/decodemv.c:240:32: runtime error: implicit conversion from
type 'uint32_t' (aka 'unsigned int') of value 4282515456 (32-bit,
unsigned) to type 'int' changed the value to -12451840 (32-bit, signed)
Bug: webm:1759
Change-Id: I7c0aa72baa45421929afac26566e149adc6669d7
fixes some warnings with clang-13 -fsanitize=integer:
vp8/decoder/threading.c:77:27: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
these bitmask constants were missed in:
1676cddaa vp8: fix some implicit signed -> unsigned conv warnings
Bug: webm:1759
Change-Id: I5d894d08fd41e32b91b56a4d91276837b3415ee4
this clears warnings under clang-13 of the form:
../vp9/encoder/x86/temporal_filter_sse4.c:275:39: warning: parameter
'u_pre' set but not used [-Wunused-but-set-parameter]
Change-Id: I21519b5b0b9c21b04b174327415e0e73b56bdfda
this quiets warnings under clang-13 of the form:
../vp9/encoder/vp9_mbgraph.c:222:42: warning: variable 'gld_y_offset'
set but not used [-Wunused-but-set-variable]
Change-Id: I32170b90c07058f780b4e8100ee5217232149db8
this clears a warning under clang-13:
vp8/encoder/firstpass.c:1634:10: warning: variable
'mod_err_per_mb_accumulator' set but not used
[-Wunused-but-set-variable]
Change-Id: I694a99d56724be89090e01c45559237c0fda147a
this can occur if 0 frames are encoded, e.g., due to --skip
see also: https://crbug.com/aomedia/3243
Change-Id: I791d5ad6611dbcb60d790e6b705298328ec48126
This reverts commit 2200039d33.
This causes failures with VP9/EndToEndTestLarge.EndtoEndPSNRTest/*; it
seems the assembly does not match the C code.
Bug: webm:1586
Change-Id: I4c63beebf88d4c12789d681b0d38014510b147fe
This reverts commit 89cfe3835c.
This is a prerequisite for reverting
2200039d33 which causes high bitdepth test
failures
Bug: webm:1586
Change-Id: I28f3b98f3339f3573b1492b88bf733dade133fc0
The only difference between the code is the clamp. For
8 bit it is purely an optimization. The values outside
this range will still saturate.
Change-Id: I2a770b140690d99e151b00957789bd72f7a11e13
The optimized quantize functions were already built to handle
highbd values. The only difference is the clamping. All highbd
functions expand to 32bits when running in highbd mode.
Removes vpx_highbd_quantize_32x32_sse2 as it is slower than the
C version in the worst case.
Bug: webm:1586
Change-Id: I49bf8a6a2041f78450bf43a4f655c67656b0f8d9
Level conformance is standadized in vp9.
If a specific target level is set, the vp9 encoder is required to
produce conformant bitstream with limit on frame size, rate,
min alt-ref distance, etc.
This change makes the SimpleEncode environment take the target level
as an input.
To make existing tests pass, we set the level to 0.
Change-Id: Ia35224f75c2fe50338b5b86a50c84355f5daf6fd
Whether a block is skipped is handled by mi->skip. x->skip_block
is kept exclusively to verify that the quantize functions are not
called for skip blocks.
Finishes the cleanup in 13eed991f
Bug: libvpx:1612
Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
These would compute the sum of absolute differences (sad) for a
group of 3 or 8 references. This was used as part of an exhaustive
search.
vp8 only uses these functions in speed 0 and best quality.
For vp9 this is only used with the --enable-non-greedy-mv
experiment.
This removes the 3- and 8-at-a-time optimized functions and uses
the fall back code which will process 1 or 4 (vpx_sadMxNx4d) at
a time.
For configure --target=x86_64-linux-gcc --enable-realtime-only:
libvpx.a
before: 3002424 after: 2937622 delta: 64802
after 'strip libvpx.a'
before: 2116998 after: 2073090 delta: 43908
Change-Id: I566d06e027c327b3bede68649dd551bba81a848e
Clean up a new build warning with gcc11:
argument 3 of type ‘const uint8_t * const[]’ with
mismatched bound [-Warray-parameter=]
Standardize sad functions with array sizes.
Change-Id: Iea4144e61368f6a8279e2f3ae96c78aff06c8b41
The distance between PROC and END is used to generate .size
information for debugging. When the leading underscore was
removed the pattern used to match the function name broke.
Change-Id: I90bf67d95ecdc2d214606e663773f88d2a2d6b9c
[NEON]
Added vpx_fdct4x4_pass1_neon(),
Added vpx_fdct8x8_pass1_notranspose_neon(),
Added vpx_fdct8x8_pass1_neon() to avoid code duplication
Refactored vpx_fdct4x4_neon() and vpx_dct8x8_neon() to use the above
Rename dct_body to vpx_fdct16x16_body to reuse later
Add transpose_s16_16x16()
I have run make test and all tests/configurations seem to pass.
Profiled using this command on an Ampere Altra VM:
sudo perf record -g ./vpxenc --codec=vp9 --height=1080 --width=1920 \
--fps=25/1 --limit=20 -o output.mkv \
../original_videos_Sports_1080P_Sports_1080P-0063.mkv --debug –rt
Before this optimization:
1.32% 1.32% vpxenc vpxenc [.] vpx_fdct4x4_neon
0.16% 0.16% vpxenc vpxenc [.] vpx_fdct4x4_c
0.79% 0.79% vpxenc vpxenc [.] vpx_fdct8x8_c
0.52% 0.52% vpxenc vpxenc [.] vpx_fdct8x8_neon
1.23% 1.23% vpxenc vpxenc [.] vpx_fdct16x16_c
0.54% 0.54% vpxenc vpxenc [.] vpx_fdct16x16_neon
So, even though a _neon() version exists, the C version was called \
as well. After this patch:
1.42% 1.36% vpxenc vpxenc [.] vpx_fdct4x4_neon
0.87% 0.82% vpxenc vpxenc [.] vpx_fdct8x8_neon
0.74% 0.74% vpxenc vpxenc [.] vpx_fdct16x16_neon
Change-Id: Id4e1dd315c67b4355fe4e5a1b59e181a349f16d0
The gcc assembler was incompatible for a long
time. It is now based on clang and accepts
more modern syntax, although not enough to
remove the script entirely.
Change-Id: I667d29dca005ea02a995c1025c45eb844081f64b
Many of the features in ads2gas are no longer used.
Remove all patterns which are no longer used in
libvpx.
Simplify between the two to minimize differences.
Change-Id: Ia1151eb8b694cbe51845a1374a876cc7b798899c
The control was never implemented, no need to keep this.
temporal_layering_mode is set in the config.
Bug: webm:1753
Change-Id: I9a6eb50e82344605ab62775911783af82ac2d401
Allow intra-only frame in svc to also work
in bypass (flexible-svc) mode.
Added unittest for the flexible svc case.
And fix the gld_fb_idx for (SL0, TL1) in bypass/flexible
mode pattern in the sample encoder: force it to be 0
(same as lst_fb_idx), since the slot is unused on SL0.
Change-Id: Iada9d1b052e470a0d5d25220809ad0c87cd46268
Fix some issues with the test, and add new
test that verifies that we can decode base stream
startinig at middle of sequence where intra-only
frame is inserted.
Change-Id: I398d23927113eb58ef64694feca25e60ce60a5f7
RTC sample encoder vpx_temporal_svc_encoder can take mask files as input
when ROI_MAP is set to 1.
Uses ROI and segmentation of vp9 to skip background encoding when
source_sad is low and the correspond block in previous frame is also
skipped.
Change-Id: I8590e6f9a88cecfa1d7f375d4cc480f0f2af87b6
Original change's description:
> Add vp9 ref frame to flag map function
>
> Change-Id: I371c2346b9e0153c0f8053cab399ce14cd286c56
Change-Id: I04a407ee0ef66c01a0d224b4468e043213f8791f
under Visual Studio:
Warning C4244 '=': conversion from 'int64_t' to 'vpx_prob', possible loss of
data
after:
ea042a676 vp9 encoder: fix integer overflows
'newp' has already been range checked earlier in the loop so the cast won't
have any unexpected results
Change-Id: Ic10877db2c0633d53fffdf8852d5095403c23a02
this is a followup to:
7fbcee49d quiet -Warray-parameter warnings
and conforms to aom in:
06e13e817 quiet -Warray-parameter warnings
the sad functions are more varied in libvpx and will require a separate
pass
Change-Id: I765fd6704df615e836ba0b184ff8266ce926c394
If a reference frame is not referenced, then set the index for that
reference to the first one used/referenced instead of unused slot.
Unused slot means key frame, as key frame resets all slots with itself.
This CL extracts `get_first_ref_frame()` from `reset_fb_idx_unused()`
with a typo fixing, and sets all unused reference frames to first ref in
vp9 uncompressed header.
Bug: webrtc:13442
Change-Id: I99523bc2ceedf27efe376d1113851ff342982181
this removes the burden from callers; the rtcd functions are left with a
mostly redundant (outside of tests) once() as top-level functions should
ensure their constraints are met
Change-Id: I5bdbcfa4671c6a1492cfe9c7d886c361c26caaa9
w/gcc-11
v_these_mv_w is always initialized in this block with _mm_add_epi16();
converting this to a _mm_storeu_si32(tmp) call also works, but
introduces more stack usage
|| ../vp9/encoder/x86/vp9_diamond_search_sad_avx.c: In function
‘vp9_diamond_search_sad_avx’:
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|285 col 19| warning:
‘v_these_mv_w’ may be used uninitialized [-Wmaybe-uninitialized]
|| 285 | new_bmv = ((const int_mv *)&v_these_mv_w)[local_best_idx];
|| | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|149 col 21| note:
‘v_these_mv_w’ declared here
|| 149 | const __m128i v_these_mv_w = _mm_add_epi16(v_bmv_w, v_ss_mv_w);
|| | ^~~~~~~~~~~~
Change-Id: I1cd2fcb41030db16f51c94f3a70eb8eb2a526401
w/gcc-11
as noted in
the size of interp_filter_selected[][]'s first dimension varies between
VP9_COMP and VP9BitstreamWorkerData as noted in the latter's definition:
// The size of interp_filter_selected in VP9_COMP is actually
// MAX_REFERENCE_FRAMES x SWITCHABLE. But when encoding tiles, all we ever do
// is increment the very first index (index 0) for the first dimension. Hence
// this is sufficient.
int interp_filter_selected[1][SWITCHABLE];
normalize the function signatures of write_modes*(), etc. to take this
into account.
vp9/encoder/vp9_bitstream.c|948 col 3| warning: ‘write_modes’ accessing
64 bytes in a region of size 16 [-Wstringop-overflow=]
|| 948 | write_modes(cpi, xd, &cpi->tile_data[data->tile_idx].tile_info,
|| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| 949 | &data->bit_writer, tile_row, data->tile_idx,
|| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| 950 | &data->max_mv_magnitude, data->interp_filter_selected);
|| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/vp9_bitstream.c|948 col 3| note: referencing argument 8 of
type ‘int (*)[4]’
vp9/encoder/vp9_bitstream.c|488 col 13| note: in a call to function
‘write_modes’
Change-Id: I0898cd7c3431633c382a0c3a1be2f0a0bea8d0f9
the row-based loop filter is ok (and being used) in this case; since
it's serialized the previous row will always be done
Change-Id: I024a0c78e7488178956cc22a4c4680a00dc6eade
previously row-mt would allocate thread data once, so increasing the
number of threads with a config change would cause a heap overflow.
Bug: chromium:1261415
Bug: chromium:1270689
Change-Id: I3c5ec8444ae91964fa34a19dd780bd2cbb0368bf
cap bitrate to 1000Mbps, change bitsaving budget to int64_t
this make test coverage for 2048x2048 - same as for vp8
Bug: webm:1749
Fixed: webm:1749
Change-Id: Ic58d73cb7529b0826d1f501ad09af8e80f706a6e
Gives 10% faster VP8 encoding in simple tests.
This patch requires testing on wider datasets and encoder
settings to see if this speedup is achieved on most data.
Change-Id: If8e04819623e78fff126c413db66c964c0b4c11a
and rename the table to kCodecIfaces[] to be a little more specific and
avoid shadowing kCodecs[] in SetRoi()
Change-Id: I64905f48d8bf76e812bdba8374b82e3f7654686f
Remove -mmacosx-version-min. The library does not use
any calls which are affected by the platform version.
There is also no version 10.16 as it went from 10.15
to 11 and now to 12.
At some point it may be good to clarify that the bare
-darwin- target is for iOS and the -darwinN- targets
are for macOS.
Change-Id: I2fd5f7cae2637905acf3ab77bfddfbe367abbb68
Fix errors reported by UBSan diagnostics:
1. /vp9/encoder/vp9_pickmode.c:308:29: unsigned integer overflow:
99 - 100 cannot be represented in type 'unsigned int'
2. /vp9/encoder/vp9_pickmode.c:330:27: unsigned integer overflow:
21976 - 21978 cannot be represented in type 'unsigned int'
3. /vp9/encoder/vp9_pickmode.c:468:13: unsigned integer overflow:
18852144 - 18852149 cannot be represented in type 'unsigned int'
(Notice that line numbers might vary a bit because fixes have
been applied incrementally i.e. fix for error #1 affects line
number reported in #2)
Fix by calculating difference instead of wrapping around to
a value near maximum.
Test: Cuttlefish webrtc with VP9 codec
Change-Id: I4f85712028647e915a4e2da31e4b0a266e9e2705
Fix UBSan error reported from aosp Cuttlefish device:
/vp9/encoder/vp9_ratectrl.c:238:33: unsigned integer overflow:
2500000 * 1800 cannot be represented in type 'unsigned int'
...by casting the operand and the result of multiplication
to 64bit integer.
Test: vp9 webrtc streaming with Cuttlefish
Change-Id: Id5bb3d4071a96179caffae0829d3cc4e48c7614b
cap the bitrate to 1000Mbps to avoid many instances of bitrate * 3 / 2
overflowing.
this adds coverage for 2048x2048 in the default test for VP8 with TODOs
for issues at that resolution for VP9 and at max resolution for both.
Bug: b/189602769
Bug: chromium:1264506
Bug: webm:1748
Bug: webm:1749
Bug: webm:1750
Bug: webm:1751
Change-Id: Iedee4dd8d3609c2504271f94d22433dfcd828429
this changes the return to int32_t which matches the type with usage
of this call as input to _mm_cvtsi32_si128(), _mm_set_epi32(), etc.
fixes implicit conversion warning with clang-11 -fsanitize=undefined
Change-Id: I1425f12d4f79155dd5d7af0eb00fbdb9f1940544
this changes the parameter to int32_t which matches the type with usage
of this call using _mm_cvtsi128_si32() as a parameter. quiets an
implicit conversion warning with clang-11 -fsanitize=undefined
Change-Id: I1e9e9ffac5d2996962d29611458311221eca8ea0
with -fsanitize=undefined
test/video_source.h:194:33: runtime error: implicit conversion from type
'int' of value -32 (32-bit, signed) to type 'unsigned int' changed the
value to 4294967264 (32-bit, unsigned)
Change-Id: I92013086d517fecf01c9e4cdfe6737b8ce733a1f
this is similar to the fix for calc_iframe_target_size:
5f345a924 Avoid overflow in calc_iframe_target_size
Bug: chromium:1264506
Change-Id: I2f0e161cf9da59ca0724692d581f1594c8098ebb
the intermediate value in the correction_factor calculation may exceed
integer bounds
Bug: b/189602769
Change-Id: I75726b12f3095663911d78333f3ea26eb6dee21e
and use it to set the format attribute for printf like functions. this
allows the examples to be built with -Wformat-nonliteral without
producing warnings.
Bug: webm:1744
Change-Id: I26b4c41c9a42790053b1ae0e4a678af8f2cd1d82
Fixed: webm:1744
and use it to set the format attribute for the printf like function
vpx_internal_error(). this allows the main library to be built with
-Wformat-nonliteral without producing warnings; the examples will be
handled in a followup.
Bug: webm:1744
Change-Id: Iebc322e24db35d902c5a2b1ed767d2e10e9c91b9
previously ranges were checked with abs() whose behavior is undefined
with INT_MIN. this fixes a crash when the original value is returned and
it later used as and offset into a table.
Bug: webm:1742
Change-Id: I345970b75c46699587a4fbc4a059e59277f4c2c8
previously ranges were checked with abs() whose behavior is undefined
with INT_MIN. this fixes a crash when the original value is returned and
it later used as and offset into a table.
Bug: webm:1742
Change-Id: I345970b75c46699587a4fbc4a059e59277f4c2c8
This allows user to make sure frame will be encoded
when drop_frames is set off (on the fly), no matter
the state of the buffer.
Change-Id: Ia7b39b93fe3721dd586bdbede72c525db87b6890
Condition already existed for screen content mode,
but only when frame-dropper was off. Remove the
frame drop condition.
Change-Id: Ie7357041f5ca05b01e78b4bd3b40da060382591b
Define VPX_NO_RETURN as __declspec(noreturn) for MSVC. See
https://docs.microsoft.com/en-us/cpp/cpp/noreturn?view=msvc-160
This requires moving VPX_NO_RETURN before function declarations because
__declspec(noreturn) must be placed there. Fortunately GCC's
__attribute__((noreturn)) can be placed either before or after function
declarations.
Change-Id: Id9bb0077e2a4f16ec2ca9c913dd93673a0e385cf
(cherry picked from commit 8a6fbc0b4e)
Define VPX_NO_RETURN as __declspec(noreturn) for MSVC. See
https://docs.microsoft.com/en-us/cpp/cpp/noreturn?view=msvc-160
This requires moving VPX_NO_RETURN before function declarations because
__declspec(noreturn) must be placed there. Fortunately GCC's
__attribute__((noreturn)) can be placed either before or after function
declarations.
Change-Id: Id9bb0077e2a4f16ec2ca9c913dd93673a0e385cf
For 1 layer CBR only.
Support for temporal layers comes later.
Rename the library to libvpxrc
Bug: b/188853141
Change-Id: Ib7f977b64c05b1a0596870cb7f8e6768cb483850
Also use round to cast float to int with more accurate calculation to
avoid error accumulation which causes qp to be different after ~290
frames.
Change-Id: Iff65a8fdc67401814fd253dbf148afe9887df97f
This feature was added to help speed up still images and slideshows.
It didn't work anymore, and thus was disabled. Code cleanup will
follow.
This had negligible impact to regular test sets. Borg test result
on ugc360p set at speed 3.
avg_psnr: ovr_psnr: ssim: speed:
-0.244 -0.278 -0.153 -0.973
Change-Id: If74edabce0c93be1361e645ffd2eec063c2db76b
This will do 3 things:
Turn off low motion computation
Turn off gf update constrain on key frame frequency
turn off content mode for cyclic refresh
Those are used to verify the external ratectrl lib works as expected.
Change-Id: Ic6e61498de82d6b3973e58df246cf5e05f838680
Check for x + w and y + h overflows in vpx_img_set_rect().
Move the declaration of the local variable 'data' to the block it is
used in.
Change-Id: I6bda875e1853c03135ec6ce29015bcc78bb8b7ba
target_bandwidth is int64_t, but layer_target_bitrate[0] is an int. this
is safe in the only place it's set because target_bandwidth defaults to
1000. target_bandwidth is later used to populate the cpi's target, which
is an unsigned int so there may be further fixes/cleanups that can be
done.
Change-Id: I35dbaa2e55a0fca22e0e2680dcac9ea4c6b2815a
The changed product was observed to attempt to multiply 1800 by 2500000,
which overflows unsigned 32 bits. Converting to unsigned 64 bits first
and testing whether the final result fits in 32 bits solves the problem.
BUG=b:179686142
Change-Id: I5d27317bf14b0311b739144c451d8e172db01945
The encoder has a feature to skip transform and quantization based
on model rd analysis. It could happen that the model
based analysis lets the encoder skips transform and quantization, while
a bad prediction occurs, leading to bad reconstructed blocks, which
are intrusive and apparently coding errors.
We add a speed feature to guard the skipping feature.
Due to the risk of bad perceptual quality, we disallow such skipping
by default.
On hdres test set, speed 2, the coding performance difference is 0.025%,
speed difference is 1.2%, which can be considered non significant.
BUG=webm:1729
Change-Id: I48af01ae8dcc7a76c05c695f3f3e68b866c89574
For usage in the external RC. When content_mode = 0,
the cyclic refresh has no dependency on the content
(motion, spatial variance, motion vectors, etc,).
The content_mode = 0, when compared to content_mode = 1,
on rtc set for speed 7: has some regression on some
clips (~3-5%), but overall/average bdrate loss is
about ~1-2%.
Comparing aq_mode=3 with content_mode = 0, vs aq_mode=3:
about ~14% avg/overall bdrate gain, but has ~3-7% regression
on some hard motion clip (e.g.m street).
Change-Id: I93117fabb8f7f89032c15baf1292b201e8c07362
Added a new flag in rate control which turns off gf interval constrain
on key frame frequency for external RC.
It remains on for libvpx.
Change-Id: I18bb0d8247a421193f023619f906d0362b873b31
for rc_interface_test_one_layer_vbr and
rc_interface_test_one_layer_vbr_periodic_key added in:
1f45e7b07 vp9 rc: add vbr to rtc rate control library
Change-Id: I8bfa3698284c8ff289e830f7b8fa1ca42b752563
fixes warnings under visual studio:
vp9\encoder\vp9_ratectrl.c(2012): warning C4028: formal parameter 1
different from declaration
vp9\encoder\vp9_ratectrl.c(2027): warning C4028: formal parameter 1
different from declaration
Change-Id: Ia0740db597fb7a259f90d362b483f58662f9f584
This reduces some regression when external RC
is used, for which avg_frame_low_motion is not
set/updated (=0).
Change-Id: I2408e62bd97592e892cefa0f183357c641aa5eea
this allows the file to be located in LIBVPX_TEST_DATA_PATH similar to
other test sources.
Bug: webm:1731
Change-Id: I51606635d91871e7c179aa8d20d4841b0d60b6ad
Two pass rc parameters are only initialized in the second pass
in vp9 normal two pass encoding.
However, the simple_encode API queries the keyframe group, arf group,
and number of coding frames without going throught the two pass
route.
Since recent libvpx rc changes, parameters in the TWO_PASS
struct have a great influence on the determination of the above
information.
We therefore need to properly init two pass rc parameters in
the simple_encode related environment.
Change-Id: Ie14b86d6e7ebf171b638d2da24a7fdcf5a15c3d9
* changes:
Use 'ptrdiff_t' instead of 'int' for pointer offset parameters
Implement vpx_convolve8_avg_vert_neon using SDOT instruction
Merge transpose and permute in Neon SDOT vertical convolution
A number of the load/store functions in mem_neon.h use type 'int' for
the 'stride' pointer offset parameter. This causes Clang to generate
the following warning every time these functions are called with a
wider type passed in for 'stride':
warning: implicit conversion loses integer precision: 'ptrdiff_t'
(aka 'long') to 'int' [-Wshorten-64-to-32]
This patch changes all such instances of 'int' to 'ptrdiff_t'.
Bug: b/181236880
Change-Id: I2e86b005219e1fbb54f7cf2465e918b7c077f7ee
Add an alternative AArch64 implementation of
vpx_convolve8_avg_vert_neon for targets that implement the Armv8.4-A
SDOT (signed dot product) instruction.
The existing MLA-based implementation of vpx_convolve8_avg_vert_neon
is retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: I971c626116155e1384bff4c76fd3420312c7a15b
The original dot-product implementation of vpx_convolve8_vert_neon
used a separate transpose before and after the convolution operation.
This patch merges the first transpose with the TBL permute (necessary
before using SDOT to compute the convolution) to significantly reduce
the amount of data re-arrangement. This new approach also allows for
more effective data re-use between loop iterations.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Bug: b/181236880
Change-Id: I87fe4dadd312c3ad6216943b71a5410ddf4a1b5b
Add an alternative AArch64 implementation of
vpx_convolve8_avg_horiz_neon for targets that implement the Armv8.4-A
SDOT (signed dot product) instruction.
The existing MLA-based implementation of vpx_convolve8_avg_horiz_neon
is retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: Ib435107c47c485f325248da87ba5618d68b0c8ed
Implement sum of squared difference calculations in vpx_mse16x16_neon
and vpx_get4x4sse_cs_neon using the ABD and UDOT instructions -
instead of widening subtracts followed by a sequence of MLAs.
The existing implementation is retained for use on CPUs that do not
implement the Armv8.4-A UDOT instruction. This commit also updates
the variable names used in the existing implementations to be more
descriptive.
Bug: b/181236880
Change-Id: Id4ad8ea7c808af1ac9bb5f1b63327ab487e4b1c7
Add an alternative AArch64 implementation of vpx_convolve8_vert_neon
for targets that implement the Armv8.4-A SDOT (signed dot product)
instruction.
The existing MLA-based implementation of vpx_convolve8_vert_neon is
retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: Iebb8c77aba1d45b553b5112f3d87071fef3076f0
Accelerate Neon variance functions by implementing the sum of squares
calculation using the Armv8.4-A UDOT instruction instead of 4 MLAs.
The previous implementation is retained for use on CPUs that do not
implement the Armv8.4-A dot product instructions.
Bug: b/181236880
Change-Id: I9ab3d52634278b9b6f0011f39390a1195210bc75
Implementing sad16_neon using ABD, UDOT instead of ABAL, ABAL2 saves
a cycle and removes resource contention for a single SIMD pipe on
modern out-of-order Arm CPUs. The UDOT accumulation into 32-bit
elements also allows for a faster reduction at the end of each SAD
function.
The existing implementation is retained for CPUs that do not
implement the Armv8.4-A UDOT instruction, and CPUs executing in
AArch32 mode.
Bug: b/181236880
Change-Id: Ibd0da46e86751d2f808c7b1e424f82b046a1aa6f
Use the AArch64-only ADDV and ADDLV instructions to accelerate
reductions that add across a Neon vector in sum_neon.h. This commit
also refactors the inline functions to return a scalar instead of a
vector - allowing for optimization of the surrounding code at each
call site.
Bug: b/181236880
Change-Id: Ieed2a2dd3c74f8a52957bf404141ffc044bd5d79
quiets an integer sanitizer warning:
vpx/src/vpx_image.c:101:25: runtime error: implicit conversion from
type 'int' of value -2 (32-bit, signed) to type 'unsigned int' changed
the value to 4294967294 (32-bit, unsigned)
Change-Id: Ifeac31cc80811081c1ba10aadaa94dc36cd46efa
Manually unrolling the inner loop is sufficient to stop the compiler
getting confused and emitting inefficient code.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Bug: b/181236880
Change-Id: I860768ce0e6c0e0b6286d3fc1b94f0eae95d0a1a
Implement AArch64-only paths for each of the Neon SAD reduction
functions, making use of a wider pairwise addition instruction only
available on AArch64.
This change removes the need for shuffling between high and low
halves of Neon vectors - resulting in a faster reduction that requires
fewer instructions.
Bug: b/181236880
Change-Id: I1c48580b4aec27222538eeab44e38ecc1f2009dc
Add an alternative AArch64 implementation of vpx_convolve8_horiz_neon
for targets that implement the Armv8.4-A SDOT (signed dot product)
instruction.
The existing MLA-based implementation of vpx_convolve8_horiz_neon is
retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Change-Id: I5337286b0f5f2775ad7cdbc0174785ae694363cc
this file uses GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST so it's
safe to enable unconditionally. the filter check fell out of sync with
the code, there's a sse2 and neon implementation for the filter.
Change-Id: I2a3336ccef3fb524ca5d9b8f88279240c9a276aa
Change clamp to an assert so we are warned if changes to input
ranges or defaults in the future lead to an invalid value.
Change-Id: Idb4e0729f477a519bfff3083cdce3891e2fc6faa
* changes:
vpx_convolve_neon: prefer != 0 to > 0 in tests
vpx_convolve_avg_neon: prefer != 0 to > 0 in tests
vpx_convolve_copy_neon: prefer != 0 to > 0 in tests
this produces better assembly code; the horizontal convolve is called
with an adjusted intermediate_height where it may over process some rows
so the checks in those functions remain.
Change-Id: Iebe9842f2a13a4960d9a5addde9489452f5ce33a
Imposed provisional upper and lower limits to each parameter
that can be adjusted in the Vizier ML experiment.
Also in some cases applied secondary limits on on the
range of the final "used" values.
Defaults and limits may well require further tuning after
subsequent rounds of experimentation.
Re-factor get_sr_decay_rate().
Change-Id: I28e804ce3d3710f30cd51a203348e4ab23ef06c0
The overshoot_pct & undershoot_pct attributes for rate control
are expressed as a percentage of the target bitrate, so the range
should be 0-100.
Change-Id: I67af3c8be7ab814c711c2eaf30786f1e2fa4f5a3
Further changes to normalize the Vizier command line parameters.
The intent is that the default behavior for any given parameter
is signaled by the value 1.0 (expressed on the command line as a
rational).
The final values used in the two pass code are obtained by multiplying
the passed in factors by a default values if use_vizier_rc_params is 1.
Where use_vizier_rc_params is 0 the values are explicitly set to
the defaults.
This patch also changes the default value of each parameter to 1.0
even if not set explicitly. This should ensure safe /default behavior
if the user sets use_vizier_rc_params to 1 but does not set all the
the individual parameters.
Change-Id: Ied08b3c22df18f42f446a4cc9363473cad097f69
Add command line options for three rd parameters.
They are controlled by --use_vizier_rc_params, together with
other rc parameters.
If not set from command line, current default values will be used.
Change-Id: Ie1b9a98a50326551cc1d5940c4b637cb01a61aa0
If pass --use-vizier-rc-params=1, the rc parameters are overwittern
by pass in values. It --use-vizier-rc-params=0, the rc parameters
remain the default values.
Change-Id: I7a3e806e0918f49e8970997379a6e99af6bb7cac
this avoids uninitialized values and potential misuse of them which
could lead to a crash should the function fail
this is the same fix that was applied in libaom:
d0cac70b5 Fix a free on invalid ptr when img allocation fails
Bug: webm:1722
Change-Id: If7a8d08c4b010f12e2e1d848613c0fa7328f1f9c
Recently, some function signatures have been changed.
This change fixes compilation error if --enable-rate-ctrl is used.
Change-Id: Ib8e9cb5e181ba1d4a6969883e377f3dd93e9289a
A recent change leads to slight difference of encoding results:
d3aaac367 Change calculation of rd multiplier,
which is caught by Jenkins nightly test.
Adjust the threshold to silence the test failure.
BUG=webm:1725
Change-Id: I7e8b3a26b72c831ae4d88d0fca681b354314739d
Changes the exposed zm_factor parameter.
This patch alters the meaning of the zm_factor
parameter that will be exposed for the Vizier project.
The previous power factor was hard to interpret in terms
of its meaning and effect and has been replaced by a linear factor.
Given that the initial Vizier results suggested a lower zero motion
effect for all formats, the default impact has been reduced.
The patch as it stands gives a modest improvement for PSNR
but is slightly down on some sets for SSIM
(overall psnr, ssim % bdrate change: -ve is better)
lowres -0.111, 0.001
ugc360p -0.282, -0.068
midres2 -0.183, 0.059
hdres2 -0.042, 0.172
Change-Id: Id6566433ceed8470d5fad1f30282daed56de385d
This is similar to the change:
https://chromium-review.googlesource.com/c/webm/libvpx/+/2771081
Which fails libvpx nightly test.
Here we add range check to get rid of the warning of
"divided by zero".
BUG=webm:1723
Change-Id: I7712efe7abd4b11cdb725643d51fd1c0a300d924
Change the way the rd multiplier is adjusted for Q and frame type.
Previously in VP9 the rd multiplier was adjusted based on crude Q bins
and whether the frame was a key frame or inter frame.
The Q bins create some problems as they potentially introduce
discontinuities in the RD curve. For example, rate rising with a
stepwise increase in Q instead of falling. As such, in AV1 they
have been removed.
A further issue was identified when examining the first round of
results from from the Vizier project. Here the multiplier for each Q bin
and each frame type was optimized for a training set, for various video
formats, using single point encodes at the appropriate YT rates.
These initial results appeared to show a trend for increased rd
multiplier at higher Q for key frames. This fits with intuition as in
this encoding context a higher Q indicates that a clip is harder to
encode and frames less well predicted. However, the situation
appeared to reverse for inter frames with higher rd multipliers
chosen at low Q.
My initial suspicion was that this was a result of over fitting, but on
closer analysis I realized that this may be more related to frame type
within the broader inter frame classification. Specifically frames coded
at low Q are predominantly ARF frames, for the mid Q bin there will
likely be a mix of ARF and normal inter frames, and for the high Q bin
the frames will almost exclusively be normal inter frames from difficult
content.
ARF frames are inherently less well predicted than other inter frames
being further apart and not having access to as many prediction modes.
We also know from previous work that ARF frames have a higher
incidence of INTRA coding and may well behave more like key frames
in this context.
This patch replaces the bin based approach with a linear function
that applies a small but smooth Q based adjustment. It also splits
ARF frames and normal inter frames into separate categories.
With this done number of parameters that will be exposed for the
next round of Vizier training is reduced from 7 to 3 (one adjustment
factor each for inter, ARF and key frames)
This patch gives net BDATE gains for our test sets even with the
baseline / default factors as follows: (% BDRATE change in overall
PSNR and SSIM, -ve is better)
LowRes -0.231, -0.050
ugc360p 0.160, -0.315
midres2 -0.348, -1.170
hdres2 -0.407, -0.691
Change-Id: I46dd2fea77b1c2849c122f10fd0df74bbd3fcc7f
These rate control parameters are for the Vizier experiment.
They are defined as rational numbers.
Change-Id: I23f382dd49158db463b75b5ad8a82d8e0d536308
this avoids a warning about differences in size between void* and
unsigned int under msvc:
vp9_ext_ratectrl_test.cc(40,3): warning C4312: 'reinterpret_cast':
conversion from 'const unsigned int' to 'void *' of greater size
Change-Id: I5a412ec785ddcaeff2ec71bb83a6048505400293
Release v1.10.0 Ruddy Duck
2021-03-09 v1.10.0 "Ruddy Duck"
This maintenance release adds support for darwin20 and new codec controls, as
well as numerous bug fixes.
- Upgrading:
New codec control is added to disable loopfilter for VP9.
New encoder control is added to disable feature to increase Q on overshoot
detection for CBR.
Configure support for darwin20 is added.
New codec control is added for VP9 rate control. The control ID of this
interface is VP9E_SET_EXTERNAL_RATE_CONTROL. To make VP9 use a customized
external rate control model, users will have to implement each callback
function in vpx_rc_funcs_t and register them using libvpx API
vpx_codec_control_() with the control ID.
- Enhancement:
Use -std=gnu++11 instead of -std=c++11 for c++ files.
- Bug fixes:
Override assembler with --as option of configure for MSVS.
Fix several compilation issues with gcc 4.8.5.
Fix to resetting rate control for temporal layers.
Fix to the rate control stats of SVC example encoder when number of spatial
layers is 1.
Fix to reusing motion vectors from the base spatial layer in SVC.
2 pass related flags removed from SVC example encoder.
Bug: webm:1712
Change-Id: I4d807da7aee5a4d9d7a7af66b927983622e9cefa
This patch converts the Vizier custom RD multipliers, to factors
that adjust each RD multiplier either side of its default value, where
a factor of 1.0 will give the previous default behavior.
Ultimately I would like to replace the multiple RD multipliers
triggered at different Q thresholds (eg, low, medium, high q)
with a function that adjusts the rd behavior smoothly as Q
changes.
Vizier could then be presented with a single adjustment control
for each of key frame and inter frame rd.
The current behavior is problematic.
Firstly having hard threshold Q values at which rd behavior changes
may cause anomalies in the rate distortion curve, where in some
situations, raising Q, for example, may not cause the expected drop
in rate and rise in distortion, because we have crossed a threshold
where the rate distortion multiplier changes sharply and this alters
the balance of bits spent in the prediction and residual parts of the
signal.
Having a single value that is used for a range of Q index values
(eg 0-64), (65-128) may also cause problems and over-fitting in
the context of the Vizier ML project. This project tries to optimize
the values for each Q range, for various YT formats, but does so
by analyzing the results of single point encodes on a set of clips.
For a given format all the clips are encoded with the same parameters
(target rate etc) so there is likely to be clustering in regards to the
Q values used. For example the training set may give a new value
for the Q range 0-64 but most of the data points used may have Q
close 64.
It will likely require several iterations working with the Vizier team
to get this right. This patch just gives an initial framework for
testing.
Change-Id: Iaa4cd5561b95a202bcae7a1d876c4f40ef444fa2
This patch changes the way prediction decay is calculated.
We expect that frames that are further from an ALT-REF frame (or Golden
Frame) will be less well predicted by that ALT-REF frame. As such it is
desirable that they should contribute less to the boost calculation used
to assign bits to the ALT_REF.
This code looks at the reduction in prediction quality between the last
frame and the second reference frame (usually two frames old). We make
the assumption that we can accumulate this to get a proxy for the likely
loss of prediction quality over multiple frames.
Previously the calculation looked at the absolute difference in the
coded errors. The issue here is that the meaning of a unit difference
is not the same for very complex frames as it is for easy frames.
In this patch we scale the decay value based on how the error difference
compares to the overall frame complexity as represented by the intra
coding error.
This was tuned experimentally to give test results that
were approximately neutral for our various test sets. There was
a slight drop in Overall PSNR but a consistent improvement in
SSIM. This balance may be improved with tuning further as it is
noteworthy that it was much better on the hd_res set.
Results (Overall PSNR, SSIM -ve better) for low_res, ugc360, midres2,
ugc480P and hd_res are as follows:
0.173 -0.688
0.118 -0.153
0.132 -0.239
0.261 -0.405
-0.305 -1.109
As part of this adjustment the contribution of motion amplitude was
removed.
This patch also changes the control mechanism that will be exposed
on the command line for use by the Vizier project. The control is now
a linear factor which defaults to 1.0, where values < 1.0 mean a lower
decay rate and values > 1.0 mean an increased decay rate.
This presents a more easily understandable interface for use in
optimizing the decay behavior for various formats, where it is clear
what a passed in value means relative to the default.
With the new decay mechanism the current values for various formats
are almost certainly wrong and we still need to define sensible upper
and lower bounds for use during future training.
Change-Id: Ib1074bbea97c725cdbf25772ee8ed66831461ce3
Added the experimental max per frame KF boost values derived from
the Vizier experiments.
These are still all off by default.
When enabled I expect these to cause significant regression as they
fluctuate wildly and in a way that makes no sense from format to format.
I suspect these values reflect over fitting perhaps from a subset of
training clips with more frequent mid chunk key frames and or short key
frame groups.
Also fixed incorrect value for gf boost for one format.
Experiment to moderate these values and use different values for first
and subsequent KF groups to follow.
Change-Id: Ibeb4268957f2edacdb4549d74930255a22a2fcc5
Added kf_frame_min_boost field to hold the minimum per frame
boost in key frame boost calculations. Replaces hard wired value.
To be used in conjunction with and tied to the maximum value.
Change-Id: I67a39ecb3f21b5918512a5ccd9a1b214d7971e45
Previous code did not have sensible defaults for larger image formats.
Added defaults for Vizier RD parameters for sizes > 1080P and changed
the first pass parameters for large formats to use the 1080P values.
No supplied value for rd_mult_q_sq_key_high_qp case yet so set to
old hard wired default value.
If the Vizier parameters were enabled the lack of sensible defaults
caused a large regression for 2K clips in one of our test sets.
Change-Id: I306c0cd76eab00d50880c91fadb5842faf6661ff
Further integration of Vizier adjustable parameters,
This patch connects up additional configurable two pass rate control
parameters for the Vizier project. This still needs to be connected up
to a command line interface and at the moment should still be using
default values that match previous behavior.
Do not submit until verified that defaults are all working correctly.
Change-Id: If1241c2dba6759395e6efa349c4659a0c345361d
< 4 isn't meaningful in the first pass; additional analysis will be
done, but thrown out, unnecessarily increasing the runtime.
Change-Id: Ic3de77e3eaa7a8a3371f76f84693e9655c60fdba
2019-05-30 20:16:46 -07:00
626 changed files with 81969 additions and 26706 deletions
Development libraries are available from the [releases](https://github.com/ShiftMediaProject/libvpx/releases) page. These libraries are available for each supported Visual Studio version (2013, 2015 or 2017) with a different download for each version. Each download contains both static and dynamic libraries to choose from in both 32bit and 64bit versions.
Development libraries are available from the [releases](https://github.com/ShiftMediaProject/libvpx/releases) page. These libraries are available for each supported Visual Studio version with a different download for each version. Each download contains both static and dynamic libraries to choose from in both 32bit and 64bit versions.
// Shared library builds don't support whitebox tests that exercise internal
// symbols.
#if CONFIG_VP8
vp8_rtcd();
#endif // CONFIG_VP8
#if CONFIG_VP9
vp9_rtcd();
#endif // CONFIG_VP9
vpx_dsp_rtcd();
vpx_scale_rtcd();
#endif // !CONFIG_SHARED
}
}// namespace libvpx_test
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.