Before v1.15.0: c=10, a=1, r=0
Rule #3: source code has changed, increment r:
r=1
Rule #4: interfaces were removed in vpx_tpl.h, set r=0, increment c:
c=11, r=0
Rule #5: no interfaces have been added
Rule #6: interfaces were removed in vpx_tpl.h, set a=0:
a=0
After release: c=11, a=0, r=0
major = c-a = 11
minor = a = 0
patch = r = 0
Bug: webm:384672478
Change-Id: I2e70e7e35c64ece32eaf1dc5625640965483f9b9
Possible fix for issue below. It was only disabled
for screen in a previous change, but we force it off
always to check if it clears the issue.
The speed feature disabled is only used for 3 spatial
layers and at least 2 temporal. The impact on speed is
expected to be small, ~2%, so ok to disable for now and
see if it clears the issue.
Bug: 366146260
Change-Id: If7af006425e1e0ef297b9d6466507ea4c90ddb6f
(cherry picked from commit 09b3d5fc5aa48752f95f4c0c37b0bd4ff55c0ba1)
Integer overflow in encode_frame_to_data_rate()
for the update:
lc->total_target_vs_actual += bits_off_for_this_layer
Fix is to use int64_t for total_target_vs_actual.
Bug: chromium:368114043
Change-Id: I9a01e1a69e26ae748e8ae23d9e1287431510388d
Divide by 3 instead of multiple by 3, in comparison of
lrc->avg_frame_bandwidth vd lrc->last_avg_frame_bandwidth,
in two functions for reset rc.
Small loss in precision, so acceptable.
Similar change to:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5698570
Bug: chromium:367892770
Change-Id: Ia9ef09a9f6beba930fedd496407cfa7057e39336
PF_ARM_SVE_INSTRUCTIONS_AVAILABLE and PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE
are available in WinSDK 10.0.26100 and recent versions of mingw-w64.
Based on a patch by Martin Storsjö on ffmpeg-devel:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-September/333611.html
Change-Id: I34b2341a559f95aa400e84d709f3eb36da5dbb7b
There's no direct processor feature constant for I8MM alone, but
there is a flag for SVE-I8MM (added in WinSDK 10.0.26100 and
recent versions of mingw-w64). If SVE-I8MM is available, we can
assume that I8MM is available.
While HW supporting these features isn't yet commonly running
Windows, this at least allows detecting and running the I8MM codepaths
in Windows builds in Wine (possibly running in QEMU).
Based on patch from Martin Storsjö on ffmpeg-devel:
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-September/333609.html
Change-Id: I77117bee8516924fddcdecccae8bab3cf5beed96
The program requires a minimum of 2 parameters. Previously the tool
would crash if only one input file was given.
Bug: webm:365481206
Change-Id: I875d81b2db4fcc4338061c03b23bb51b0aad58e4
Possible fix for issue below. The speed feature disabled
is only used for 3 spatial layers and at least 2 temporal.
The impact on speed is expected to be small, ~2%, so ok
to disable for now and see if it clears the issue.
Bug: 366146260
Change-Id: I94ab991d583cc2ce758db337abbbb463a65f0767
The wrapped storage must exist for the duration of the vpx_image_t
allocation.
Bug: aomedia:363806063
Change-Id: Ic6b79a56b6c07776222d1767490d873d7408ced0
The default template for https://issues.webmproject.org/ is a public bug
report. Security issues can be reported securely using the 'Security
report' template.
Change-Id: Ic7144a6c7a144772b78852d1415a51a570c79d50
and examples/resize_util.c. These functions were added in:
3cd37dfeb Adds a non-normative resize library to vp9 encoder
but never used meaningfully in the library.
This mirrors the change in libaom:
d10029bb4b Restore function prototype of av1_resize_frame420
except that vp9_resize_frame420() was never exported in the shared
library, so can be deleted along with the rest.
The reasoning for removing examples/resize_util.c is the same: it is not
useful and examples should use the public functions of the libvpx
library.
Change-Id: I386080d3f1a3ef81dfc87fcdf5bbdf459d996f03
Added key frame temporal filtering. Enabled it for VOD encoding
with encoder speed < 2.
Minor improvement in prediction.
Added the restriction of using no more than "arnr_max_frames"
frames for temporal filtering.
Key frame temporal filtering is turned off by default for now. To
enable it, set "--enable-keyframe-filtering=1"
Borg result with "--enable-keyframe-filtering=1"
avg_psnr: ovr_psnr: ssim: vmaf:
hdres2: -0.762 -0.863 -0.903 -0.680
midres2: -0.813 -0.753 -0.757 -0.743
lowres2: -0.492 -0.598 -0.737 -0.881
The impact on the encoder time is minimal.
Change-Id: If6abea3e21efcb96f1978cd9dfaa742c40dc2a56
`#if defined(__GNUC__)` is enough if a specific version isn't being
looked for.
Bug: aomedia:356832974
Change-Id: I3fcbecf9d547c6a2d89d7b5456e83ee08ddc6f5e
Applied 12-tap filter to temporal filter prediction for better
result. Improved the calculation of frames to be used in temporal
filtering.
The overall PSNR gain was -0.511% (lowres), -0.338% (midres), and
-0.288% (hdres).
Encoder time was increased by ~2%, which would be largely reduced
by the following SIMD optimization.
Change-Id: If3ece30f1614beadc99ebf6b4dc3f2d988d3bdb9
Move the saturate_cast_double_to_int() function in
vp8/encoder/firstpass.c to vpx_dsp/vpx_dsp_common.h so that it can be
used in other files.
Change-Id: I748fea969520542dca68d7a46500d3272f22e16f
to INT_MAX. This matches calc_iframe_target_size() in VP8
(http://crbug.com/1473473). If rc->avg_frame_bandwidth is large even
small kf_boost values will overflow an int.
Change-Id: Iaca5b47fe97793ae70930b3b2c2f42725d2c96fb
This fixes a build error seen in gcc 15:
3b63004 mkvparser/mkvparser.cc: add missing <cstdint> include
Bug: aomedia:357622679
Change-Id: I6c4a1795d189f9993d4f2c5c9f0375912bc58f0c
Rely on the -I or -system compiler option to find "gtest/gtest.h". This
makes it easier to build our tests against a copy of gtest outside the
libvpx source tree.
Bug: webm:42330726
Change-Id: I3b189c6345e13b36b236d1eedc6ee091bfa71f48
Fixes a 'Result of operation is garbage or undefined' static analysis
report (seen with clang-14) related to left shifting a negative value.
Bug: b:328632178
Change-Id: I18f0100eca0deac1cac9be0c7e848685d2911fb3
Motion vectors are now clamped in
vp8_find_best_sub_pixel_step_iteratively, vp8_find_best_sub_pixel_step,
vp8_find_best_half_pixel_step, vp8_full_search_sad,
vp8_refining_search_sadx4 and vp8_refining_search_sad_c (the rtcd for
other optimizations are redirects to vp8_refining_search_sadx4).
The difference of valid motion vectors may still go beyond the range of
the MVcount array, however, so additional checks are added to
rd_update_mvcount() and update_mvcount().
Note the test source and settings (speed 1 and GOOD quality mode) come
from the issue report; additional coverage is added for realtime. The
realtime path does not trigger the error without the fix, but as it's
similar to the rd path, the same clamp is done to be safe.
Fixes:
vp8/encoder/rdopt.c:1579:5: runtime error: index 17467 out of bounds for
type 'unsigned int[2047]'
Bug: oss-fuzz:69906
Change-Id: Ia8bd087cfe4475ab09ba711ed806fbcbaa72e552
cpi->output_framerate may be as large as 10M. Previously this would
cause kf_boost to be ~20M which would overflow an int when multiplied by
values in kf_boost_qadjustment[].
Fixes:
vp8/encoder/ratectrl.c:340:25: runtime error: signed integer overflow:
19999984 * 220 cannot be represented in type 'int'
Bug: oss-fuzz:69100
Change-Id: I2d77c9d2912412f6265f6a8dc0e6b361b63b8242
The assignment "cpi->output_framerate = cpi->framerate;" after the
vp8_new_framerate() call is not needed, because vp8_new_framerate() sets
cpi->framerate and cpi->output_framerate to the same value.
Change-Id: I4de97b43957142d658e0c08ecfc6628844ce453a
+ fix an additional double -> int overflow warning (chrome's fuzzers do
not have the float-cast-overflow sanitizer enabled)
Bug: chromium:352414650
Change-Id: I634bb421a74236eac434df138ed71dadf197596a
The only real change is in the initialization of frame_window. The (int)
cast is moved to the result of VPXMIN(), so that
cpi->twopass.total_stats.count - cpi->common.current_video_frame is
calculated in double.
Change-Id: Ia80f24614af7184b37cfdd99d8a8b1639460f273
rc->avg_frame_bandwidth is capped at INT_MAX. Rather than multiply the
value by 3, divide projected_frame_size by 3 to avoid the overflow.
Without rounding this differs slightly from the original, but loss of
precision is acceptable in this case.
Bug: chromium:348440590
Change-Id: Id5960825c79d7c764d257e9b4bd0a1de751878d8
Replace the VERSION_STRING_NOSP macro by the public API function
vpx_codec_version_str().
Treat vpx_version.h as an absolutely internal header of the libvpx
library.
Change-Id: I86ba8548a62adae91ae7f5caad98169707f3fc64
This change happens in define_gf_group().
Since this part is not critical for ext_ratectrl,
turn off the error reporting for now.
Change-Id: Ie74aa06a116edb8c5d9e7b29cadbd366232fbc1d
The compare_fp_stats() and compare_fp_stats_md5() functions are not used
when CONFIG_REALTIME_ONLY is equal to 1. Define these functions only if
CONFIG_REALTIME_ONLY is 0 to avoid the -Wunused-function warnings.
Change-Id: Iaae208f67708cfaeee5304b0320ebce63c863f96
Allow the TPL group to use up to 3 reference frames from the
previous GOP. This slightly changes the coding stats in the range
of <0.1%.
STATS_CHANGED
Change-Id: Ieb4e948a783bf8ef9ca78717d56ff750f3f795a4
Fix double-to-int cast overflows in vp8 code caused by setting the
target bitrate to the maximum value (2000000).
Tested: Build libvpx with UndefinedBehaviorSanitizer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --arnr-maxframes=7 \
--arnr-strength=5 --kf-max-dist=9999 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50 --codec=vp8
Note: This is essentially the VP8 version of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/191361.
Bug: 349440066
Change-Id: Ia43e1aad8fcab60ace49da960579081c2c3a5445
Fix the following UBSan integer errors in test_decode():
vpxenc.c:1589:57: runtime error: implicit conversion from type 'int' of
value -16 (32-bit, signed) to type 'unsigned int' changed the value to
4294967280 (32-bit, unsigned)
vpxenc.c:1590:58: runtime error: implicit conversion from type 'int' of
value -16 (32-bit, signed) to type 'unsigned int' changed the value to
4294967280 (32-bit, unsigned)
Tested: Build libvpx with -fsanitize=integer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --arnr-maxframes=7 \
--arnr-strength=5 --kf-max-dist=9999 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50 --codec=vp8
Bug: 349440066
Change-Id: Ice2f0e7176ffec664856559e2c02bd51113c4d74
Tested: Build libvpx with -fsanitize=integer and then run
./vpxenc husky.yuv -o AV1_husky_2000000_10000000_10000000.webm --good \
--cpu-used=2 -v -t 0 -w 352 -h 288 --fps=10000000/10000000 \
--target-bitrate=2000000 --limit=150 --test-decode=fatal --passes=2 \
--lag-in-frames=25 --min-q=0 --max-q=63 --min-gf-interval=4 \
--max-gf-interval=22 --arnr-maxframes=7 --arnr-strength=5 \
--kf-max-dist=9999 --aq-mode=0 --undershoot-pct=100 \
--overshoot-pct=100 --bias-pct=50
This unsigned integer overflow seems to be caused by
g_timebase.num=1000000.
Note: This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/191401.
Bug: 349440066
Change-Id: I924fa9c653400764dd7320938b88b4ea40f38172
This patch fixes some additional cases where under extreme conditions
some of the VBR adjustment variables can wrap.
As this happens on a per frame level the extra saturation checks should
not be an issue for performance.
Note: This CL is a port of the following libaom CLs:
https://aomedia-review.googlesource.com/c/aom/+/190521https://aomedia-review.googlesource.com/c/aom/+/190888
Change-Id: I87c4ecca10f39767002f7d90d0f43b19c7150832
Current code was disallowing scene detection for
speeds >= 8, to avoid any encode_time increase
(see comment in the code).
But we can expect the cost to be small even at speed 8,9,
and that concern on encode_time was from some time ago
before 8 and 9 were further optimized. And this is
needed for content with scene changes (see issue attached).
So allow scene detection now for all RTC speed settings (speed >= 5).
Bug: b/346846607
Change-Id: I678dbb88ff1399ed89b2bf9770ae9427e3044fc4
The last reference to the flag in configure was removed in:
fad70a358 Remove -fno-strict-aliasing flag
The library should be expected to function without this flag; it's built
and tested elsewhere without it.
Bug: webm:570, webm:603
Change-Id: Icf85fd9bd5c9cb0c81d6eecf10fba07807f48b4a
The GNU Assembler was removed in r24. clang's internal assembler works,
but `-c` is necessary to avoid linking.
Bug: webm:1856
Change-Id: I61f80cf78657d3b71d5e73c5b2510575533ca5ea
Move the function into define_gf_group().
define_gf_group() has a lot of settings that might cause
performance drop if skipped.
Imitate define_gf_group_structure()'s behavior which add
an extra overlay frame at the end of gf_group whenever
alt_ref is used.
After this change, we can feed the baseline decision through
webmrc and get the same result as baseline.
This CL is tested with city_cif.yuv using ffmpeg
BUG = b/345528565
Change-Id: Ib61f0a0a72251f8662fb4072e0cfd7f456a243b3
Quiets some spurious -Wmaybe-uninitialized warnings with gcc 14.1.0.
In function 'calc_plane_error16',
inlined from 'main' at ../tools/tiny_ssim.c:464:5:
../tools/tiny_ssim.c:37:12: warning: 'v[0]' may be used uninitialized
[-Wmaybe-uninitialized]
37 | if (orig == NULL || recon == NULL) {
| ^
In function 'calc_plane_error16',
inlined from 'main' at ../tools/tiny_ssim.c:462:5:
../tools/tiny_ssim.c:37:12: warning: 'u[0]' may be used uninitialized
[-Wmaybe-uninitialized]
37 | if (orig == NULL || recon == NULL) {
| ^
In function 'calc_plane_error',
inlined from 'main' at ../tools/tiny_ssim.c:461:5:
../tools/tiny_ssim.c:61:12: warning: 'y[0]' may be used uninitialized
[-Wmaybe-uninitialized]
61 | if (orig == NULL || recon == NULL) {
To reduce confusion, read_input_file() is changed to return an int as
previously it would only return (size_t)-1/0/1 (and now returns 0/1).
Change-Id: I2344048ecc2bd233891ffcef08002ee98d6d262a
The default behavior changed in:
148d1085f Refactor and extend run-time CPU feature detection on Arm
This fixes build errors with these targets as there is no runtime cpu
detection defined for them.
Change-Id: Ie6b0bae1fc3e244d7dfcc823f60c3e466ccade79
Both VP8 and VP9 internally cap the target bitrate to the smaller of the
uncompressed bitrate and 1000000 kilobits per second.
Change-Id: I4008ce09b5e709e75111800341d015e41eb1da42
These change fixes issues that can occur if the user specifies a very
high target data rate or rate per frame.
Fixes some issue with overflow of int variables used to hold bitrate
values (rate per second, rate per frame etc).
Note: This CL is a port of the following libaom CLs:
https://aomedia-review.googlesource.com/c/aom/+/190381https://aomedia-review.googlesource.com/c/aom/+/190462
All the changes were ported to VP9. For VP8, only the new type of
cpi->bytes (equivalent to ppi->total_bytes in libaom) was ported.
Change-Id: I438dd46efd5a134389b893ffae1f8a2381207906
2024-05-21 v1.14.1 "Venetian Duck"
This release includes enhancements and bug fixes.
- Upgrading:
This release is ABI compatible with the previous release.
- Enhancement:
Improved the detection of compiler support for AArch64 extensions,
particularly SVE.
Added vpx_codec_get_global_headers() support for VP9.
- Bug fixes:
Added buffer bounds checks to vpx_writer and vpx_write_bit_buffer.
Fix to GetSegmentationData() crash in aq_mode=0 for RTC rate control.
Fix to alloc for row_base_thresh_freq_fac.
Free row mt memory before freeing cpi->tile_data.
Fix to buffer alloc for vp9_bitstream_worker_data.
Fix to VP8 race issue for multi-thread with pnsr_calc.
Fix to uv width/height in vp9_scale_and_extend_frame_ssse3.
Fix to integer division by zero and overflow in calc_pframe_target_size().
Fix to integer overflow in vpx_img_alloc() & vpx_img_wrap()(CVE-2024-5197).
Fix to UBSan error in vp9_rc_update_framerate().
Fix to UBSan errors in vp8_new_framerate().
Fix to integer overflow in vp8 encodeframe.c.
Handle EINTR from sem_wait().
Change-Id: Ic5e274fdc35c9141591a65e825bf012d2cca3caa
I introduced this bug in commit 2e32276:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5446333
I changed the line
stride_in_bytes = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
to three lines:
s = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
if (s > INT_MAX) goto fail;
stride_in_bytes = (int)s;
But I didn't realize that `s` is used later in the calculation of
alloc_size.
As a quick fix, undo the effect of s * 2 for high bit depths after `s`
has been assigned to stride_in_bytes.
Bug: chromium:332382766
Change-Id: I53fbf405555645ab1d7254d31aadabe4f426be8c
(cherry picked from commit 74c70af016)
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188962.
stride_align is documented to be the "alignment, in bytes, of each row
in the image (stride)."
Change-Id: I2184b50dc3607611f47719319fa5adb3adcef2fd
(cherry picked from commit 7d37ffacc6)
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188823.
Impose maximum values on the input parameters so that we can perform
arithmetic operations without worrying about overflows.
Also change the VpxImageTest.VpxImgAllocHugeWidth test to write to the
first and last samples in the first row of the Y plane, so that the test
will crash if there is unsigned integer overflow in the calculation of
stride_in_bytes.
Bug: chromium:332382766
Change-Id: I54cec6c9e26377abaa8a991042ba277ff70afdf3
(cherry picked from commit 06af417e79)
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188761.
Fix unsigned integer overflows in the calculation of stride_in_bytes in
img_alloc_helper() when d_w is huge.
Change the type of stride_in_bytes from unsigned int to int because it
will be assigned to img->stride[VPX_PLANE_Y], which is of the int type.
Test:
. ../libvpx/tools/set_analyzer_env.sh integer
../libvpx/configure --enable-debug --disable-optimizations
make -j
./test_libvpx --gtest_filter=VpxImageTest.VpxImgAllocHugeWidth
Bug: chromium:332382766
Change-Id: I3b39d78f61c7255e10cbf72ba2f4975425a05a82
(cherry picked from commit 2e32276277)
Ported from test/aom_image_test.cc in libaom commit 04d6253.
Change-Id: I56478d0a5603cfb5b65e644add0918387ff69a00
(cherry picked from commit 3dbab0e664)
If fmt is VPX_IMG_FMT_NONE, currently img_alloc_helper() allocates a
single plane because VPX_IMG_FMT_NONE (0) is not a planar format (the
VPX_IMG_FMT_PLANAR bit is not set in VPX_IMG_FMT_NONE).
Although this seems correct, the problem is that most of the code in
libvpx assumes planar formats and is likely to dereference a null
pointer when it uses img->planes[1]. Also, VPX_IMG_FMT_NONE isn't really
a valid image format. So it is safer to make img_alloc_helper() fail if
fmt is VPX_IMG_FMT_NONE.
Change-Id: I05b47f4b5eceb631a02384b2cce1c2f6fdca8673
(cherry picked from commit d3a946de8c)
The integer overflow happens
in vp9_calc_iframe_target_size_one_pass_cbr(), when
calculating the target size for L1T3 encoding.
The input target bitrate(kbps) is very large, so it gets set
to INT_MAX (before being multiplied by 1000 to convert to bps),
and avg_frame_bandwidth is then set to (INT_MAX / lc->framerate),
which when multipled by (16 + kf_boost) can exceed INT_MAX.
Fix is to cast the operands to int64_t and final result to int.
Bug: chromium:340918567
Change-Id: Ic00094b22c1f12ca988c0cb1fcaed473e1f8ed2b
In multi-threaded scenario, when the bitstream
buffer allocated is insufficient, the main thread
called 'longjmp' without waiting for the completion
of workers. In this patch, 'longjmp' is called by
the main thread after joining other worker threads.
This resolves the assertion failure as reported in
Bug: webm:1847
Bug: webm:1844
Change-Id: I399c76087b65e7b8d9a9fa4f12d784408243d648
(cherry picked from commit 611d9ba0a5)
Add the `size` and `error` members to the vpx_write_bit_buffer struct.
Add the vpx_wb_init() and vpx_wb_has_error() functions.
Instances of the vpx_write_bit_buffer struct are only allocated in the
vp9_pack_bitstream() function. So vp9_pack_bitstream() is the only
function outside vpx_dsp/bitwriter_buffer.* that needs updating.
This CL completes the work of adding output buffer bounds checks to
vp9/encoder/vp9_bitstream.c.
Bug: webm:1844
Change-Id: I6b362be572852ee51d96023b35bfb334faada7e1
(cherry picked from commit d790001fd5)
In the vpx_writer struct, change the buffer_end field to the size field.
Change vpx_stop_encode() to return true on success, false on failure
(output buffer full).
In write_compressed_header(), remove the assertion
assert(header_bc.pos <= 0xffff). The caller (vp9_pack_bitstream()) will
check that condition.
In vp9_pack_bitstream(), the variable "first_part_size" is renamed
"compressed_hdr_size".
Bug: webm:1844
Change-Id: I4ed6ab905a707ad44d875e53036d5a42523a65d0
(cherry picked from commit 73703c188b)
Fixes a static analysis warning:
Value stored to 'data_size' is never read
Bug: webm:1844
Change-Id: Ia27181b1051bb2c3a6bc4a4c2549df8b0525e889
(cherry picked from commit 9f73377821)
The buffer_end field will allow bounds checking when vpx_writer writes
to the output buffer. This CL sets up the plumbing to pass the output
buffer size from vp9_pack_bitstream() to vpx_start_encode(), which
initializes the vpx_writer struct. vpx_writer doesn't use the output
buffer size in bounds checks yet, but the code in vp9_bitstream.c does.
Bug: webm:1844
Change-Id: I995e469ab453c02d740f54b46e0b08c7f2eb1a2e
(cherry picked from commit e387187438)
Set up the plumbing to pass the size of the output buffer `dest` to
vp9_pack_bitstream(). The output buffer is the cx_data buffer in the
encoder_encode() function in vp9/vp9_cx_iface.c, and its size is
cx_data_sz.
In this CL vp9_pack_bitstream() ignores the `dest_size` parameter.
Bug: webm:1844
Change-Id: I53c80280143d409cf16f87c4d6deec3d9338aea3
(cherry picked from commit d48577579b)
In multi-threaded scenario, when the bitstream
buffer allocated is insufficient, the main thread
called 'longjmp' without waiting for the completion
of workers. In this patch, 'longjmp' is called by
the main thread after joining other worker threads.
This resolves the assertion failure as reported in
Bug: webm:1847
Bug: webm:1844
Change-Id: I399c76087b65e7b8d9a9fa4f12d784408243d648
cpi_->cyclic_refresh is nullptr if aq_mode is 0, in other words, the
rate controller runs in non adaptive quantization mode. This CL fixes
the crash in GetSegmentationData() in non aq mode.
Bug: b/259487065
Test: video encoding on ChromeOS
Change-Id: I503b30d15c697c8dd1da203b3c7361b91c428e87
(cherry picked from commit 1d007eafa3)
Issue happens for real-time nonrd pickmode.
Due to speed feature: sf->adaptive_rd_thresh_row_mt,
enabled for speed >= 8, and for speed >= 7 svc only.
Issue occurs where resolution (sb_rows) changes and
row_base_thresh_freq_fact needs to be re-allocated.
Fix is to add sb_rows to TileDataEnc and check for
re-alloc of row_base_thresh_freq_fac.
Bug: b:331108922
Change-Id: I1a1ca94c14f343200c180725e4cb8d91d3c55b83
(cherry picked from commit 3f8f19372b)
In vp9_init_tile_data(), call vp9_row_mt_mem_dealloc(cpi) to free the
row mt memory in cpi->tile_data before freeing cpi->tile_data.
Bug: b:331086799, b:331108729
Change-Id: Idc79984ce7e0110e6858139b2ed286492a2e8622
(cherry picked from commit 34277e53ad)
Before proceeding with Encode(). This avoids some static analysis
warnings about uninitialized `cfg_` members.
Change-Id: Ib67b278d6706ab1034219e8c1ad9ba0c5b574ba8
(cherry picked from commit 108f5128e2)
sem_wait() may be interrupted by a signal and fail with EINTR:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
Retry the sem_wait() call if it fails with EINTR.
This finishes the fix started in
https://chromium-review.googlesource.com/c/webm/libvpx/+/5299569. As a
speculative fix, that CL fixed only the sem_wait(&cpi->h_event_end_lpf)
calls responsible for bug chromium:324459561. ClusterFuzz verified the
fix, so this CL extends it to the other sem_wait() calls.
Note that sem_wait() calls like the following do not need this fix,
because the while (1) loop retries the sem_wait() call if it fails:
while (1) {
if (vpx_atomic_load_acquire(&cpi->b_multi_threaded) == 0) break;
if (sem_wait(&cpi->h_event_start_lpf) == 0) {
...
}
}
Bug: chromium:324459561
Change-Id: I0f0612616eee37fb3da68049e49b3e86927b5e24
(cherry picked from commit d4959f9825)
Before proceeding with Encode(). This avoids some static analysis
warnings about uninitialized `cfg_` members.
Change-Id: Ib67b278d6706ab1034219e8c1ad9ba0c5b574ba8
In very rare cases (e.g. encoding with very high bit rate), the
allocated token memory isn't enough, which causes a buffer overflow
and then an encoder failure. This is fixed by using the aligned
number of blocks while allocating this buffer.
BUG=b/328803779
Change-Id: I5437cce13398206bf9982d57f35d6f9da17b187f
This is a port of the change in libaom:
https://aomedia-review.googlesource.com/c/aom/+/189761
5ccdc66ab6 cpu.cmake: Do more elaborate test of whether SVE can be compiled
For Windows targets, Clang will successfully compile simpler
SVE functions, but if the function requires backing up and restoring
SVE registers (as part of the AAPCS calling convention), Clang
will fail to generate unwind data for this function, resulting
in an error.
This issue is tracked upstream in Clang in
https://github.com/llvm/llvm-project/issues/80009.
Check whether the compiler can compile such a function, and
disable SVE if it is unable to handle that case.
Change-Id: I8550248abd6a7876bd8ecf6ba66bc70518133566
(cherry picked from commit 35f0262c5e)
This is a port of the change in libaom:
https://aomedia-review.googlesource.com/c/aom/+/189761
5ccdc66ab6 cpu.cmake: Do more elaborate test of whether SVE can be compiled
For Windows targets, Clang will successfully compile simpler
SVE functions, but if the function requires backing up and restoring
SVE registers (as part of the AAPCS calling convention), Clang
will fail to generate unwind data for this function, resulting
in an error.
This issue is tracked upstream in Clang in
https://github.com/llvm/llvm-project/issues/80009.
Check whether the compiler can compile such a function, and
disable SVE if it is unable to handle that case.
Change-Id: I8550248abd6a7876bd8ecf6ba66bc70518133566
This mode is used infrequently and is quite slow. This shifts the tests
to nightly to speed up the presubmit.
Change-Id: I3020887e0ca0150d7cbea9cc726649c11f94d56c
Use the utility functions and set gf_group_size in
ext_rc_define_gf_group_structure()
Avoid using gop_decision->update_type to keep the logic simple
for now.
Also simplify the interface.
Change-Id: I78fd5892e6f9731d50d6e5da97598b46c70a1dde
The vpx_ports/msvc.h header provides snprintf() and round() for MSVC
older than Visual Studio 2015 and Visual Studio 2013, respectively.
Since configure now requires vs14 (Visual Studio 2015) or later, it is
safe to remove vpx_ports/msvc.h.
Change-Id: I2fe4c41eaa126f4cf17639c11895f1e464294c76
Replace %ld with %zu for `size_t`. Added in:
fd28f6f3c Add rate_ctrl_log_path
Fixes:
vp9\encoder\vp9_encoder.c(5748,15): warning C4477: 'fprintf' : format
string '%ld' requires an argument of type 'long', but variadic
argument 2 has type 'size_t'
Change-Id: I36fa9c7a9e14d4a2d9ef51a7f5c55de71bb34518
If img_data is not NULL, img_alloc_helper ignores buf_align, so
vpx_img_wrap can set buf_align to any placeholder value.
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/90362.
Bug: webm:1850
Change-Id: I42bc45aecf822a9314caf23058fe123d0574dc20
Port the changes to aom/src/aom_image.c in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/56643. The changes
related to `border` are not ported.
Bug: webm:1850
Change-Id: Ie81fffe0c84e912da880ffca245ae27cd71cf348
I introduced this bug in commit 2e32276:
https://chromium-review.googlesource.com/c/webm/libvpx/+/5446333
I changed the line
stride_in_bytes = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
to three lines:
s = (fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? s * 2 : s;
if (s > INT_MAX) goto fail;
stride_in_bytes = (int)s;
But I didn't realize that `s` is used later in the calculation of
alloc_size.
As a quick fix, undo the effect of s * 2 for high bit depths after `s`
has been assigned to stride_in_bytes.
Bug: chromium:332382766
Change-Id: I53fbf405555645ab1d7254d31aadabe4f426be8c
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188962.
stride_align is documented to be the "alignment, in bytes, of each row
in the image (stride)."
Change-Id: I2184b50dc3607611f47719319fa5adb3adcef2fd
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188823.
Impose maximum values on the input parameters so that we can perform
arithmetic operations without worrying about overflows.
Also change the VpxImageTest.VpxImgAllocHugeWidth test to write to the
first and last samples in the first row of the Y plane, so that the test
will crash if there is unsigned integer overflow in the calculation of
stride_in_bytes.
Bug: chromium:332382766
Change-Id: I54cec6c9e26377abaa8a991042ba277ff70afdf3
A port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/188761.
Fix unsigned integer overflows in the calculation of stride_in_bytes in
img_alloc_helper() when d_w is huge.
Change the type of stride_in_bytes from unsigned int to int because it
will be assigned to img->stride[VPX_PLANE_Y], which is of the int type.
Test:
. ../libvpx/tools/set_analyzer_env.sh integer
../libvpx/configure --enable-debug --disable-optimizations
make -j
./test_libvpx --gtest_filter=VpxImageTest.VpxImgAllocHugeWidth
Bug: chromium:332382766
Change-Id: I3b39d78f61c7255e10cbf72ba2f4975425a05a82
The MAX_NUM_THREADS macro is unrelated to the VPxWorkerInterface, so it
doesn't need to be defined in vpx_util/vpx_thread.h.
The VP8 code doesn't seem to depend on MAX_NUM_THREADS, so VP8 can use
64 directly in the range check of its g_threads option. Move the
definition of the MAX_NUM_THREADS macro to vp9/encoder/vp9_ethread.h and
use it in VP9 code only.
Change-Id: Ibf788ca2496c743a2ac0498fefaab8a3c181228d
The `error: use of undeclared identifier 'EBUSY'` in
vpx_util/vpx_pthread.h was found in Mozilla's bug 1886318 [1]. This
patch addresses the issue by adding the `<errno.h>` header to introduce
the `EBUSY` identifier, resolving the problem.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1886318#c1
Change-Id: Ic417dafebf5ab160060dd29f692fa9c40d8db05a
The Google cpp style guide dictates that you should "include what you
use" with respect to symbols. This CL adds vpx_config.h imports to unit
tests that rely on config flags but were otherwise indirectly included.
Change-Id: Ia70a512cebe6c104d2d64afbed3cde8a405c68df
This CL will help run libvpx tests under Chromium against its partition
allocator. The allocator does not support single allocations above
3.998GiB. Because of this tests related to large video sizes that
Chromium is configured for are expected to fail.
Chromium also only supports the CONFIG_REALTIME_ONLY option,
some changes are scoped behind this flag.
Change-Id: I80e8743c0619ce502688109ce0be01cb252d5f92
ctx->pending_cx_data is a pointer. It looks nicer to compare
ctx->pending_cx_data with NULL than with 0.
Change-Id: I18815907b3d75551abfc603cb3c5c0297dceed23
cpi_->cyclic_refresh is nullptr if aq_mode is 0, in other words, the
rate controller runs in non adaptive quantization mode. This CL fixes
the crash in GetSegmentationData() in non aq mode.
Bug: b/259487065
Test: video encoding on ChromeOS
Change-Id: I503b30d15c697c8dd1da203b3c7361b91c428e87
VPX_CODEC_CORRUPT_FRAME is a decoder error. It is strange for
vpx_codec_encode() to fail with this error. In set_frame_size(), change
VPX_CODEC_CORRUPT_FRAME to VPX_CODEC_ERROR.
The use of VPX_CODEC_CORRUPT_FRAME was originally added in
commit 1ed56a46b3.
Change-Id: Iee92ed4cfca5061289b278ece2ba475cf98fec06
The current SVE2 approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for high bitdepth 2D convolution for
filter sizes smaller or equal to 4 to avoid the storing and
re-loading from the intermediate buffer.
This approach is not beneficial when applying an 8tap filter in the
convolution.
Change-Id: Ie090eb79f1cbf182300d9343ae63069396ef3956
These invalid value definitions are necessary to initialize
the gop decision in external RC so libvpx can tell which is populated
and which is not
Bug: b/329483680
Change-Id: I06bbb41fa59d0fb95296aebd0d05a703ec953b81
Coverity somehow thinks the return value of read_tx_mode() is between 0
and 7 (inclusive).
Hopefully this will fix Coverity CID 1584457: Out-of-bounds access in
read_coef_probs().
Change-Id: I49fbddf6fd6861bc9def9dfa91eaaaa4aefe5710
This array will be partially configured and used in later rate
distortion optimization search.
BUG=webm:1846
Change-Id: I83daba341c56767187031edb1c10d4528a4257a3
Add the `size` and `error` members to the vpx_write_bit_buffer struct.
Add the vpx_wb_init() and vpx_wb_has_error() functions.
Instances of the vpx_write_bit_buffer struct are only allocated in the
vp9_pack_bitstream() function. So vp9_pack_bitstream() is the only
function outside vpx_dsp/bitwriter_buffer.* that needs updating.
This CL completes the work of adding output buffer bounds checks to
vp9/encoder/vp9_bitstream.c.
Bug: webm:1844
Change-Id: I6b362be572852ee51d96023b35bfb334faada7e1
Issue happens for real-time nonrd pickmode.
Due to speed feature: sf->adaptive_rd_thresh_row_mt,
enabled for speed >= 8, and for speed >= 7 svc only.
Issue occurs where resolution (sb_rows) changes and
row_base_thresh_freq_fact needs to be re-allocated.
Fix is to add sb_rows to TileDataEnc and check for
re-alloc of row_base_thresh_freq_fac.
Bug: b:331108922
Change-Id: I1a1ca94c14f343200c180725e4cb8d91d3c55b83
In the vpx_writer struct, change the buffer_end field to the size field.
Change vpx_stop_encode() to return true on success, false on failure
(output buffer full).
In write_compressed_header(), remove the assertion
assert(header_bc.pos <= 0xffff). The caller (vp9_pack_bitstream()) will
check that condition.
In vp9_pack_bitstream(), the variable "first_part_size" is renamed
"compressed_hdr_size".
Bug: webm:1844
Change-Id: I4ed6ab905a707ad44d875e53036d5a42523a65d0
In vp9_init_tile_data(), call vp9_row_mt_mem_dealloc(cpi) to free the
row mt memory in cpi->tile_data before freeing cpi->tile_data.
Bug: b:331086799, b:331108729
Change-Id: Idc79984ce7e0110e6858139b2ed286492a2e8622
The code was using the bitstream_worker_data when it
wasn't allocated for big enough size. This is because
the existing condition was to only re-alloc the
bitstream_worker_data when current dest_size was larger
than the current frame_size. But under resolution change
where frame_size is increased, beyond the current dest_size,
we need to allow re-alloc to the new size.
The existing condition to re-alloc when dest_size is
larger than frame_size (which is not required) is kept
for now.
Also increase the dest_size to account for image format.
Added tests, for both ROW_MT=0 and 1, that reproduce
the failures in the bugs below.
Note: this issue only affects the REALTIME encoding path.
Bug: b/329088759, b/329674887, b/329179808
Change-Id: Icd65dbc5317120304d803f648d4bd9405710db6f
(cherry picked from commit c29e637283)
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all highbd SVE horizontal convolution algorithms process
4 rows at a time, but this means we end up doing at least 1 row too
much work in the 2D first pass case where we need h + 7, not h + 8
rows of output.
This patch adds an additional SVE2 path that processes h + 7 rows of
data exactly, saving the work of the unnecessary extra row.
Change-Id: I2f5d39ad737dbd7eccb08dd2b51586c6710119b8
If a local variable "pc" is defined as &cpi->common, replace
"cpi->common." with "pc->".
Also replace a memcpy() call with a struct assignment.
Change-Id: I6f4f12e69d9989beaa6e04c83d93230e7d726278
Declare the dest_size member of the VP9BitstreamWorkerData struct as
size_t instead of int.
Fix the following MSVC warning:
vp9\encoder\vp9_bitstream.c(1031,37): warning C4267: '=':
conversion from 'size_t' to 'int', possible loss of data
Change-Id: Idab5ad5d4bf4d1e4754f011a3073c9a89da29f55
The buffer_end field will allow bounds checking when vpx_writer writes
to the output buffer. This CL sets up the plumbing to pass the output
buffer size from vp9_pack_bitstream() to vpx_start_encode(), which
initializes the vpx_writer struct. vpx_writer doesn't use the output
buffer size in bounds checks yet, but the code in vp9_bitstream.c does.
Bug: webm:1844
Change-Id: I995e469ab453c02d740f54b46e0b08c7f2eb1a2e
This was added in libaom in:
5ddac0aac8 RTCD defs: Remove empty specialize statements once and for all.
https://aomedia-review.googlesource.com/c/aom/+/9062
Change-Id: I9c8fb0c8e4bd4dc9373d8533ab083dff816e7cbe
Set up the plumbing to pass the size of the output buffer `dest` to
vp9_pack_bitstream(). The output buffer is the cx_data buffer in the
encoder_encode() function in vp9/vp9_cx_iface.c, and its size is
cx_data_sz.
In this CL vp9_pack_bitstream() ignores the `dest_size` parameter.
Bug: webm:1844
Change-Id: I53c80280143d409cf16f87c4d6deec3d9338aea3
Avoid calling encode_tiles_buffer_alloc_size() twice by saving its
return value in a local variable.
Change-Id: I3050f9cf7c3520f7edc80abf66620ba233fadad8
The code was using the bitstream_worker_data when it
wasn't allocated for big enough size. This is because
the existing condition was to only re-alloc the
bitstream_worker_data when current dest_size was larger
than the current frame_size. But under resolution change
where frame_size is increased, beyond the current dest_size,
we need to allow re-alloc to the new size.
The existing condition to re-alloc when dest_size is
larger than frame_size (which is not required) is kept
for now.
Also increase the dest_size to account for image format.
Added tests, for both ROW_MT=0 and 1, that reproduce
the failures in the bugs below.
Note: this issue only affects the REALTIME encoding path.
Bug: b/329088759, b/329674887, b/329179808
Change-Id: Icd65dbc5317120304d803f648d4bd9405710db6f
SVE and SVE2 code paths in libvpx require intrinsics from
arm_neon_sve_bridge.h. SVE is disabled if the compiler does not
support this header. This patch conditionally disables SVE2 in the
same way.
Also gate the check for arm_neon_sve_bridge.h on whether SVE is
enabled in the first place. The check isn't necessary if the user has
explicitly disabled SVE. (Explicitly disabling SVE already disables
SVE2 since the former is a pre-requisite for the latter.)
Change-Id: Ibb21f09e8b2470d1ce5d98b71b101f5b7f7dbcdc
In encoder_encode(), remove the return statement after a
vpx_internal_error() call because setjmp() has been called at that
point.
Change-Id: Ib8ebbfbacb21097ce7f1b4e3bf53004bbe88a42b
in struct VP8RateControlRtcConfig and struct VP9RateControlRtcConfig;
structs default to public access.
Change-Id: Icdc5b44fb4c7297b0cb3c6cde8bec33ea5cee18c
vp8/vp8_ratectrl_rtc.h should come first as it's implemented in this
module. Split the rest of the groups on C/C++/vpx bounds.
Change-Id: If6bbbd8f3adf3766fa36fbc53ae06c9f6f76ebe9
Add SVE2 implementation of vpx_highbd_convolve8_avg_vert function.
Add the corresponding tests as well.
Change-Id: I20ca19e09a1686bb00c0b51bf756ddab0adbc2c0
Add SVE implementation of vpx_highbd_convolve8_avg_horiz function.
Add the corresponding tests as well.
Change-Id: If13793fa653834dfdfeddfee60b80129eea85dd7
Add SVE2 implementation of vpx_highbd_convolve8_vert function. Add
the corresponding tests as well.
Change-Id: I289ac79d4493935217feaa4fd2fa0b8ef9a62972
Add 'sve2' arch options to the configure, build and unit test files -
adding appropriate conditional options where necessary. Arm SIMD
extensions are treated as supersets in libvpx, so disable SVE2 if
SVE is unavailable.
Change-Id: Icdec2aace357e36fba77c77cd8b70da1e5427fce
This was deprecated in 1.9.5 [1]. It is now enabled by default. For
earlier versions of doxygen this will set the value to false, but I
don't believe we were relying on this functionality.
[1]: https://www.doxygen.nl/manual/changelog.html#log_1_9_5
Change-Id: I75f576d35ca86636761cf70fda0dd0ad37f71d71
The sem_* macros do not behave exactly like the POSIX sem_* functions.
Add the vp8_ prefix to the sem_* macro names to make it clear that they
are not the POSIX sem_* functions. Another reason for adding the vp8_
prefix is that we need to wrap sem_wait() (to handle EINTR) on the Unix
platforms that have real sem_wait() function.
Handle EINTR in the Unix (non-Apple) definition of vp8_sem_wait().
Change-Id: I3df02a30f851d41691a55cf7a84aa2ff054bba9c
Based on a clang-tidy warning:
`no header providing "sem_wait" is directly included`
Though this may not clear it entirely, it's the closest that can be
done given the platform-dependent includes and implementation in
vp8/common/threading.h
Change-Id: I19984f820f3f380e58deef40563a2f0c66187748
set --target to the more modern aarch64-android-gcc and remove an
incorrect comment regarding realtime-only.
Change-Id: I5f6c9de9fcd96a60817e37fc6f6505725ddea6b9
When dot-product and SVE support are disabled the hwcap variable is
currently unused. Fix this by wrapping it in an #ifdef matching the
conditions where it is needed.
Change-Id: I1c2e302d861c6c726b314e374f07d4fafe17ffc7
libvpx's check for conditionally defining __builtin_prefetch is broken,
since clang-cl defines __builtin_prefetch on Win ARM64: in addition, it
supports up to 3 arguments, with the latter 2 being optional. This
causes build breaks when paired with other libraries, like Abseil, which
do perform the conditional test correctly.
The real fix here is to define something like VPX_PREFETCH rather than
trying to #define an implementation-reserved name, which is undefined
behavior.
Bug: 328105513
Change-Id: Ibe14d9ce34306654bd20e560973f76c3b40036ee
Refactor the transpose_concat_*() helper function used in the Arm Neon
DotProd and I8MM vertical convolution implementations to not use TBL
instructions. Using vzip* to achieve the same outcome (with the same
number of instructions) avoids needing/loading the lookup indices and
also increases performance on little (in-order) Arm Cortex cores.
Change-Id: Iff62a44f8a9bf0ee239d5bb36be8424cab0dbca5
sem_wait() may be interrupted by a signal and fail with EINTR:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
Retry the sem_wait() call if it fails with EINTR.
This finishes the fix started in
https://chromium-review.googlesource.com/c/webm/libvpx/+/5299569. As a
speculative fix, that CL fixed only the sem_wait(&cpi->h_event_end_lpf)
calls responsible for bug chromium:324459561. ClusterFuzz verified the
fix, so this CL extends it to the other sem_wait() calls.
Note that sem_wait() calls like the following do not need this fix,
because the while (1) loop retries the sem_wait() call if it fails:
while (1) {
if (vpx_atomic_load_acquire(&cpi->b_multi_threaded) == 0) break;
if (sem_wait(&cpi->h_event_start_lpf) == 0) {
...
}
}
Bug: chromium:324459561
Change-Id: I0f0612616eee37fb3da68049e49b3e86927b5e24
We already have some logic in the configure.sh file to selectively
disable code dependent on particular architecture extensions, however we
do not yet have anything to check that the compiler being supplied
recognises and can compile code using these extensions.
This commit adds compiler "-march=..." flag tests to the existing
extension-disable loop so that we now correctly disable extensions that
are not supported by the compiler. For AArch64 this loop also needs to
move below the existing compiler/OS handling to ensure that prefixes
like $CROSS are handled correctly before running compiler tests.
Bug: webm:1841
Change-Id: I936b911c4b0ebf03abc34b7532b2bb4568129f57
(cherry picked from commit fa50b26848)
Disable SVE feature if arm_neon_sve_bridge header is not supported
by the compiler.
Change-Id: I3f78be2dd95b37b8d51b9f1fceca1f9701535eca
(cherry picked from commit 6ea3b51ec2)
Added unitest which triggers the data race in the
bug below, when only C code is forced.
The data race is between the loopfilter and variance
computation from generate_psnr_packet calculation.
Proposed fix is to move the wait for loopfilter thread to
finish up before entering generate_psnr_packet().
Bug: b/266833179.
Change-Id: Id2871c53274be0f404e65601c9a5c98aaead0c72
(cherry picked from commit 756b29a776)
We already have some logic in the configure.sh file to selectively
disable code dependent on particular architecture extensions, however we
do not yet have anything to check that the compiler being supplied
recognises and can compile code using these extensions.
This commit adds compiler "-march=..." flag tests to the existing
extension-disable loop so that we now correctly disable extensions that
are not supported by the compiler. For AArch64 this loop also needs to
move below the existing compiler/OS handling to ensure that prefixes
like $CROSS are handled correctly before running compiler tests.
Bug: webm:1841
Change-Id: I936b911c4b0ebf03abc34b7532b2bb4568129f57
Add SVE implementation for vpx_highbd_convolve8_horiz that specialises
for 4-tap filters. This way we avoid a lot of redundant work to
multiply and add zero, given that some of the 8-tap filters are
zero-padded, so they are effectively 4-tap filters.
Change-Id: Ib5e0377f924df1d893e9436f443fcbe7d196ea27
Rename dot_neon_sve_bridge.h to vpx_neon_sve_bridge.h in order to
reflect that other instructions can be implemented in the header
file. In a subsequent patch, the usage of vtbl with Neon-SVE bridge
intrinsics will be added.
Change-Id: I8f71aad2b7fb4932c9554badf041a80aca58c7cf
Remove the 4-tap Neon DotProd path for the horizontal pass of 2D
convolution since it has been made redundant by the horizontal-
vertical merged implementation. Also move the 8-tap path closer to
where it is used and call it explicitly rather than the filter-
agnostic wrapper.
Change-Id: I1861dc88a67a759c3e8deb0b471ec447a62063f2
The current SBD Neon DotProd approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for 4-tap standard bitdepth 2D
convolution to avoid storing to and re-loading from the intermediate
buffer - giving a 10-25% speedup depending on block size. Merging the
passes for 8-tap filters does not have the same benefit, so keep the
existing implementation.
Change-Id: Ic6008836d1a499ee2cd957b9db194fca5671ccb4
Remove the 4-tap Neon i8mm path for the horizontal pass of 2D
convolution since it has been made redundant by the horizontal-
vertical merged implementation. Also move the 8-tap path closer to
where it is used and call it explicitly rather than the filter-
agnostic wrapper.
Change-Id: Icddecb7e133656c54aa5e79536b49759715b6fcb
The current SBD Neon i8mm approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for 4-tap standard bitdepth 2D
convolution to avoid storing to and re-loading from the intermediate
buffer - giving a 5-40% speedup depending on block size. Merging the
passes for 8-tap filters does not have the same benefit, so keep the
existing implementation.
Change-Id: Ic8ec2822681176ef879dcaf8424d8d91c5e8d2df
With either CONFIG_VP8=0 or CONFIG_VP9=0. Fixes a warning about an extra
';' outside of a function due to VP[89]_INSTANTIATE_TEST_SUITE() being
defined to nothing.
Change-Id: I1878d7596e39c5166efbe96450a733efc08665ea
inter/intra_cost in VP9 TPL is calculated with SATD
which should be close enough to be used as inter/intra_pred_err
Bug: b/326262148
Change-Id: Ic0fd08708fcf3640398fc22a1a6bb6f449b2a9b8
Anonymous unions are not supported in C99, they were added in C11:
https://en.cppreference.com/w/c/language/union
Fixes -Wpendantic warning:
vp9/encoder/vp9_context_tree.h:93:4: warning: ISO C99 doesn’t support
unnamed structs/unions [-Wpedantic]
Change-Id: Ibd29d6deca35d81ea886e80e9f44575c73ecd96d
Fixes a -Wpedantic warning:
vp9/encoder/vp9_rdopt.c:1988:20: warning: invalid use of pointers to
arrays with different qualifiers in ISO C before C2X [-Wpedantic]
Change-Id: I581e21d7e59c0bae0e44056a3b3f049c5a4e7cf2
Add SVE implementation of vpx_highbd_convolve8_horiz function. Add
the corresponding tests as well.
Change-Id: I0b2815831daf203e167ea5289307087ce53ff9da
The new Armv8.0 Neon implementation of 4-tap vertical convolution is
faster than Armv8.4 DotProd and Armv8.6 I8MM implementations. This
patch removes the DotProd and I8MM implementations in favour of using
the Armv8.0 version everywhere.
Change-Id: I126470fd4862d8bb116153e90bb2e4f2f2dba1e4
Refactor Armv8.0 Neon 4-tap convolution functions to operate on 8-bit
types directly, rather than first widening to 16-bit.
2-tap (bilinear) filter values are always positive, but 4-tap filter
values are negative on the outer edges (taps 0 and 3), with taps 1
and 2 having much greater positive values to compensate. To use
instructions that operate on 8-bit types we also need the types to be
unsigned. In the convolution kernel, subtracting the products of taps
0 and 3 from the products of taps 1 and 2 always works since 2-tap
filters are 0-padded.
Co-authored by: Hari Limaye <hari.limaye@arm.com>
Change-Id: I87b32e2ef8cbd21eebb8cd2642e8826b704905b1
The THREADFN and THREAD_EXIT_SUCCESS macros are used to define the
thread start routines passed to our implementation of pthread_create(),
so it makes sense to define these macros in vpx_util/vpx_pthread.h. This
also allows the VP8 and VP9 code to share the macro definitions.
Replace the THREAD_FUNCTION macro by THREADFN. They have the same
definition.
Change-Id: I79a7476e43652667af6a8da7ad7ce346b1b6b024
This helps prevent name clashes if code e.g. #includes headers from both
libvpx and libaom.
Bug: none
Change-Id: Ifc9e7ac4862dc04a399e7777d2636e1453627970
Currently we use two rounds of complex right-shift operations to
narrow and pack results from the dot-product convolution kernels.
This patch refactors these sequences to use one "simple" right-shift
and one complex right-shift - reducing the latency by 4 cycles on
modern out-of-order Arm CPUs.
Change-Id: I3fd38560bb14d85826e417f40d35f11165ab80da
Currently we use two rounds of complex right-shift operations to
narrow and pack results from the dot-product convolution kernels.
This patch refactors these sequences to use one "simple" right-shift
and one complex right-shift - reducing the latency by 4 cycles on
modern out-of-order Arm CPUs.
Change-Id: I908147ed65a87157009363782399ff398406cdf9
- Initialize gop_decision
- Initialize GF group for a new one
- GF group index for key frame special treatment is not needed any more
when key frame is decided by the RC
Bug: b/323050877
Change-Id: Iaf36ea4f671b833f3ba4c524b9799a3093412dfa
The current Neon approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically, average with the dst block and store the final
output.
This patch merges the two phases for high bitdepth 2D convolution to
avoid the storing and re-loading from the intermediate buffer. This
provides a small gain (<5%) for large block sizes but the benefit
increases for small block sizes - as the proportion of compute to
memory access decreases. These effects are amplified further when
considering little (in-order) core performance.
Change-Id: I84f1cafcfbbfa48b2cfe4b20881da9c4bc3b56ac
The current Neon approach to 2D convolution is:
1) Filter horizontally, storing to an intermediate buffer.
2) Filter vertically and store the final output.
This patch merges the two phases for high bitdepth 2D convolution to
avoid the storing and re-loading from the intermediate buffer. This
provides a small gain (<5%) for large block sizes but the benefit
increases for small block sizes - as the proportion of compute to
memory access decreases. These effects are amplified further when
considering little (in-order) core performance.
Change-Id: I8ec13fb9edd642fdb927bf5394a3c2a349d22a29
Add a highbd Neon implementation of the horizontal portion of 2D
convolution specialised for executing with 4-tap filters. This new
path is also used when executing with bilinear (2-tap) filters.
Change-Id: I513e35c4f8857bc89e0def5e9402bc31ddd46440
Add a highbd Neon implementation of vertical convolution specialised
for executing with 4-tap filters. This new path is also used when
executing with bilinear (2-tap) filters.
Change-Id: I30469c7b8e6ccff31d96588a3e4c21b401f1ed09
Add a highbd Neon implementation of horizontal convolution specialised
for executing with 4-tap filters. This new path is also used when
executing with bilinear (2-tap) filters.
Change-Id: Icabeea295af3e0bbeda755168996668cb960b0de
Filter tap reporting was made more granular recently[1] to enable Arm
Neon optimizations that specialise convolution implementations
according to the filter size. This patch removes an assert that
should have been removed during that change - it no longer serves any
purpose to assert that the filter being used is a no-op filter.
This change is a pre-requisite for some highbd Neon convolution
changes that specialise implementations according to filter size.
(Without this change a convolve-copy test would fail should we
interrogate the size of the filter.)
[1] https://chromium-review.googlesource.com/c/webm/libvpx/+/5063929
Change-Id: I2a71680d27134535e6c0663b1668ba1b150b1a6f
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all highbd Neon horizontal convolution algorithms process
4 rows at a time, but this means we end up doing at least 1 row too
much work in the 2D first pass case where we need h + 7, not h + 8
rows of output.
This patch adds an additional Neon path that processes h + 7 rows of
data exactly, saving the work of the unnecessary extra row.
Change-Id: Id6658b4e9e774effc760ff131e188b6907a57676
Call scalar C implementation of 2D convolution immediately if scaling
is required - instead of entering the Neon functions for the
horizontal and vertical passses and then falling back to the scalar
implementation. This has the benefit of being able to allocate a
smaller intermediate buffer.
Change-Id: Icacdd5f3a1401395951b613da1cd6932955bd0f8
There's no reason for these files to be separate, and merging them
will make life easier in subsequent commits adding a horizontal pass
specialised for the first pass of 2D.
Also perform some refactoring for 2D convolution definitions:
- Add a comment deriving the intermediate buffer height.
- Align the intermediate buffers to 32 bytes.
Change-Id: Ib92524396e6f9c58295339de54d08d894ace3bd1
Mostly a cosmetic change:
1) Remove forward declarations.
2) Remove excessive prefetches - some of which were wrong, prefetching
data that had just been loaded.
Change-Id: I17d8accc2abf3a9b2050603f859fce588a1f7178
CONFIG_PROFILE is unused currently. The option can still be selected
because it is in the CMDLINE_SELECT list and interpreted by configure
directly.
Bug: webm:1835
Change-Id: Id9667289113335a10018803f578b255967bd60b1
Move narrowing shift and max value clipping into the 4-pixel-output
kernel. As well as cleaning up the code quite a bit, this also
improves performance by 5-10% as it eliminates the implied top /
bottom register shuffling of the previous approach.
Also clean up the formatting and magic numbers in the 8-pixel-output
kernel.
Change-Id: I77a5e9e317ef4097f187330d4b32973022ba573f
In https://chromium-review.googlesource.com/c/webm/libvpx/+/71356, the
statement
clamp(q, active_best_quality, active_worst_quality);
was added to rc_pick_q_and_bounds_two_pass() (recently renamed
vp9_rc_pick_q_and_bounds_two_pass()).
The result of the clamp() call is not used, so the clamp() call has no
side effect.
Fix Coverity CID 1577645 Useless call:
side_effect_free: Calling
clamp(q, active_best_quality, active_worst_quality) is only useful for
its return value, which is ignored.
Change-Id: I014c3e4caf2bc999fe480000acc4e49e7ad15aaf
Various bits of tidying up to make the code more compact:
- Use appropriate load/store helper functions from mem_neon.h.
- Remove variable forward declarations.
- Use != 0 instead of > 0 in loop termination tests.
- Remove excessive prefetches.
Change-Id: I114cf4d2a34f02acc130558d125d2c191c6c5992
Various bits of tidying up to make the code more compact:
- Use/create appropriate mem_neon.h load/store helper functions.
- Remove variable forward declarations.
- Use != 0 instead of > 0 in loop termination tests.
- Remove excessive prefetches.
Change-Id: Ida7d3c4a3fe084600417f196baa26501c6e2d45a
Initialise result vectors of mem_neon.h helpers with vdup_n_<type>(0)
instead of load-broadcast of the first loaded elements. The former is
more easily optimized by modern compilers.
Change-Id: If967e2bb55523670c3e433dd66d060665e13b4f2
Align the intermediate buffers to 32 bytes and always use a stride of
64, regardless of the actual data block width.
Change-Id: I738eaa711168bc8231d8ac54d9e5e5e87b62e703
Add rdmult to the frame decision as RC can return this information, and
we may want to use it in the future.
Bug: b/323234722
Change-Id: I8ddb7038073d89af1ef84932448b1abaf1937cee
Use uv_crop_(width|height). This fixes an issue with 1 to 2 scaling from
1x1 where the unrounded value would go to zero, resulting in a heap
overflow. This path is only executed when the library is built without
--enable-vp9-highbitdepth.
Bug: b:319964497
Change-Id: I9cb6632f864ec54c045608af86aede20657d6253
(cherry picked from commit 7ad5f4f695)
Observed when built using Visual Studio 2019.
Move 720P image allocation to the heap.
Bug: webm:1831
Change-Id: I4e343af08d2f282618ad1b328a39d7dba5e79654
(cherry picked from commit 43e1c8bf10)
This can happen in the setting of the frame
target size for delta frames, for non-CBR mode
(end_usage != USAGE_STREAM_FROM_SERVER) and with
temporal layers.
In calc_pframe_target_size(): the percent_high
(factor to adjust the target_size) may end up dividing
bits_off_target by total_byte_count. The total_byte_count
is define per layer for temporal layers, so it will be zero
for delta frames if the enhancement layer has never been
encoded before.
Since percent_high is capped to over_shoot_pct, the proposed
fix is to apply this cap if total_byte_count is zero.
Also this CL fixes a few integer overflow issues in setting
the layer target_bandwidth, the recale function, and in
setting target_bits_per_mb.
Unittest is added by Wan-Teh which triggers this issue.
Bug: chromium:1514684
Change-Id: I091158e720ece75d7ab9b7c4d18d30a5783102ab
(cherry picked from commit 43bd567950)
Equivalent to the change to av1_change_config() in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/182413.
Because we call alloc_compressor_data() only if
cm->mi_alloc_size < new_mi_size, this change won't cause
alloc_compressor_data() to be called unnecessarily, unlike the libaom
bug https://crbug.com/aomedia/3526.
Bug: b:317105128
Change-Id: I8a772a1d5c4766846641a6d541a6d861bf76c60f
(cherry picked from commit aef73b22cb)
This change was intended to be cosmetic in that it tweaks some
comments, removes forward declarations and moves some constant
declarations into the kernels where they're used. However, it also
adds some performance for 8-tap vertical convolution paths as it
appears removing forward declarations also removes some false loop-
carried dependencies that the compiler wasn't able to figure out.
Change-Id: Ic58658b10fbe8378062920199819359d2df008de
The updated test will validate the QP / frame type / ARF settings by the
rate controller and callbacks, making sure the callbacks are working as
expected.
Removed the old tests that verify the signals from the encoder, which
are not needed any more.
Change-Id: Ida3c484e2ac520f3e81358d7cbf7918abfdaca54
Disable some tests because they rely on vpx_rc_gop_info_t
which isn't populated when the callback is used for key frame
This parameter will be deleted / cleaned up in the follow-up.
Bug: b/323050877
Change-Id: If1c0476eac8d324c8d5a460bfc9afdb6d93aacdf
Use uv_crop_(width|height). This fixes an issue with 1 to 2 scaling from
1x1 where the unrounded value would go to zero, resulting in a heap
overflow. This path is only executed when the library is built without
--enable-vp9-highbitdepth.
Bug: b:319964497
Change-Id: I9cb6632f864ec54c045608af86aede20657d6253
Simplify the computation of the Armv8.4 DotProd convolution
correction constant. Summing 128 * filter_tap[0,7] is always the same
as 128 * 128 since the filter taps always sum to 128.
Change-Id: I227ba47ae47bed8304a695a2395bcc85f33c245c
Move the convolution kernels using Armv8.4 dotprod and Armv8.6 i8mm
instructions into the respective .c files. These kernels are only used
in the respective .c files so it isn't useful for them to be declared
in a header.
This change also removes the need for feature-macro guarding - which
wasn't being done correctly for MSVC (since Microsoft's Arm
architecture feature macros are named differently to those defined by
GNU-compliant compilers.)
Bug: webm:1838
Change-Id: I495fca2a982c34978b6c9102f144bb9c45352a9a
Move the Arm Neon dotprod and i8mm 2D convolution functions into the
appropriate vpx_convolve8_neon_[dotprod|i8mm].c file. Only the
Armv7/Armv8.0 Neon files needed to be split in this way to allow
linking against a handwritten assembly implementation of the kernels
for Armv7 builds.
Change-Id: Ifc363556c3961aa78b9e53761537d4816c5b9964
This is one commit after the libwebm-1.0.0.31 tag:
affd7f4 In MakeUID(), call rand() under #ifdef _WIN32
Change-Id: I5979a8cd3b064d4f4f0dbeca9f84f6791e593b47
Call indirect RTCD high bitdepth variance functions (instead of the
Neon functions) in the high bitdepth Neon subpel variance paths so
that faster SVE variance functions can be used on CPUs where SVE is
supported.
Change-Id: I04bdef235afac06f2100df0cbaccfb8caef41ac7
Add SVE implementation of get<w>x<h>var functions for 8-, 10-, 12- bit
depth. Add the corresponding tests as well.
Change-Id: Id4feb8726a3eb0a963e3dd8932ee52374a67da48
Add standard and high bitdepth unit tests for vpx_get<w>x<h>var
functions. Enable these unit tests for the C implementation.
Change-Id: I8716fd6a9718dab3eef218a8a60a1efd4c0e316c
Fix Coverity defects CID 1568604 and CID 1568615 (Uninitialized pointer
field). Since the constructors are private and the Create() factory
methods set the cpi_ pointer field, these two Coverity defects are
harmless.
Define the constructors with "= default" instead of "{}".
Change-Id: Ie6b45fce66c23941a9a5c38ee0bccbc4b7d3a2a2
Add SVE implementation of variance functions for 8-, 10-, 12- bit
depth. Add the corresponding tests as well.
Change-Id: I785d85760ad4346cbfbf0f842784b4945870afee
Observed when built using Visual Studio 2019.
Move 720P image allocation to the heap.
Bug: webm:1831
Change-Id: I4e343af08d2f282618ad1b328a39d7dba5e79654
read_yuv_frame() supports VPX_IMG_FMT_NV12. Port its code to
vpx_img_read() and vpx_img_write().
The code in vp9/simple_encode.cc, including img_read(), doesn't support
VPX_IMG_FMT_NV12. Check before the vpx_img_alloc() calls and abort the
process if the image format is VPX_IMG_FMT_NV12.
Bug: chromium:1510090
Change-Id: Ie77e29c2c9ee7a01e6a59c8ad3cbcc769d9f2d4c
If fmt is VPX_IMG_FMT_NONE, currently img_alloc_helper() allocates a
single plane because VPX_IMG_FMT_NONE (0) is not a planar format (the
VPX_IMG_FMT_PLANAR bit is not set in VPX_IMG_FMT_NONE).
Although this seems correct, the problem is that most of the code in
libvpx assumes planar formats and is likely to dereference a null
pointer when it uses img->planes[1]. Also, VPX_IMG_FMT_NONE isn't really
a valid image format. So it is safer to make img_alloc_helper() fail if
fmt is VPX_IMG_FMT_NONE.
Change-Id: I05b47f4b5eceb631a02384b2cce1c2f6fdca8673
This often falls out of sync with the release and the version is already
contained in CHANGELOG.
Bug: webm:1833
Change-Id: Ieee6ca40249bf6e77037fbec30d87b109ca8fe21
Release v1.14.0 Venetian Duck
2024-01-18 v1.14.0 "Venetian Duck"
This release drops support for old C compilers, such as Visual Studio 2012
and older, that disallow mixing variable declarations and statements (a C99
feature). It adds support for run-time CPU feature detection for Arm
platforms, as well as support for darwin23 (macOS 14).
- Upgrading:
This release is ABI incompatible with the previous release.
Various new features for rate control library for real-time: SVC parallel
encoding, loopfilter level, support for frame dropping, and screen content.
New callback function send_tpl_gop_stats for vp9 external rate control
library, which can be used to transmit TPL stats for a group of pictures. A
public header vpx_tpl.h is added for the definition of TPL stats used in
this callback.
libwebm is upgraded to libwebm-1.0.0.29-9-g1930e3c.
- Enhancement:
Improvements on Neon optimizations: VoD: 12-35% speed up for bitdepth 8,
68%-151% speed up for high bitdepth.
Improvements on AVX2 and SSE optimizations.
Improvements on LSX optimizations for LoongArch.
42-49% speedup on speed 0 VoD encoding.
Android API level predicates.
- Bug fixes:
Fix to missing prototypes from the rtcd header.
Fix to segfault when total size is enlarged but width is smaller.
Fix to the build for arm64ec using MSVC.
Fix to copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic.
Fix to -Wshadow warnings.
Fix to heap overflow in vpx_get4x4sse_cs_neon.
Fix to buffer overrun in highbd Neon subpel variance filters.
Added bitexact encode test script.
Fix to -Wl,-z,defs with Clang's sanitizers.
Fix to decoder stability after error & continued decoding.
Fix to mismatch of VP9 encode with NEON intrinsics with C only version.
Fix to Arm64 MSVC compile vpx_highbd_fdct4x4_neon.
Fix to fragments count before use.
Fix to a case where target bandwidth is 0 for SVC.
Fix mask in vp9_quantize_avx2,highbd_get_max_lane_eob.
Fix to int overflow in vp9_calc_pframe_target_size_one_pass_cbr.
Fix to integer overflow in vp8,ratectrl.c.
Fix to interger overflow in vp9 svc.
Fix to avg_frame_bandwidth overflow.
Fix to per frame qp for temporal layers.
Fix to unsigned integer overflow in sse computation.
Fix to uninitialized mesh feature for BEST mode.
Fix to overflow in highbd temporal_filter.
Fix to unaligned loads w/w==4 in vpx_convolve_copy_neon.
Skip arm64_neon.h workaround w/VS >= 2019.
Fix to c vs avx mismatch of diamond_search_sad().
Fix to c vs intrinsic mismatch of vpx_hadamard_32x32() function.
Fix to a bug in vpx_hadamard_32x32_neon().
Fix to Clang -Wunreachable-code-aggressive warnings.
Fix to a bug in vpx_highbd_hadamard_32x32_neon().
Fix to -Wunreachable-code in mfqe_partition.
Force mode search on 64x64 if no mode is selected.
Fix to ubsan failure caused by left shift of negative.
Fix to integer overflow in calc_pframe_target_size.
Fix to float-cast-overflow in vp8_change_config().
Fix to a null ptr before use.
Conditionally skip using inter frames in speed features.
Remove invalid reference frames.
Disable intra mode search speed features conditionally.
Set nonrd keyframe under dynamic change of deadline for rtc.
Fix to scaled reference offsets.
Set skip_recode=0 in nonrd_pick_sb_modes.
Fix to an edge case when downsizing to one.
Fix to a bug in frame scaling.
Fix to pred buffer stride.
Fix to a bug in simple motion search.
Update frame size in actual encoding.
Change-Id: I9c27fb2b917f9b80ed4bcc5cb3b4f87c56b62c2f
Add SVE implementation of MSE functions for 10-, 12- bit depth. Add
the corresponding tests as well.
An implementation was not added for 8 bit depth as the Neon DotProd
version is faster than the SVE implementation.
Change-Id: I0c5712ba2735a2879a0aa3a9a52980032fddc7a6
Enable Neon Dotprod 8-bit high bitdepth implementation for MSE
function as it is now not called with bit depth 10 or 12.
Bug: webm:1819
Change-Id: I9d1d506401aa0523fba2d8ea4978dc00fdacbb95
Instead of always calling highbd_get_block_variance_fn with bit depth
8 use the macroblock's bit depth.
Bug: webm:1819
Change-Id: Ib4b19703384e897ee9ffeef73a11a8af2d262558
For svc with no inter-layer prediction: reset
the RC and force max_qp on all spatial layers
on scene/slide changes. In the current code it was only
reset on current spatial layer because it was assumed
we can predict off lower spatial layer to avoid
prediction across scene change. But this does not apply
when inter-layer prediction is off on delta frames.
Also reset only up to current temporal layer.
Because of the hierarchical prediction structure
only the lower temporal layers need the RC to be reset.
This helps to reduce excessive frame drops for the
full_superframe_drop mode.
Change-Id: I76925681850b82aa7fff7f9b1c1a0a605cf3cf3b
for VPX_CODEC_USE_PSNR. This clears a clang-tidy warning. vpx_encoder.h
exports vpx_codec.h so it shouldn't be necessary.
Change-Id: I863b6f8689eeef59cd9eadf3cdc177247a0653f8
This can happen in the setting of the frame
target size for delta frames, for non-CBR mode
(end_usage != USAGE_STREAM_FROM_SERVER) and with
temporal layers.
In calc_pframe_target_size(): the percent_high
(factor to adjust the target_size) may end up dividing
bits_off_target by total_byte_count. The total_byte_count
is define per layer for temporal layers, so it will be zero
for delta frames if the enhancement layer has never been
encoded before.
Since percent_high is capped to over_shoot_pct, the proposed
fix is to apply this cap if total_byte_count is zero.
Also this CL fixes a few integer overflow issues in setting
the layer target_bandwidth, the recale function, and in
setting target_bits_per_mb.
Unittest is added by Wan-Teh which triggers this issue.
Bug: chromium:1514684
Change-Id: I091158e720ece75d7ab9b7c4d18d30a5783102ab
Add header file containing helper functions to make use of SVE
dot-product intrinsics via the Neon-SVE bridge.
Change-Id: I6cd198f8202559672817cbc19f890db35c03d3ff
GCC already does not allow implicit vector type conversions by default,
add -flax-vector-conversions=none to Clang builds to have the same
behavior.
Change-Id: I9d1adb836377077cf48818c80fe71025e2d2bdc7
Added unitest which triggers the data race in the
bug below, when only C code is forced.
The data race is between the loopfilter and variance
computation from generate_psnr_packet calculation.
Proposed fix is to move the wait for loopfilter thread to
finish up before entering generate_psnr_packet().
Bug: b/266833179.
Change-Id: Id2871c53274be0f404e65601c9a5c98aaead0c72
Equivalent to the change to av1_change_config() in the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/182413.
Because we call alloc_compressor_data() only if
cm->mi_alloc_size < new_mi_size, this change won't cause
alloc_compressor_data() to be called unnecessarily, unlike the libaom
bug https://crbug.com/aomedia/3526.
Bug: b:317105128
Change-Id: I8a772a1d5c4766846641a6d541a6d861bf76c60f
The VpxTpl* structs defined in vpx_tpl.h are only used by the external
rate control library. Add a VPX_TPL_ABI_VERSION component to
VPX_EXT_RATECTRL_ABI_VERSION and remove the VPX_TPL_ABI_VERSION
component from VPX_ENCODER_ABI_VERSION.
The current value of VPX_TPL_ABI_VERSION is 2. It is subtracted from
VPX_EXT_RATECTRL_ABI_VERSION and added to VPX_ENCODER_ABI_VERSION so
that the values of those two macros stay the same.
Add a note to explain why VPX_ENCODER_ABI_VERSION has a
VPX_EXT_RATECTRL_ABI_VERSION component.
Change-Id: I680b8522dc04328cd51df6de590fdec75ca88ae8
Commit db83435 introduced support for configuring for *-darwin23-gcc.
However configuring for *-darwin23-gcc does not currently add the
`-arch` flag to CFLAGS/LDFLAGS, so correct this here.
Change-Id: Ieeda1a5039ad40590dfcdcc6ba615a1d1697d54d
Before release:
c-a=8, a=0, r=1 -> c=8, a=0, r=1
After release:
- If the library source code has changed at all since the last
update, then increment revision:
c=8, a=0, r=r+1=2
- If any interfaces have been added, removed, or changed since
the last update, increment current, and set revision to 0:
c=c+1=9, a=0, r=0
- If any interfaces have been added since the last public release,
then increment age:
c=9, a=a+1=1, r=0
- If any interfaces have been removed or changed since the last
public release, then set age to 0:
c=9, a=0, r=0 (VpxTpl* structure changes)
MAJOR=c-a=9
MINOR=a=0
PATCH=r=0
Bug: webm:1833
Change-Id: Id24c9a0ff415a6f625d17b6098cdd0baf27432e3
Change if to assertion in vp9_extrc_get_encodeframe_decision
Clarify comment for VP9E_ENABLE_EXTERNAL_RC_TPL that
rc_type | VPX_RC_QP must be non zero for this control to work.
Change-Id: I2c54cf7eda1f0f12f4ff7ac929e8e6a1fdd2215d
Performance optimization. get_msb utilizes
the compiler/platform specific last significant bit
operator.
Note: 32 bit unsigned assumed, like all get_msb implementations do.
Change-Id: Ib013ad24aa0ea845efeb52aacd448b067edf91da
This is never used.
A callback in external rc func was added and used instead.
Change-Id: Iade6f361072f0c28af98904baf457d2f0e9ca904
(cherry picked from commit 41ced868a6)
Commit db83435 introduced support for configuring for *-darwin23-gcc.
However configuring for *-darwin23-gcc does not currently add the
`-arch` flag to CFLAGS/LDFLAGS, so correct this here.
Change-Id: Ieeda1a5039ad40590dfcdcc6ba615a1d1697d54d
Explain why the encoder init functions cannot call update_error_state().
In vp8/vp8_cx_iface.c, this comment should have been added in
https://chromium-review.googlesource.com/c/webm/libvpx/+/4506609.
Rewrite update_error_state() in vp8/vp8_cx_iface.c to look like the
versions in vp9/vp9_cx_iface.c and av1/av1_cx_iface.c (in libaom).
Change-Id: I3f153d67b8c549ca5ac8ea0cfbcaad4ae705c8e6
After a longjmp() call in vp8e_encode(), call update_error_state() so
that we return the error code and error detail set by the
vpx_internal_error() call.
Change-Id: I1f2428eb1b1f61e46c02604e16a5d44dcf162479
The function convolve8_4_usdot contains a comment relating to the
SDOT implementation of convolve8, which requires addition of a
correction constant to account for range clamp of the input values.
This is not performed in the i8mm USDOT implementation - so remove the
comment.
Also add some const qualifiers to function arguments.
Change-Id: I10aff560d20403897f708ee293bf873be9c35761
Fix the following clang-tidy misc-include-cleaner warnings:
vp9/encoder/vp9_encoder.c:
no header providing "vp9_is_valid_scale" is directly included
no header providing "VPX_CODEC_CORRUPT_FRAME" is directly included
vp9/vp9_cx_iface.c:
no header providing "valid_ref_frame_size" is directly included
Change-Id: I20e846f5b14c42c72aaefec0718b4ae9c7eea44a
Issue explanation:
The unit test calls set_config function twice after encoding the
first frame.
The first call of set_config reduces frame width, but is still within
half of the first frame.
The second call reduces frame width even more, making is less than
half of the first frame, which according to the encoder logic,
there is no valid ref frames, and this frame should be set as a
forced keyframe. This leads to null pointer access in scale_factors
later.
Solution:
To make sure the correct detection of a forced key frame,
we need to update the frame width and height only when the actual
encoding is performed.
Bug: b/311985118
Change-Id: Ie2cd3b760d4a4b399845693d7421c4eb11a12775
(cherry picked from commit 1ed56a46b3)
This change fixed a bug revealed by b/311294795.
In simple motion search, the reference buffer pointer needs to be
restored after the search. Otherwise, it causes problems while the
reference frame scaling happens. This CL fixes the bug.
Bug: b/311294795
Change-Id: I093722d5888de3cc6a6542de82a6ec9d601f897d
(cherry picked from commit 50ed636e49)
Use vpx_sse and vpx_highbd_sse instead of vpx_mse16x16 and
vpx_highbd_8_mse16x16 respectively to compute SSE for PSNR
calculations. This solves an issue whereby vpx_highbd_8_mse16x16
was being used to calculate SSE for 10- and 12-bit input.
This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063
by Jonathan Wright <jonathan.wright@arm.com>.
Bug: webm:1819
Change-Id: I37e3ac72835e67ccb44ac89a4ed16df62c2169a7
(cherry picked from commit 7dfe343199)
Issue explanation:
The unit test calls set_config function twice after encoding the
first frame.
The first call of set_config reduces frame width, but is still within
half of the first frame.
The second call reduces frame width even more, making is less than
half of the first frame, which according to the encoder logic,
there is no valid ref frames, and this frame should be set as a
forced keyframe. This leads to null pointer access in scale_factors
later.
Solution:
To make sure the correct detection of a forced key frame,
we need to update the frame width and height only when the actual
encoding is performed.
Bug: b/311985118
Change-Id: Ie2cd3b760d4a4b399845693d7421c4eb11a12775
This change fixed a bug revealed by b/311294795.
In simple motion search, the reference buffer pointer needs to be
restored after the search. Otherwise, it causes problems while the
reference frame scaling happens. This CL fixes the bug.
Bug: b/311294795
Change-Id: I093722d5888de3cc6a6542de82a6ec9d601f897d
fseeko and ftello are available on Android only from API level 24. Add
the needed guards for these functions.
Suggested by Yifan Yang.
Change-Id: I3a6721d31e1d961ab10b434ea6e92959bd5a70ab
(cherry picked from commit bf07554183)
This change fixed a corner case bug reealed by b/311394513.
During the frame scaling, vpx_highbd_convolve8() and vpx_scaled_2d()
requires both x_step_q4 and y_step_q4 are less than or equal to a
defined value. Otherwise, it needs to call vp9_scale_and_extend_
frame_nonnormative() that supports arbitrary scaling.
The fix was done in LBD and HBD funnctions.
Bug: b/311394513
Change-Id: Id0d34e7910ec98859030ef968ac19331488046d4
(cherry picked from commit 8bf3649d41)
Need to set skip_recode properly so that
vp9_encode_block_intra() can work properly when it is
called by block_rd_txfm(). We can not skip "recode" because
it is still at the rd search stage.
Bug: b/310340241
Change-Id: I7d7600ef72addd341636549c2dad1868ad90e1cb
(cherry picked from commit f10481dc0a)
no header providing "CONFIG_VP9_HIGHBITDEPTH" is directly included
no header providing "VPX_BITS_8" is directly included
Change-Id: Ie6d78c79ab462501417f2b451bbe808a1fdce931
Since the reference frame is already scaled, do not scale the offsets.
BUG: b/311489136, b/312656387
Change-Id: Ib346242e7ec8c4d3ed26668fa4094271218278ed
(cherry picked from commit 845a817c05)
Use vpx_sse and vpx_highbd_sse instead of vpx_mse16x16 and
vpx_highbd_8_mse16x16 respectively to compute SSE for PSNR
calculations. This solves an issue whereby vpx_highbd_8_mse16x16
was being used to calculate SSE for 10- and 12-bit input.
This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063
by Jonathan Wright <jonathan.wright@arm.com>.
Bug: webm:1819
Change-Id: I37e3ac72835e67ccb44ac89a4ed16df62c2169a7
vpx/vpx_integer.h is clearly intended as the facade header for the
Standard C Library headers <stddef.h>, <inttypes.h>, and <stdint.h>.
It is reasonable to expect that vpx/vpx_decoder.h and vpx/vpx_encoder.h
should provide the symbols from vpx/vpx_codec.h.
Change-Id: I220797e63b2efc3dd9e2ac197fe2f918bf80d247
This change fixed a corner case bug reealed by b/311394513.
During the frame scaling, vpx_highbd_convolve8() and vpx_scaled_2d()
requires both x_step_q4 and y_step_q4 are less than or equal to a
defined value. Otherwise, it needs to call vp9_scale_and_extend_
frame_nonnormative() that supports arbitrary scaling.
The fix was done in LBD and HBD funnctions.
Bug: b/311394513
Change-Id: Id0d34e7910ec98859030ef968ac19331488046d4
Need to set skip_recode properly so that
vp9_encode_block_intra() can work properly when it is
called by block_rd_txfm(). We can not skip "recode" because
it is still at the rd search stage.
Bug: b/310340241
Change-Id: I7d7600ef72addd341636549c2dad1868ad90e1cb
Define the VPX_DL_REALTIME, VPX_DL_GOOD_QUALITY, and VPX_DL_BEST_QUALITY
macros as unsigned long, because the deadline parameter of
vpx_codec_encode() is of the unsigned long type. This enables C++
templates to deduce the unsigned long type from these macros.
Change-Id: I2173e3bbf5e15c84c11843790df93a497a35ed7d
fseeko and ftello are available on Android only from API level 24. Add
the needed guards for these functions.
Suggested by Yifan Yang.
Change-Id: I3a6721d31e1d961ab10b434ea6e92959bd5a70ab
The changes in this CL show that both the VP8 and VP9 implementations of
the decode function eventually discard the deadline parameter. Change
the code to ignore the deadline parameter in vpx_codec_decode() without
passing it to the decode function, and document that the deadline
parameter is ignored and 0 should be passed.
Change-Id: Ia977e16cdbdf97901207aa2d749887980137c4c0
Since the reference frame is already scaled, do not scale the offsets.
BUG: b/311489136, b/312656387
Change-Id: Ib346242e7ec8c4d3ed26668fa4094271218278ed
Add an Armv8.0 MLA Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ic2c3cb307b95964cd0ba86f1c42eece3a8ab7cf4
Add an Armv8.0 MLA Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I027eaf2d1bb9711c2217cc8aa6b1e379d3e66b26
The deadline parameter of vpx_codec_encode() is of the unsigned long
type. The cpplint runtime/int check and the clang-tidy
google-runtime-int warn about the use of the unsigned long type. Adding
a type alias works around this issue.
Note: vpx_codec_decode() also has a deadline parameter, but it is of the
long type. So unfortuntely this type alias cannot be simply named
vpx_codec_deadline_t and the name must suggest it is encoder-specific.
Change-Id: I27b6b25730b620b328422ec3f91e63fdc55b377a
For realtime mode: if the deadline mode (good/best/realtime)
is changed on the fly (via codec_encode() call), force a
key frame and set the speed feature nonrd_keyframe = 1 to
avoid entering the rd pickmode.
nonrd_pickmode=0/off is the only feature in realtime mode that
involves rd pickmode, so by forcing it on/1 we can cleanly
separate nonrd (realtime) from rd (good/best), so we can
avoid possible issues on this dynamic mode switch, such as in
bug listed below.
Dynamic change of deadline, in particular for realtime mode,
involves a lot of coding/speed feature changes, so best to
also force reset with keyframe.
Added unitest that triggers the issue in the bug.
Bug: b/310663186
Change-Id: Idf8fd7c9ee54b301968184be5481ee9faa06468d
Add an Armv8.6 USDOT Neon path for the horizontal portion of 2D
convolution, specialised for executing with 4-tap filters (the most
common filter size for settings --good --cpu-used=1.) This new path
is also used when executing with bilinear (2-tap) filters.
Change-Id: I455e5a94bdcea1358025bd8e4d4c8c62e373aa5d
Add an Armv8.6 USDOT Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I8f7633d9852ebfe8feb9b4a055715f849cccf297
Add an Armv8.4 SDOT Neon path for the horizontal portion of 2D
convolution, specialised for executing with 4-tap filters (the most
common filter size for settings --good --cpu-used=1.) This new path
is also used when executing with bilinear (2-tap) filters.
Change-Id: I5116d10ddb371ac2cf302ef905d06f2140dc7600
Add an Armv8.4 SDOT Neon implementation of horizontal convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ib396681b3f7b8b0eeba94381fbe33a06cf7b4a13
Add an Armv8.6 USDOT Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: Ic893b25541e3317c5d5c270c338f868f080aed7c
Add an Armv8.4 SDOT Neon implementation of vertical convolution
specialised for executing with 4-tap filters (the most common filter
size for settings --good --cpu-used=1.) This new path is also used
when executing with bilinear (2-tap) filters.
Change-Id: I3eb00b5a34f5676b68bda60a2a29be56e3d7d0cd
vpx_get_filter_taps() currently reports either 8-tap or 2-tap.
However, many 8-tap filters are actually 0-padded, resulting in a
lot of redundant work (multiplying by, and adding, 0) when processing
using an 8-tap convolution function. In preparation for adding 2- and
4-tap SIMD implementations for the convolution paths, make the filter
size reporting more granular, stripping any 0 padding. Filter sizes
can now be reported as 2-, 4-, 6- or 8-tap.
Change-Id: I100133aac7173134af34b918c9ad3007d98d6060
Delete redundant transpose/permute code in the Neon dot-product
vertical convolution paths. Variable values were assigned but never
used before subsequent assignment.
Change-Id: I15b29d0c993f56599e0d18ac1d5787e6385d2a3a
The test shows that the comment for kf_max_dist in vpx/vpx_encoder.h
differs from its behavior by one. We should modify the comment to match
the encoding behavior.
Bug: webm:1829
Change-Id: Icdc58b8f6b25353f10ce8ecc481c862bd3fe86df
When all the inter reference frames are invalid, disable the speed
features that bypass intra mode search.
BUG=b/312517065
Change-Id: I246c953fad3be61b9d307da11c752a21a36b90ff
vpx_codec_iface_t is defined as follows:
typedef const struct vpx_codec_iface vpx_codec_iface_t;
Since vpx_codec_iface_t is already a const struct, it is redundant to
add "const" to vpx_codec_iface_t.
Note: I think vpx_codec_iface_t should not have been defined as a const
struct, but it is too late to change that now.
Change-Id: Ifbd3f8a63c1d48e9169ff77fa0b505ea1e65519d
When the reference frame's scaling factor is not in the supported
range, skip using it for motion compensation prediction in the
partition speed features.
BUG=b/312517065
Change-Id: Ie3687186521ad2616be258e80d3e5b16e5f2d5e9
The code is ported from libaom's aom_sse and aom_highbd_sse at
commit 1e20d2da96515524864b21010dbe23809cff2e9b.
The vpx_sse and vpx_highbd_sse functions will be used by vpx_dsp/psnr.c.
Bug: webm:1819
Change-Id: I4fbffa9000ab92755de5387b1ddd4370cb7020f7
is -> if
returns -> computes
in the documentation for ComputeQP().
This is the same as:
9142314c2 ratectrl_rtc.h: fix a few typos
+ remove a duplicate, commented out, version of GetLoopfilterLevel()
Change-Id: I8832e628b63b0b7dac6236631072f36ad55d90e8
Move some internal drop_frame code to separate
function so the external RC can use.
And add new flag setting under VP8E_SET_RTC_EXTERNAL_RATECTRL
to disable vp8_drop_encodedframe_overshoot() for
testing the external RC.
Unittest added for single layer and 3 temporal layers.
Bug: b/280363228
Change-Id: Ibea2f627cc54e7156ff35259a64dd111d42d146c
Older versions of MSVC do not allow declarations after statements in C
files. We don't need to support those versions of MSVC now.
Use -std=gnu99 instead of -std=gnu89.
Change-Id: I76ba962f5a2bca30d6a5b2b05c5786507398ad32
Most are related to include-what-you-use. One is to avoid using the
unsigned long type explicitly (by passing VPX_DL_REALTIME directly to
vpx_codec_encode).
Change-Id: Ieaf3418382ad8516cb4b172f7678893286fcb8cf
Declare the oxcf parameters of vp8_create_compressor() and init_config()
as const. This helps code analysis.
Change-Id: I344ef3e6afc3adced2b2865b7e0057c6d4b1d3c0
Fixes the creation of DT_TEXTREL entries in binaries built with PIE
enabled:
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
This matches the changes made in libaom:
1df26009da aom_configure: only override CONFIG_PIC if not set on cmd line
7235e65746 aom_configure.cmake: detect PIE and set CONFIG_PIC
Change-Id: I0a43e964af2d8eb8c5e7811ce14ad39285eec3a8
- Enable C vs SIMD test for x86 32-bit platform
- Correct a print message in run_tests()
BUG=webm:1800
Change-Id: Ib1ccd3a87a64b5ec6cde524a14d5d1b7e200abfb
Supports single layer and svc. For svc only the
framedrop_mode = FULL_SUPERFRAME_DROP is allowed
for now.
Dropping frames due to overshoot is enabled by the
oxcf->drop_frames_water_mark, which is zero as default.
Note that this CL also allows for drop/skip encoding of
enhancement layers if that layer bitrate is zero.
max_consec_drop is also added, set to INT_MAX as default.
Note that max_consec_drop is only used for svc mode.
It has not been added yet for single layer in libvpx encoder.
Tests added for single layer and svc case.
Change-Id: Ic12f6a0eb3fbf07d8eb8456c46cec27b2e1930d3
Guard hwcap2 feature interrogation on HAVE_NEON_I8MM so that it gets
disabled if neon_i8mm is disabled when configuring the build.
Bug: webm:1825
Change-Id: Ic6ff71f17387b96219591928a583d43560bb7c7a
The intermediate value in the target bandwidth
calculation may exceed integer bounds.
Bug: 308007926
Change-Id: I8288c5820db06a550d88bf91fccc86106996deaa
Signed-off-by: Xiahong Bao <xiahong.bao@nxp.com>
Add 'sve' arch options to the configure, build and unit test files -
adding appropriate conditional options where necessary. Arm SIMD
extensions are treated as supersets in libvpx, so disable SVE if
either Neon DotProd or I8MM are unavailable.
Change-Id: I39dd24f2b209251084d1e28d7ac68099460309bb
- Use smaller frame size that still triggers the overflow
- Do not run encoder as the encoder init also triggers the overflow
Bug: chromium:1492864
Change-Id: I392549abf69f1cfb3754cc847a214513ec9bedc5
Frame size caps the target bitrate internally, so the frame size needs
to be large enough to reproduce the target bitrate overflow in the
fuzzing test.
However the frame size needed exceeds the max buffer allowed on 32bit
system defined by VPX_MAX_ALLOCABLE_MEMORY
Bug: chromium:1492864
Change-Id: Ia3a9a78cd35516373897039a7769b492e29e8450
avg_frame_bandwidth = target_bandwidth / framerate
If target_bandwidth is too big and/or framerate is too small (< 1),
avg_frame_bandwidth could be overflow
Bug: chromium:1492864
Change-Id: I32314da1414b472ae4bf2acdcd81b8a948286146
A speed feature disable_split_mask (set to 63) could cause no mode and
partition to be selected in rd_pick_partition because:
-> thresh_mult_sub8x8 all INT_MAX
-> All modes skipped for sub8x8 blocks
-> found_best_rd is 0 -> break from the loop of 4 sub blocks
-> sum_rdc is INT_MAX -> No rd update -> should_encode_sb is 0
-> Propagating to top of the tree
-> No partition / mode is selected
Bug: b/290499385
Change-Id: Ia655e262f3b32445347ae0aaf1a2d868cea997f3
Port the following libaom CLs to libvpx:
https://aomedia-review.googlesource.com/c/aom/+/178361https://aomedia-review.googlesource.com/c/aom/+/180701https://aomedia-review.googlesource.com/c/aom/+/181821
The tests themselves are not feature-gated in the same way that they are
used in the rest of the codebase since they are not controlled by
rtcd.pl. This means that tests that assume the existence of features not
present on the target can cause SIGILL to be thrown.
This commit extends init_vpx_test.cc to match the behaviour for other
targets and automatically disable testing for features that are not
available on the machine running the tests.
Call arm_cpu_caps() and x86_simd_caps() inside #if !CONFIG_SHARED.
All the SIMD-specialized functions (arm or x86) are internal functions,
so they are not exported from the libvpx shared library. If
CONFIG_SHARED is 1, it is not necessary to call arm_cpu_caps(),
x86_simd_caps(), and append_negative_gtest_filter() either.
Change-Id: I330631816bdb52842020c5aa2a1ad802865cc285
Fix the TODO(https://crbug.com/1486441) comment in vp8/vp8_cx_iface.c.
Make vp8cx_create_encoder_threads() work after it has been called
before. If there are already the exact number of threads it needs to
create, return immediately. Otherwise, shut down the existing threads
(by calling vp8cx_remove_encoder_threads()) and create the required
number of threads.
Call vp8cx_create_encoder_threads() in vp8e_set_config() to respond to
changes in g_threads or g_w (which also affects the number of threads
through cm->mb_cols and cpi->mt_sync_range).
Change-Id: I552eeca5b1f1f5313f59559eb1da396f270a2429
Add the mt_current_mb_col_size field to VP8_COMP to record the size of
the mt_current_mb_col array.
Move the allocation of the mt_current_mb_col array from
vp8_alloc_compressor_data() to vp8_encode_frame(), where the use of
mt_current_mb_col starts. Allocate mt_current_mb_col right before use
if mt_current_mb_col hasn't been allocated or if the current size is
incorrect.
Move the deallocation of the mt_current_mb_col array from
dealloc_compressor_data() to vp8cx_remove_encoder_threads().
Move the TODO(https://crbug.com/1486441) comment from
vp8/encoder/onyx_if.c to vp8/vp8_cx_iface.c.
Change-Id: Ic5a0793278c2cc94876669aaa0dd732412876673
This CL adds an `emms` instruction at the end of the MMX assembly
for the vpx_subtract_block function, to properly clear the register
state. This resolves a mismatch between x86 build and C only build.
BUG=webm:1816
Change-Id: I79d2947da7f587f3558a2ae17df214d2faf59e74
Make vp8cx_create_encoder_threads() undo everything cleanly before
returning an error.
Make vp8cx_remove_encoder_threads() reset pointer fields to NULL after
freeing them, reset encoding_thread_count to 0, and reset b_lpf_running
to 0 (false). This makes it safe to call vp8cx_create_encoder_threads()
after calling vp8cx_remove_encoder_threads().
Change-Id: I586f06ce3d5b1c88ca46884bb4d6667ffc97e440
Fix the following compiler warning when libvpx is configured with
the --disable-multithread option:
vp9/common/vp9_thread_common.c:391:7: warning:
variable 'cur_row' set but not used [-Wunused-but-set-variable]
int cur_row;
^
Change-Id: I53aa279152715083df40990eb7fdcaeb77a66777
vp8cx_create_encoder_threads() caps the thread count at
(cm->mb_cols / cpi->mt_sync_range) - 1. If cfg.g_w is 16, cm->mb_cols is
only 1 (see vp8_alloc_frame_buffers: mb_cols = width >> 4), so we won't
be using multiple threads. To reproduce bug chromium:1486441, the test
just needs to increase cfg.g_h sufficiently.
Bug: chromium:1486441
Change-Id: Ie6b2da2e31cfa1717a481f55eebc8c875db94d87
Use $PWD to get the current directory.
Quote directory pathnames.
Suggested by James Zern.
Bug: webm:1800
Change-Id: I51e922b24da0e89d936370f858eab55d193ebdcb
These functions assume the uint16_t samples are <= 255 (bit depth 8),
but vpx_highbd_8_mse16x16() is called for any bit depth, not just 8.
A better fix is to port the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/175063 to libvpx, but
that requires porting aom_sse() and aom_highbd_sse() to libvpx, which is
quite involved. So disable vpx_highbd_8_mse16x16_neon_dotprod, etc.
first.
Bug: webm:1819
Change-Id: If495a5dedc58d9981317b9993c9fbb81ff3ab50c
libvpx 1.13.1
2023-09-29 v1.13.1 "Ugly Duckling"
This release contains two security related fixes. One each for VP8 and VP9.
- Upgrading:
This release is ABI compatible with the previous release.
- Bug fixes:
https://crbug.com/1486441 (CVE-2023-5217)
Fix to a crash related to VP9 encoding (#1642)
* tag 'v1.13.1':
update CHANGELOG
update version to 1.13.1
Fix bug with smaller width bigger size
vp9_alloccommon: clear allocation sizes on free
VP8: disallow thread count changes
encode_api_test: add ConfigResizeChangeThreadCount
README: update release version to 1.13.0
Bug: webm:1818
Change-Id: I732e2423f635d4115890f00fd63f9886e31f39a6
use sizeof(var) instead of sizeof(type) and sizeof(*var) instead of
sizeof(var[0]) for consistency in some places.
Change-Id: Ibd9a783cfef5ce1d06131df3831a4093821a502f
SO_VERSION_MAJOR = 8
SO_VERSION_MINOR = 0
SO_VERSION_PATCH = 1
The increase of the patch number corresponds to the revision number in
the libtool text.
3. If the library source code has changed at all since the last update,
then increment revision (‘c:r:a’ becomes ‘c:r+1:a’).
Bug: webm:1818
Change-Id: Ia114368e9fd7a908e7fcf6e4d3142f142770e3f4
Fixed previous patch that clusterfuzz failed on.
Local fuzzing passing overnight.
Bug: webm:1642
Change-Id: If0e08e72abd2e042efe4dcfac21e4cc51afdfdb9
(cherry picked from commit 263682c9a2)
This fixes reallocations (and avoids potential crashes) if any
allocations fails and the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: If5dc96b73c02efc94ec84c25eb50d10ad6b645a6
(cherry picked from commit 02ab555e99)
When the next frame is null and the current frame is an overlay
frame, which is equivalent to there is an active alt ref frame,
we call this an end of sequence.
Change-Id: I49c2cf7a001df98aff8b62ba034317e408274bd4
Currently allocations are done at encoder creation time. Going from
threaded to non-threaded would cause a crash.
Bug: chromium:1486441
Change-Id: Ie301c2a70847dff2f0daae408fbef1e4d42e73d4
(cherry picked from commit 3fbd1dca6a)
Update thread counts and resolution to ensure allocations are updated
correctly. VP8 is disabled to avoid a crash.
Bug: chromium:1486441
Change-Id: Ie89776d9818d27dc351eff298a44c699e850761b
(cherry picked from commit af6dedd715)
Currently allocations are done at encoder creation time. Going from
threaded to non-threaded would cause a crash.
Bug: chromium:1486441
Change-Id: Ie301c2a70847dff2f0daae408fbef1e4d42e73d4
Update thread counts and resolution to ensure allocations are updated
correctly. VP8 is disabled to avoid a crash.
Bug: chromium:1486441
Change-Id: Ie89776d9818d27dc351eff298a44c699e850761b
define_gf_group is called at the last frame of each GOP to get GOP size
for next one, which means it'll also be called at the last GOP of the
sequence, when calling WebM RC will be returned with error since WebM RC
does not have any more GOP to return.
When gop_coding_frames from the encoder is 1, it means it's running out
of firstpass stats, which means end of sequence.
Bug: b/299610956
Change-Id: I30e077a28fe41593ebabbc1dc0c2915a4bcbece3
This cpu detection implementation doesn't do anything MSVC specific,
it just calls the IsProcessorFeaturePresent function. This can be
compiled with mingw compilers just as well.
Change-Id: I55e607a47c8f5b70d9f707ef96b2fa7553f2f79f
The original ref frame index was the index in the GF group; RC expects
the index to be the one for ref frame buffer.
Change-Id: I9a2b0e72b6332023fb2e8da131b557f82db02e39
Arm Neon DotProd implementations of vpx_highbd_8_mse<w>x<h> currently
need to be enabled at compile time since they're guarded by #ifdef
feature macros. Now that run-time feature detection has been enabled
for Arm platforms, expose these implementations with distinct
*neon_dotprod names in a separate file and wire them up to the build
system and rtcd.pl. Also add new test cases for the new functions.
Change-Id: I26be6fb587258c8fa9fbf03509b7602358a001a8
Enable Arm Neon DotProd implementations of vpx_get_var_sse_sum*
specialty variance functions via run-time feature detection, wiring
up the new *neon_dotprod names to rtcd.pl. Also add new test cases.
Change-Id: I04ac3db87d32ee7f94702b6c0360254e5688f713
Arm Neon DotProd implementations of vpx_variance<w>x<h> currently
need to be enabled at compile time since they're guarded by #ifdef
feature macros. Now that run-time feature detection has been enabled
for Arm platforms, expose these implementations with distinct
*neon_dotprod names in a separate file and wire them up to the build
system and rtcd.pl. Also add new test cases for the new functions.
Remove the _neon suffix in functions making reference to
vpx_variance<w>x<h>_neon() (e.g. sub-pixel variance) - enabling use
of the appropriate *neon or *neon_dotprod version at run time.
Similar changes for the specialty variance and MSE functions will be
made in a subsequent commit.
Change-Id: I69a0ef0d622ecb2d15bd90b4ace53273a32ed22d
Arm Neon DotProd implementations of vpx_sad*4d currently need to be
enabled at compile time since they're guarded by ifdef feature
macros. Now that run-time feature detection has been enabled for Arm
platforms, expose these implementations with distinct *neon_dotprod
names in separate files and wire them up to the build system and
rtcd.pl. Also add new test cases for the new DotProd functions.
Change-Id: Ie99ee0b03ec488626f52c3f13e4111fe26cc5619
Arm Neon DotProd implementations of vpx_sad* currently need to be
enabled at compile time since they're guarded by ifdef feature
macros. Now that run-time feature detection has been enabled for Arm
platforms, expose these implementations with distinct *neon_dotprod
names in separate files and wire them up to the build system and
rtcd.pl. Also add new test cases for the new DotProd functions.
Change-Id: Ic6906c28240276ba89787eadbc9393a232374f95
Arm Neon DotProd and I8MM implementations of vpx_convolve8* currently
need to be enabled at compile time since they're guarded by ifdef
feature macros. Now that run-time feature detection has been enabled
for Arm platforms, expose these implementations with distinct
*neon_dotprod/*neon_i8mm names in separate files and wire them up to
the build system and rtcd.pl. Also add new test cases for the new
DotProd and I8MM functions.
Change-Id: I3db3cd62e8596099d9fec7805ca3ee86b2a01c74
1) Overhaul the Arm CPU feature detection code, taking inspiration
from similar recent changes in libaom.
2) Add neon_dotprod and neon_i8mm arch options in the configure,
build and unit test files, adding appropriate conditional options
where necessary.
3) Soft-enable run-time CPU feature detection by default for both 32-
bit and 64-bit Arm platforms.
Change-Id: I3f13317d88324acc5753394351188baa8d18a261
Simplify the parameters and return values of the Neon MSE helper
functions for both standard and high bitdepth - avoiding unused
return values.
Change-Id: I6f9208f9ce890fbe58346d9c7d9d701f28f2f90f
Overflow was happening in two places:
one in set_encoder_config(), where the input
layer_target_bitrates are converted from kbps to bps,
the other in vp9_calc_pframe_target_size_one_pass_vbr(),
where target is scaled by kf_ratio.
vp9_ratectrl.c:2039: runtime error: signed integer overflow:
-137438983 * 25 cannot be represented in type 'int'
Bug: chromium:1475943
Change-Id: I1ab0980862548c8827fae461df9a7a74425209ff
vp9/encoder/vp9_ratectrl.c:2171:23: runtime error: signed integer
overflow: 103079280 * -22 cannot be represented in type 'int'
Bug: chromium:1473268
Change-Id: Ic1de7d48e74d94c2a992e53ec4382b5b44dba7af
in calc_iframe_target_size():
vp8/encoder/ratectrl.c:349:31: runtime error: signed integer overflow:
38 * 343597280 cannot be represented in type 'int'
Bug: chromium:1473473
Change-Id: Ie8f7b147efb27c92314df09837b66f7d97046883
Remove '= {}' (C23 [1]) and use memset to clear a vpx_rc_config_t
instance.
after:
6e2c3b9b3 Add RC mode to vpx external RC interface
Fixes compile with -pedantic and Microsoft's cl compiler.
[1] https://en.cppreference.com/w/c/language/initialization
Change-Id: I2019cdf0c42103cfc80b1e58c68b7596e497007f
Use an array for constant initialization rather than array syntax which
assumes the underlying type is a vector. Fixes compile error with
cl targeting Windows Arm64:
vpx_dsp\arm\fdct4x4_neon.c(55,52): error C2078: too many initializers
No change in assembly with gcc 12.2.0 & clang 14.
Bug: b/277255390
Bug: webm:1810
Fixed: webm:1810
Change-Id: Ia30edcdbb45067dfe865b9958a5eecf1fd9ddfc8
after:
22818907d normalize *const in rtcd
fixes warnings of the form:
vpx_dsp\x86\quantize_avx.c(145): warning C4028: formal parameter 2
different from declaration
Change-Id: I4dc423f11ec4a9171e18bdb6be2fa8dfb65ee61a
Fix a bug in vpx_int_pro_row_neon (increment pointer after peeled
first loop iteration) and re-enable both vpx_int_pro_row/col_neon
paths.
Also fix IntProRowTest to use width_ (instead of 0) as the src_stride
for the input data block. The test's use of 0 for src_stride is the
reason the tests passed with the buggy Neon implementation noted in
the listed bugs. (The old buggy Neon implementation fails the
adjusted unit tests.)
BUG=webm:1800
BUG=webm:1809
Change-Id: I1f4572ee155653a7596fe2c10b5938ea7a3f63ae
Arm SIMD testing was enabled in c vs SIMD bit-exactness test after
arm SIMD mismatch was resolved.
BUG=webm:1800
Change-Id: Id60127313a0955f4a5c8468281fd5a441668fddb
The vpx_int_pro_row/col neon SIMD version caused a mismatch between
neon encoding vs c encoding. Disabled them for now to ensure the
correctness of VP9 encoding on the arm platform. Since these 2
functions were not used much, so this wouldn't affect the overall
encoder speed much.
BUG=webm:1800
BUG=webm:1809
Change-Id: Id1a7d542fc03d4cf9fa1039a49832abf35fb722f
- Include vpx/vpx_ext_ratectrl.h in vp9_ext_ratectrl.c
- Include vpx/internal/vpx_codec_internal.h
- Include <stddef.h> for NULL
Bug: b/294049605
Change-Id: Iedd8b3864da27fde1678bfa6606e6fc5630a7a09
- Use zero initializer instead of memset to avoid including <cstring>
- Include vpx_codec.h for vpx_codec_err_t and error codes
- Include vpx_tpl.h for VpxTplGopStats
Change-Id: Iac5837ce2173bd945bfe8eeb401ff4dfd04fd2e1
This CL adds a shell script to test bit exactness of C and SIMD
VP9 encoder for x86 platform.
As C Vs NEON encoding outputs are not bit-exact (BUG=webm:1809),
ARM tests are currently disabled.
BUG=webm:1800
Change-Id: Iffcc70863e8cf83ccb5bc5be73e8866165697358
apply similar steps as to the other quantize functions to switch to
macroblock_plane and ScanOrder
Change-Id: I486d653326aaf52ffd3beafd2e891ba6a5d57ef3
Pass macroblock_plane and ScanOrder instead of looking up the values
beforehand. Avoids pushing arguments to the stack.
Change-Id: I22df6f645eb1a1d89ba5a4d9bc58acb77af51aa9
Update functions in WRITE_COMPRESSED_STREAM blocks, which are disabled
by default. This caused them to be missed in:
84e6b7ab0 test/*.cc: prefer 'override' to 'virtual'
Change-Id: I0e462263f19c15eb0a30d0c0f4e145062f789489
In file included from ../test/bench.cc:14:
../test/bench.h:17:7: warning: 'AbstractBench' has virtual functions but
non-virtual destructor [-Wnon-virtual-dtor]
class AbstractBench {
Change-Id: Ibbfb949b63c8dff936c7ed4f2d056dea0343377b
With gcc 13.1.1
In function ‘handle_inter_mode’,
inlined from ‘vp9_rd_pick_inter_mode_sb’ at
../vp9/encoder/vp9_rdopt.c:3872:17:
../vp9/encoder/vp9_rdopt.c:3142:8: warning: ‘tmp_rd’ may be used
uninitialized [-Wmaybe-uninitialized]
3142 | rd = tmp_rd + RDCOST(x->rdmult, x->rddiv, rs, 0);
../vp9/encoder/vp9_rdopt.c: In function ‘vp9_rd_pick_inter_mode_sb’:
../vp9/encoder/vp9_rdopt.c:2846:15: note: ‘tmp_rd’ was declared here
2846 | int64_t rd, tmp_rd, best_rd = INT64_MAX;
Change-Id: I8608957cc8bbeb1ae525f3c3dad6fe9785b2a9b4
These were removed in If7a49e920e12f7fca0541190b87e6dae510df05c but
the leftovers can cause a build to fail if the code isn't optimized out.
I just found this out in the Meson port of libvpx for GStreamer.
BUG=webm:1584
Change-Id: I1c953720a2cbec3796200d4ec4020dca0b672bfb
vp9/common/vp9_mfqe.c|240 col 16| warning: code will never be executed
[-Wunreachable-code]
BLOCK_SIZE mfqe_bs, bs_tmp;
^~~~~~~
Change-Id: I566b20d8c294e19bc4b90b57b730f933048e71a5
Based on the change in libaom:
fe36011455 Fix Clang -Wunreachable-code-aggressive warnings
Clang's -Wunreachable-code-aggressive flag enables several warning flags
such as -Wunreachable-code-break and -Wunreachable-code-return. Chrome's
build system enables -Wunreachable-code-aggressive (in
build/config/compiler/BUILD.gn), so it would be good if libvpx could be
compiled without -Wunreachable-code-aggressive warnings.
This requires the VPX_NO_RETURN macro be defined correctly for all the
compilers we support, otherwise some compilers may warn about missing
return statements after a die() or fatal() call (which does not return).
Change-Id: I0c069133af45a7a61759538b6d74c681ea087dcd
This fixes a crash if the application continues to call
vpx_codec_decode(). Previously a non-keyframe could cause a crash if the
decoder failed before fully initializing due to an allocation failure.
The stream info and frame resolution would be 0, skipping an allocation.
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I1c17302f4d3a488ba3b4eefe0bf53853dc558bc1
This fixes a crash if the application continues to call
vpx_codec_decode(). Previously the decoder instance would be freed,
causing a crash when attempting to access it with restart_threads=1.
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: Ic084894b776729bb1572f747082cef002f0832a8
This fixes a crash if the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp8 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I9867f5fc3d1163026f521a9609d3cbbc00568d1d
This avoids a crash if any of the thread allocations fail and the
application continues to call vpx_codec_decode(). Previously
num_tile_workers would be non-zero, but not equal to num_threads, which
would cause a crash during later thread management.
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: Ie3faf7b36764aebedac0924acb6e4cb7545aec7d
This fixes reallocations (and avoids potential crashes) if any
allocations fails and the application continues to call
vpx_codec_decode().
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: If5dc96b73c02efc94ec84c25eb50d10ad6b645a6
If any allocations fail in init_decoder() and the application continues
to call vpx_codec_decode() some of the allocations would be orphaned or
the decoder would be left in a partially initialized state.
Found with vpx_dec_fuzzer_vp9 & Nallocfuzz
(https://github.com/catenacyber/nallocfuzz).
Bug: webm:1807
Change-Id: I44f662526d715ecaeac6180070af40672cd42611
A right shift by 2 is equivalent to two halving operations if there is
no no addition or subtraction between the two halving operations.
Note: Since vhaddq_s16() and vhsubq_s16() have 17-bit intermediate
precision, the Neon code doesn't need to go to int32_t as was done in
https://chromium-review.googlesource.com/c/webm/libvpx/+/4604169.
Change-Id: Ibe0691cde0fd3b94ee7c497845ba459d30d503b0
The corresponding case block is not only for ARM.
Original comment text makes reader confused.
Test: N/A, just comment text changes.
Change-Id: I3154d18d3b3d237c1eecfe07dc7ec237c98194cf
Signed-off-by: Chen Wang <wangchen20@iscas.ac.cn>
This CL resolves the mismatch between C and intrinsic implementation
of vpx_hadamard_32x32 function. The mismatch was due to integer
overflow during the addition operation in the intrinsic functions.
Specifically, the addition in the intrinsic function was performed
at the 16-bit level, while the calculation of a0 + a1 resulted in
a 17-bit value.
This code change addresses the problem by performing
the addition at the 32-bit level (with sign extension) in both SSE2
and AVX2, and then converting the results back to the 16-bit level
after a right shift.
STATS_CHANGED
Change-Id: I576ca64e3b9ebb31d143fcd2da64322790bc5853
impace -> impact
taget -> target
prediciton -> prediction
addtion -> addition
the the -> the
Bug: webm:1803
Change-Id: I759c9d930a037ca69662164fcd6be160ed707d77
Dont -> Don't
setings -> settings
thresold -> thresh
thresold -> threshold
becasue -> because
itterations -> iterations
its a -> it's a
an constant -> a constant
Bug: webm:1803
Change-Id: I1e019393939ed25c59c898c88d4941ec360b026d
In the function vp9_diamond_search_sad_avx(), arranged
the cost vector in a specific order. This ensures that
the motion vector with the least index is selected,
when there exists more than one candidate motion
vector with the minimum cost, thus resolving the
c vs avx mismatch.
STATS_CHANGED
Change-Id: I4f8864f464f9ea2aae6250db3d8ad91cb08b26e2
Double the number of accumulator registers to remove the bottleneck.
Also peel the first loop iteration.
Change-Id: I6a90680369f9c33cdfe14ea547ac1569ec3f50de
* changes:
vpx_dsp_common.h,clip_pixel: work around VS2022 Arm64 issue
fdct_partial_neon.c: work around VS2022 Arm64 issue
fdct8x8_test.cc: work around VS2022 Arm64 issue
New file (vpx_tpl.c) in the following CLs will add new APIs dealing with
TPL stats from VP9 encoder.
Change-Id: I5102ef64214cba1ca6ecea9582a19049666c6ca4
This CL refactors the code related to convolve function.
Furthermore, improved the AVX2 intrinsic to compute
convolve vertical for w = 4 case, and convolve horiz for
w = 16 case.
Please note the module level scaling w.r.t C function
(timer based) for existing (AVX2) and new AVX2 intrinsics:
Block Scaling
Size AVX2 AVX2
(existing) (New)
4x4 5.34x 5.91x
4x8 7.10x 7.79x
16x8 23.52x 25.63x
16x16 29.47x 30.22x
16x32 33.42x 33.44x
This is a bit exact change.
Change-Id: If130183bc12faab9ca2bcec0ceeaa8d0af05e413
2D 8-tap convolution filtering is performed in two passes -
horizontal and vertical. The horizontal pass must produce enough
input data for the subsequent vertical pass - 3 rows above and 4 rows
below, in addition to the actual block height.
At present, all Neon horizontal convolution algorithms process 4 rows
at a time, but this means we end up doing at least 1 row too much
work in the 2D first pass case where we need h + 7, not h + 8 rows of
output.
This patch adds additional dot-product (SDOT and USDOT) Neon paths
that process h + 7 rows of data exactly, saving the work of the
unnecessary extra row. It is impractical to take a similar approach
for the Armv8.0 MLA paths since we have to transpose the data block
both before and after calling the convolution helper functions.
vpx_convolve_neon performance impact: we observe a speedup of ~9% for
smaller (and wider) blocks, and a speedup of 0-3% for larger blocks.
This is to be expected since the proportion of redundant work
decreases as the block height increases.
Change-Id: Ie77ad1848707d2d48bb8851345a469aae9d097e1
This avoids link errors related to the sanitizers:
https://clang.llvm.org/docs/AddressSanitizer.html#usage
"When linking shared libraries, the AddressSanitizer run-time is not
linked, so -Wl,-z,defs may cause link errors ..."
See also:
https://crbug.com/aomedia/3438
Bug: webm:1801
Fixed: webm:1801
Change-Id: Ie212318005a5f7222e5486775175534025306367
1) Use #define constant instead of magic numbers for right shifts.
2) Move saturating narrow into helper functions that return 4-element
result vectors.
3) Use mem_neon.h helpers for load/store sequences in Armv8.0 paths.
4) Tidy up: assert conditions and some longer variable names.
5) Prefer != 0 to > 0 where possible for loop termination conditions.
Change-Id: Idfcac43ca38faf729dca07b8cc8f7f45ad264d24
* changes:
vp8_[cd]x_iface: clear setjmp flag on function exit
vp9_decodeframe,tile_worker_hook: relocate setjmp=1
vp9,encoder_set_config: set setjmp flag after setjmp()
rather than define new targets, add a platform to the arm64 list as they
share the same configuration.
Bug: webm:1788
Change-Id: Iac020280b1103fb12b559f21439aeff26568fba4
x86 and armv7 are skipped for now as the intrinsics will need different
flags than cl.exe (/arch:... -> -m...).
Bug: webm:1788
Change-Id: I8ca8660a8644cdd84c51cb1f75005e371ba8207d
Contains the size of GOP - also the size of the list of TPL stats for
each frame in this GOP.
VpxTplGopStats will be the unit for VP9E_GET_TPL_STATS control to return
TPL stats from the encoder.
Bug: b/273736974
Change-Id: I1682242fc6db4aafcd6314af023aa0d704976585
There were multiple implementations of CHECK_MEM_ERROR across the
library that take different arguments and used in different places.
This CL will unify them and have only one implementation that takes
vpx_internal_error_info.
Change-Id: I2c568639473815bc00b1fc2b72be56e5ccba1a35
* changes:
Overwrite cm->error->detail before freeing
Have vpx_codec_error take const vpx_codec_ctx_t *
Add comments about vpx_codec_enc_init_ver failure
Added AVX2 intrinsic optimization for the following functions
1. vpx_idct16x16_256_add
2. vpx_idct32x32_1024_add
3. vpx_idct32x32_135_add
The module level scaling w.r.t C function (timer based) for
existing (SSE2) and new AVX2 intrinsics:
Scaling
Function Name SSE2 AVX2
vpx_idct32x32_1024_add 3.62x 7.49x
vpx_idct32x32_135_add 4.85x 9.41x
vpx_idct16x16_256_add 4.82x 7.70x
This is a bit-exact change.
Change-Id: Id9dda933aa1f5093bb6b35ac3b8a41846afca9d2
Help detect use after free of the return value of
vpx_codec_error_detail(). If vpx_codec_error_detail() is called after
vpx_codec_encode() fails, the return value may be equal to
cm->error->detail, which is freed when vpx_codec_destroy() is called.
Document the lifetime of the string returned by
vpx_codec_error_detail().
Change-Id: I8089e90a4499b4f3cc5b9cfdbb25d72368faa319
Also have vpx_codec_error_detail take vpx_codec_ctx_t *. Both functions
are getter functions that don't modify the codec context.
Change-Id: I4689022425efbf7b1da5034255ac052fce5e5b4f
Address the questions:
1. If vpx_codec_enc_init_ver() fails, should I still call
vpx_codec_destroy() on the encoder context?
2. Is it safe to call vpx_codec_error_detail() when
vpx_codec_enc_init_ver() failed?
Change-Id: I1b0e090d11dd9f853fe203f4cbb6080c3c7b0506
I realized the calculation of the size of the list of VpxTplBlockStats
is non-trivial. So it's better to add the field for the size.
Bug: b/273736974
Change-Id: Ic1b50597c1f89a8f866b5669ca676407be6dc9d8
This allows AArch64 to be correctly detected when building with Visual
Studio (cl.exe) and fixes a crash in vp9_diamond_search_sad_neon.c.
There are still test failures, however.
Microsoft's compiler doesn't define __ARM_FEATURE_*. To use those paths
we may need to rely on _M_ARM64_EXTENSION.
Bug: webm:1788
Bug: b/277255076
Change-Id: I4d26f5f84dbd0cbcd1cdf0d7d932ebcf109febe5
This will allow identifying Windows Visual Studio targets as aarch64;
the Microsoft compiler does not define __aarch64__.
An alternative would be to define this in the code, checking for
_M_ARM64 or _M_ARM64EC. For now we'll use the existing VPX_ARCH_*
system. For compatibility VPX_ARCH_ARM will continue to be defined to 1
in this case.
Bug: webm:1788
Bug: b/277255076
Change-Id: I12e25710891e86f0c7339ba96884c18ed90ba16f
Get ready for changes to follow:
- Custom reader/writer IO functions
- Codec control to get TPL stats from the encoder
Move the definition of TplFrameStats to public header so applications
can use them directly.
Bug: b/273736974
Change-Id: Ieb0db4560ddd966df1bc01f6a7e179cc97f9bac1
Joint motion search during compound mode eval is optimized by
reducing the number of mv search iterations based on bsize.
The sf 'comp_inter_joint_search_thresh' is renamed as
'comp_inter_joint_search_iter_level' and used to add the logic.
cpu Testset Instr. Cnt BD Rate loss (%)
Red (%) avg. psnr ovr.psnr ssim
0 LOWRES2 5.373 0.0917 0.1088 0.0294
0 MIDRES2 3.395 0.0239 0.0520 0.0783
0 HDRES2 2.291 0.0223 0.0301 0.0053
0 Average 3.686 0.0460 0.0636 0.0377
STATS_CHANGED
Change-Id: I7ee8873ebc8af967382324ae8f5c70c26665d5e6
This is a reland of commit 3c59378e4e
Addressed issues from the previous CL:
- Both recon_error and rate_cost are scaled up
- recon_error and rate_cost are not accumulated across ref frames,
instead they are calculated with the best ref frame picked.
- get_quantize_error() is put where it was, so there is no behavior
change for vp9.
Bug: b/273736974
Original change's description:
> Calculate recrf_dist and recrf_rate
>
> Change-Id: I74e74807436b92d729e2ccaab96149780f1f52d9
Change-Id: I20e1f5543e83b576a074bd4e6b44d99da65f4b56
This reverts commit 3c59378e4e.
Reason for revert:
recon_error and recon_rate is summed by mistake across reference frames, as pointed out by Angie.
It could also cause vp9 behavior changes.
Original change's description:
> Calculate recrf_dist and recrf_rate
>
> Change-Id: I74e74807436b92d729e2ccaab96149780f1f52d9
Change-Id: I6106ce77cb0fe8c12b2bcf070d01513ffa8dc613
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
This allows the testdata target to work environments like cygwin/msys
when a windows style path is used. It may also fix using paths with
spaces, though that's not generally recommended.
Change-Id: Id444c14468b05d589bce49c1f612aa712a3f0c8c
in get_rdmult_delta() and compute_frame_aq_offset().
quiets -Wunused-but-set-variable with clang-17
Change-Id: I726852f3bc42afa80a18475de910040a9436b0bb
Add Neon implementations of high bitdepth downsampling SAD4D
functions for all block sizes.
Also add corresponding unit tests.
Change-Id: Ib0c2f852e269cbd6cbb8f4dfb54349654abb0adb
Add Neon implementations of standard bitdepth downsampling SAD4D
functions for all block sizes.
Also add corresponding unit tests.
Change-Id: Ieb77661ea2bbe357529862a5fb54956e34e8d758
Add Neon implementations of high bitdepth downsampling SAD functions
for all block sizes.
Also add corresponding unit tests.
Change-Id: I56ea656e9bb5f8b2aedfdc4637c9ab4e1951b31b
Add Neon implementations of standard bitdepth downsampling SAD
functions for all block sizes.
Also add corresponding unit tests.
Change-Id: Ibda734c270278d947673ffcc29ef17a2f4970b01
Introduced AVX2 intrinsic to compute FDCT for block size
16x16 case. This is a bit-exact change.
Please check the module level scaling w.r.t C function (timer based)
for existing (SSE2) and new AVX2 intrinsics:
Scaling
SSE2 AVX2
3.88x 5.95x
Change-Id: I02299c3746fcb52d808e2a75d30aa62652c816dc
I believe the following comments are wrongly scoped, possibly left over
from previous changesets. This made me very confused when reading the
test suite Makefile, in order to port it to Meson.
Change-Id: Ice3c7ba50c6909a9c7dfd4001afa1e1ddfa4b5ce
New linear model to calculate loopfilter level from frame qp.
Linear regression was done on qvga, vga, and hd clips.
Bug: b/275304642
Change-Id: I552b312212bb4de21b53b762d139aa9588c64ae2
Added an assert for prune_single_mode_based_on_mv_diff_mode_rate
speed feature. This ensures NEARMV or ZEROMV modes are pruned
only when NEARESTMV and NEWMV modes are not early terminated.
Change-Id: Id8b03eef6d1ef3f16714a9cbfde0c171c0c6fe0b
Pack nz_mask with zero. After the result is permuted this has the effect
of ignoring the upper half of the iscan register which is only loaded
with 128-bits. Depending on the optimization level and the load used the
upper half of the ymm register may contain undefined values which can
produce an incorrect eob. If this is large enough it can cause a crash.
Bug: chromium:1431729
Change-Id: I4ebae9fa39f228bdd29dcc19935f3f07759d75f5
Add a widening 4D reduction function operating on uint16x8_t vectors
and use it to optimize the final reduction in Armv8.0 Neon standard
bitdepth 16xh, 32xh and 64h SAD4D computations.
Also simplify the Armv8.0 Neon version of the sad64xhx4d_neon helper
function since VP9 block sizes are not large enough to require
widening to 32-bit accumulators before the final reduction.
Change-Id: I32b0a283d7688d8cdf21791add9476ed24c66a28
Add a 4D reduction function operating on uint16x8_t vectors and use
it to optimize the final reduction in standard bitdepth 4xh and 8xh
SAD4D computations. Similar 4D reduction optimizations have already
been implemented for all other standard bitdepth block sizes, and all
high bitdepth block sizes.[1]
[1] https://chromium-review.googlesource.com/c/webm/libvpx/+/4224681
Change-Id: I0aa0b6e0f70449776f316879cafc4b830e86ea51
Added AVX2 intrinsic optimization for the following functions
1. vpx_variance8x4
2. vpx_variance8x8
3. vpx_variance8x16
This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.698
0 MIDRES2 0.577
0 HDRES2 0.469
0 Average 0.582
Change-Id: Iae8fdf9344fd012cda4955ed140633141d60ba86
The shift instructions have marginally worse performance on some
micro-architectures, and the vget_{low,high} instructions are
unnecessary.
This commit improves performance of the d135 predictors by 1.5% geomean
averaged across a range of compilers and micro-architectures.
Change-Id: Ied4c3eecc12fc973841696459d868ce403ed4e6c
Use sum_neon.h helpers for horizontal reductions in Neon DC predictors,
enabling use of dedicated Neon reduction instructions on AArch64. Some
of the surrounding code is also optimized to remove redundant broadcast
instructions in the dc_store helpers.
Performance is largely unchanged on both the standard as well as the
high bit-depth predictors. The main improvement appears to be the 16x16
standard-bitdepth dc predictor, which improves by 10-15% when
benchmarked on Neoverse N1.
Change-Id: Ibfcc6ecf4b1b2f87ce1e1f63c314d0cc35a0c76f
* changes:
Avoid LD2/ST2 instructions in highbd v predictors in Neon
Avoid interleaving loads/stores in Neon for highbd dc predictor
Avoid LD2/ST2 instructions in vpx_dc_predictor_32x32_neon
For these block sizes there is no need to widen to 32-bits until the
final reduction, so use a single vabaq instead of vabd + vpadalq.
Change-Id: I9c19d620f7bb8b3a6b0bedd37789c03bb628b563
The interleaving load/store instructions (LD2/LD3/LD4 and ST2/ST3/ST4)
are useful if we are dealing with interleaved data (e.g. real/imag
components of complex numbers), but for simply loading or storing larger
quantities of data it is preferable to simply use the normal load/store
instructions.
This patch replaces such occurrences in the two larger block sizes:
vpx_highbd_v_predictor_16x16_neon and vpx_highbd_v_predictor_32x32_neon.
Change-Id: Ie4ffa298a2466ceaf893566fd0aefe3f66f439e4
The interleaving load/store instructions (LD2/LD3/LD4 and ST2/ST3/ST4)
are useful if we are dealing with interleaved data (e.g. real/imag
components of complex numbers), but for simply loading or storing larger
quantities of data it is preferable to simply use two or more of the
normal load/store instructions.
This patch replaces such occurrences in the two larger block sizes:
vpx_highbd_dc_predictor_16x16_neon, vpx_highbd_dc_predictor_32x32_neon,
and related helper functions.
Speedups over the original Neon code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 16x16 | 1.25
Neoverse N1 | LLVM 15 | 32x32 | 1.13
Neoverse N1 | GCC 12 | 16x16 | 1.56
Neoverse N1 | GCC 12 | 32x32 | 1.52
Neoverse V1 | LLVM 15 | 16x16 | 1.63
Neoverse V1 | LLVM 15 | 32x32 | 1.08
Neoverse V1 | GCC 12 | 16x16 | 1.59
Neoverse V1 | GCC 12 | 32x32 | 1.37
Change-Id: If5ec220aba9dd19785454eabb0f3d6affec0cc8b
The LD2 and ST2 instructions are useful if we are dealing with
interleaved data (e.g. real/imag components of complex numbers), but for
simply loading or storing larger quantities of data it is preferable to
simply use two of the normal load/store instructions.
This patch replaces such occurrences in vpx_dc_predictor_32x32_neon and
related functions.
With Clang-15 this speeds up this function by 10-30% depending on the
micro-architecture being benchmarked on. With GCC-12 this speeds up the
function by 40-60% depending on the micro-architecture being benchmarked
on.
Change-Id: I670dc37908aa238f360104efd74d6c2108ecf945
The existing tests duplicate `above_row_[block_size - 1]` after the
first `block_size` elements, which can lead to tests incorrectly passing
due to differing behaviour when calculating the average for the last
elements of the output.
This change adjusts the above array setup to be fully random instead,
allowing us to catch such issues here rather than in other larger tests
like the external MD5 tests.
It doesn't appear that other architectures are fully clean with this
change so restrict it to just Neon for now until they are fixed.
Bug: webm:1797
Change-Id: If83ff1adbf1e8d30f2a92474d7186c65840a5d0b
The existing standard bitdepth implementation doesn't appear to manifest
as a failure in any of the predictor or MD5 tests, but it does rely on
the predictor tests filling the second `bs` elements of the `above`
input array with copies of `above[bs - 1]` in order to match the C
implementation.
This patch adjusts the Neon implementation to correctly match the C
implementation in the case where the elements of the `above` array all
differ.
The geomean of performance for the predictor is approximately a 2%
slowdown compared to the previous vectorized implementation. This is
still considerably faster than the unspecialized naive C implementation.
Bug: webm:1797
Change-Id: I8fb00a154288d54b24a72a7ff63c816bdcf3aca3
The existing implementation doesn't appear to manifest as a failure in
any of the predictor or MD5 tests, but it does rely on the predictor
tests filling the second `bs` elements of the `above` input array with
copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C
implementation in the case where the elements of the `above` array all
differ.
Performance of the predictor is mostly unchanged, except for the 32x32
block size where it appears to have gotten about 40% faster when
compiled with clang-15.
Bug: webm:1797
Change-Id: Iaad58e77c5467307a3c80d6989b7cf2988e09311
The existing implementation doesn't appear to manifest as a failure in
any of the predictor or MD5 tests, but it does rely on the predictor
tests filling the second `bs` elements of the `above` input array with
copies of `above[bs - 1]` in order to match the C implementation.
This patch adjusts the Neon implementation to correctly match the C
implementation in the case where the elements of the `above` array all
differ.
Performance of the predictor is mostly unchanged, except for the 16x16
block size where it appears to have gotten marginally faster across most
compiler/micro-architecture combinations.
Bug: webm:1797
Change-Id: Iac166d6047316c0382e0f2790ce780fc99674b43
Introduced AVX2 intrinsic to compute convolve vertical for
w = 4 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.364
0 MIDRES2 0.236
0 HDRES2 0.162
0 Average 0.254
Change-Id: I413f58aa6333a6f2421d4c10d49dec01e55b2098
This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in:
03f6fdcfca Fix warnings reported by -Wshadow: Part1b: scan_order struct
and variable
Bug: webm:1793
Change-Id: Ide5127886b7fd7778e6d8a983bfba6edda21ff28
Fix comment typos for vpx_codec_destroy() and vpx_codec_enc_init_ver().
Based on the change made in libaom:
https://aomedia.googlesource.com/aom/+/365a968684
365a968684 Fix comment typos (likely copy-and-paste errors)
Change-Id: I39edae835ed0752b569e8e7328d0709c59724ac2
This reverts commit 9c15fb62b3.
Reason for revert:
vpxenc should only use public interface
Original change's description:
> Add codec control to get tpl stats
>
> Add command line flag to vpxenc to export tpl stats
>
> Bug: b/273736974
> Change-Id: I6980096531b0c12fbf7a307fdef4c562d0c29e32
Bug: b/273736974
Change-Id: Ifa8951bb34e5936bbfc33086b22e9fc36d379bc9
Add Neon implementation of vpx_highbd_avg_4x4_c and vpx_highbd_avg_8x8_c
as well as the corresponding tests.
Change-Id: Ib1b06af5206774347690c9c56e194b76aa409c91
Shift the final read from the source by 3 to avoid breaking the
assumption that the 6-tap filter needs only 5 pixels outside of the
macroblock; this matches the sse2 and ssse3 implementations.
It's possible this restriction could be removed if the source buffers
are assumed to be padded.
Bug: webm:1795
Change-Id: I4c791e3a214898a503c78f4cedca154c75cdbaef
Fixed: webm:1795
The code to enable trellis coefficient optimization
is refactored using the sf 'trellis_opt_tx_rd'. This
change facilitates adaptive skipping of trellis
optimization based on block properties.
Change-Id: Ia1ff7cbbe5acf86414410f62655d46c099387847
This is a reland of commit 14fc40040f
Parent change fixed in crrev.com/c/webm/libvpx/+/4305500
Original change's description:
> quantize: use scan_order instead of passing scan/iscan
>
> further reduces the arguments for the 32x32. This will be applied to the base
> version as well.
>
> Change-Id: I25a162b5248b14af53d9e20c6a7fa2a77028a6d1
Change-Id: I2a7654558eaddd68bd09336bf317b297f18559d2
This is a reland of commit 573f5e662b
Alignment issue with tests fixed in crrev.com/c/webm/libvpx/+/4305500
Original change's description:
> quantize: simplify highbd 32x32_b args
>
> Change-Id: I431a41279c4c4193bc70cfe819da6ea7e1d2fba1
Change-Id: Ic868b6f987c99d88672858fedd092fa49c125e19
Change the VP9RateControlRtcConfig constructor to initialize
ss_number_layers (to 1).
Change UpdateRateControl() to return bool so that it can report failure
(due to invalid configuration).
Also change InitRateControl() to return bool to propagate the return
value of UpdateRateControl().
Note: This is a port of the libaom CL
https://aomedia-review.googlesource.com/c/aom/+/172042.
Change-Id: I90b60353b5f15692dba5d89e7b1a9c81bb2fdd89
The code that sets oxcf->ts_rate_decimator[tl] does not need to be
inside a loop that iterates over sl. Move the code out of the sl loop so
that oxcf->ts_rate_decimator[tl] is set only once.
Change-Id: I22f6c117d200ec38a757b749a8700660d15436c1
Remove the `ts_number_layers` field from VP9RateControlRtcConfig because
the base class VpxRateControlRtcConfig already has that field.
Note: In commit 65a1751e5b,
`ts_number_layers` was moved to the newly created base class
VpxRateControlRtcConfig but was inadvertently left in
VP9RateControlRtcConfig:
https://chromium-review.googlesource.com/c/webm/libvpx/+/3140048,
Change-Id: I98d48e152683ec2e5e62efffb56b7f010c5d0695
Introduced AVX2 intrinsic to compute convolve horizontal for
w = 4 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 0.763
0 MIDRES2 0.466
0 HDRES2 0.317
0 Average 0.516
Change-Id: I124f3f8e994c24461812f4963b113819466db44f
Optimize vpx_minmax_8x8_neon on AArch64 targets by using the UMAXV and
UMINV instructions - computing the maximum and minimum elements in a
Neon vector.
Change-Id: I54c3a3a087d266f6774e6113e5947253df288a64
Optimize Neon implementation of vpx_satd by using ABD and UADALP instead
of ABAL and ABAL2, splitting the accumulator and using a dedicated
helper function to perform the final reduction.
Change-Id: Idcfa49e001b68b1dcd87c13fd9acc317a208cd2a
Both are around 3x faster than original C version. 8-bit gives a
small 0.5% speed increase, whereas highbd gives ~2.5%.
Change-Id: I71d75ddd2757b19aa201e879fd9fa8f3a25431ad
Introduced AVX2 intrinsic to compute convolve vertical for
w = 8 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 1.347
0 MIDRES2 1.046
0 HDRES2 0.805
0 Average 1.066
Change-Id: Idf77fff054beaf2c985b9bf2335591bda47e811f
Function renamed as 'build_inter_pred_model_rd_earlyterm' and
added a comment to explain its behavior.
Change-Id: I804e6273558ba36241232f62cf18ea754b85e369
The high bitdepth Neon code applying the first pass of the bilinear
filter for subpixel variance on blocks of width 4 processed two rows
at a time. This resulted in a source buffer overread, attempting to
produce two rows of padding for the second (vertical) pass of the
bilinear filter.
This patch modifies highbd_var_filter_block2d_bil_w4 and
highbd_avg_pred_var_filter_block2d_bil_w4 such that they only process
a single row per iteration, and only require a single row of padding
for the second pass. This prevents the buffer overread.
Since all block sizes are now processed one row at a time, there is
no need for a "padding" macro parameter - the value is always 1, with
no special case for 4xh blocks. As well as re-enabling the Neon paths
and their associated tests, we remove the now-redundant 'padding'
macro parameter.
Bug: webm:1796
Change-Id: Icd6076b38eb4476139795bb1734ca800c9edf079
vpx_highbd_8_sub_pixel_avg_variance4x4_neon
vpx_highbd_8_sub_pixel_avg_variance4x8_neon
vpx_highbd_10_sub_pixel_avg_variance4x4_neon
vpx_highbd_10_sub_pixel_avg_variance4x8_neon
vpx_highbd_12_sub_pixel_avg_variance4x4_neon
vpx_highbd_12_sub_pixel_avg_variance4x8_neon
all cause heap overflows of the form:
i[ RUN ] NEON/VpxHBDSubpelAvgVarianceTest.Ref/33
=================================================================
==535205==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff95bb0b89 at pc 0x00000116dabc bp 0xffffd09f6430 sp 0xffffd09f6428
READ of size 8 at 0xffff95bb0b89 thread T0
#0 0x116dab8 in load_unaligned_u16q vpx_dsp/arm/mem_neon.h:176:3
#1 0x116dab8 in highbd_var_filter_block2d_bil_w4
vpx_dsp/arm/highbd_subpel_variance_neon.c:49:21
#2 0x116dab8 in vpx_highbd_8_sub_pixel_avg_variance4x4_neon
vpx_dsp/arm/highbd_subpel_variance_neon.c:543:1
...
0xffff95bb0b89 is located 0 bytes to the right of 73-byte region
[0xffff95bb0b40,0xffff95bb0b89)
allocated by thread T0 here:
#0 0x5f18b0 in malloc (test_libvpx+0x5f18b0)
#1 0xce4a40 in vpx_memalign vpx_mem/vpx_mem.c:62:10
#2 0xce4a40 in vpx_malloc vpx_mem/vpx_mem.c:70:40
#3 0xa52238 in (anonymous namespace)::SubpelVarianceTest<unsigned
int (*)(unsigned char const*, int, int, int, unsigned char
const*, int, unsigned int*, unsigned char
const*)>::SetUp()
test/variance_test.cc:586:14
...
This is the same issue as:
e33d4c276 disable vpx_highbd_*_sub_pixel_variance4x{4,8}_neon
They have highbd_var_filter_block2d_bil_w4 in common.
Bug: webm:1796
Change-Id: I3ed70d0ba22e127720542612ea9f6665948eedfc
vpx_highbd_8_sub_pixel_variance4x4_neon
vpx_highbd_8_sub_pixel_variance4x8_neon
vpx_highbd_10_sub_pixel_variance4x4_neon
vpx_highbd_10_sub_pixel_variance4x8_neon
vpx_highbd_12_sub_pixel_variance4x4_neon
vpx_highbd_12_sub_pixel_variance4x8_neon
all cause heap overflows of the form:
[ RUN ] NEON/VpxHBDSubpelVarianceTest.Ref/24
=================================================================
==450528==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8311a571 at pc 0x0000010ca52c bp 0xffffc63e96b0 sp 0xffffc63e96a8
READ of size 8 at 0xffff8311a571 thread T0
#0 0x10ca528 in load_unaligned_u16q vpx_dsp/arm/mem_neon.h:176:3
#1 0x10ca528 in highbd_var_filter_block2d_bil_w4
vpx_dsp/arm/highbd_subpel_variance_neon.c:49:21
#2 0x10ca528 in vpx_highbd_10_sub_pixel_variance4x8_neon
vpx_dsp/arm/highbd_subpel_variance_neon.c:257:1
...
0xffff8311a571 is located 0 bytes to the right of 113-byte region
[0xffff8311a500,0xffff8311a571)
allocated by thread T0 here:
#0 0x5f18b0 in malloc (test_libvpx+0x5f18b0)
#1 0xce4f90 in vpx_memalign vpx_mem/vpx_mem.c:62:10
#2 0xce4f90 in vpx_malloc vpx_mem/vpx_mem.c:70:40
#3 0xa4ad44 in (anonymous namespace)::SubpelVarianceTest<unsigned
int (*)(unsigned char const*, int, int, int, unsigned char
const*, int, unsigned int*)>::SetUp() test/variance_test.cc:586:14
Bug: webm:1796
Change-Id: I39f7f936bae2bcbbe1f803fb10375ec02d1c1277
* changes:
Implement highbd_d207_predictor using Neon
Implement highbd_d153_predictor using Neon
Implement d207_predictor using Neon
Implement d153_predictor using Neon
Implement highbd_d63_predictor using Neon
Introduced AVX2 intrinsic to compute convolve horizontal for
w = 8 case. This is a bit-exact change.
Instruction Count
cpu Resolution Reduction(%)
0 LOWRES2 1.509
0 MIDRES2 1.165
0 HDRES2 0.898
0 Average 1.191
Change-Id: I699c94aa3d7ea74c58f901df906eed0b81b4ee79
horizontal_add_int64x2 was incorrectly returning a uint64_t instead of
an int64_t. This patch fixes that.
Change-Id: Ic6016cf87aebfc6a14f540b784d6648757e12b49
Currently vp9_block_error_fp_neon is only used when
CONFIG_VP9_HIGHBITDEPTH is set to false. This patch optimizes the
implementation and uses tran_low_t instead of int16_t so that the
function can also be used in builds where vp9_highbitdepth is enabled.
Change-Id: Ibab7ec5f74b7652fa2ae5edf328f9ec587088fd3
Use a mem_neon.h helper to do strided 4-byte loads instead of Neon
8-byte loads - where the last 4 bytes are out of bounds.
Re-enable the Neon code path and the tests.
Bug: webm:1794
Change-Id: I69ccff730f4a5cbf585dd6a9aa0f3eb13e150074
Add an additional 32-bit vector accumulator to allow parallel
processing on CPUs that have more than one Neon multiply-accumulate
pipeline. Also use sum_neon.h horizontal-add helpers for reduction.
Change-Id: Ibcb48a738f5dee1430c3ebcd305b5ea8ea344c40
The load of `left[bs]` in the standard bitdepth d117 Neon implementation
triggered an address-sanitizer failure.
The highbd equivalent does not appear to trigger any asan failures when
running the VP9/ExternalFrameBufferMD5Test or
VP9/TestVectorTest.MD5Match tests, but for consistency with the standard
bitdepth implementation we adjust it to avoid the over-read.
Performance is roughly identical, with a 0.8% performance improvement on
average over the previous optimised code.
Change-Id: I05dc4d43f244f4915c0ccc52cc0af999bbacb018
Add Neon implementations of the d117 predictor for 4x4, 8x8, 16x16 and
32x32 block sizes. Also update tests to add new corresponding cases.
This re-lands commit 360e9069b6,
previously reverted in commit 394de691a0.
The implementation is mostly identical to the original but with an
adjustment to how data is loaded from the `left` array. In particular
the left array cannot be guaranteed to be larger than the block size, so
the read of e.g. `left[32]` in the `bs=32` case is not valid. This turns
out to be not a problem since the last lane loaded in this case is
unused. I have added comments in the code to explain why this is the
case.
Since we cannot load the last element directly, we instead construct it
from the previous aligned read. This seems to have an inconsistent
affect on performance, improving by up to 10% in some cases and
regressing by up to 10% on others. Either way it is still significantly
faster than the original C code.
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 1.88
Neoverse N1 | LLVM 15 | 8x8 | 5.19
Neoverse N1 | LLVM 15 | 16x16 | 9.63
Neoverse N1 | LLVM 15 | 32x32 | 13.85
Neoverse N1 | GCC 12 | 4x4 | 2.04
Neoverse N1 | GCC 12 | 8x8 | 4.62
Neoverse N1 | GCC 12 | 16x16 | 9.79
Neoverse N1 | GCC 12 | 32x32 | 4.69
Neoverse V1 | LLVM 15 | 4x4 | 1.75
Neoverse V1 | LLVM 15 | 8x8 | 6.71
Neoverse V1 | LLVM 15 | 16x16 | 9.62
Neoverse V1 | LLVM 15 | 32x32 | 13.81
Neoverse V1 | GCC 12 | 4x4 | 1.75
Neoverse V1 | GCC 12 | 8x8 | 6.01
Neoverse V1 | GCC 12 | 16x16 | 6.91
Neoverse V1 | GCC 12 | 32x32 | 4.39
Change-Id: Ia0977ff0b0eba2c41c7884b64e7c22ff9bc9549d
Add Neon implementations of the highbd d63 predictor for 4x4, 8x8, 16x16
and 32x32 block sizes. Also update tests to add new corresponding cases.
This re-lands commit 7cdf139e3d,
previously reverted in 7478b7e4e4.
Compared to the previous implementation attempt we now correctly match
the behaviour of the C code when handling the final element loaded from
the 'above' input array. In particular:
- The C code for a 4x4 block performs a full average of the last element
rather than duplicating the final element from the input 'above'
array.
- The C code for other block sizes performs a full average for the
stride=0 and stride=1, and otherwise shifts in duplicates of the final
element from the input 'above' array. Notably this shifting for later
strides _replaces_ the final element which we previously performed an
average on (see {d0,d1}_ext in the code).
It is worth noting that this difference is not caught by the existing
VP9HighbdIntraPredTest test cases since the test vector initialisation
contains this loop:
for (int x = block_size; x < 2 * block_size; x++) {
above_row_[x] = above_row_[block_size - 1];
}
Since AVG2(a, a) and AVG3(a, a, a) are simply 'a', such differences in
behaviour for the final element are not observed.
Tested on AArch64 with:
- ./test_libvpx --gtest_filter="*VP9HighbdIntraPredTest*"
- ./test_libvpx --gtest_filter="*VP9/TestVectorTest.MD5Match*"
- ./test_libvpx --gtest_filter="*VP9/ExternalFrameBufferMD5Test*"
Speedups over the C code (higher is better):
Microarch. | Compiler | Block | Speedup
Neoverse N1 | LLVM 15 | 4x4 | 2.43
Neoverse N1 | LLVM 15 | 8x8 | 3.92
Neoverse N1 | LLVM 15 | 16x16 | 3.19
Neoverse N1 | LLVM 15 | 32x32 | 4.13
Neoverse N1 | GCC 12 | 4x4 | 2.92
Neoverse N1 | GCC 12 | 8x8 | 6.51
Neoverse N1 | GCC 12 | 16x16 | 4.55
Neoverse N1 | GCC 12 | 32x32 | 3.18
Neoverse V1 | LLVM 15 | 4x4 | 1.99
Neoverse V1 | LLVM 15 | 8x8 | 3.65
Neoverse V1 | LLVM 15 | 16x16 | 3.72
Neoverse V1 | LLVM 15 | 32x32 | 3.26
Neoverse V1 | GCC 12 | 4x4 | 2.39
Neoverse V1 | GCC 12 | 8x8 | 4.76
Neoverse V1 | GCC 12 | 16x16 | 3.24
Neoverse V1 | GCC 12 | 32x32 | 2.44
Change-Id: Iefaa774d6a20388b523eaa7f5df6bc5f5cf249e4
Allocate mb_plane_ on the heap to ensure src is aligned.
Now that all the implementations of the 32x32 quantize are in
intrinsics we can reference struct members directly. Saves
pushing them to the stack.
n_coeffs is not used at all for this function.
Change-Id: Ib551f7f583977602504d962b72063bc6eda9dda9
This causes various buffer overflows in the tests:
[ RUN ] NEON/SixtapPredictTest.TestWithPresetData/0
=================================================================
==22346==ERROR: AddressSanitizer: global-buffer-overflow on address
0x0000012b4a5b at pc 0x000000df0f60 bp 0xffffcf6e64b0 sp 0xffffcf6e64a8
READ of size 8 at 0x0000012b4a5b thread T0
#0 0xdf0f5c in vp8_sixtap_predict16x16_neon
vp8/common/arm/neon/sixtappredict_neon.c:1507:13
#1 0x8819e4 in (anonymous
namespace)::SixtapPredictTest_TestWithPresetData_Test::TestBody()
test/predict_test.cc:293:3
...
0x0000012b4a5b is located 2 bytes to the right of global variable
'kTestData' defined in '../test/predict_test.cc:237:24' (0x12b48a0) of
size 441
[ RUN ] NEON/SixtapPredictTest.TestWithRandomData/0
=================================================================
==22338==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8b5321fb at pc 0x000000df0f60 bp 0xfffff7e0cf30 sp 0xfffff7e0cf28
READ of size 8 at 0xffff8b5321fb thread T0
#0 0xdf0f5c in vp8_sixtap_predict16x16_neon
vp8/common/arm/neon/sixtappredict_neon.c:1507:13
#1 0x87d4c0 in (anonymous
namespace)::PredictTestBase::TestWithRandomData(void (*)(unsigned
char*, int, int, int, unsigned char*, int))
test/predict_test.cc:170:9
...
0xffff8b5321fb is located 2 bytes to the right of 441-byte region
[0xffff8b532040,0xffff8b5321f9)
allocated by thread T0 here:
#0 0x5fd4f0 in operator new[](unsigned long) (test_libvpx+0x5fd4f0)
#1 0x87c2e0 in (anonymous namespace)::PredictTestBase::SetUp()
test/predict_test.cc:47:12
#2 0x87d074 in non-virtual thunk to (anonymous
namespace)::PredictTestBase::SetUp() test/predict_test.cc
...
Bug: webm:1795
Change-Id: I32213a381eef91547d00f88acf90f1cf2ec2ea75
This function causes a heap overflow in the tests:
[ RUN ] NEON/VpxSseTest.RefSse/0
=================================================================
==876922==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8949d903 at pc 0x000000dd95d4 bp 0xfffffdd7f260 sp 0xfffffdd7f258
READ of size 8 at 0xffff8949d903 thread T0
#0 0xdd95d0 in vpx_get4x4sse_cs_neon
vpx_dsp/arm/variance_neon.c:556:10
#1 0x9d4894 in (anonymous namespace)::MainTestClass<unsigned int
(*)(unsigned char const*, int, unsigned char const*,
int)>::RefTestSse() test/variance_test.cc:531:5
#2 0x9d4894 in (anonymous
namespace)::VpxSseTest_RefSse_Test::TestBody()
test/variance_test.cc:772:30
...
0xffff8949d903 is located 3 bytes to the right of 16-byte region
[0xffff8949d8f0,0xffff8949d900)
allocated by thread T0 here:
#0 0x5fd050 in operator new[](unsigned long) (test_libvpx+0x5fd050)
#1 0x9d3e04 in (anonymous namespace)::MainTestClass<unsigned int
(*)(unsigned char const*, int, unsigned char const*,
int)>::SetUp() test/variance_test.cc:299:12
Bug: webm:1794
Change-Id: I4bc681eb9a436743ef8bfe2a2abae59ce754309c
This reverts commit 360e9069b6.
This causes ASan errors:
[ RUN ] VP9/TestVectorTest.MD5Match/1
=================================================================
==837858==ERROR: AddressSanitizer: stack-buffer-overflow on address
0xffff82ecad40 at pc 0x000000c494d4 bp 0xffffe1695800 sp 0xffffe16957f8
READ of size 16 at 0xffff82ecad40 thread T0
#0 0xc494d0 in vpx_d117_predictor_32x32_neon (test_libvpx+0xc494d0)
#1 0x1040b34 in vp9_predict_intra_block (test_libvpx+0x1040b34)
#2 0xf8feec in decode_block (test_libvpx+0xf8feec)
#3 0xf8f588 in decode_partition (test_libvpx+0xf8f588)
#4 0xf7be5c in vp9_decode_frame (test_libvpx+0xf7be5c)
...
Address 0xffff82ecad40 is located in stack of thread T0 at offset 64 in
frame
#0 0x103fd3c in vp9_predict_intra_block (test_libvpx+0x103fd3c)
This frame has 2 object(s):
[32, 64) 'left_col.i' <== Memory access at offset 64 overflows this
variable
[96, 176) 'above_data.i'
Change-Id: I058213364617dfe1036126c33a3307f8288d9ae0
This reverts commit 5359ae810c.
Reason for revert: Blocks quantize cleanups
Original change's description:
> Allow macroblock_plane to have its own rounding buffer
>
> Add 8 bytes buffer to macroblock_plane to support rounding factor.
>
> Change-Id: I3751689e4449c0caea28d3acf6cd17d7f39508ed
Change-Id: Ia2424d2114207370f0b45350313a5ff8521d25a8
While porting this function to NEON, using SSE4_1 implementation
as base I noticed that both were producing files with different
checksums to the C reference implementation. After investigating
further I found that this saturating pack was the culprit. Doing
the multiplication on the 32-bit values, leads to producing the
correct results with the C implementation.
Change-Id: I40c2a36551b2db363a58ea9aa19ef327f2676de3
This reverts commit 848f6e7337.
This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*
Change-Id: Ic12014ab0a78ed3cde02d642509061552cdc8fc9
This reverts commit 573f5e662b.
This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*
Change-Id: Ibf05e6b116c46f6e2c11187b3e3578bbd2d2c227
This reverts commit 14fc40040f.
This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*
Change-Id: I934f9a4c3ce3db33058a65180fa645c8649c3670
This reverts commit 7cdf139e3d.
This causes failures in the VP9/ExternalFrameBufferMD5Test and
VP9/TestVectorTest.MD5Match tests in both armv7 and aarch64 builds.
Change-Id: I7ac4ba0ddc70e7e7860df9f962e6658defe1cdd5
Currently MSE functions just call the variance helpers but don't
actually use the computed sum. This patch adds dedicated helpers to
perform the computation of sse.
Add the corresponding tests as well.
Change-Id: I96a8590e3410e84d77f7187344688e02efe03902
* changes:
Implement highbd_d117_predictor using Neon
Implement highbd_d63_predictor using Neon
Implement d117_predictor using Neon
Implement d63_predictor using Neon
Now that all the implementations of the 32x32 quantize are in
intrinsics we can reference struct members directly. Saves
pushing them to the stack.
n_coeffs is not used at all for this function.
Change-Id: I2104fea3fa20c455087e21b347d6abd7ea1f3e1e
Currently only vpx_mse16x16 has a Neon implementation. This patch adds
optimized Armv8.0 and Armv8.4 dot-product paths for all block sizes:
8x8, 8x16, 16x8 and 16x16.
Add the corresponding tests as well.
Change-Id: Ib0357fdcdeb05860385fec89633386e34395e260
1) Use vtrn[12]q_[su]64 in vpx_vtrnq_[su]64* helpers on AArch64
targets. This produces half as many TRN1/2 instructions compared to
the number of MOVs that result from vcombine.
2) Use vpx_vtrnq_[su]64* helpers wherever applicable.
3) Refactor transpose_4x8_s16 to operate on 128-bit vectors.
Change-Id: I9a8b1c1fe2a98a429e0c5f39def5eb2f65759127
Use (void) to indicate an empty parameter list and match the declaration
of vpx_codec_vp[89]_[cd]x. This fixes a cfi sanitizer error.
Change-Id: I190f432eea4d1765afffd84c7458ec44d863f90c
* changes:
Add Neon implementation of high bitdepth 32x32 hadamard transform
Add Neon implementation of high bitdepth 16x16 hadamard transform
Add Neon implementation of high bitdepth 8x8 hadamard transform
This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in:
863b04994b Fix warnings reported by -Wshadow: Part2: av1 directory
Bug: webm:1793
Change-Id: I4df1bbc8d079a3174d75f0d35d54c200ffdbb677
Specialize implementation of high bitdepth variance functions such that
we only widen data processing element types when absolutely necessary.
Change-Id: If4cc3fea7b5ab0821e3129ebd79ff63706a512bf
In joint_motion_search, there are four iterations.
Even iterations search in the first reference frame
and odd iterations search in the second. The last two
iterations use the search result of the first two
iterations as the start point. If the search result does
not change,last two iterations are not necessary and can
be skipped.
Instruction Count
cpu-used Reduction(%)
0 1.411
Change-Id: Ie583c9f75dd0a22bbdfb432ccdd62eea6ec4fce8
Added unit test.
Keep track of spatial layer id and frame type in case where spatial
layers are encoded parallel by the hardware encoder.
ComputeQP() / PostEncodeUpdate() doesn't need to be called sequentially
when there is no inter layer prediction.
Bug: b/257368998
Change-Id: I50beaefcfc205d3f9a9d3dbe11fead5bfdc71489
* changes:
Optimize vpx_highbd_comp_avg_pred_neon
Add Neon AvgPredTestHBD test suite
Specialize Neon high bitdepth avg subpel variance by filter value
Specialize Neon high bitdepth subpel variance by filter value
Refactor Neon high bitdepth avg subpel variance functions
Optimize Neon high bitdepth subpel variance functions
Optimize the implementation of vpx_highbd_comp_avg_pred_neon by making
use of the URHADD instruction to compute the average.
Change-Id: Id74a6d9c33e89bc548c3c7ecace59af69051b4a7
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Change-Id: Id5a2b2d9fac6f878795a6ed9de2bc27d9e62d661
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Change-Id: I73182c979255f0332a274f2e5907df7f38c9eeb3
Use the same general code style as in the standard bitdepth Neon
implementation - merging the computation of vpx_highbd_comp_avg_pred
with the second pass of the bilinear filter to avoid storing and loading
the block again.
Also move vpx_highbd_comp_avg_pred_neon to its own file (like the
standard bitdepth implementation) since we're no longer using it for
averaging sub-pixel variance.
Change-Id: I2f5916d5b397db44b3247b478ef57046797dae6c
Use the same general code style as in the standard bitdepth Neon
implementation. Additionally, do not unnecessarily widen to 32-bit data
types when doing bilinear filtering - allowing us to process twice as
many elements per instruction.
Change-Id: I1e178991d2aa71f5f77a376e145d19257481e90f
Release v1.13.0 Ugly Duckling
2023-01-31 v1.13.0 "Ugly Duckling"
This release includes more Neon and AVX2 optimizations, adds a new codec
control to set per frame QP, upgrades GoogleTest to v1.12.1, and includes
numerous bug fixes.
- Upgrading:
This release is ABI incompatible with the previous release.
New codec control VP9E_SET_QUANTIZER_ONE_PASS to set per frame QP.
GoogleTest is upgraded to v1.12.1.
.clang-format is upgraded to clang-format-11.
VPX_EXT_RATECTRL_ABI_VERSION was bumped due to incompatible changes to the
feature of using external rate control models for vp9.
- Enhancement:
Numerous improvements on Neon optimizations.
Numerous improvements on AVX2 optimizations.
Additional ARM targets added for Visual Studio.
- Bug fixes:
Fix to calculating internal stats when frame dropped.
Fix to segfault for external resize test in vp9.
Fix to build system with replacing egrep with grep -E.
Fix to a few bugs with external RTC rate control library.
Fix to make SVC work with VBR.
Fix to key frame setting in VP9 external RC.
Fix to -Wimplicit-int (Clang 16).
Fix to VP8 external RC for buffer levels.
Fix to VP8 external RC for dynamic update of layers.
Fix to VP9 auto level.
Fix to off-by-one error of max w/h in validate_config.
Fix to make SVC work for Profile 1.
Bug: webm:1780
Change-Id: I371fc1444ead56f8d7fc510e05582b6415c3ddb1
Use standard loads and stores instead of the significantly slower
interleaving/de-interleaving variants. Also move all loads in loop
bodies above all stores as a mitigation against the compiler thinking
that the src and dst pointers alias (since we can't use restrict in
C89.)
Change-Id: Idd59dca51387f553f8db27144a2b8f2377c937d3
Add missing 4x4 and 4x8 tests for both high bitdepth sub-pixel variance
and high bitdepth averaging sub-pixel variance.
Change-Id: I042752c5b7ccc14f58075694d0bb1d36f144ad06
Move the 4D reduction helper function to sum_neon.h and use this for
both standard and high bitdepth SAD4D paths. This also removes the
AArch64 requirement for using the UDOT Neon SAD4D paths.
Change-Id: I207f76b3d42aa541809b0672c3b3d86e54d133ff
* changes:
Optimize Neon implementation of high bitdepth SAD4D functions
Optimize Neon implementation of high bitdepth avg SAD functions
Optimize Neon implementation of high bitdepth SAD functions
Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
block once - instead of four times.
Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723
Optimizations take a similar form to those implemented for standard
bitdepth averaging SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Change-Id: I75c5f09948f6bf17200f82e00e7a827a80451108
Optimizations take a similar form to those implemented for standard
bitdepth SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Change-Id: I9e626d7fa0e271908dc43448405a7985b80e6230
At BEST encoding mode, the mesh search range wasn't initialized for
non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly
used speed 0's setting. Fixed it by adding the initialization.
There were 2 ways to fix this. Patchset 1 set to use speed 0's setting
for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's
encoding results much, and only a couple of clips' results were changed.
Borg result for BEST mode:
avg_psnr: ovr_psnr: ssim: encoding_spdup:
lowres2: -0.004 -0.003 -0.000 0.030
midres2: -0.006 -0.009 -0.012 0.033
hdres2: 0.002 0.002 0.004 0.015
Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type.
However, the majority of test clips' BDrate got changed up to
~0.5% (gain or loss), and overall it didn't give better performance
than patchset 1. So, we chose to use patchset 1.
Change-Id: Ibbf578dad04420e6ba22cb9a3ddec137a7e4deef
rather than the gcc specific __attribute__((aligned())); fixes build
targeting ARM64 windows.
Bug: webm:1788
Change-Id: I2210fc215f44d90c1ce9dee9b54888eb1b78c99e
Use the load_unaligned helper functions in mem_neon.h to load strided
sequences of 4 bytes where alignment is not guaranteed in the Neon
SAD and SAD4D paths.
Change-Id: I941d226ef94fd7a633b09fc92165a00ba68a1501
Refactor the Neon implementation of transpose_s16_8x8(q) and
transpose_u16_8x8 so that the final step compiles to 8 ZIP1/ZIP2
instructions as opposed to 8 EXT, MOV pairs. This change removes 8
instructions per call to transpose_s16_8x8(q), transpose_u16_8x8
where the result stays in registers for further processing - rather
than being stored to memory - like in vpx_hadamard_8x8_neon, for
example.
This is a backport of this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/169426
Change-Id: Icef3e51d40efeca7008e1c4fc701bf39bd319c88
In total this gives about 9% extra performance for both rt/best
profiles.
Furthermore, add transpose_s32 16x16 function
Change-Id: Ib6f368bbb9af7f03c9ce0deba1664cef77632fe2
Use the same specialization for averaging subpel variance functions
as used for the non-averaging variants. The rationale for the
specialization is as follows:
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/166962
Change-Id: I7860c852db94a7c9c3d72ae4411316685f3800a4
Merge the computation of vpx_comp_avg_pred into the second pass of the
bilinear filter - avoiding the overhead of loading and storing the
entire block again.
This is a backport of this libaom change[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/166961
Change-Id: I9327ff7382a46d50c42a5213a11379b957146372
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
(>= 16x16) as we need to be doing enough work to make the cost of
finding the optimal implementation worth it.
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/162463
Change-Id: Ia818e148f6fd126656e8411d59c184b55dd43094
Use case is for 1 pass encoding.
Forces max_quantizer = min_quantizer and aq-mode = 0.
Applicalble to spatial layers, where user may set
the QP per spatial layer.
Change-Id: Idfcb7daefde94c475ed1bc0eb8af47c9f309110b
This simplifies integration with the Android platform and avoids the
files from being used when a non-NDK build is performed. In that case
Android.bp is preferred.
Change-Id: I803912146dac788b7f0af27199c7613cabbc9fa0
Refactor and optimize the Neon implementation of variance functions -
effectively backporting these libaom changes[1,2].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
[1] https://aomedia-review.googlesource.com/c/aom/+/162241
[2] https://aomedia-review.googlesource.com/c/aom/+/162262
Change-Id: Ia4e8fff4d53297511d1a1e43bca8053bf811e551
Failure occurs for 1 pass non-realtime mode at speed 0.
Due to speed feautre rd_ml_partition.var_pruning, which
doesn't check for scaled reference in simple_motion_search().
Bug: webm:1768
Change-Id: Iddcb56033bac042faebb5196eed788317590b23f
Add additional AArch64 paths for vpx_convolve8_vert_neon and
vpx_convolve8_avg_vert_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.
The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.
Change-Id: Ifbf467681dd53bb1d26e22359885e6edde3c5c72
Add additional AArch64 paths for vpx_convolve8_horiz_neon and
vpx_convolve8_avg_horiz_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.
The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.
Change-Id: If19f5872c3453458a8cfb7c7d2be82a2c0eab46a
avoids a warning on some platforms:
egrep: warning: egrep is obsolescent; using grep -E
Bug: webm:1786
Change-Id: Ia434297731303aacb0b02cf3dcbfd8e03936485d
Fixed: webm:1786
Define all Neon load/store helper functions in mem_neon.h and use
them consistently in Neon convolution functions.
Change-Id: I57905bc0a3574c77999cf4f4a73442c3420fa2be
The Neon convolution helper functions take a pointer to a filter and
load the 8 values into a single Neon register. For some reason,
filter values 3 and 4 are then duplicated into their own separate
registers.
This patch modifies these helper functions so that they access filter
values 3 and 4 via the lane-referencing versions of the various Neon
multiply instructions. This reduces register pressure and tidies up
the source code quite a bit.
Change-Id: Ia4aeee8b46fe218658fb8577dc07ff04a9324b3e
This change replaces references to a number of deprecated NumPy type
aliases (np.bool, np.int, np.float, np.complex, np.object, np.str)
with their recommended replacement
(bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases
so we must remove uses before updating NumPy.
Change-Id: I9f5dfcbb11fe6534fce358054f210c7653f278c3
Test to verify RC for going down and back up in
spatial layers. Going back up has an issue so added
a TODO.
Make the test more flexible to handle dynamic layers.
Test for dyanmic change in temporal layers to follow.
Change-Id: Ic5542f7b274135277429e116f56ba54e682e96a0
Allow external model to control frame rdmult.
A function is called per frame to get the value of rdmult from
the external model.
The external rdmult will overwrite libvpx's default rdmult unless
a reserved value is selected.
A unit test is added to test when the default rdmult value is set.
Change-Id: I2f17a036c188de66dc00709beef4bf2ed86a919a
SVC test is only in CBR and the frame_flags are
set by the SVC pattern, so we shouldn't undo them
for svc mode.
Change-Id: I5ffa65dd58a7b47f287d124d9e71ba1dc7c5a549
A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.
Bug: b:259487065
Test: Build
Change-Id: If05854530f97e1430a7b97788910f277ab673a87
Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.
Added rate targetting unittest for (2SL, 3TL).
Bug: chromium:1375111
Change-Id: I5a62ffe7fbea29dc5949c88a284768386b1907a9
This was just a helper function which called vpx_quantize_b or
vpx_highbd_quantize_b. It also checked for skip_block, which was
necessary when webm:1439 was filed but does not appear to be
necessary now.
Removes a quantize variant and makes subsequent cleanups easier.
Change-Id: Ibe545eccd19370f07ff26c8e151f290c642efd2a
Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.
Change-Id: I3ebcd26010f6c5c067026aa9353cde46669c5d94
Add an Arm Neon implementation of vpx_hadamard_32x32 and use it
instead of the scalar C implementation.
Also add test coverage for the new Neon implementation.
Change-Id: Iccc018eec4dbbe629fb0c6f8ad6ea8554e7a0b13
For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.
Change-Id: I1ce4bbef6e364bbadc76264056aa3f86b1a8edc5
Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.
Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3
(src|ref)8_ptr -> (src|ref)_ptr. aligns the names with the rtcd header;
clears some clang-tidy warnings
Change-Id: Id1aa29da8c0fa5860b46ac902f5b2620c0d3ff54
On a dynamic change of temporal layers:
starting/maimum/optimal were being set twice,
causing incorrect large values.
Bug: b/253927937
Change-Id: I204e885cff92530336a9ed9a4363c486c5bf80ae
On change/update of rc_cfg: when number of temporal
layers change call vp8_reset_temporal_layer_change(),
which in turn will call vp8_init_temporal_layer_context()
only for the new layers.
Bug:b/249644737
Change-Id: Ib20d746c7eacd10b78806ca6a5362c750d9ca0b3
Move all butterfly functions to fdct_neon.h
Slightly optimize load/scale/cross functions
in fdct 16x16.
These will be reused in highbd variants.
Change-Id: I28b6e0cc240304bab6b94d9c3f33cca77b8cb073
In assembly it made sense to iterate using n_coeffs.
In intrinsics it's just as fast to use index and
easier to read.
Change-Id: I403c959709309dad68123d0a3d0efe183874543d
warning: ‘s2[3]’ may be used uninitialized
and
warning: ‘s1[3]’ may be used uninitialized
The warnings exposed unused code.
Change-Id: I75cf1f9db75e811cb42e2f143be1ad76f3e4dee9
Match style for vpx_quantize_b_sse2 and prepare to rewrite
ssse3 version in intrinsics.
Need to evaluate the value of threshold breakout before
going further.
Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b
vp9_quantize_fp_sse2 was only tested in non-hbd
configuration. Missed when fixing this for
vpx_quantize_b_sse2.
Change-Id: Ide346e5727d74281c774f605c90d280050e0bf62
All of the assembly adds 1 to iscan to convert from
a 0 based array to the EOB value.
Add 1 to all iscan values and remove the extra
instructions from the assembly.
Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9
In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12:
In function ‘_mm_add_epi16’,
inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized]
1060 | return (__m128i) ((__v8hu)__A + (__v8hu)__B);
| ^~~~~~~~~~~
../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’:
../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here
39 | __m128i below_context;
Change-Id: I2fc592f121c4e85d0aff1640014c3444f5eb09fd
Allow to handle external q and external max frame size separately.
Rely on libvpx's decision to catch overshoot/undershoot and recode frames.
Previously, when external max frame size is set, we didn't handle
undershoot cases, and now we fall back to libvpx's decision to
recode a frame if overshoot/undershoot is seen.
Change-Id: Ic3eee042cfe104b528c5f2c6c82b98dd5d8fa8ca
fixes -Wclobbered warnings with gcc 12.1.0:
vp8/vp8_dx_iface.c|278 col 16| warning: variable 'w' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
vp8/vp8_dx_iface.c|278 col 19| warning: variable 'h' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
Change-Id: Ib2c606a3450188d7869c066cacaf5615d9746181
missed in
447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check
+ fix #if comments
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
Change-Id: I098b96421b7de5928bb3b11612ca1f32e7b6cbc4
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
Change-Id: I164fe121ccefda99050a9b6a99738a2b518520f3
this produces better assembly with gcc (11.3.0-3); no change in assembly
using clang from the r24 android sdk (Android (8075178, based on
r437112b) clang version 14.0.1
(https://android.googlesource.com/toolchain/llvm-project
8671348b81b95fc603505dfc881b45103bee1731)
Change-Id: Ifec252d4f499f23be1cd94aa8516caf6b3fbbc11
Pass the encode frame info to external ml model, with the information
of gop size and whether alt ref is used.
Change-Id: I55be2d3de83d7182c1a1a174e44ead7e19045c9d
This was reported with doxygen 1.9.4.
Also update the comment for CLASS_GRAPH by running "doxygen -u" because
the original comment for CLASS_GRAPH mentions the obsolete tag
'CLASS_DIAGRAMS',
Change-Id: I3bca547201f794d363bd814b7c7f7c9d7088797a
only store the deltas from --style Google in the file and reapply using
Debian clang-format version 11.1.0-6+build1
Bug: b/229626362
Change-Id: I3e18a2e7c17a90a48405b3cf1b37ebc652aba0db
this was added in:
7beafefd1 vp9: Allow for disabling loopfilter per spatial layer
but the test doesn't zero initialize its svc_params_ member.
fixes the use of an uninitialized value, reported by valgrind and
integer sanitizer:
[ RUN ] VP9/RcInterfaceSvcTest.Svc/0
==1064682== Conditional jump or move depends on uninitialised value(s)
==1064682== at 0x1C5624: loopfilter_frame (vp9_encoder.c:3285)
==1064682== by 0x1C9B54: encode_frame_to_data_rate (vp9_encoder.c:5595)
==1064682== by 0x1CA2EE: SvcEncode (vp9_encoder.c:5789)
==1064682== by 0x1CEA01: vp9_get_compressed_data (vp9_encoder.c:7891)
==1064682== by 0x185F0E: encoder_encode (vp9_cx_iface.c:1437)
==1064682== by 0x1503BB: vpx_codec_encode (vpx_encoder.c:208)
vp9/encoder/vp9_svc_layercontext.c:362:26: runtime error: implicit
conversion from type 'int' of value -1 (32-bit, signed) to type
'LOOPFILTER_CONTROL' changed the value to 4294967295 (32-bit, unsigned)
#0 0x558925f45377 in vp9_restore_layer_context vp9/encoder/vp9_svc_layercontext.c:362:26
#1 0x558925ef89fd in vp9_get_compressed_data vp9/encoder/vp9_encoder.c:7781:5
#2 0x558925e3ef3e in encoder_encode vp9/vp9_cx_iface.c:1437:20
Bug: b/229626362
Change-Id: I33d244be7752c68b71efa9c62ca45d6b202ec761
with block sizes < 8x8 previously only the inner loop was aborted. this
could cause propagation of invalid motion vectors to scale_mv().
this quiets integer sanitizer warnings of the form:
vp9/common/vp9_mvref_common.h:239:18: runtime error: implicit conversion
from type 'int' of value 32768 (32-bit, signed) to type 'int16_t' (aka
'short') changed the value to -32768 (16-bit, signed)
Bug: b/229626362
Change-Id: I58b5a425adf21542cbf4cc4dd5ab3cc5ed008264
these shift values off the most significant bit as part of the process;
vp8_regular_quantize_b_sse4_1 is included here for a special case of
mask creation
quiets warnings of the form:
vp8/decoder/dboolhuff.h:81:11: runtime error: left shift of
2373679303235599696 by 3 places cannot be represented in type
'VP8_BD_VALUE' (aka 'unsigned long')
vp8/encoder/bitstream.c:257:18: runtime error: left shift of 2147493041
by 1 places cannot be represented in type 'unsigned int'
vp8/encoder/x86/quantize_sse4.c:114:18: runtime error: left shift of
4294967294 by 1 places cannot be represented in type 'unsigned int'
vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of
4294967295 by 1 places cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: Iabed118b2a094232783e5ad0e586596d874103ca
and use it on MD5Transform(); this behavior is well defined and is only
a warning with -fsanitize=integer, not -fsanitize=undefined.
quiets warnings of the form:
md5_utils.c:163:3: runtime error: left shift of 143704723 by 7 places
cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: I60a384b2c2556f5ce71ad8ebce050329aba0b4e4
this changes from scaling best sse to downscaling base sse in
comparisons.
this quiets an integer sanitizer warning of the form:
vp9/encoder/vp9_pickmode.c:1632:41: runtime error: left shift of
4294967295 by 1 places cannot be represented in type 'unsigned int'
Bug: b/229626362
Change-Id: Iee2920474ba700a46177d4514ba6ef7691958069
make source_variance unsigned; this matches update_thresh_freq_fact()
and the type of the MACROBLOCK member.
quiets integer sanitizer warnings of the form:
vp9/encoder/vp9_pickmode.c:2710:58: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
Bug: b/229626362
Change-Id: I812c6ca914507bf25cad323dea3d91a3a2ea4f1d
flat/flat2 are stored as int8_t as returned by the filter_mask*
functions.
this quiets integer sanitizer warnings of the form:
vpx_dsp/loopfilter.c:197:28: runtime error: implicit conversion from
type 'int8_t' (aka 'signed char') of value -1 (8-bit, signed) to type
'uint8_t' (aka 'unsigned char') changed the value to 255 (8-bit,
unsigned)
Bug: b/229626362
Change-Id: Iacb6ae052d4cb2b6e0ebccbacf59ece9501d3b5f
* changes:
vpx_encoder.h: make flag constants unsigned
vp8,VP8_COMP: normalize segment_encode_breakout type
webmdec,WebmInputContext: make timestamp_ns signed
highbd_quantize_intrin_sse2: quiet int sanitizer warnings
load_unaligned_u32: use an int w/_mm_cvtsi32_si128
variance_sse2.c: add some missing casts
this matches the type for vpx_codec_frame_flags_t and
vpx_codec_er_flags_t and quiets int sanitizer warnings of the form:
implicit conversion from type 'int' of value -9 (32-bit, signed) to type
'unsigned int' changed the value to 4294967287 (32-bit, unsigned)
Bug: b/229626362
Change-Id: Icfc5993250f37cedb300c7032cab28ce4bec1f86
use unsigned int as the API value is of this type; this quiets some
integer sanitizer warnings of the form:
implicit conversion from type 'unsigned int' of value 2147483648
(32-bit, unsigned) to type 'int' changed the value to -2147483648
(32-bit, signed)
Bug: b/229626362
Change-Id: I3d1ca618bf1b3cd57a5dca65a3067f351c1473f8
this matches the type returned from libwebm, which uses -1 as an error;
quiets integer sanitizer warnings of the form:
implicit conversion from type 'long long' of value -1 (64-bit, signed)
to type 'uint64_t' (aka 'unsigned long') changed the value to
18446744073709551615 (64-bit, unsigned)
Bug: b/229626362
Change-Id: Id3966912f802aee3c0f7852225b55f3057c3e76a
add a missing cast in ^ operations; quiets warnings of the form:
implicit conversion from type 'int' of value -1 (32-bit, signed) to type
'unsigned int' changed the value to 4294967295 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I56f74981050b2c9d00bad20e68f1b73ce7454729
this matches the type of the function parameter; quiets integer
sanitizer warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
3215646151 (32-bit, unsigned) to type 'int' changed the value to
-1079321145 (32-bit, signed)
Bug: b/229626362
Change-Id: Ia9a5dc5e1f57cbf4f8f8fa457bb674ef43369d37
quiets integer sanitizer warnings of the form:
../vpx_dsp/x86/variance_sse2.c:100:10: runtime error: implicit
conversion from type 'unsigned int' of value 4294966272 (32-bit,
unsigned) to type 'int' changed the value to -1024 (32-bit, signed)
Bug: b/229626362
Change-Id: I150cc0a6a6b85143c3bf96886686fe3a40897db5
with certain optimization flags or sanitizers enabled some code may fail
to vectorize:
third_party/libyuv/source/row_common.cc:3178:7: warning: loop not
vectorized: the optimizer was unable to perform the requested
transformation; the transformation might be disabled or specified as
part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]
this was observed with integer/undefined sanitizers using clang 11/13
Bug: b/229626362
Change-Id: I01595c641763c4cd4242e02f2cc5cbabfe69d03e
fixes warnings of the form:
../vp9/simple_encode.cc:755:48: warning: empty expression statement has
no effect; remove unnecessary ';' to silence this warning
[-Wextra-semi-stmt]
SET_STRUCT_VALUE(config, oxcf, ret, key_freq);
Bug: b/229626362
Change-Id: I1c9b0ae9927cdd7c31da000633bcb6e2b8242cd4
Up to 4.1x faster than vp9_highbd_quantize_fp_c() for full
calculations.
~1.3% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I8c6466bdbcf1c398b1d8b03cab4165c1d8556b0c
the snapshot of googletest and the test files themselves are targeting
c++11 currently; these warnings are supported by recent versions of
clang
Change-Id: I5d36b3bd4058ba1610f0c8b27cad27aadee85939
Assume the level definition of min_gf_interval is the minimum allowed
gf_interval. We take this level comformant min_gf_interval instead of
+1.
Change-Id: I9c7e62f210c95b356e9716579ee4c19638de8e35
avoid calculating the end timestamp when performing a flush to prevent
an implicit conversion warning when applying a non-zero offset to a 0
pts used in that case:
vp9/vp9_cx_iface.c:1361:50: runtime error: implicit conversion from type
'vpx_codec_pts_t' (aka 'long') of value -15 (64-bit, signed) to type
'unsigned long' changed the value to 18446744073709551601 (64-bit,
unsigned)
Bug: b/229626362
Change-Id: I68ba19b7d6de35cc185707dfb6b43406b7165035
The iteration index is wrong, causing the starting level to be chosen
is "LEVEL_5_2", which is intended for videos of a large resolution.
Change-Id: Id88836981bdcbd7494bd06193d6a433ac75a6d2e
Up to 5.37x faster than vp9_highbd_quantize_fp_c() for full
calculations.
~1.6% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I584fd1f60a3e02f1ded092de98970725fc66c5b8
this fixes runtime errors with clang -fsanitize=integer in x86 builds:
../vp9/encoder/vp9_rdopt.c:3250:17: runtime error: signed integer
overflow: 18 - -2147483648 cannot be represented in type 'int'
../vp9/encoder/vp9_rdopt.c:3277:16: runtime error: signed integer
overflow: 26 - -2147483648 cannot be represented in type 'int'
Bug: b/229626362
Change-Id: Ic9a5063c840b4fce7056f61362234721add056a6
prefer int in most cases
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -809931979 (32-bit, signed)
to type 'uint32_t' (aka 'unsigned int') changed the value to 3485035317
(32-bit, unsigned)
Bug: b/229626362
Change-Id: I0c6604efc188f2660c531eddfc7aa10060637813
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -2 (32-bit, signed) to type
'unsigned int' changed the value to 4294967294 (32-bit, unsigned)
Bug: b/229626362
Change-Id: Id7e13b3d494ccd1a2351db8fab6fdb6a9a771d51
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'int' of value -1323 (32-bit, signed) to
type 'unsigned int' changed the value to 4294965973 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I7291d9bd5cacea0d88d9f4c4624c096764f4a472
w/clang -fsanitize=integer fixes warnings of the form:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
4294443008 (32-bit, unsigned) to type 'int' changed the value to -524288
(32-bit, signed)
Bug: b/229626362
Change-Id: Ic7c0a2e7b64a1dd6fd5cc64adcd5765318c2a956
unsigned -> int and vice versa
reported by clang -fsanitize=integer
vp8/common/findnearmv.c:108:11: runtime error: implicit conversion from
type 'uint32_t' (aka 'unsigned int') of value 4294443008 (32-bit,
unsigned) to type 'int' changed the value to -524288 (32-bit, signed)
vp8/common/findnearmv.c:110:33: runtime error: implicit conversion from
type 'int' of value -524288 (32-bit, signed) to type 'uint32_t' (aka
'unsigned int') changed the value to 4294443008 (32-bit, unsigned)
Bug: b/229626362
Change-Id: Ic7ce0fd98255ccf9307ac73e9fb6a8189b268214
use vpx_enc_frame_flags_t; this avoids int -> unsigned conversion
warnings; reported w/clang -fsanitize=integer:
test/error_resilience_test.cc:95:9: runtime error: implicit conversion
from type 'int' of value -12845057 (32-bit, signed) to type 'unsigned
long' changed the value to 4282122239 (32-bit, unsigned)
Bug: b/229626362
Change-Id: I0fc1dbe44a258f397cf1a05347d8cb86ee70b1b8
reported under clang-13. null data may be passed as a flush; move
data_end after that check
vp9/vp9_dx_iface.c:337:40: runtime error: applying zero offset to null
pointer
Bug: b/229626362
Change-Id: I845726fd6eb6ac7a776e49272c6477a5ad30ffdf
reported under clang-13; use a while loop in file_read() to force a size
check before attempting to read. buf (aux_buf) may be may be null when
no conversion is necessary.
y4minput.c:29:43: runtime error: applying zero offset to null pointer
Bug: b/229626362
Change-Id: Ia3250d6ff9c325faf48eaa31f4399e20837f8f7b
this clears warnings under clang-13 of the form:
vp9/encoder/x86/highbd_temporal_filter_sse4.c|196 col 63| warning:
parameter 'v_pre' set but not used [-Wunused-but-set-parameter]
this is the high-bitdepth version of:
73b8aade8 temporal_filter_sse4: remove unused function params
Change-Id: I9b2c9bf27c16975e4855df6a2c967da4c8c63a3a
Up to 11.78x faster than vpx_quantize_b_32x32_sse2() for full
calculations.
~1.7% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: Ib759056db94d3487239cb2748ffef1184a89ae18
Up to 3.61x faster than vpx_highbd_quantize_b_sse2() for full
calculations.
~2.3% overall encoder improvement for the test clip used.
Bug: b/237714063
Change-Id: I23f88d2a7f96aaa4103778372f4f552207f73cee
Add unit tests for a 4 frame video, which could be considered as a
corner case.
Three different GOP settings are tested and verified as valid.
(1). The first GOP has 3 coding frames, no alt ref.
The second GOP has 1 coding frame, no alt ref.
The numer of coding frames is 4.
Their frame types are: keyframe, inter_frame, inter_frame,
golden_frame.
(2). The first GOP has 4 coding frames, use alt ref.
The second GOP has 1 coding frame, which is the overlay of
the first GOP's alt ref frame.
The numer of coding frames is 5.
Their types are: keyframe, alt_ref, inter_frame, inter_frame,
overlay_frame.
(3). Only one GOP with 4 coding frames, do not use alt ref.
The numer of coding frames is 4.
Their types are: keyframe, inter_frame, inter_frame, inter_frame.
Change-Id: I4079ff5065da79834b363b1e1976f65efed3f91f
in CheckLowFilterOutput(); use std::unique_ptr to avoid spurious memory
leak warning:
test/pp_filter_test.cc|466 col 3| warning: Potential leak of memory
pointed to by 'expected_output' [cplusplus.NewDeleteLeaks]
ASSERT_NE(expected_output, nullptr);
Bug: b/229626362
Change-Id: Ie9e06c9b9442ffa134e514d2aee70841d19c8ecb
in ConfigChangeThreadCount(); initialize cfg as the static analyzer can
assume AlwaysTrue() within EXPECT_NO_FATAL_FAILURE may return false
causing InitCodec() not to be called.
test/encode_api_test.cc|321 col 3| warning: 1st function call argument
is an uninitialized value [core.CallAndMessage]
video.SetSize(cfg.g_w, cfg.g_h);
Bug: b/229626362
Change-Id: I54899ed0a207ca685416bed3a0e9c9644668e163
Up to 1.36x faster than vpx_quantize_b_32x32_avx() for full
calculations. Up to 1.29x faster for VP9_HIGHBITDEPTH builds.
Bug: b/237714063
Change-Id: I97aa6a18d4dc2f3187b76800f91bbba7be447ef1
this quiets a couple static analysis warnings with clang 11:
vpx_dsp/x86/avg_intrin_sse2.c:278:45: warning: Although the value stored
to 'src_diff' is used in the enclosing expression, the value is never
actually read from 'src_diff' [deadcode.DeadStores]
src[7] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
^ ~~~~~~~~~~
vpx_dsp/x86/avg_intrin_avx2.c:307:49: warning: Although the value stored
to 'src_diff' is used in the enclosing expression, the value is never
actually read from 'src_diff' [deadcode.DeadStores]
src[7] = _mm256_loadu_si256((const __m256i *)(src_diff += src_stride));
^ ~~~~~~~~~~
Bug: b/229626362
Change-Id: I4b0201bd39775885df0afc03fa5da70910b9dad6
this quiets a static analysis warning with clang 11:
vpx_dsp/avg.c:353:15: warning: Assigned value is garbage or undefined
[core.uninitialized.Assign]
hbuf[idx] /= norm_factor;
^ ~~~~~~~~~~~
the same fix was applied in libaom:
1ad0889bc aom_int_pro_row_c: add an assert for height
Bug: b/229626362
Change-Id: Ic8a249f866b33b02ec9f378581e51ac104d97169
this avoids a warning with certain versions of gcc; observed with:
mipsisa32r6el-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110
Change-Id: I8999f487a79a9d53133816d572054b2423330bcf
This reverts commit 9f1329f8ac
and fixes a dumb mistake in evaluation of vfcmv. Used vdupq_n_s16,
instead of vdupq_n_s32.
Change-Id: Ie236c878c166405c49bc0f93f6d63a6715534a0a
Added datarate unittest for 4:4:4 and 4:2:2 input,
for spatial and temporal layers.
Fix is needed in vp9_set_literal_size():
the sampling_x/y should be passed into update_inital_width(),
othewise sampling_x/y = 1/1 (4:2:0) was forced.
vp9_set_literal_size() is only called by the svc and
on dynamic resize.
Fix issue with the normative optimized scaler:
UV width/height was assumed to be 1/2 of Y, for
the ssse and neon code.
Also fix to assert for the scaled width/height:
in case scaled width/height is odd it should be
incremented by 1 (make it even).
Change-Id: I3a2e40effa53c505f44ef05aaa3132e1b7f57dd5
Release v1.12.0 Torrent Duck
2022-06-17 v1.12.0 "Torrent Duck"
This release adds optimizations for Loongarch, adds support for vp8 in the
real-time rate control library, upgrades GoogleTest to v1.11.0, updates
libwebm to libwebm-1.0.0.28-20-g206d268, and includes numerous bug fixes.
- Upgrading:
This release is ABI compatible with the previous release.
vp8 support in the real-time rate control library.
New codec control VP8E_SET_RTC_EXTERNAL_RATECTRL is added.
Configure support for darwin21 is added.
GoogleTest is upgraded to v1.11.0.
libwebm is updated to libwebm-1.0.0.28-20-g206d268.
- Enhancement:
Numerous improvements on checking memory allocations.
Optimizations for Loongarch.
Code clean-up.
- Bug fixes:
Fix to a crash related to {vp8/vp9}_set_roi_map.
Fix to compiling failure with -Wformat-nonliteral.
Fix to integer overflow with vp9 with high resolution content.
Fix to AddNoiseTest failure with ARMv7.
Fix to libvpx Null-dereference READ in vp8.
Change-Id: I6964e96bccf016f977cc6e83dc0a192d66a19618
min/max_gf_interval is fixed and can be passed from the command line.
It must satisfy the level constraints.
active_min/max_gf_interval might be changing based on
min/max_gf_interval. It is determined per GOP.
Change-Id: If456c691c97a8b4c946859c05cedd39ca7defa9c
replace the check on use_nonrd_pick_mode with an assert. this is only a
start, there are many branches that could be removed that check mode ==
REALTIME, etc. with this configuration.
Bug: webm:1773
Change-Id: I38cf9f83e7c085eb8e87d5cf6db7dc75359b611b
(cherry picked from commit 08b86d7622)
this avoids a crash if cpu-used is not explicitly set as there are some
(unnecessary) checks against use_nonrd_pick_mode which would cause
encoding to be skipped if the old default of 0 were used
Bug: webm:1773
Change-Id: I62fba5fb51d8afa422689b7de3f03e8f7570e50b
Fixed: webm:1773
(cherry picked from commit 95d196fdf4)
A stale codec control was removed, but compatibility was restored.
New codec control was added.
Bump *current* and *age*, and keep *revision* as 0.
Bug: webm:1752
Bug: webm:1757
Change-Id: I76179f129a10c06d897b5c62462808ed9b9c2923
replace the check on use_nonrd_pick_mode with an assert. this is only a
start, there are many branches that could be removed that check mode ==
REALTIME, etc. with this configuration.
Bug: webm:1773
Change-Id: I38cf9f83e7c085eb8e87d5cf6db7dc75359b611b
this avoids a crash if cpu-used is not explicitly set as there are some
(unnecessary) checks against use_nonrd_pick_mode which would cause
encoding to be skipped if the old default of 0 were used
Bug: webm:1773
Change-Id: I62fba5fb51d8afa422689b7de3f03e8f7570e50b
Fixed: webm:1773
This CL breaks the backward compatibility:
1365e7e1a vp9-svc: Remove VP9E_SET_TEMPORAL_LAYERING_MODE
Forcing the value of the next element
Bug: webm:1752
Change-Id: I83c774b3aa6cca25f2f14995590fb20c0a1668d4
(cherry picked from commit 013ec5722c)
This CL breaks the backward compatibility:
1365e7e1a vp9-svc: Remove VP9E_SET_TEMPORAL_LAYERING_MODE
Forcing the value of the next element
Bug: webm:1752
Change-Id: I83c774b3aa6cca25f2f14995590fb20c0a1668d4
Convert the data member EncoderTest::last_pts_ to a local variable in
the EncoderTest::RunLoop() and VP9FrameSizeTestsLarge::RunLoop()
methods. EncoderTest::last_pts_ is only used in these two methods, and
these two methods first set EncoderTest::last_pts_ to 0 before using it.
So EncoderTest::last_pts_ is effectively a local variable in these two
methods.
Note that several subclasses of EncoderTest declare their own last_pts_
data member and use it to calculate the data rate. Apparently their own
last_pts_ data member hides the same-named data member in the base
class. Although this is allowed by C++, this is very confusing.
Change-Id: I55ce1cf8cc62e07333d8a902d65b46343a3d5881
If the external model recommends an invalid q value, we use the
default q selected by libvpx's rate control strategy.
We update the test so that when the external model wants to control
GOP decision, it could get per frame information and just recommend
an invalid q.
Change-Id: I69be4b0ee0800e7ab0706d305242bb87f001b1f7
'gop_index' has already been used in vpx_rc_encodeframe_info_t,
which represents the frame index inside the current
group of picture (gop).
We therefore use 'gop_global_index' to represent the index of
the current gop to avoid duplicate names.
Change-Id: I3eb8987dd878f650649b013e0036e23d0846b5f0
This change let the encoder send first pass stats before gop
decisioins so that external models could make use of it.
Change-Id: Iafc7eddab93aa77ceaf8e1f7663a52b27d94af80
The bit mask allows us to easily add an additional control mode
which both the QP and GOP are controlled by an external model.
Change-Id: I49f676f622a6e70feb2a39dc97a4e5050b7f4760
vp9_change_config may call functions that perform allocations which
expect failures detected by CHECK_MEM_ERROR to not return.
Change-Id: I1dd1eca9c661ed157d51b4a6a77fc9f88236d794
(cherry picked from commit 3997d9bc62)
vp8_change_config may call vp8_alloc_compressor_data which expects
failures detected by CHECK_MEM_ERROR to not return.
Change-Id: Ib7fbf4af904bd9b539402bb61c8f87855eef2ad6
(cherry picked from commit 365eebc147)
this fixes a regression in make 4.2 and still present in 4.3 causing
double colon rules to be serialized which breaks sharding done by the
test and test-no-data-check rules. these targets only define one set of
rules so ordinary rules work unlike clean. install may be another
candidate, but that's left for a follow up.
Change-Id: I9f074eca2ad266eeca6e31aae2e9f31eec8680e0
Tested: make 3.81, 4.1, 4.2, 4.2.1, 4.3
- Return error instead of OK when GOP model is not set.
- Update descriptions for a few variables.
Change-Id: I213f6b7085c487507c3935e7ce615e807f4474cc
the issues fixed in this change are related to implicit conversions
between int / unsigned int:
vp9/encoder/vp9_segmentation.c:42:36: runtime error: implicit conversion
from type 'int' of value -9 (32-bit, signed) to type 'unsigned int'
changed the value to 4294967287 (32-bit, unsigned)
vpx_dsp/x86/sum_squares_sse2.c:36:52: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
vpx_dsp/x86/sum_squares_sse2.c:36:67: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
vp9/encoder/x86/vp9_diamond_search_sad_avx.c:81:45: runtime error:
implicit conversion from type 'uint32_t' (aka 'unsigned int') of value
4290576316 (32-bit, unsigned) to type 'int' changed the value to
-4390980 (32-bit, signed)
vp9/encoder/vp9_rdopt.c:3472:31: runtime error: implicit conversion from
type 'int' of value -1024 (32-bit, signed) to type 'uint16_t' (aka
'unsigned short') changed the value to 64512 (16-bit, unsigned)
unsigned is forced for masks and int is used with intel intrinsics
Bug: webm:1767
Change-Id: Icfa4179e13bc98a36ac29586b60d65819d3ce9ee
Fixed: webm:1767
vp9_change_config may call functions that perform allocations which
expect failures detected by CHECK_MEM_ERROR to not return.
Change-Id: I1dd1eca9c661ed157d51b4a6a77fc9f88236d794
vp8_change_config may call vp8_alloc_compressor_data which expects
failures detected by CHECK_MEM_ERROR to not return.
Change-Id: Ib7fbf4af904bd9b539402bb61c8f87855eef2ad6
this should match VP8 and use ONE_PASS_TEST_MODES, but currently the
code will produce integer sanitizer warnings and may segfault under
certain conditions
Bug: webm:1767,webm:1768
Change-Id: I6482ff1862f19716fde3d57522591bc61d76a84f
DISABLED_TestExternalResizeSmallerWidthBiggerSize was added for
webm:1642, but never fixed
Bug: webm:1642
Change-Id: I0fa368a44dda550241ea997068c58eaff551233c
- COLS_IN_ALPHA_INDEX
this was unused given ALPHABETICAL_INDEX = NO
- PERL_PATH / MSCGEN_PATH
these were unused
quiets warnings with doxygen 1.9.1:
warning: Tag 'COLS_IN_ALPHA_INDEX' at line 1110 of file 'doxyfile' has
become obsolete.
warning: Tag 'PERL_PATH' at line 1105 of file 'doxyfile' has become
obsolete.
warning: Tag 'MSCGEN_PATH' at line 1126 of file 'doxyfile' has become
obsolete
Change-Id: I6229311afaa3318a3f9bcaf40fafcc5ea71ae271
Add a helper function to call the external rate control model.
The helper function is placed in the function where vp9 determines
GOP decisions.
The helper function passes frame information, including current
frame show index, coding index, etc to the external rate control
model, and then receives GOP decisions.
The received GOP decisions overwrites the default GOP decision, only
when the external rate control model is set to be active via
the codec control.
The decision should satisfy a few constraints, for example, larger
than min_gf_interval; smaller than max_gf_interval. Otherwise,
return error.
Unit tests are added to test the new functionality.
Change-Id: Id129b4e1a91c844ee5c356a7801c862b1130a3d8
This reverts commit 258affdeab.
Reason for revert:
Not bitexact with C version
Original change's description:
> [NEON] Optimize vp9_diamond_search_sad() for NEON
>
> About 50% improvement in comparison to the C function.
> I have followed the AVX version with some simplifications.
>
> Change-Id: I72ddbdb2fbc5ed8a7f0210703fe05523a37db1c9
Change-Id: I5c210b3dfe1f6dec525da857dd8c83946be566fc
rather than tmpfile(). this allows for setting the path with TEST_TMPDIR
and provides a valid default for android.
Change-Id: Iecb26f381b6a6ec97da62cfa0b7200f427440a2f
Simplify architecture support code and remove redundant code
to improve efficiency.
Bug: webm:1755
Change-Id: I03bc251aca115b0379fe19907abd165e0876355b
only lint-hunks.py is tested as part of the presubmit; the rest may
need further changes as they're used.
Bug: b/229626362
Change-Id: I2fd6e96deab8d892d34527e484ea65e3df86d162
Some macros have been changed to "#define do {...} While (0)",
change the rest to "static INLINE ..."
Bug: webm:1755
Change-Id: I445ac0c543f12df38f086b479394b111058367d0
If aq_mode=0 the segmentation feature may still be used
for active_maps, so the condition active_maps.enabled
needs to be added in two places regarding segmentation
logic in encodeframe.c. Otherwise the active_maps would
have no effect.
This also resolves why the assert in bug webm:1762 was
not triggered when aq_mode=0.
Change-Id: Ibd68e9b5c3f81728241a168d3fb3567d6845633d
For segment skip feature: allow for setting the
mi->interp_filter to BILINEAR, if cm->interp_filter
is set BILIENAR. This can happen at speed 9 when the
segment skip feature is used (e.g., active_maps)
Without this fix the assert can be triggered with the
active_map_test.cc for speed 9 included.
Updated the test.
Fixes the assert triggered in the issue:
Bug: webm:1762
Change-Id: I462e0bdd966e4f3cb5b7bc746685916ac8808358
About 50% improvement in comparison to the C function.
I have followed the AVX version with some simplifications.
Change-Id: I72ddbdb2fbc5ed8a7f0210703fe05523a37db1c9
previously vp9_bitstream_worker_data was checked after it was memset();
this change uses CHECK_MEM_ERROR for consistency to ensure the pointer
is checked first
Change-Id: I532d0eb0e746dc6b8d694b616eba693c5c0053ac
around ASM_REGISTER_STATE_CHECK() this helps keep the call ordering
consistent avoiding some code reordering which may affect the registers
being checked
fixes issue with armv7 and multiple versions of gcc:
[ RUN ] C/AddNoiseTest.CheckNoiseAdded/0
test/register_state_check.h:116: Failure
Expected equality of these values:
pre_store_[i]
Which is: 0
post_store[i]
Which is: 4618441417868443648
Bug: webm:1760
Change-Id: Ib8bcefd2c4d263f9fc4d4b4d4ffb853fe89d1152
Fixed: webm:1760
this avoids a desynchronization of mb_rows if an allocation prior to
vp8mt_alloc_temp_buffers() fails and the decoder is then destroyed
Bug: webm:1759
Change-Id: I75457ef9ceb24c8a8fd213c3690e7c1cf0ec425f
when no frames were decoded, for example due to a decoder initialization
failure, an orphan buffer pointer from webm_guess_framerate() via
webm_read_frame() would have been freed during cleanup
Change-Id: I6ea3defdd13dd75427f79c516e207b682391e4fa
previously the returns for alloc_context_buffers_ext() and
vp9_alloc_context_buffers() were ignored which would result in a NULL
access during encoding should they fail
Change-Id: Icd76576f3d5f8d57697adc9ae926a3a5be731327
avoid setting num_internal_frame_buffers until the allocation is
checked, avoiding an invalid access in vp9_free_internal_frame_buffers()
Change-Id: I28a544a2553d62a6b5cb7c45bf10591caa4ebab6
in vp9_free_ref_frame_buffers() and vp9_free_context_buffers(); pool and
free_mi may be NULL due to earlier allocation failures
Change-Id: I3bd26ea29b3aea6c58f33d5b7f5a280eb6250ec7
The release tag is release-1.11.0.
Ref: https://aomedia-review.googlesource.com/c/aom/+/156641
79c98a122 Upgrade GoogleTest to v1.11.0
Note the tree structure differs from libaom, but is left untouched to
avoid breaking test include paths in this commit.
Change-Id: Ia3c6861d45a3befc2decb1da5b1018bcfd38f95a
this matches the call with int_mv::as_int and fixes a warning with
clang-13 -fsanitize=integer:
vp8/decoder/decodemv.c:240:32: runtime error: implicit conversion from
type 'uint32_t' (aka 'unsigned int') of value 4282515456 (32-bit,
unsigned) to type 'int' changed the value to -12451840 (32-bit, signed)
Bug: webm:1759
Change-Id: I7c0aa72baa45421929afac26566e149adc6669d7
fixes some warnings with clang-13 -fsanitize=integer:
vp8/decoder/threading.c:77:27: runtime error: implicit conversion
from type 'unsigned int' of value 4294967295 (32-bit, unsigned) to type
'int' changed the value to -1 (32-bit, signed)
these bitmask constants were missed in:
1676cddaa vp8: fix some implicit signed -> unsigned conv warnings
Bug: webm:1759
Change-Id: I5d894d08fd41e32b91b56a4d91276837b3415ee4
this clears warnings under clang-13 of the form:
../vp9/encoder/x86/temporal_filter_sse4.c:275:39: warning: parameter
'u_pre' set but not used [-Wunused-but-set-parameter]
Change-Id: I21519b5b0b9c21b04b174327415e0e73b56bdfda
this quiets warnings under clang-13 of the form:
../vp9/encoder/vp9_mbgraph.c:222:42: warning: variable 'gld_y_offset'
set but not used [-Wunused-but-set-variable]
Change-Id: I32170b90c07058f780b4e8100ee5217232149db8
this clears a warning under clang-13:
vp8/encoder/firstpass.c:1634:10: warning: variable
'mod_err_per_mb_accumulator' set but not used
[-Wunused-but-set-variable]
Change-Id: I694a99d56724be89090e01c45559237c0fda147a
this can occur if 0 frames are encoded, e.g., due to --skip
see also: https://crbug.com/aomedia/3243
Change-Id: I791d5ad6611dbcb60d790e6b705298328ec48126
This reverts commit 2200039d33.
This causes failures with VP9/EndToEndTestLarge.EndtoEndPSNRTest/*; it
seems the assembly does not match the C code.
Bug: webm:1586
Change-Id: I4c63beebf88d4c12789d681b0d38014510b147fe
This reverts commit 89cfe3835c.
This is a prerequisite for reverting
2200039d33 which causes high bitdepth test
failures
Bug: webm:1586
Change-Id: I28f3b98f3339f3573b1492b88bf733dade133fc0
The only difference between the code is the clamp. For
8 bit it is purely an optimization. The values outside
this range will still saturate.
Change-Id: I2a770b140690d99e151b00957789bd72f7a11e13
The optimized quantize functions were already built to handle
highbd values. The only difference is the clamping. All highbd
functions expand to 32bits when running in highbd mode.
Removes vpx_highbd_quantize_32x32_sse2 as it is slower than the
C version in the worst case.
Bug: webm:1586
Change-Id: I49bf8a6a2041f78450bf43a4f655c67656b0f8d9
Level conformance is standadized in vp9.
If a specific target level is set, the vp9 encoder is required to
produce conformant bitstream with limit on frame size, rate,
min alt-ref distance, etc.
This change makes the SimpleEncode environment take the target level
as an input.
To make existing tests pass, we set the level to 0.
Change-Id: Ia35224f75c2fe50338b5b86a50c84355f5daf6fd
Whether a block is skipped is handled by mi->skip. x->skip_block
is kept exclusively to verify that the quantize functions are not
called for skip blocks.
Finishes the cleanup in 13eed991f
Bug: libvpx:1612
Change-Id: I1598c3b682d3c5e6c57a15fa4cb5df2c65b3a58a
These would compute the sum of absolute differences (sad) for a
group of 3 or 8 references. This was used as part of an exhaustive
search.
vp8 only uses these functions in speed 0 and best quality.
For vp9 this is only used with the --enable-non-greedy-mv
experiment.
This removes the 3- and 8-at-a-time optimized functions and uses
the fall back code which will process 1 or 4 (vpx_sadMxNx4d) at
a time.
For configure --target=x86_64-linux-gcc --enable-realtime-only:
libvpx.a
before: 3002424 after: 2937622 delta: 64802
after 'strip libvpx.a'
before: 2116998 after: 2073090 delta: 43908
Change-Id: I566d06e027c327b3bede68649dd551bba81a848e
Clean up a new build warning with gcc11:
argument 3 of type ‘const uint8_t * const[]’ with
mismatched bound [-Warray-parameter=]
Standardize sad functions with array sizes.
Change-Id: Iea4144e61368f6a8279e2f3ae96c78aff06c8b41
The distance between PROC and END is used to generate .size
information for debugging. When the leading underscore was
removed the pattern used to match the function name broke.
Change-Id: I90bf67d95ecdc2d214606e663773f88d2a2d6b9c
[NEON]
Added vpx_fdct4x4_pass1_neon(),
Added vpx_fdct8x8_pass1_notranspose_neon(),
Added vpx_fdct8x8_pass1_neon() to avoid code duplication
Refactored vpx_fdct4x4_neon() and vpx_dct8x8_neon() to use the above
Rename dct_body to vpx_fdct16x16_body to reuse later
Add transpose_s16_16x16()
I have run make test and all tests/configurations seem to pass.
Profiled using this command on an Ampere Altra VM:
sudo perf record -g ./vpxenc --codec=vp9 --height=1080 --width=1920 \
--fps=25/1 --limit=20 -o output.mkv \
../original_videos_Sports_1080P_Sports_1080P-0063.mkv --debug –rt
Before this optimization:
1.32% 1.32% vpxenc vpxenc [.] vpx_fdct4x4_neon
0.16% 0.16% vpxenc vpxenc [.] vpx_fdct4x4_c
0.79% 0.79% vpxenc vpxenc [.] vpx_fdct8x8_c
0.52% 0.52% vpxenc vpxenc [.] vpx_fdct8x8_neon
1.23% 1.23% vpxenc vpxenc [.] vpx_fdct16x16_c
0.54% 0.54% vpxenc vpxenc [.] vpx_fdct16x16_neon
So, even though a _neon() version exists, the C version was called \
as well. After this patch:
1.42% 1.36% vpxenc vpxenc [.] vpx_fdct4x4_neon
0.87% 0.82% vpxenc vpxenc [.] vpx_fdct8x8_neon
0.74% 0.74% vpxenc vpxenc [.] vpx_fdct16x16_neon
Change-Id: Id4e1dd315c67b4355fe4e5a1b59e181a349f16d0
The gcc assembler was incompatible for a long
time. It is now based on clang and accepts
more modern syntax, although not enough to
remove the script entirely.
Change-Id: I667d29dca005ea02a995c1025c45eb844081f64b
Many of the features in ads2gas are no longer used.
Remove all patterns which are no longer used in
libvpx.
Simplify between the two to minimize differences.
Change-Id: Ia1151eb8b694cbe51845a1374a876cc7b798899c
The control was never implemented, no need to keep this.
temporal_layering_mode is set in the config.
Bug: webm:1753
Change-Id: I9a6eb50e82344605ab62775911783af82ac2d401
Allow intra-only frame in svc to also work
in bypass (flexible-svc) mode.
Added unittest for the flexible svc case.
And fix the gld_fb_idx for (SL0, TL1) in bypass/flexible
mode pattern in the sample encoder: force it to be 0
(same as lst_fb_idx), since the slot is unused on SL0.
Change-Id: Iada9d1b052e470a0d5d25220809ad0c87cd46268
Fix some issues with the test, and add new
test that verifies that we can decode base stream
startinig at middle of sequence where intra-only
frame is inserted.
Change-Id: I398d23927113eb58ef64694feca25e60ce60a5f7
RTC sample encoder vpx_temporal_svc_encoder can take mask files as input
when ROI_MAP is set to 1.
Uses ROI and segmentation of vp9 to skip background encoding when
source_sad is low and the correspond block in previous frame is also
skipped.
Change-Id: I8590e6f9a88cecfa1d7f375d4cc480f0f2af87b6
Original change's description:
> Add vp9 ref frame to flag map function
>
> Change-Id: I371c2346b9e0153c0f8053cab399ce14cd286c56
Change-Id: I04a407ee0ef66c01a0d224b4468e043213f8791f
under Visual Studio:
Warning C4244 '=': conversion from 'int64_t' to 'vpx_prob', possible loss of
data
after:
ea042a676 vp9 encoder: fix integer overflows
'newp' has already been range checked earlier in the loop so the cast won't
have any unexpected results
Change-Id: Ic10877db2c0633d53fffdf8852d5095403c23a02
this is a followup to:
7fbcee49d quiet -Warray-parameter warnings
and conforms to aom in:
06e13e817 quiet -Warray-parameter warnings
the sad functions are more varied in libvpx and will require a separate
pass
Change-Id: I765fd6704df615e836ba0b184ff8266ce926c394
If a reference frame is not referenced, then set the index for that
reference to the first one used/referenced instead of unused slot.
Unused slot means key frame, as key frame resets all slots with itself.
This CL extracts `get_first_ref_frame()` from `reset_fb_idx_unused()`
with a typo fixing, and sets all unused reference frames to first ref in
vp9 uncompressed header.
Bug: webrtc:13442
Change-Id: I99523bc2ceedf27efe376d1113851ff342982181
this removes the burden from callers; the rtcd functions are left with a
mostly redundant (outside of tests) once() as top-level functions should
ensure their constraints are met
Change-Id: I5bdbcfa4671c6a1492cfe9c7d886c361c26caaa9
w/gcc-11
v_these_mv_w is always initialized in this block with _mm_add_epi16();
converting this to a _mm_storeu_si32(tmp) call also works, but
introduces more stack usage
|| ../vp9/encoder/x86/vp9_diamond_search_sad_avx.c: In function
‘vp9_diamond_search_sad_avx’:
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|285 col 19| warning:
‘v_these_mv_w’ may be used uninitialized [-Wmaybe-uninitialized]
|| 285 | new_bmv = ((const int_mv *)&v_these_mv_w)[local_best_idx];
|| | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|149 col 21| note:
‘v_these_mv_w’ declared here
|| 149 | const __m128i v_these_mv_w = _mm_add_epi16(v_bmv_w, v_ss_mv_w);
|| | ^~~~~~~~~~~~
Change-Id: I1cd2fcb41030db16f51c94f3a70eb8eb2a526401
w/gcc-11
as noted in
the size of interp_filter_selected[][]'s first dimension varies between
VP9_COMP and VP9BitstreamWorkerData as noted in the latter's definition:
// The size of interp_filter_selected in VP9_COMP is actually
// MAX_REFERENCE_FRAMES x SWITCHABLE. But when encoding tiles, all we ever do
// is increment the very first index (index 0) for the first dimension. Hence
// this is sufficient.
int interp_filter_selected[1][SWITCHABLE];
normalize the function signatures of write_modes*(), etc. to take this
into account.
vp9/encoder/vp9_bitstream.c|948 col 3| warning: ‘write_modes’ accessing
64 bytes in a region of size 16 [-Wstringop-overflow=]
|| 948 | write_modes(cpi, xd, &cpi->tile_data[data->tile_idx].tile_info,
|| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| 949 | &data->bit_writer, tile_row, data->tile_idx,
|| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| 950 | &data->max_mv_magnitude, data->interp_filter_selected);
|| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/vp9_bitstream.c|948 col 3| note: referencing argument 8 of
type ‘int (*)[4]’
vp9/encoder/vp9_bitstream.c|488 col 13| note: in a call to function
‘write_modes’
Change-Id: I0898cd7c3431633c382a0c3a1be2f0a0bea8d0f9
the row-based loop filter is ok (and being used) in this case; since
it's serialized the previous row will always be done
Change-Id: I024a0c78e7488178956cc22a4c4680a00dc6eade
previously row-mt would allocate thread data once, so increasing the
number of threads with a config change would cause a heap overflow.
Bug: chromium:1261415
Bug: chromium:1270689
Change-Id: I3c5ec8444ae91964fa34a19dd780bd2cbb0368bf
cap bitrate to 1000Mbps, change bitsaving budget to int64_t
this make test coverage for 2048x2048 - same as for vp8
Bug: webm:1749
Fixed: webm:1749
Change-Id: Ic58d73cb7529b0826d1f501ad09af8e80f706a6e
Gives 10% faster VP8 encoding in simple tests.
This patch requires testing on wider datasets and encoder
settings to see if this speedup is achieved on most data.
Change-Id: If8e04819623e78fff126c413db66c964c0b4c11a
and rename the table to kCodecIfaces[] to be a little more specific and
avoid shadowing kCodecs[] in SetRoi()
Change-Id: I64905f48d8bf76e812bdba8374b82e3f7654686f
Remove -mmacosx-version-min. The library does not use
any calls which are affected by the platform version.
There is also no version 10.16 as it went from 10.15
to 11 and now to 12.
At some point it may be good to clarify that the bare
-darwin- target is for iOS and the -darwinN- targets
are for macOS.
Change-Id: I2fd5f7cae2637905acf3ab77bfddfbe367abbb68
Fix errors reported by UBSan diagnostics:
1. /vp9/encoder/vp9_pickmode.c:308:29: unsigned integer overflow:
99 - 100 cannot be represented in type 'unsigned int'
2. /vp9/encoder/vp9_pickmode.c:330:27: unsigned integer overflow:
21976 - 21978 cannot be represented in type 'unsigned int'
3. /vp9/encoder/vp9_pickmode.c:468:13: unsigned integer overflow:
18852144 - 18852149 cannot be represented in type 'unsigned int'
(Notice that line numbers might vary a bit because fixes have
been applied incrementally i.e. fix for error #1 affects line
number reported in #2)
Fix by calculating difference instead of wrapping around to
a value near maximum.
Test: Cuttlefish webrtc with VP9 codec
Change-Id: I4f85712028647e915a4e2da31e4b0a266e9e2705
Fix UBSan error reported from aosp Cuttlefish device:
/vp9/encoder/vp9_ratectrl.c:238:33: unsigned integer overflow:
2500000 * 1800 cannot be represented in type 'unsigned int'
...by casting the operand and the result of multiplication
to 64bit integer.
Test: vp9 webrtc streaming with Cuttlefish
Change-Id: Id5bb3d4071a96179caffae0829d3cc4e48c7614b
cap the bitrate to 1000Mbps to avoid many instances of bitrate * 3 / 2
overflowing.
this adds coverage for 2048x2048 in the default test for VP8 with TODOs
for issues at that resolution for VP9 and at max resolution for both.
Bug: b/189602769
Bug: chromium:1264506
Bug: webm:1748
Bug: webm:1749
Bug: webm:1750
Bug: webm:1751
Change-Id: Iedee4dd8d3609c2504271f94d22433dfcd828429
this changes the return to int32_t which matches the type with usage
of this call as input to _mm_cvtsi32_si128(), _mm_set_epi32(), etc.
fixes implicit conversion warning with clang-11 -fsanitize=undefined
Change-Id: I1425f12d4f79155dd5d7af0eb00fbdb9f1940544
this changes the parameter to int32_t which matches the type with usage
of this call using _mm_cvtsi128_si32() as a parameter. quiets an
implicit conversion warning with clang-11 -fsanitize=undefined
Change-Id: I1e9e9ffac5d2996962d29611458311221eca8ea0
with -fsanitize=undefined
test/video_source.h:194:33: runtime error: implicit conversion from type
'int' of value -32 (32-bit, signed) to type 'unsigned int' changed the
value to 4294967264 (32-bit, unsigned)
Change-Id: I92013086d517fecf01c9e4cdfe6737b8ce733a1f
this is similar to the fix for calc_iframe_target_size:
5f345a924 Avoid overflow in calc_iframe_target_size
Bug: chromium:1264506
Change-Id: I2f0e161cf9da59ca0724692d581f1594c8098ebb
the intermediate value in the correction_factor calculation may exceed
integer bounds
Bug: b/189602769
Change-Id: I75726b12f3095663911d78333f3ea26eb6dee21e
and use it to set the format attribute for printf like functions. this
allows the examples to be built with -Wformat-nonliteral without
producing warnings.
Bug: webm:1744
Change-Id: I26b4c41c9a42790053b1ae0e4a678af8f2cd1d82
Fixed: webm:1744
and use it to set the format attribute for the printf like function
vpx_internal_error(). this allows the main library to be built with
-Wformat-nonliteral without producing warnings; the examples will be
handled in a followup.
Bug: webm:1744
Change-Id: Iebc322e24db35d902c5a2b1ed767d2e10e9c91b9
previously ranges were checked with abs() whose behavior is undefined
with INT_MIN. this fixes a crash when the original value is returned and
it later used as and offset into a table.
Bug: webm:1742
Change-Id: I345970b75c46699587a4fbc4a059e59277f4c2c8
previously ranges were checked with abs() whose behavior is undefined
with INT_MIN. this fixes a crash when the original value is returned and
it later used as and offset into a table.
Bug: webm:1742
Change-Id: I345970b75c46699587a4fbc4a059e59277f4c2c8
This allows user to make sure frame will be encoded
when drop_frames is set off (on the fly), no matter
the state of the buffer.
Change-Id: Ia7b39b93fe3721dd586bdbede72c525db87b6890
Condition already existed for screen content mode,
but only when frame-dropper was off. Remove the
frame drop condition.
Change-Id: Ie7357041f5ca05b01e78b4bd3b40da060382591b
Define VPX_NO_RETURN as __declspec(noreturn) for MSVC. See
https://docs.microsoft.com/en-us/cpp/cpp/noreturn?view=msvc-160
This requires moving VPX_NO_RETURN before function declarations because
__declspec(noreturn) must be placed there. Fortunately GCC's
__attribute__((noreturn)) can be placed either before or after function
declarations.
Change-Id: Id9bb0077e2a4f16ec2ca9c913dd93673a0e385cf
(cherry picked from commit 8a6fbc0b4e)
Define VPX_NO_RETURN as __declspec(noreturn) for MSVC. See
https://docs.microsoft.com/en-us/cpp/cpp/noreturn?view=msvc-160
This requires moving VPX_NO_RETURN before function declarations because
__declspec(noreturn) must be placed there. Fortunately GCC's
__attribute__((noreturn)) can be placed either before or after function
declarations.
Change-Id: Id9bb0077e2a4f16ec2ca9c913dd93673a0e385cf
For 1 layer CBR only.
Support for temporal layers comes later.
Rename the library to libvpxrc
Bug: b/188853141
Change-Id: Ib7f977b64c05b1a0596870cb7f8e6768cb483850
Also use round to cast float to int with more accurate calculation to
avoid error accumulation which causes qp to be different after ~290
frames.
Change-Id: Iff65a8fdc67401814fd253dbf148afe9887df97f
This feature was added to help speed up still images and slideshows.
It didn't work anymore, and thus was disabled. Code cleanup will
follow.
This had negligible impact to regular test sets. Borg test result
on ugc360p set at speed 3.
avg_psnr: ovr_psnr: ssim: speed:
-0.244 -0.278 -0.153 -0.973
Change-Id: If74edabce0c93be1361e645ffd2eec063c2db76b
This will do 3 things:
Turn off low motion computation
Turn off gf update constrain on key frame frequency
turn off content mode for cyclic refresh
Those are used to verify the external ratectrl lib works as expected.
Change-Id: Ic6e61498de82d6b3973e58df246cf5e05f838680
Check for x + w and y + h overflows in vpx_img_set_rect().
Move the declaration of the local variable 'data' to the block it is
used in.
Change-Id: I6bda875e1853c03135ec6ce29015bcc78bb8b7ba
target_bandwidth is int64_t, but layer_target_bitrate[0] is an int. this
is safe in the only place it's set because target_bandwidth defaults to
1000. target_bandwidth is later used to populate the cpi's target, which
is an unsigned int so there may be further fixes/cleanups that can be
done.
Change-Id: I35dbaa2e55a0fca22e0e2680dcac9ea4c6b2815a
The changed product was observed to attempt to multiply 1800 by 2500000,
which overflows unsigned 32 bits. Converting to unsigned 64 bits first
and testing whether the final result fits in 32 bits solves the problem.
BUG=b:179686142
Change-Id: I5d27317bf14b0311b739144c451d8e172db01945
The encoder has a feature to skip transform and quantization based
on model rd analysis. It could happen that the model
based analysis lets the encoder skips transform and quantization, while
a bad prediction occurs, leading to bad reconstructed blocks, which
are intrusive and apparently coding errors.
We add a speed feature to guard the skipping feature.
Due to the risk of bad perceptual quality, we disallow such skipping
by default.
On hdres test set, speed 2, the coding performance difference is 0.025%,
speed difference is 1.2%, which can be considered non significant.
BUG=webm:1729
Change-Id: I48af01ae8dcc7a76c05c695f3f3e68b866c89574
For usage in the external RC. When content_mode = 0,
the cyclic refresh has no dependency on the content
(motion, spatial variance, motion vectors, etc,).
The content_mode = 0, when compared to content_mode = 1,
on rtc set for speed 7: has some regression on some
clips (~3-5%), but overall/average bdrate loss is
about ~1-2%.
Comparing aq_mode=3 with content_mode = 0, vs aq_mode=3:
about ~14% avg/overall bdrate gain, but has ~3-7% regression
on some hard motion clip (e.g.m street).
Change-Id: I93117fabb8f7f89032c15baf1292b201e8c07362
Added a new flag in rate control which turns off gf interval constrain
on key frame frequency for external RC.
It remains on for libvpx.
Change-Id: I18bb0d8247a421193f023619f906d0362b873b31
for rc_interface_test_one_layer_vbr and
rc_interface_test_one_layer_vbr_periodic_key added in:
1f45e7b07 vp9 rc: add vbr to rtc rate control library
Change-Id: I8bfa3698284c8ff289e830f7b8fa1ca42b752563
fixes warnings under visual studio:
vp9\encoder\vp9_ratectrl.c(2012): warning C4028: formal parameter 1
different from declaration
vp9\encoder\vp9_ratectrl.c(2027): warning C4028: formal parameter 1
different from declaration
Change-Id: Ia0740db597fb7a259f90d362b483f58662f9f584
This reduces some regression when external RC
is used, for which avg_frame_low_motion is not
set/updated (=0).
Change-Id: I2408e62bd97592e892cefa0f183357c641aa5eea
this allows the file to be located in LIBVPX_TEST_DATA_PATH similar to
other test sources.
Bug: webm:1731
Change-Id: I51606635d91871e7c179aa8d20d4841b0d60b6ad
Two pass rc parameters are only initialized in the second pass
in vp9 normal two pass encoding.
However, the simple_encode API queries the keyframe group, arf group,
and number of coding frames without going throught the two pass
route.
Since recent libvpx rc changes, parameters in the TWO_PASS
struct have a great influence on the determination of the above
information.
We therefore need to properly init two pass rc parameters in
the simple_encode related environment.
Change-Id: Ie14b86d6e7ebf171b638d2da24a7fdcf5a15c3d9
* changes:
Use 'ptrdiff_t' instead of 'int' for pointer offset parameters
Implement vpx_convolve8_avg_vert_neon using SDOT instruction
Merge transpose and permute in Neon SDOT vertical convolution
A number of the load/store functions in mem_neon.h use type 'int' for
the 'stride' pointer offset parameter. This causes Clang to generate
the following warning every time these functions are called with a
wider type passed in for 'stride':
warning: implicit conversion loses integer precision: 'ptrdiff_t'
(aka 'long') to 'int' [-Wshorten-64-to-32]
This patch changes all such instances of 'int' to 'ptrdiff_t'.
Bug: b/181236880
Change-Id: I2e86b005219e1fbb54f7cf2465e918b7c077f7ee
Add an alternative AArch64 implementation of
vpx_convolve8_avg_vert_neon for targets that implement the Armv8.4-A
SDOT (signed dot product) instruction.
The existing MLA-based implementation of vpx_convolve8_avg_vert_neon
is retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: I971c626116155e1384bff4c76fd3420312c7a15b
The original dot-product implementation of vpx_convolve8_vert_neon
used a separate transpose before and after the convolution operation.
This patch merges the first transpose with the TBL permute (necessary
before using SDOT to compute the convolution) to significantly reduce
the amount of data re-arrangement. This new approach also allows for
more effective data re-use between loop iterations.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Bug: b/181236880
Change-Id: I87fe4dadd312c3ad6216943b71a5410ddf4a1b5b
Add an alternative AArch64 implementation of
vpx_convolve8_avg_horiz_neon for targets that implement the Armv8.4-A
SDOT (signed dot product) instruction.
The existing MLA-based implementation of vpx_convolve8_avg_horiz_neon
is retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: Ib435107c47c485f325248da87ba5618d68b0c8ed
Implement sum of squared difference calculations in vpx_mse16x16_neon
and vpx_get4x4sse_cs_neon using the ABD and UDOT instructions -
instead of widening subtracts followed by a sequence of MLAs.
The existing implementation is retained for use on CPUs that do not
implement the Armv8.4-A UDOT instruction. This commit also updates
the variable names used in the existing implementations to be more
descriptive.
Bug: b/181236880
Change-Id: Id4ad8ea7c808af1ac9bb5f1b63327ab487e4b1c7
Add an alternative AArch64 implementation of vpx_convolve8_vert_neon
for targets that implement the Armv8.4-A SDOT (signed dot product)
instruction.
The existing MLA-based implementation of vpx_convolve8_vert_neon is
retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Bug: b/181236880
Change-Id: Iebb8c77aba1d45b553b5112f3d87071fef3076f0
Accelerate Neon variance functions by implementing the sum of squares
calculation using the Armv8.4-A UDOT instruction instead of 4 MLAs.
The previous implementation is retained for use on CPUs that do not
implement the Armv8.4-A dot product instructions.
Bug: b/181236880
Change-Id: I9ab3d52634278b9b6f0011f39390a1195210bc75
Implementing sad16_neon using ABD, UDOT instead of ABAL, ABAL2 saves
a cycle and removes resource contention for a single SIMD pipe on
modern out-of-order Arm CPUs. The UDOT accumulation into 32-bit
elements also allows for a faster reduction at the end of each SAD
function.
The existing implementation is retained for CPUs that do not
implement the Armv8.4-A UDOT instruction, and CPUs executing in
AArch32 mode.
Bug: b/181236880
Change-Id: Ibd0da46e86751d2f808c7b1e424f82b046a1aa6f
Use the AArch64-only ADDV and ADDLV instructions to accelerate
reductions that add across a Neon vector in sum_neon.h. This commit
also refactors the inline functions to return a scalar instead of a
vector - allowing for optimization of the surrounding code at each
call site.
Bug: b/181236880
Change-Id: Ieed2a2dd3c74f8a52957bf404141ffc044bd5d79
quiets an integer sanitizer warning:
vpx/src/vpx_image.c:101:25: runtime error: implicit conversion from
type 'int' of value -2 (32-bit, signed) to type 'unsigned int' changed
the value to 4294967294 (32-bit, unsigned)
Change-Id: Ifeac31cc80811081c1ba10aadaa94dc36cd46efa
Manually unrolling the inner loop is sufficient to stop the compiler
getting confused and emitting inefficient code.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Bug: b/181236880
Change-Id: I860768ce0e6c0e0b6286d3fc1b94f0eae95d0a1a
Implement AArch64-only paths for each of the Neon SAD reduction
functions, making use of a wider pairwise addition instruction only
available on AArch64.
This change removes the need for shuffling between high and low
halves of Neon vectors - resulting in a faster reduction that requires
fewer instructions.
Bug: b/181236880
Change-Id: I1c48580b4aec27222538eeab44e38ecc1f2009dc
Add an alternative AArch64 implementation of vpx_convolve8_horiz_neon
for targets that implement the Armv8.4-A SDOT (signed dot product)
instruction.
The existing MLA-based implementation of vpx_convolve8_horiz_neon is
retained and used on target CPUs that do not implement the SDOT
instruction (or CPUs executing in AArch32 mode). The availability of
the SDOT instruction is indicated by the feature macro
__ARM_FEATURE_DOTPROD.
Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>
Change-Id: I5337286b0f5f2775ad7cdbc0174785ae694363cc
this file uses GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST so it's
safe to enable unconditionally. the filter check fell out of sync with
the code, there's a sse2 and neon implementation for the filter.
Change-Id: I2a3336ccef3fb524ca5d9b8f88279240c9a276aa
Change clamp to an assert so we are warned if changes to input
ranges or defaults in the future lead to an invalid value.
Change-Id: Idb4e0729f477a519bfff3083cdce3891e2fc6faa
* changes:
vpx_convolve_neon: prefer != 0 to > 0 in tests
vpx_convolve_avg_neon: prefer != 0 to > 0 in tests
vpx_convolve_copy_neon: prefer != 0 to > 0 in tests
this produces better assembly code; the horizontal convolve is called
with an adjusted intermediate_height where it may over process some rows
so the checks in those functions remain.
Change-Id: Iebe9842f2a13a4960d9a5addde9489452f5ce33a
Imposed provisional upper and lower limits to each parameter
that can be adjusted in the Vizier ML experiment.
Also in some cases applied secondary limits on on the
range of the final "used" values.
Defaults and limits may well require further tuning after
subsequent rounds of experimentation.
Re-factor get_sr_decay_rate().
Change-Id: I28e804ce3d3710f30cd51a203348e4ab23ef06c0
The overshoot_pct & undershoot_pct attributes for rate control
are expressed as a percentage of the target bitrate, so the range
should be 0-100.
Change-Id: I67af3c8be7ab814c711c2eaf30786f1e2fa4f5a3
Further changes to normalize the Vizier command line parameters.
The intent is that the default behavior for any given parameter
is signaled by the value 1.0 (expressed on the command line as a
rational).
The final values used in the two pass code are obtained by multiplying
the passed in factors by a default values if use_vizier_rc_params is 1.
Where use_vizier_rc_params is 0 the values are explicitly set to
the defaults.
This patch also changes the default value of each parameter to 1.0
even if not set explicitly. This should ensure safe /default behavior
if the user sets use_vizier_rc_params to 1 but does not set all the
the individual parameters.
Change-Id: Ied08b3c22df18f42f446a4cc9363473cad097f69
Add command line options for three rd parameters.
They are controlled by --use_vizier_rc_params, together with
other rc parameters.
If not set from command line, current default values will be used.
Change-Id: Ie1b9a98a50326551cc1d5940c4b637cb01a61aa0
If pass --use-vizier-rc-params=1, the rc parameters are overwittern
by pass in values. It --use-vizier-rc-params=0, the rc parameters
remain the default values.
Change-Id: I7a3e806e0918f49e8970997379a6e99af6bb7cac
this avoids uninitialized values and potential misuse of them which
could lead to a crash should the function fail
this is the same fix that was applied in libaom:
d0cac70b5 Fix a free on invalid ptr when img allocation fails
Bug: webm:1722
Change-Id: If7a8d08c4b010f12e2e1d848613c0fa7328f1f9c
Recently, some function signatures have been changed.
This change fixes compilation error if --enable-rate-ctrl is used.
Change-Id: Ib8e9cb5e181ba1d4a6969883e377f3dd93e9289a
A recent change leads to slight difference of encoding results:
d3aaac367 Change calculation of rd multiplier,
which is caught by Jenkins nightly test.
Adjust the threshold to silence the test failure.
BUG=webm:1725
Change-Id: I7e8b3a26b72c831ae4d88d0fca681b354314739d
Changes the exposed zm_factor parameter.
This patch alters the meaning of the zm_factor
parameter that will be exposed for the Vizier project.
The previous power factor was hard to interpret in terms
of its meaning and effect and has been replaced by a linear factor.
Given that the initial Vizier results suggested a lower zero motion
effect for all formats, the default impact has been reduced.
The patch as it stands gives a modest improvement for PSNR
but is slightly down on some sets for SSIM
(overall psnr, ssim % bdrate change: -ve is better)
lowres -0.111, 0.001
ugc360p -0.282, -0.068
midres2 -0.183, 0.059
hdres2 -0.042, 0.172
Change-Id: Id6566433ceed8470d5fad1f30282daed56de385d
This is similar to the change:
https://chromium-review.googlesource.com/c/webm/libvpx/+/2771081
Which fails libvpx nightly test.
Here we add range check to get rid of the warning of
"divided by zero".
BUG=webm:1723
Change-Id: I7712efe7abd4b11cdb725643d51fd1c0a300d924
Change the way the rd multiplier is adjusted for Q and frame type.
Previously in VP9 the rd multiplier was adjusted based on crude Q bins
and whether the frame was a key frame or inter frame.
The Q bins create some problems as they potentially introduce
discontinuities in the RD curve. For example, rate rising with a
stepwise increase in Q instead of falling. As such, in AV1 they
have been removed.
A further issue was identified when examining the first round of
results from from the Vizier project. Here the multiplier for each Q bin
and each frame type was optimized for a training set, for various video
formats, using single point encodes at the appropriate YT rates.
These initial results appeared to show a trend for increased rd
multiplier at higher Q for key frames. This fits with intuition as in
this encoding context a higher Q indicates that a clip is harder to
encode and frames less well predicted. However, the situation
appeared to reverse for inter frames with higher rd multipliers
chosen at low Q.
My initial suspicion was that this was a result of over fitting, but on
closer analysis I realized that this may be more related to frame type
within the broader inter frame classification. Specifically frames coded
at low Q are predominantly ARF frames, for the mid Q bin there will
likely be a mix of ARF and normal inter frames, and for the high Q bin
the frames will almost exclusively be normal inter frames from difficult
content.
ARF frames are inherently less well predicted than other inter frames
being further apart and not having access to as many prediction modes.
We also know from previous work that ARF frames have a higher
incidence of INTRA coding and may well behave more like key frames
in this context.
This patch replaces the bin based approach with a linear function
that applies a small but smooth Q based adjustment. It also splits
ARF frames and normal inter frames into separate categories.
With this done number of parameters that will be exposed for the
next round of Vizier training is reduced from 7 to 3 (one adjustment
factor each for inter, ARF and key frames)
This patch gives net BDATE gains for our test sets even with the
baseline / default factors as follows: (% BDRATE change in overall
PSNR and SSIM, -ve is better)
LowRes -0.231, -0.050
ugc360p 0.160, -0.315
midres2 -0.348, -1.170
hdres2 -0.407, -0.691
Change-Id: I46dd2fea77b1c2849c122f10fd0df74bbd3fcc7f
These rate control parameters are for the Vizier experiment.
They are defined as rational numbers.
Change-Id: I23f382dd49158db463b75b5ad8a82d8e0d536308
this avoids a warning about differences in size between void* and
unsigned int under msvc:
vp9_ext_ratectrl_test.cc(40,3): warning C4312: 'reinterpret_cast':
conversion from 'const unsigned int' to 'void *' of greater size
Change-Id: I5a412ec785ddcaeff2ec71bb83a6048505400293
Release v1.10.0 Ruddy Duck
2021-03-09 v1.10.0 "Ruddy Duck"
This maintenance release adds support for darwin20 and new codec controls, as
well as numerous bug fixes.
- Upgrading:
New codec control is added to disable loopfilter for VP9.
New encoder control is added to disable feature to increase Q on overshoot
detection for CBR.
Configure support for darwin20 is added.
New codec control is added for VP9 rate control. The control ID of this
interface is VP9E_SET_EXTERNAL_RATE_CONTROL. To make VP9 use a customized
external rate control model, users will have to implement each callback
function in vpx_rc_funcs_t and register them using libvpx API
vpx_codec_control_() with the control ID.
- Enhancement:
Use -std=gnu++11 instead of -std=c++11 for c++ files.
- Bug fixes:
Override assembler with --as option of configure for MSVS.
Fix several compilation issues with gcc 4.8.5.
Fix to resetting rate control for temporal layers.
Fix to the rate control stats of SVC example encoder when number of spatial
layers is 1.
Fix to reusing motion vectors from the base spatial layer in SVC.
2 pass related flags removed from SVC example encoder.
Bug: webm:1712
Change-Id: I4d807da7aee5a4d9d7a7af66b927983622e9cefa
This patch converts the Vizier custom RD multipliers, to factors
that adjust each RD multiplier either side of its default value, where
a factor of 1.0 will give the previous default behavior.
Ultimately I would like to replace the multiple RD multipliers
triggered at different Q thresholds (eg, low, medium, high q)
with a function that adjusts the rd behavior smoothly as Q
changes.
Vizier could then be presented with a single adjustment control
for each of key frame and inter frame rd.
The current behavior is problematic.
Firstly having hard threshold Q values at which rd behavior changes
may cause anomalies in the rate distortion curve, where in some
situations, raising Q, for example, may not cause the expected drop
in rate and rise in distortion, because we have crossed a threshold
where the rate distortion multiplier changes sharply and this alters
the balance of bits spent in the prediction and residual parts of the
signal.
Having a single value that is used for a range of Q index values
(eg 0-64), (65-128) may also cause problems and over-fitting in
the context of the Vizier ML project. This project tries to optimize
the values for each Q range, for various YT formats, but does so
by analyzing the results of single point encodes on a set of clips.
For a given format all the clips are encoded with the same parameters
(target rate etc) so there is likely to be clustering in regards to the
Q values used. For example the training set may give a new value
for the Q range 0-64 but most of the data points used may have Q
close 64.
It will likely require several iterations working with the Vizier team
to get this right. This patch just gives an initial framework for
testing.
Change-Id: Iaa4cd5561b95a202bcae7a1d876c4f40ef444fa2
This patch changes the way prediction decay is calculated.
We expect that frames that are further from an ALT-REF frame (or Golden
Frame) will be less well predicted by that ALT-REF frame. As such it is
desirable that they should contribute less to the boost calculation used
to assign bits to the ALT_REF.
This code looks at the reduction in prediction quality between the last
frame and the second reference frame (usually two frames old). We make
the assumption that we can accumulate this to get a proxy for the likely
loss of prediction quality over multiple frames.
Previously the calculation looked at the absolute difference in the
coded errors. The issue here is that the meaning of a unit difference
is not the same for very complex frames as it is for easy frames.
In this patch we scale the decay value based on how the error difference
compares to the overall frame complexity as represented by the intra
coding error.
This was tuned experimentally to give test results that
were approximately neutral for our various test sets. There was
a slight drop in Overall PSNR but a consistent improvement in
SSIM. This balance may be improved with tuning further as it is
noteworthy that it was much better on the hd_res set.
Results (Overall PSNR, SSIM -ve better) for low_res, ugc360, midres2,
ugc480P and hd_res are as follows:
0.173 -0.688
0.118 -0.153
0.132 -0.239
0.261 -0.405
-0.305 -1.109
As part of this adjustment the contribution of motion amplitude was
removed.
This patch also changes the control mechanism that will be exposed
on the command line for use by the Vizier project. The control is now
a linear factor which defaults to 1.0, where values < 1.0 mean a lower
decay rate and values > 1.0 mean an increased decay rate.
This presents a more easily understandable interface for use in
optimizing the decay behavior for various formats, where it is clear
what a passed in value means relative to the default.
With the new decay mechanism the current values for various formats
are almost certainly wrong and we still need to define sensible upper
and lower bounds for use during future training.
Change-Id: Ib1074bbea97c725cdbf25772ee8ed66831461ce3
Added the experimental max per frame KF boost values derived from
the Vizier experiments.
These are still all off by default.
When enabled I expect these to cause significant regression as they
fluctuate wildly and in a way that makes no sense from format to format.
I suspect these values reflect over fitting perhaps from a subset of
training clips with more frequent mid chunk key frames and or short key
frame groups.
Also fixed incorrect value for gf boost for one format.
Experiment to moderate these values and use different values for first
and subsequent KF groups to follow.
Change-Id: Ibeb4268957f2edacdb4549d74930255a22a2fcc5
Added kf_frame_min_boost field to hold the minimum per frame
boost in key frame boost calculations. Replaces hard wired value.
To be used in conjunction with and tied to the maximum value.
Change-Id: I67a39ecb3f21b5918512a5ccd9a1b214d7971e45
Previous code did not have sensible defaults for larger image formats.
Added defaults for Vizier RD parameters for sizes > 1080P and changed
the first pass parameters for large formats to use the 1080P values.
No supplied value for rd_mult_q_sq_key_high_qp case yet so set to
old hard wired default value.
If the Vizier parameters were enabled the lack of sensible defaults
caused a large regression for 2K clips in one of our test sets.
Change-Id: I306c0cd76eab00d50880c91fadb5842faf6661ff
Further integration of Vizier adjustable parameters,
This patch connects up additional configurable two pass rate control
parameters for the Vizier project. This still needs to be connected up
to a command line interface and at the moment should still be using
default values that match previous behavior.
Do not submit until verified that defaults are all working correctly.
Change-Id: If1241c2dba6759395e6efa349c4659a0c345361d
_WIN32 is predefined for the Windows platform in MSVC, whereas WIN32 is not, and WIN32 is also not defined in the makefiles.
Change-Id: I8b58e42d891608dbe1e1313dc9629c2be588d9ec
This patch adds fields into the RC data structure for the Vizier.
The added fields allow control of some extra rate control parameters
and rate distortion.
This patch also adds functions to initialize the various parameters
though many are not yet used / wired in and for now all are set to
default values. Ultimately many will be set through new command
line options.
Change-Id: I41591bb627d3837d2104fb363845adedbddf2e02
The flag update_pattern_ was being set to 0
(because it was set before reset) instead of 1.
And the example flexible mode pattern was not setting
non-reference frame on top temporal top spatial.
Change-Id: I8aee56ce13cc4e0d614126592f9d0f691fe527b0
And allow the frame to recode when the frame size is larger
than the input max frame size.
If the max frame size is not specified, let vp9 decide whether
to recode. The recode follows the vp9's current recoding mechanism.
The rate control api will return the new qindex back to the
external model.
Change-Id: I796fbf713ad50a5b413b0e2501583b565ed2343f
Previous parser assumed that the header would not exceed
80 characters. However, with latest FFMPEG changes, the header
of Y4M files can exceed this limit.
New parser can parse an arbitrarily long header, as long each
tag is 255 or less characters.
BUG=aomedia:2876
Change-Id: I9e6e42c50f4e49251dd697eef8036485ad5a1228
Previous parser assumed that the header would not exceed
80 characters. However, with latest FFMPEG changes, the header
of Y4M files can exceed this limit.
New parser can parse up to ~200 characters. Arbitrary parsing in
future commit.
BUG=aomedia:2876
Change-Id: I2ab8a7930cb5b76004e6731321d0ea20ddf333c1
use Values() rather than ValuesIn() with an initializer list as this
version of gcc under CentOS fails to deduce the type:
../third_party/googletest/src/include/gtest/gtest-param-test.h:304:29:
note: template argument deduction/substitution failed:
../test/vp9_end_to_end_test.cc:346:59: note: couldn't deduce template
parameter ‘T’
::testing::ValuesIn({ 6, 7, 8 }));
Bug: webm:1690
Change-Id: I43d9d4777fcd74a4f8fa8bdcd9834cdca5e546ff
use a #define for kDataAlignment as it's used with DECLARE_ALIGNED
(__attribute__((aligned(n)))) and this version under CentOS is more
strict over integer constants:
../vpx_ports/mem.h:18:72: error: requested alignment is not an integer constant
#define DECLARE_ALIGNED(n, typ, val) typ val __attribute__((aligned(n)))
Bug: webm:1690
Change-Id: I8d4661ec1c2c1b1522bdc210689715d2302c7e72
* changes:
Add return to vp9_extrc_update_encodeframe_result
Add status in vp9_extrc_get_encodeframe_decision
Return status in vp9_extrc_send_firstpass_stats
Return status in vp9_extrc_create/init/delete
Seen with arm-linux-gnueabihf-gcc-8 (8.3.0 & 8.4.0)
Without reworking the code or adding an additional branch this warning
cannot be silenced otherwise. The loopfilter is only called when needed
for a block so these output pixels will be set.
BUG=b/176822719
Change-Id: I9cf6e59bd5de901e168867ccbe021d28d0c04933
Some refactoring and cleanup -- do not count the first 9 bytes against
the header limit. Add a unit test.
BUG=aomedia:2876
Change-Id: Id897d565e2917b48460cc77cd082cec4c98b42cb
The error occurs with low resolution when LibvpxVp8Encoder::NumberOfThreads returns 1.
Bug: b:175283098
Change-Id: Icc9387c75f4ac6e4f09f102b3143e83c998c5e38
For SVC: add parameter to the control SET_SVC_PARAMS to
allow for disabling the loopfilter per spatial layer.
Note this svc setting will override the setting via
VP9E_SET_DISABLE_LOOPFILTER (which should only be used
for non-SVC).
Add unittest to handle both SVC (spatial or temporal layers)
and non-SVC (single layer) case.
Change-Id: I4092f01668bae42aac724a6df5b6f6a604337448
Tpl stats is computed at the beginning of encoding the altref
frame. We aggregate tpl stats of all blocks for every frame of
the current group of picture.
After the altref frame is encoded, the tpl stats is passed through
the encode frame result to external environment.
Change-Id: I2284f8cf9c45d35ba02f3ea45f0187edbbf48294
Add a comment to vp9_args to point out that bitdeptharg and
inbitdeptharg do not have a corresponding entry in vp9_arg_ctrl_map and
must be listed at the end of vp9_args.
Change-Id: Ic9834ab72599c067156ca5a315824c7f0760824a
Fix three bugs along the way.
1) Call vp9_extrc_send_firstpass_stats() after vp9_extrc_create()
2) Pass in model pointer in vp9_extrc_create()
3) Free frame_stats buffer in vp9_extrc_delete()
Bug: webm:1707
Change-Id: Ic8bd62c7b4ebd85a7479ae5e4c82d7f6059d782f
VP9E_SET_EXTERNAL_RATE_CONTROL
One can assign an external library using the control flag,
VP9E_SET_EXTERNAL_RATE_CONTROL.
The args alongside the control flag should be of type char**.
args[0]: char* points to the path of rate control library
args[1]: char* points to the config of the rate control library.
Change-Id: Iae47362cdfafa00614bac427884bffcf6944c583
after:
979e27c97 configure: add darwin20 support
make the condition more specific by including the trailing -gcc (-*)
Change-Id: I78f481b6c5ad9137e6b6973198e8671e806ee82c
this release will have arm64 and x86_64 support. in the future it might
be useful to move to mac/iphone targets to help disambiguate
arm64-darwin-gcc and arm64-darwin20-gcc.
Change-Id: I1f8b145303204af316955822f5e8bab51c47f353
libvpx does sched_yield() on Linux. This is highly frowned upon these
days mainly because it is not needed and causes high scheduler overhead.
It is not needed because the kernel will preempt the task while it is
spinning which will imply a yield. On ChromeOS, not yielding has the
following improvements:
1. power_VideoCall test as seen on perf profile:
With yield:
9.40% [kernel] [k] __pi___clean_dcache_area_poc
7.32% [kernel] [k] _raw_spin_unlock_irq <-- kernel scheduler
Without yield:
8.76% [kernel] [k] __pi___clean_dcache_area_poc
2.27% [kernel] [k] _raw_spin_unlock_irq <-- kernel scheduler
As you can see, there is a 5% drop in the scheduler's CPU utilization.
2. power_VideoCall test results:
There is a 3% improvement on max video FPS, from 30 to 31. This
improvement is consistent.
Also note that the sched_yield() manpage itself says it is intended
only for RT tasks. From manpagE: "sched_yield() is intended for use
with real-time scheduling policies (i.e., SCHED_FIFO or SCHED_RR)
and very likely means your application design is broken."
BUG=b/168205004
Change-Id: Idb84ab19e94f6d0c7f9e544e7a407c946d5ced5c
Signed-off-by: Joel Fernandes <joelaf@google.com>
Similar to the change in
https://aomedia-review.googlesource.com/c/aom/+/115162.
This currently is a warning, but the tree should be clean now in the
default x86-64 configuration so we can use it to prevent regressions and
find any remaining issues in other configurations.
BUG=b/159031844
Change-Id: I097537ff018668492d37164fdba5edd241dc5dbe
Add encoder control to disable feature to increase Q
on overshoot detection, for CBR. Default (no usage
of the control) means the feature is internally enabled.
Add the control to the sample encoders, but keep it
disabled as default (set to 0, so feature is on).
Change-Id: Ia2237bc4aaea9770e5080dab20bfff9e3fd09199
Number signs are handled differently in Makefile variable parsing as
compared to bash variable parsing. See this demo:
```
$ cat Makefile
A=foo#bar
B='foo#bar'
C="foo#bar"
D=foo\#bar
E='foo\#bar'
F="foo\#bar"
$(info $(A))
$(info $(B))
$(info $(C))
$(info $(D))
$(info $(E))
$(info $(F))
$ make
foo
'foo
"foo
foo#bar
'foo#bar'
"foo#bar"
make: *** No targets. Stop.
$ make -v
GNU Make 4.2.1
```
In other words, the `#` character is evaluated first when parsing
Makefiles, causing the rest of the line to become a comment. The effect of
this is that paths that contain embedded `#` symbols are not handled
properly in the vpx build system.
To test this change, clone vpx to a directory containing a `#` symbol and
attempt a build. With this change, it worked for me on Fedora 31, however
without the change the build failed.
Change-Id: Iaee6383e2435049b680484cc5cefdea9f2d9df46
Fix to reset RC for temporal layers: the
first_spatial_layer_to_encode is usually/default 0,
so the logic to reset for temporal layers was not
being executed. Use VPXMAX(1, ) to make sure all
temporal layers will be reset (when max-q is used
for overshoot).
Change-Id: Iec669870c865420d01d52eab9425cd6c7714eddc
1.Add compile check to probe the native ability of
toolchain to decide whether a feature can be enabled.
2.Add runtime check to probe cpu supported features.
MSA will be prefered if MSA and MMI are both supported.
3.You can configure and build as following commands:
./configure --cpu=loongson3a && make -j4
Change-Id: I057553216dbc79cfaba9c691d5f4cdab144e1123
1) Use kRefFrameTypeNone in the unit test
2) Reset mv_info in fp_motion_vector_info_init
3) Call fp_motion_vector_info_init() in first_pass_encode()
4) Set mv_info for intra frame.
5) Set mv_info with zero mv as default for inter frame
6) Remove duplicated fp_motion_vector_info in encode_frame_info
Change-Id: I2f7db5cd4cf1f19db039c9ce638d17b832f45b6e
This reduce the average recode times per frame from 2.81 to 2.73
when targeting 15% error for target bitrate per frame.
Change-Id: I58f0be86443643ba23623cb1d522ae41897734a3
This will reduce the avg recode times per frame form
3.19 to 2.81 when targeting 15% error margin for
target bitrate per frame.
Change-Id: I28c9ec09a1b1318c09fe5229ccb7e51b32b9dfb9
Store motion vectors for each 16x16 block found in the first pass
motion search.
Provide an api "ObserveFirstPassMotionVector()" in SimpleEncode
class, similar to "ObserveFirstPassStats()".
Change-Id: Ia86386b7e4aa549f7000e7965c287380bf52e62c
This fixes a lossless encoding bug as reported in the issue tracker.
Coding performance change is neutral.
BUG=webm:1700
Change-Id: I0f034b16b57e917e722709a7e9addef864b83d27
Make sure to initialize the layer context for spatial-svc
which has a single temporal layer.
Change-Id: I026ecec483555658e09d6d8893e56ab62ee6914b
(cherry picked from commit 1e9929390c)
For svc with dynamic resize (only for single_layer_svc mode),
add flag to indicate resized width/height has already been set,
otherwise on the resized/trigger frame (resize_pending=1), the
wrong resolution may be set if oxcf->width/height is different
than layer width/height in single_layer_svc mode.
Change-Id: I24403ee93fc96b830a9bf7c66d763a48762cdcb4
(cherry picked from commit de4aedaec3)
The reset happens on the base spatial layer, before
encoding. But it should be reset on the
first_spatial_layer_to_encode, which may not be 0.
Change-Id: I38ef686b4459ca7433062adbfe32ef2134e1ad60
(cherry picked from commit 769129fb29)
This catches the assert/crash fixed in 5174eb5.
Also fix to only check for dynamic resize in SVC mode
for base temporal layer.
Change-Id: Ie6eb7d233cc43eafb1b78cec4aeb94fb4d7fe11a
(cherry picked from commit 3101666d2a)
Fix the logic to allow denoiser reset on resize for SVC mode,
as dynamic resize is allowed for SVC under single_layer mode.
Change-Id: I7776c68dadff2ccbce9b0b4a7f0d12624c2ccf90
(cherry picked from commit 5174eb5b92)
this matches libaom and provides
GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST
BUG=webm:1695
BUG=b/159031848
Change-Id: Icdaf61481ab2012dd0e517dd1e600045c937c0dd
Fix two bugs reported by clang when enable msa optimizatons:
1. clang dose not support uld instruction.
2. ulw instruction will result in unit cases coredump.
Change-Id: I171bed11d18b58252cbc8853428c039e2549cb95
similar to the TEST_CASE -> TEST_SUITE changes in:
83769e3d2 update googletest to v1.10.0
BUG=webm:1695
Change-Id: Ib2bdb6bc0e4ed02d61523f8a8315b017b8ad6dad
(cherry picked from commit 6ee3f3649f)
similar to the TEST_CASE -> TEST_SUITE changes in:
83769e3d2 update googletest to v1.10.0
BUG=webm:1695
Change-Id: Ib2bdb6bc0e4ed02d61523f8a8315b017b8ad6dad
1. Adjust variable type to match clang compiler.
Clang is more strict on the type of asm operands, float or double
type variable should use constraint 'f', integer variable should
use constraint 'r'.
2. Fix prob of using r-value in output operands.
clang report error: 'invalid use of a cast in a inline asm context
requiring an l-value: remove the cast or build with -fheinous-gnu-extensions'.
Change-Id: Iae9e08f55f249059066c391534013e320812463e
last_q is used in resize logic, should
always be last Q selected for previous
frame, encoded or dropped.
Change-Id: Ie9019ccf5a9e3acc8456a2e70cc2aa8d1c90236e
For temporal layers resize is only checked
on the base/TL0 frames. So rc->last_q should be used,
which because rc is in the layer context, rc->last_q
will correspond to the qindex on last TL0 frame.
In the previous code cm->base_qindex was used, which
would correspond to qindex on last encoded frame, which
is not TL0 when temporal_layers > 1.
Change-Id: Iaf86f7156d2d48ae99a1b34ad576d453d490e746
serves as a brief introduction and adds a link to the gerrit
instructions on webmproject.org.
Bug: webm:1669
Change-Id: If1d483eb48e2edcda8c51e66bdd1a86b7c35b986
(cherry picked from commit 220b00dd0d)
serves as a brief introduction and adds a link to the gerrit
instructions on webmproject.org.
Bug: webm:1669
Change-Id: If1d483eb48e2edcda8c51e66bdd1a86b7c35b986
1.'xor,or,and' to 'pxor,por,pand'. In the case of operating FPR,
gcc supports both of them, clang only supports the second type.
2.'dsrl,srl' to 'ssrld,ssrlw'. In the case of operating FPR, gcc
supports both of them, clang only supports the second type.
Change-Id: I93b47348e7c6580d99f57dc11165b4645236533c
For svc with dynamic resize (only for single_layer_svc mode),
add flag to indicate resized width/height has already been set,
otherwise on the resized/trigger frame (resize_pending=1), the
wrong resolution may be set if oxcf->width/height is different
than layer width/height in single_layer_svc mode.
Change-Id: I24403ee93fc96b830a9bf7c66d763a48762cdcb4
This is needed to allow for newmv search in nonrd_pickmode
for resize/scaled frame, and for int_pro_motion_estimation
on resized/scaled frame.
Change-Id: I5e2fdbc4706a10813c1b00f6194e2442f648905a
this moves the framework to c++11 and changes *_TEST_CASE* to
_TEST_SUITE
BUG=webm:1695,webm:1686
Change-Id: I07f2c20850312a9c7e381b38353d2f9f45889cb1
(cherry picked from commit 83769e3d25)
ASSERT's in the function only force a return, not termination. this
fixes a static analyzer issue with using a null decoder object in
following calls.
BUG=webm:1695,webm:1686
Change-Id: I79762df8076d029c5c8fef4d5e06ed655719de62
(cherry picked from commit 0370a43816)
The reset happens on the base spatial layer, before
encoding. But it should be reset on the
first_spatial_layer_to_encode, which may not be 0.
Change-Id: I38ef686b4459ca7433062adbfe32ef2134e1ad60
ASSERT's in the function only force a return, not termination. this
fixes a static analyzer issue with using a null decoder object in
following calls.
BUG=webm:1695
Change-Id: I79762df8076d029c5c8fef4d5e06ed655719de62
Reduce the time before sampling begins (after key)
and reduce averaging window, to make resize act
faster.
Reset RC paramaters for temporal layers on resize.
Add per-frame-bandwidth thresholds to force
downsize for extreme case, for HD input.
Change-Id: I8e08580b2216a2e6981502552025370703cd206c
This catches the assert/crash fixed in 5174eb5.
Also fix to only check for dynamic resize in SVC mode
for base temporal layer.
Change-Id: Ie6eb7d233cc43eafb1b78cec4aeb94fb4d7fe11a
Fix the logic to allow denoiser reset on resize for SVC mode,
as dynamic resize is allowed for SVC under single_layer mode.
Change-Id: I7776c68dadff2ccbce9b0b4a7f0d12624c2ccf90
1) Avoid using global variables.
2) Add comments to EncodeConsistencyTest.
3) Check frame_type and show_idx in EncodeConsistencyTest.
Change-Id: I2261a0bd65189beb70432d62c077ef618a2712ab
Let SetExternalGroupOfPicturesMap() modify the gop_map_ to satisfy
the following constraints.
1) Each key frame position should be at the start of a gop.
2) The last gop should not use an alt ref.
Add unit test for SetExternalGroupOfPicturesMap()
Change-Id: Iee9bd238ad0fc5c2ccbf2fbd065a280c854cd718
Rename external_arf_indexes by gop_map
Use kGopMapFlagStart to indicate the start of a gop in the gop_map.
Use kGopMapFlagUseAltRef to indicate whether to use altref in the
gop_map.
Change-Id: I743e3199a24b9ae1abd5acd290da1a1f8660e6ac
Send GOP_COMMAND to vp9 for setting gop decisions on the fly.
GOP_COMMAND has three members.
use: use this command to set gop or use vp9's gop decision.
show_frame_count: number of show frames in this gop.
use_alt_ref: use alt ref frame or not.
Move the logic of processing external_arf_indexes_ from
get_gop_coding_frame_num() to GetGopCommand() and
GetCodingFrameNumFromGopMap().
Change-Id: Ic1942c7a4cf6eecdf3507864577688350c7ef0cf
For flexible svc in simulcast mode: don't allow refresh
of all reference slots on key frame. Which slots to update
should be based on the user flags.
Change-Id: I3597c61ebcdfed2055bbdffec7ce701fad892744
In the vp8_cost_branch function a couple of unsigned int are being
multiplied by integer coefficients and added to later be divided by
256. While the end result most likely fits an unsigned int, the
intermediary result of multiplying and adding sometimes doesn't (I was
able to reproduce it by leaving the encoder running at 60 fps for a
while). To avoid the multiplication overflow (which is undefined
behavior and causes a wrong result anyways) the calculation is
performed using unsigned long long instead and cast to unsigned int
for return.
Bug: b/154172422
Test: run cuttlefish with webrtc enabled for an hour
Change-Id: If7ebbda38b2450a59ed3c99ffbb59dc62431a324
When the encoder is run continuously for a few minutes at 60 fps, the
total_target_vs_actual field overflows. Since this field is a signed
integer that's considered undefined behavior in C++, which causes an
abort when used in an android binary (those run with ubsan enabled)
Bug: b/154172422
Test: run cuttelfish with webrtc enabled for an hour
Change-Id: I8f7d9d0884311a6338bdcdec76348b8cc3ce8c69
rather than die_codec(). calling any api functions with an uninitialized
codec context is undefined. this avoids a crash in a call to
vpx_codec_error_detail().
BUG=webm:1688
Change-Id: I4a4feeabc1cafa44c8d2f24587fad79e313dba6d
+ fix some test_rc_interface issues:
add a space before $^ in the vcproj rule to add sources to the target,
one between the -I's, and make the guid unique; fixes build / link
errors.
Change-Id: Ia9c99f6a4482a001d993affbc3b3903c2a4e366a
In vpx_codec_control_(), before we enter the for loop, we have already
checked if ctx->iface->ctrl_maps is null and handle that as an error. So
the for loop can assume ctx->iface->ctrl_maps is not null, which implies
'entry' is not null (both initially and after entry++).
Change-Id: Ieafe464d4111fdb77f0586ecfa1835d1cfd44d94
there was an assumption that function calls would terminate early with
an error given 'set -e' was being used. this is true, but only when the
function is part of a simple command otherwise it won't inherit the
behavior. many of the call sites use 'func || return 1' syntax meaning
the function would continue to completion return with the status of the
last command executed. this hid errors with e.g., eval statements. inner
calls within the functions are now explicitly tested for failure.
BUG=aomedia:2669
Change-Id: Ie33a5ac4023dcc800bd302cb8cc54c6c3f2282f5
Update a comment on the nonexistent vpx_codec_init() function. Replace it
with vpx_codec_dec_init() and vpx_codec_enc_init().
I missed this comment in the last commit.
Change-Id: I1d3614b3bb3aa4330ac6bd49e4d2e1f4e627b6b0
Currently, in rare cases on big videos (> 5K), best_mv may differ from ref_mv by more than the allowable MV_MAX. Intersect mv_limits with those bound by MV_MAX before diamond search.
We could use vp9_set_mv_search_range, but that seems a bit more constrained than the bug I encountered (e.g., MAX_FULL_PEL_VAL < MV_MAX / 8).
Change-Id: I2c6563c05039d6ee05edf642665faaccf51787d4
Update comments on the nonexistent vpx_codec_init() function. Replace it
with vpx_codec_dec_init() and vpx_codec_enc_init().
based on the change in libaom:
b1b8c68e8 Update comments on nonexistent aom_codec_init
Change-Id: I63d3f6c87706a98f631457b5f6ce51e8b0c5cfb1
Disable checking rectangular partitions in
nonrd_pick_partition, and enable use_source_sad.
~3-4% speedup for HD clip on x86.
bdrate loss of ~0.2% on rtc set.
Change-Id: Ibef8f100f1f623482d47510cb4ec9278ba777d7c
use an extreme bitrate to cover rate control calculations.
this is disabled by default as there are a mix of
-fsanitize=undefined/integer warnings for vp9 and -fsanitize=integer
warnings for vp8.
this is a follow-up to:
5e065cf9d vp8/{ratectrl,onyx_if}: fix some signed integer overflows
5eab093a7 vp9_ratectrl: fix some signed integer overflows
BUG=webm:1685
Change-Id: I24d223e33471217528a79b0088965ba51d0399ba
Enable use_source_sad at speed 5 and use it to
condition min_partition_size in nonrd_select_partition.
Also disable checking rectangular partitions in
nonrd_pick_partition for speed >= 5.
~5-8% speedup for HD clip on x86.
bdrate loss of ~1% on rtc set.
Change-Id: Ia643b34a51191e3929a443de77e271561e7c877d
Define LIBVPX_{ELF,MACHO} to simplify blocks.
Create new globalsym macro and include logic for PRIVATE.
BUG=webm:1679
Change-Id: I303ba1492a2813f685de51155ccef7e4831e1881
Condition on current_video_frame count, as the
avg_frame_qindex needs some time to settle.
Fixes psnr test failures.
Change-Id: I462c45250becb55b72b6ffe2b7087094d6d58a01
For speed >= 8: disable nonrd_keyframe SVC with
spatial_layers > 1. In this case having base
spatial layer key frame with higher quality
(hybrid mode search) is beneficial, without too
much cpu cost (since its on lowest spatial layer).
Change-Id: Iff7c43aed4e808603d8abdedb6eb5d2c9c8ecb8d
Only affects variance partition at low-resoln,
speed 6,7 real-time mode. At very high Q better to
save bits from the split to 8x8.
bdrate gain ~3% on rtc_derf at very low bitrates
Change-Id: I94ee58e67d5ba6277cbab8f8dd9ea45b035c82b5
Add vp9 RTC rate control without creating encoder,
to allow external codecs to use vp9 rate control.
A new library (libvp9rc.a) will be built. Applications using this
interface must be linked with the library.
BUG=1060775
Change-Id: Ib3e597256725a37d2d104e1e1a1733c469991b03
:private_extern only applies to macho. Match x86inc.asm logic:
%if FORMAT_ELF
global %2:function hidden
%elif FORMAT_MACHO
global %2:private_extern
%else
global %2
%endif
May fix a build issue on windows:
vp8/encoder/x86/block_error_sse2.asm:18: error:
COFF format does not support any special symbol types
BUG=webm:1679
Change-Id: I7e1f4043b064a04752d1cedd030cbe7f5461fe40
* changes:
x86inc.asm: update to 3e5aed95c
x86inc.asm: namespace ARCH_* defines
x86inc.asm: only set visibility for chromium builds
x86inc.asm: do not align .text for aout
x86inc.asm: use .text on march32
x86inc.asm: copy PIC macros from x86_abi_support.asm
x86inc.asm: set PREFIX from libvpx defines
x86inc.asm: pull settings from libvpx
x86inc.asm: update to 3e5aed95
All decoder functions should return the VPX_CODEC_INCAPABLE error code
if the algorithm does not have the requested capability.
Move the definitions of VPX_CODEC_CAP_FRAME_THREADING and
VPX_CODEC_CAP_EXTERNAL_FRAME_BUFFER to the VPX_CODEC_CAP_* section.
Change "PUT_SLICE and PUT_FRAME events are posted" to "put_slice and
put_frame callbacks are invoked".
Also fix some other minor comment errors.
This carries back to libvpx the following libaom CL:
https://aomedia-review.googlesource.com/c/aom/+/108405
Change-Id: If67a271c9abbb3eebc2359719cc7d9f235b690d2
Reapply and update a4b47b89f. This restores the previous version's
behavior avoiding issues with builds that may split sources on
directory boundaries; protected visibility may work in this case.
BUG=webm:1679
Change-Id: I36011727485847dd11f06782bc6beddedc39019c
Reapply a97c83f7a. Only use .text sections for aout and do not specify
an alignment.
BUG=webm:1679
Change-Id: Ibb01b09c205f9e0ecd4bfa0241e3d5e01ae5a55e
Reapply 9679be4bc. The read only sections are getting stripped on some
OS X builds. As a result, random data is used in place of the intended
tables.
BUG=webm:1679
Change-Id: Ifb17acbed73df4b9949a8badae2d9305a3073b83
Reapply 7e065cd57. x86inc.asm always defines PIC for x86_64. We undefine
it for x32.
Incorporate e56f96394 as well to ensure GET_GOT_DEFINED is defined.
BUG=webm:1679
Change-Id: I1535d57bcb4223327ca63b4fd11bffcda1009332
Pull a clean copy in and name it _new. Will apply the libvpx
patches and then move it over.
BUG=webm:1679
Change-Id: I48d3d4ab7911340c0997dd79a0dbadccf5697682
Chromium needs :function hidden and the space between the symbol and the
colon removed, at least for nasm. This matches x86inc.asm.
BUG=webm:1679
Change-Id: Ie47bb75d44d3130791639cbf4e2ebe019e2d686e
Move some code for 1 pass, that is not
directly related to rate control, out of
the postencode.
This avoids the need of extra flag for the
RC interface in:
https://chromium-review.googlesource.com/c/webm/libvpx/+/2118915
Change-Id: I3992ea8255196a762c1174c35dd7dcc9b01d317e
in calculations involving bitrate in encode_frame_to_data_rate() and
vp8_compute_frame_size_bounds()
note this isn't exhaustive, it's just the result of a vpxenc run with:
-w 800 -h 480 --cpu-used=8 --rt --target-bitrate=1400000000
Bug: b/151945689
Change-Id: I3a4f878046fcf80e87482761588c977c283ae917
in calculations involving bitrate in vp9_rc_postencode_update() and
calc_pframe_target_size_one_pass_vbr()
note this isn't exhaustive, it's just the result of a vpxenc run with:
-w 800 -h 480 --cpu-used=8 --rt --target-bitrate=1400000000
Bug: b/151945689
Change-Id: I941a77340fd44b09fc965dd182d7aeab9f1f3da0
Because energy scaling is non-decreasing, we can work on the variance
and scale after the loop. This avoids costly computations (in
particular, log()) within the loop.
We've measured that we spend 0.8% of our total time computing the log.
Change-Id: I302fc0ecd9fd8cf96ee9f31b8673e82de1b2b3e2
* changes:
Correct time_base of ivf header in SimpleEncode
Add detail comments on valid_list in SimpleEncode
Add missing Copyright to python files
Move member functions up in simple_encode.h
quiets -Wunreachable-code-loop-increment, present since:
e57f388bc vpx_codec_enc_config_default: disable 'usage'
as g_usage was never supported for vp8/9 this was always a single
iteration. if additional usages are added in the future similar to av1
this can be restored.
Bug: b/150166387
Change-Id: Ic6f0985829e87694de8b5e0340cffa6c451ed1c2
* changes:
Add unit test for ref_frame_info
Add key frame group info to SimpleEncode
Add ref_frame_info to encode_frame_result
Add init/update_frame_indexes()
Add GetVectorData()
Add count on expected number of resizes,
and use the speed_setting_ for base layer.
Also allow AQ_MODE=3 for the tests with
dynamic layer disabling/enabling.
Change-Id: I03fb0789a2210ba00b8b153941bf79fb774d51bf
Make internal dynamic resize work for SVC mode
when single layer SVC is running (i.e, other layers
are dropped due to 0 bitrate).
Added unittest.
Change-Id: Icf03e1f276d9c4ba2734c87c927f7881c6b0a116
Fix several bugs to make the test pass.
1) Move update_frame_indexes() out of show_frame check.
2) Init coding_indexes[i] to -1 when key frame appears
3) Fix a bug in PostUpdateRefFrameInfo()
Change-Id: Ie7c70a1d460e5b89475a1aef77416fc9a88387e1
We will init and update current_video_frame and
current_frame_coding_index in the functions.
So it's easier to keep track of when the frame indexes are updated.
Change-Id: Id6ba46643f8923348bb4f81c5dd9ace553244057
It's necessary to get data pointer from a vector sometimes.
This function will guarantee that the data pointer is nullptr
if the vector is empty.
Change-Id: I156308bcb193fe404452d3cd3b24b3f80c3c3727
RefFrameInfo contains the coding_indexes and valid_list of
three reference frame types.
Note that I will add unit test in the follow-up CLs.
Change-Id: Ia055df1f8a5537b2bdd02c78991df9bbf48e951a
Add a test to ensure that encoding with the external arfs gets the
same result as long as the arfs are the same as the vp9 baseline.
Change-Id: I92c79001018f4df3bc16e9fc56c733509bebb9dc
When "rate_ctrl" experiment is on, we allow the external arf
passed from outside to determine group of picture size
in define_gf_group().
Change-Id: I0b8c3e1bf3087f21a4e484354168df4967d35bba
Replace golden and altref by past and future in RefFrameType.
So that we don't get confused with FrameType and RefFrameType.
Change-Id: I1be45d49f76c68869fc4bf53ff946fee9ce7eb9d
Make sure frame_type, show_idx and coding_index in GroupOfPicture
match the results in EncodeFrameInfo.
Change-Id: I3b477a03b5efd651c2d174e7146a4cd4f5551604
In the previous version, we assume the number of coding frames is
known.
Although the assumption is true for now with rate_ctrl flag on,
it's more proper to use ObserveGroupOfPicture() to get
the partial info about how many coding frames are in the group.
Because We want to keep the flexibility of changing the size of
group of pictures on the fly in the future.
Change-Id: Ibbe6ab49268c468bf1cef8344efd3a3e1eab972a
Add coding_index to EncodeFrameInfo
Add start_coding_index to GroupOfPicture
Add frame_coding_index_ to SimpleEncode
The definition of coding index is as follows.
Each show or no show frame is assigned with a coding index based
on its coding order (starting from zero) in the coding process of
the entire video. The coding index for each frame is unique.
Change-Id: I43e18434a0dff0d1cd6f927a693d6860e4038337
For low resolutions: increase the partition threshold
to split to 8x8 blocks for high Q.
Some improvement in quality for low bitrates at low resoln.
On rtc_derf speed 7: ~1.7 bdrate gain for low bitrates.
Change-Id: I1900c32497b75da4e8b882fedc8f4b440b017480
Set enable_adaptive_subpel_force_stop to 0 as default
for all speeds. Its only enabled for speed >= 9.
Change-Id: I23a1c1cb9765994d2153ef401976c11a07f3fe7f
this avoids leaving the floating point unit in an inconsistent state on
error and breaking subsequent tests on x86
the test clip invalid-bug-148271109.ivf would also result in a sanitizer
error prior to:
vp8,GetSigned: silence unsigned int overflow warning
BUG=b/148271109
Change-Id: Ia254f3892ac1eeec51db5e9d42ea071545db0cd8
in non-conformant fuzzed bitstreams the calculation of br->value may
overflow. this is defined behavior and harmless in that the stream is
already corrupt.
BUG=b/148271109
Change-Id: I3668ada57e0bd68cea86b82917fb03c19ac1283d
fixes -fsanitize=integer warning:
runtime error: implicit conversion from type 'int' of value -1 (32-bit,
signed) to type 'unsigned int' changed the value to 4294967295 (32-bit,
unsigned)
Change-Id: I95d41aade78cea5e4f870a804d3f358c2cf618d7
Pass the motion vector info stored to the encode frame result
through the interface "update_encode_frame_result()".
Change-Id: I589affa0c4c4d0fd4d639edff9068e44a715beff
Add outfile_path to SimpleEncode() with default value NULL.
The encoder will only output bitstream when outfile_path is set.
Change-Id: Ic68e5358ea454358c510bb0ae214f4201cb3db39
this allows calls to use better versions (e.g., avx2) if available. in
most other cases the function pointer will be defined to the sse2
variant if another isn't available. this improves performance at 1080P
by ~2% on a Xeon E5-2690.
Change-Id: Ie9da3a567021f8416651a29b8c9ab9238dc4bdf1
Allocate motion vector information for the frame, and store it
when a superblock (64x64) is encoded.
The unit size of the smallest block is 4x4.
A special requirement by the vp9 spec is that sub 8x8 blocks
of a 8x8 block must have the same reference frame.
There is no such requirement for blocks large or equal to 8x8.
Change-Id: Iba17c568c450361e5d059503c6fb7bc458184c31
Init the memory for partition information in "EncodeFrameResult".
And pass the partition information of vp9 encoder to it through
the interface: "update_encode_frame_result()".
Change-Id: Iea049e661da79f54d41da7924b9ef28ff7cfbfa3
The bits_per_mb factor from cyclic refresh does not
need to be conditioned on seg_enabled, cr->apply_cyclic_refresh
is sufficient. This is more correct for the case where
the refresh is turned off/on dynamically.
Small/neutral change in bdrate metrics.
Change-Id: Ifbeda9d3e022e6b61cdefa1482d3075f076d7253
Allocate partition information for the frame, and update it
when a superblock (64x64) is encoded.
The unit size of the smallest block is 4x4.
For each 4x4 block, store the current positition (row, column),
the start positition (row_start, column_start) of the partition,
and the block width and height of the partition.
Change-Id: I11c16bbca7e89a088715a1200abd23fe2f9ca1d6
For screen content: lower the threshold for setting
color sensitivity on scene change.
Reduces artifacts in color slide change content.
Change-Id: Ie9a375dee9b8a546dede8afbd241e0e46f79a7f4
this function is currently only used with range checked timestamp
values, but this documents the function's expectations in case it's used
elsewhere
Change-Id: I9de314fc500a49f34f8a1df3598d64bc5070248e
...instead of blindly derefing NULL.
Found by some additional fuzzing of the vp8/vp9 decoders to be
upstreamed soon.
Change-Id: I2ea08c2d15f689f3fac8cc73622056a82d94ec00
The test didn't verify expected error code with invalid sizes. It
assumed VPX_CODEC_OK.
Added new Encoder class which doesn't run decoding at all. It accepts
expected error code to verify with encoder output.
The encoder behavior was changed in 94a65e8.
BUG=webm:1670
Change-Id: I6324d8f744e6c4aa82aa66913923dc140b07bfc9
The compiler cannot prove that the buffers do not alias, so it has to emit a
reload. On our internal workloads, the reloads are about 1% of the total time
spent decoding frames.
The loop before the change:
movzwl 0x8(%r15), %edx # load ref_frame
addq $0xc, %rax
movw %dx, -0x4(%rax) # store ref_frame
movq 0xc(%r15), %rdx # load mv
movq %rdx, -0xc(%rax) # store mv
cmpq %rax, %rcx
jne -0x1a
The loop after the change:
movw %r9w, 0x8(%rax) # store cached ref_frame
addq $0xc, %rax
movq %r8, -0xc(%rax) # store cached mv
cmpq %rax, %rdx
jne -0x12
Change-Id: Ia1e9634bcabb4d7e06ed60f470bc4cd67f5ab27e
This avoids assigning variables which will not be used. A
similar change was made to vpx_dsp/bitreader.c a long time
ago.
Change-Id: Ia5012091b8d85ca9bfefc7735a2aa69c5c2bf516
GetNextEncodeFrameInfo()
Gets encode_frame_info for the next coding frame.
ObserveGroupOfPicture()
Provides the group of pictures that the next coding frame is in.
Change-Id: Idbc437d32c392f25b06efb2d4e1ec01347d678f2
Set frames_since_key to 0 whenever a key frame appears.
Add dependency notes to get_gop_coding_frame_num()
Change-Id: I41ff04bb1c6176e60946b05fe21c72fbb82be62a
always set asm_conversion_cmd as e.g., vpx_config.asm may still be
generated with make when using --enable-external-build
BUG=webm:1535
Change-Id: I120452d4e06580b67119aee8d0a710998ac87a7a
Make sure restore_coding_context() is always called in the end
of encode_with_recode_loop().
Add EncodeConsistencyTest.
Change-Id: I3c8e4c8fcff4e3f7afef9bec469beef2a5fb6eeb
The mutex lf_mutex will now be allocated and destroyed, making it easier
to verify if it has been inited before destruction.
BUG=webm:1662
Change-Id: I8169bea9e117bd615d68b8d02da98aeab570b53f
Similar to __has_feature, __has_attribute needs to be defined
away on unsupported platforms.
BUG=chromium:1020220,chromium:977230
Change-Id: I803fff0fef2b18b535604f3b7f9f8300e45f7ef8
Improves encode_time by 10% on FullStackTest.VP9KSVC_3SL_High and other
tests when -ftrivial-auto-var-init= is used.
vp9_pick_inter_mode can be called recursevely so multiple pred_buf is
neede. So alternative to attribute should be list of bufferes in
ThreadData or TileData.
Bug: 1020220, 977230
Change-Id: I939a468f88c2b5dd2ec235de7564b92bfaa356f5
This helps to improve some benchmarks by 10%, e.g. decode_time
PCFullStackTest.VP9SVC_3SL_Low
Bug: 1020220, 977230
Change-Id: Ic992f1eec369f46a08e19eb33bc3a7c15c1e7c87
Move the break point in encode_with_recode_loop after
save_coding_context() so that restore_coding_context
can work properly.
Change-Id: I58f46928c8cae0ae542fd8343076670fb35681bf
Move vpx_free(buffer_pool) after vp9_remove_compressor()
buffer_pool needs to be free after cpi because buffer_pool
contains allocated buffers that will be free in
vp9_remove_compressor()
Change-Id: I8bcedae2858cfe132bde110c8f3f6b55dcbe3f36
Let vp9_get_compressed_data update ENCODE_FRAME_RESULT, a C
version of EncodeFrameResult.
Let unit test to test frame_type and show_idx properly.
Change-Id: Id810c26c826254fd82249f19ab855ea3b440d99c
It contains coding_data_size and coding_data.
The EncodeFrame will allocate a buffer, write the coding data into the
buffer and give the ownership of the buffer to
encode_frame_result->coding_data
Change-Id: I6bd86aede191ade1db4a1f1bba5be601eef97d60
vp9_lookahead_full - Check if lookahead is full
vp9_lookahead_next_show_idx - Return the show_idx
that will be assigned to the next frame pushed by
vp9_lookahead_push()
Keep track of the show_idx of each frame in the queue
Change-Id: If7ec2c7250f52413e6ce00c5b96f026ebf60a403
This avoids unneeded initializations.
extend_and_predict is called from multiple nested loops, allocate
large buffer on stack and use just a portion of it.
-ftrivial-auto-var-init= inserts initializations which performed on
multiple iterations of loops causing 258.5% regression on
webrtc_perf_tests decode_time/pc_vp9svc_3sl_low_alice-video.
Bug: 1020220, 977230
Change-Id: I7e5bb3c3780adab74dd8b5c8bd2a96bf45e0c231
* changes:
Refactor check_initial_width
Move noise_sensitivity to set_encoder_config
Remove extra function calls in check_initial_width
Move init_ref_frame_bufs to vp9_create_compressor
Remove bits_left update in encoder_encode()
Add vp9_get_encoder_config / vp9_get_frame_info
vp9_get_coding_frame_num()
Make [min/max]_gf_interval static under rate_ctrl
Add rate_ctrl flag
the bitreaders may fill beyond what was written to the buffer as an
optimization. the data isn't used meaningfully, but it may trigger a
msan warning.
BUG=b/140939146
Change-Id: Id03cd203b8ee7ecaf6fdfe3f3c9f2ccfec527129
disabled external_build will return an incorrect result for a value not
explicitly set on the command line; use ! enabled instead.
fixes ios build
Change-Id: I48dda3a06731bc9809c2266880797e1779e4c01c
1) Rename it by update_initial_width() because it's actually
changing the initial_width
2) Move alloc_raw_frame_buffers out of it.
Change-Id: I341bd6743eb0e1217bdf1bdbd7f67c4ea7d76ee2
When configuring with --enable-external-build the .mk files
are not expected to work. This avoids some spurious warnings
when configuring for darwin targets on other platforms.
Fixed: webm:1535
Change-Id: Idac2b397db1b595ba7ea9231c4eb835b6013abdc
These only appear to exist in this repository. Based
on the name they may have been intended to manage
tabs vs spaces.
Change-Id: I2ac1a858f75cb0e5714964cb68e49082c4eb3ca5
The oldest supported Visual Studio version has been vs14
since 539dc7649f.
Clean up scripts and remove dead code.
Change-Id: I6db5b053a55d7656275d3d48e35d672c8ce22067
* changes:
Make gop size independent from kf_zeromotion_pct
Add get_frames_to_next_key()
Rename i by frames_to_key in find_next_key_frame
Remove input_stats when decide frames_to_key
Remove twopass param from test_candidate_kf
Pass first_pass_info/show_idx to test_candidate_kf
Refactor test_candidate_kf()
Decide the key frame directly when auto_key is off
Remove detect_transition_to_still()
Change the interface of find_next_key_frame
Unit Test: VP9/AqSegmentTest. VP9/CpuSpeedTest, AVX2/Loop8Test6Param
implicit conversion from type 'int' of value 59741 (32-bit, signed) to
type 'int16_t' (aka 'short') changed the value to -5795 (16-bit, signed)
BUG=webm:1615
Change-Id: I2e5b688a97c3caa29d4b8a817b95a4986b81a562
const or constexpr should be sufficient for this use but older
versions of gcc fail to expand DECLARE_ALIGNED correctly. Work
around this by using an enum.
Fixed: webm:1660
Change-Id: Ifa4f7585417760f90f9fb28332152019de9f8169
this fixes a segfault when scaling is enabled; in some cases depending
on the ratio offsets may become odd.
vpx_int_pro_row_sse2 was updated previously, though the reason wasn't
listed:
54eda13f8 Apply fast motion search to golden reference frame
BUG=webm:1600
Change-Id: I8d5e105d876d8cf917919da301fce362adffab95
* changes:
Refactor kf_group_err in find_next_key_frame
Simplify the logics in find_next_key_frame
Add get_gop_coding_frame_num()
Localize zero_motion_accumulator
Since the while loop's condition already check
rc->frames_to_key < cpi->oxcf.key_freq,
it impossible to have "frames_to_key >= 2 * cpi->oxcf.key_freq"
and "frames_to_key > cpi->oxcf.key_freq".
Hence, these logics are removed.
Change-Id: I9dfc2ba36e1012718c857fc710036e2d30acd3b8
* changes:
Rename num_show_frames by num_coding_frames
Use compute_arf_boost() in define_gf_group()
Localize av_err mean_mod_score in define_gf_group
Move code of deciding gop size into brackets
When Checking for AVX Support, only the CPU's Capabilities and YMM
Register support by the OS were queried. In case of AVX-512, that is
insufficient, and ZMM Register support by the OS needs querying,
otherwise the OS will raise an Illegal Operation Exception if the CPU
is capable of AVX-512 but the OS is not.
Change-Id: I3444b19156d5743841de96cecbdaac19cc3f2b3f
The behavior is the same as that of detect_transition_still,
only we void using cpi and twopass->stats_in
Change-Id: I07722c817d98d8e4991a0a883235a582db8b5c3c
It's behavior is the same as that of calc_arf_boost()
But, we avoid using cpi and twopass->stats_in
Change-Id: I31cf7889abf43effcca9004a9d55f4b424ce388a
Note the last packet is cumulative first pass stats.
So the number of frames is packet number minus one
Change-Id: I5f617e7eeb63d17204beaaeb6422902ec076caeb
Move the logics of computing
gf_group_err, gf_group_raw_error, gf_group_noise,
gf_group_skip_pct, gf_group_inactive_zone_rowsa,
gf_group_inter, gf_group_motion
into one for loop
The behavior stays the same.
Change-Id: Idbc338a88469bf7a2786c831880e8aba8ed4feb5
This is part of the change aims at replacing
stats_in/stats_in_start/stats_in_end by first_pass_info.
Change-Id: Ibcd2a08e57cb749fe68996f33fe3a5e7f92b1758
when the best filter selected is not EIGHTTAP_SMOOTH, and
reuse_inter_pred is 0, pred buffer was not pointing to the right place.
Change-Id: I5b519fedd2d892bf140879faa74b463a161e253b
with vp9-highbitdepth off.
Unit Test: SSE2/Trans16x16DCT , VP9/LevelTest.TestTargetLevel20Large, VP9/CpuSpeedTest
implicit conversion from type 'int32_t' (aka 'int') of value -32851
(32-bit, signed) to type 'tran_low_t' (aka 'short') changed the value to
32685 (16-bit, signed)
BUG=webm:1615
BUG=webm:1647
Change-Id: I9ef064dc9ac734379628565ff6505b0876984123
Unit test: VP8/InvalidFileTest
implicit conversion from type 'int' of value -45844 (32-bit, signed) to
type 'short' changed the value to 19692 (16-bit, signed)
BUG=webm:1615
BUG=webm:1644
Change-Id: Id5d470f706d68e24f7a1e689526c9ecd3a8e8db8
Unit Test: VP9/InvalidFileTest
implicit conversion from type 'int' of value -65536 (32-bit, signed) to
type 'int16_t' (aka 'short') changed the value to 0 (16-bit, signed)
BUG=webm:1615
BUG=webm:1645
Change-Id: I4ce0c6abf8b5bf43ee43e958ad75d9fa28b23eee
From unit test: AVX/VP9QuantizeTest; SSSE3/VP9QuantizeTest ...
implicit conversion from type 'int' of value -139812 (32-bit, signed)
to type 'tran_low_t' (aka 'short') changed the value to -8740 (16-bit,
signed)
BUG=webm:1615
Change-Id: I730946ac6c7a250dcbcfd8a2712c0f1150ddb4fd
From unit test: VP9MultiThreaded/InvalidFileTest
implicit conversion from type 'int' of value 83144 (32-bit, signed) to
type 'tran_low_t' (aka 'short') changed the value to 17608 (16-bit,
signed)
BUG=webm:1615
BUG=webm:1648
Change-Id: I4170494c328596ace66432c8563c55f31745cf76
Fix some speed feature settings for speed 4
in real-time mode.
Use rd pickmode (i.e.,nonrd_pick_mode=0), but
use variance partitioning. Allow aq-mode=3 to
work at speed 4 and modify some other speed settings.
This makes it much faster than the current speed 4,
and still better quality than speed 5.
Change-Id: I94ec43ccac022030a75b5a528703be0c37f9a35c
Remove the feature_score related code to simplify the code.
The feature_score is incorporated in get_local_structure and will
be integrated in later.
The current non_greedy_mv performances are
lowres: -0.239% midres: -0.569% hdres: -0.365%
Change-Id: Ida28bb1baff6932f1c28b24d371a35a1546fa7e9
Condition to disallow key frames on spatial
enhancement layers should be based on the
first_spatial_layer_to_encode, which need not be
layer 0.
Change-Id: If6bc67568151c38c9c98290e5838a23b3ab18e8a
Move vp9_alloc_motion_field_info out of init_tpl_buffer, so that
vp9_alloc_motion_field_info will be called even though there is
not alternate reference frame.
This fix the crash with shields_720p50 at bitrate 2000
Change-Id: If2877e8d0b8a834556be12d239b7b58ad1fc8c73
nzflag is used as a boolean, it doesn't need to be a sized type, int is
enough (and _mm_movemask_epi8 returns one)
fixes:
vp9_quantize_sse2.c:136:16: implicit conversion from type
'int' of value 65535 (32-bit, signed) to type 'int16_t' (aka 'short')
changed the value to -1 (16-bit, signed)
BUG=webm:1649
Change-Id: I0e3f5278af49d84760f3dfb607f28099cf02f21d
add SVC framedrop mode: Lower spatial layers
are constrained to drop if current spatial layer
needs to drop.
No change in behavior to other existing modes.
Change-Id: I2d37959caf8c4b453b405904831b550367f716ba
Spends 25% less time in dec_find_mv_refs for
grass_1_1280X768_fr30_bd8_sub8X8_l31.webm saving 0.7% overall.
Change-Id: I658bb5d6dd8ac82a568c7823dea3f4947ad7ed73
Replace get_pyramid_mv by vp9_motion_field_mi_get_mv.
The goal is to modularize motion field related operations.
Change-Id: I33084e680567ab106659ba9389cc4b507b893c69
implicit conversion from type 'int' of value 49161 (32-bit, signed) to
type 'int16_t' (aka 'short') changed the value to -16375 (16-bit,
signed)
BUG=webm:1615
Change-Id: I3f18283609ac2ce365202a63ef61a47eb00c155b
implicit conversion from type 'int' of value 65536
(32-bit, signed) to type 'short' changed the value to 0 (16-bit, signed)
BUG=webm:1615
Change-Id: I6a04e57bd3272934de9c75fab60a1620ff6c3636
runtime error: implicit conversion from type 'int' of value
-61240 (32-bit, signed) to type 'int16_t' (aka 'short') changed the
value to 4296 (16-bit, signed)
BUG=webm:1615
Change-Id: I213fc153f0df9ea46737a7fb98d909e670125724
Change-Id: I509cbda24d7d0c8dac75209efa40e24c09a107c5
Exhaust: add exhaust search with neighbor constraint
GroundTruth: be able to import motion field variable
MotionEST: use new function names
Util: be able to set the size of image
Change-Id: I36cfdf4b1f28b8190b3ad2be61c241da1347cfc3
implicit conversion from type 'int' of value 42126 (32-bit, signed)
to type 'tran_low_t' (aka 'short') changed the value to -23410 (16-bit, signed)
BUG=webm:1615
Change-Id: I339c640fce81e9f2dd73ef9c9bee084b6a5638dc
implicit conversion from type 'int' of value -139 (32-bit, signed)
to type 'int8_t' (aka 'signed char') changed the value to 117 (8-bit, signed)
BUG=webm:1615
Change-Id: Ic64959759f4a188087aa24bedbae5f9fa60674ad
implicit conversion from type 'int' of value 32768 (32-bit, signed)
to type 'short' changed the value to -32768 (16-bit, signed)
BUG=webm:1615
Change-Id: I7cdba7f7e550f62fd3ac31574e49b1909b6ab054
Temporarily add motion_compensated_prediction_new() to
decouple non_greedy_mv's motion search from baseline.
We need to decouple non_greedy_mv's full pixel motion search and
sub pixel motion search
Change-Id: I1a0e4a170c19b5b718e9d19b62268b520105a0ef
implicit conversion from type 'unsigned int' of value 256
(32-bit, unsigned) to type 'unsigned char' changed the value to
0 (8-bit, unsigned)
BUG=webm:1615
Change-Id: I2b630bf22cad28b5a7a8a37f6938e6ebe12bc64e
runtime error: implicit conversion from type 'int' of value 65594 (32-bit, signed)
to type 'uint16_t' (aka 'unsigned short') changed the value to 58 (16-bit, unsigned)
BUG=webm:1615
Change-Id: I6046a4a4fc0a108c337153f2c59d5cef5c8dcbd6
In high bitdepth build, Neon code would outrange because of use of
int16x8_t and vmulq_s16.
C code always truncate outrange values.
Change-Id: I33a968b8d812e3c8477f3a61d84482758a3f8b21
The encoding time difference between non_greedy_mv and baseline
is reduced from 51% to 13%
However, there is also a performance impact.
non_greedy_mv performance:
Before this CL
lowres 0.395% midres 0.716% hdres 0.533%
After this CL
lowres 0.242% midres 0.429% hdres 0.305%
Change-Id: I047d6509df504b264981c0b903c0cc955f45b273
implicit conversion from type 'unsigned int' of value 256 (32-bit, unsigned)
to type 'uint8_t' (aka 'unsigned char') changed the value to 0 (8-bit, unsigned)
BUG=webm:1615
Change-Id: Ia9ac3772021ae492368c650a73846e7d22c8fdfc
implicit conversion from type 'int' of value -1
(32-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the
value to 255 (8-bit, unsigned
BUG=webm:1615
Change-Id: If507e73aea4dccd3914b6470f8d15db3b67300ce
implicit conversion from type 'int' of value -9 (32-bit, signed) to type
'uint8_t' (aka 'unsigned char') changed the value to 247 (8-bit, unsigned)
BUG=webm:1615
Change-Id: Ic2254ef4312f349ee38ec6e12a56b2cd5714b101
Add intra speed feature to force DC only under intra mode
testing when source sad for superblock is not high.
Feature is only enable at speed >=8. With this feature
enabled at speed 8 we now allow for H/V intra check as
well for speed 8.
This helps to redude artifacts for speed 8, by allowing H/V mode
to be checked for blocks when the superblock has high
source sad/content change.
Change-Id: I0495ce96b4cc844e8c625b5183eef180dbaaaa72
Move vp9_nb_mvs_inconsistency to vp9_non_greedy_mv.c
This is to facilitate following SIMD optimizations.
Change-Id: I8eb8f820368928e0c4fb287e557cddf0bd2c763e
Remove the cb_priv, get_fb_cb, release_fb_cb, and int_frame_buffers
fields from the VP9_COMMON struct. They are not being used.
Change-Id: I235194aa8b315cd8ec9405bbba5feb3bee69f7e0
In this case, vp9_nb_mvs_inconsistency doesn't need to check
whether each neighbor mv is valid or not.
non_greedy_mv encoding time is reduced by 1.5%
Change-Id: I3216c98481e777d5e0b917ea20ee39b7ca9c9d23
The bahavior of this function is to compute log2 of mv difference,
i.e. min log2(1 + row_diff * row_diff + col_diff * col_diff)
against available neghbor mvs.
Since the log2 is monotonic increasing, we can compute
min row_diff * row_diff + col_diff * col_diff first
then apply log2 in the end
non_greedy_mv encoding time is reduced by 1.5%
Change-Id: I70d40060e2621daec27229f1f6d9fea0286aa04e
* changes:
Use sdx8f in exhaustive_mesh_search_single_step
Sync the behavior of exhaustive_mesh_search
Refactor exhaustive_mesh_search_new
Simplify code in exhaustive_mesh_search_new
This reverts commit affd9921e4.
Reason for revert: Quality regression
(VP9/EndToEndTestLarge.EndtoEndPSNRTest/195 failed)
BUG=webm:1635
Original change's description:
> Set up frame contexts based on frame type
>
> In single layer ARF case, use different frame
> contexts for KF, ARF/GF, LF, OVERLAY update types.
>
> Change-Id: Iebb7f9bb430e483dea1e75fc122b9b67645ce804
Change-Id: I98a4eaa6ec0ae6616ea5ad35d1580501b7422e1b
Add the following two functions:
exhaustive_mesh_search_multi_step
exhaustive_mesh_search_single_step
Change-Id: I02fac56a815b091beab2203afce560d7d29aad44
Fix to avoid color artifacts observed for speed >= 8.
In model_rd_large in non_rd pickmode: always do the
transform skipping test for UV plane.
BUG=b/136198713
Change-Id: Idd91322fb898fe731846d8581b21010096f87680
(cherry picked from commit c33c7ca85f)
Fix to avoid color artifacts observed for speed >= 8.
In model_rd_large in non_rd pickmode: always do the
transform skipping test for UV plane.
BUG=b/136198713
Change-Id: Idd91322fb898fe731846d8581b21010096f87680
As the boosted frames, early in key frame interval,
are used as reference by many subsequent boosted frames,
boosted frames that are closer to the reference key frame
should be allocated with more target bits than the rest.
Similarly, the active best quality should be lower for
boosted frames early in the key interval and vice versa.
Hence, the bits allocation and active best quality are varied
based on their temporal position in the key frame interval.
Change-Id: I1362248560d074b9e209657a23ae73dda0b01d52
from sanitizer run:
runtime error: implicit conversion from type 'unsigned int' of value 256
(32-bit, unsigned) to type 'unsigned char' changed the value to
0 (8-bit, unsigned)
BUG=webm:1615
Change-Id: I9321bbd58a305419bc8669ecd7594adc47e8b116
implicit conversion from type 'int' of value -2 (32-bit, signed) to type
'uint8_t' (aka 'unsigned char') changed the value to 254 (8-bit,
unsigned)
BUG=webm:1615
Change-Id: I9b8f5a9df3211e344e91d67a45d321e7115f5d4a
this doesn't cause any overflow issues after:
11de1b838 Fix timestamp overflow issues
BUG=webm:701,webm:1614
Change-Id: I7e1cbfa4264d1661eb9a5baa2b2111a0899360f2
The functions are
diamond_search_sad_new()
vp9_full_pixel_diamond_new()
vp9_refining_search_sad_new()
Change-Id: Ied6fe98b8a1401c95f0488faf781c5cd5e8e0db6
Used separate frame contexts for non-boosted frames.
Adjusted the frame context index grouping for boosted
frames.
Change-Id: I7f6f83f53d46f66a83a6806c2b568bd833ce940d
and calculation
Add interpolation in the Scene
Delete Color interpolation
Build triangle mesh
Reconstruct the code of depth interpolation
Add new data structure Node for back linking
Change-Id: Ibb1e896a2e3623d4549d628539d81d79827ba684
The previous change to disable some vsx functions did not clear
the test failures. Disable vsx by default until it is investigated
and fixed.
BUG=webm:1522
Change-Id: I8ba2e7261ea3eee5022832da7e4a22bf8daa0996
This reduce non_greedy_mv encoding time by 8.9%
Use linear approximation for value >= 1024
BDRate increases slightly on hdres
lowres: -0.002
midres: 0.007
hdres: 0.057
Change-Id: I55fd5e0bf0ab2206a286e11974f701cc48084be8
- Save the initial user-specified timestamp and rebase all further
timestamps by this value. This makes libvpx internal timestamps to
always start from zero, regardless of the user's timestamps.
- Calculate reduced timestamp conversion ratio and use it to convert
user's timestamps to libvpx internal timestamps and back. The effect
of this is that integer overflow due to multiplication doesn't
happen for a much longer time.
BUG=webm:701
Change-Id: Ic6f5eacd9a7c21b95707d31ee2da77dc8ac7dccf
fmemopen is not preferred during fuzzing.
Removed all file operations.
Removed need for allocating a different input buffer.
data buffer is appropriately incremented and passed directly to decoder
This will also test input being sent in an unaligned buffer to the library.
Removed read_frame function and did the required parsing inline.
Change-Id: I32829b0149dba9339f2e8bb4c0249a4987a630c7
(a java based language for data visualization)
add MotionField module
reformat the code by using newest clang-format version
add necessary comments
add new functions
move basic settings to setup
Change-Id: I64a6b2daec06037daa9e54c6b8d1eebe58aa6de0
Updated build instructions for vpx_dec_fuzzer to include
-fsanitize=fuzzer-no-link while configuring library
Change-Id: Id158256aa1cfe3d847720e8558cb5998ad4fd777
This patch uses ARF itself as the GOLDEN frame for the
next gf group instead of replacing it with the overlay
frame. By doing so, bits consumed by the overlay frame
will be reduced.
Change-Id: I909ceaa6d501c267d315614075913d45ad426c15
There are no sse functions which use these files. Cleans up spurious
warnings when building with --disable-sse2
Change-Id: I04d84b8b7ecfe6da7d5d4df63840796c7b04c085
81de00c Check there is only one settings per ContentCompression
5623013 Fixes a double free in ContentEncoding
93b2ba0 mkvparser: quiet static analysis warnings
Change-Id: Ieaa562ef2f10075381bd856388e6b29f97ca2746
I made a mistake (used the outdated baseline) in the CL I
submitted earlier this week:
https://chromium-review.googlesource.com/c/webm/libvpx/+/1638854
The corrected results are following:
The additional gains/loss on top of the tune=ssim are:
Data Set Overall PSNR SSIM MS-SSIM
Lowres 3.490 -3.164 -2.267
Midres 2.245 -2.270 -2.287
HDres 2.562 -1.804 -1.681
Lowres_10bd 3.477 -2.399 -2.689
Midres_10bd 3.467 -1.534 -1.636
The overall gains/loss comparing to tune=psnr are:
Data Set Overall PSNR SSIM MS-SSIM
Lowres 6.127 -5.818 -4.783
Midres 4.574 -5.383 -6.242
HDres 4.908 -6.218 -7.106
Lowres_10bd 6.115 -6.212 -7.790
Midres_10bd 6.238 -6.064 -7.249
Change-Id: Iae72482f7b30f200e5021a98c920eed841d0972a
This CL fixed a bug that sometimes we calculate the best rd cost using
uninitialized rd_div. This CL also includes a small refactoring of
rd_pick_partition().
Speed change: (the smaller the better)
Performance counter stats for './vpxenc park_joy_480p.y4m --limit=50
-o output.webm':
with this CL: 297,086,181,136 instructions:u
without this CL: 299,285,835,104 instructions:u
Quality change: (negative is better)
avg_psnr ovr_psnr ssim
(low_res) 0.007 0.005 -0.002
(mid_res) 0.022 0.028 0.007
(hd_res) -0.008 -0.003 -0.014
Change-Id: I8924d8426364304212bcef3aba13346783e6f1a8
with g++ this avoids:
command line option ‘-Wno-missing-prototypes’ is valid for C/ObjC but
not for C++
the flag is necessary with clang.
BUG=webm:1584
Change-Id: I250c76483302d913999e5f9e0d09ee6449b052df
Test was running at speed 4, which is not used for real-time.
With this change all Datarate tests are now running at
(speed >= 5, 1 pass, real-time mode), which is what they
were intended for.
BUG=webm:1512
Change-Id: I47a721dadd24b73df722c44419df7cfc06c44226
Use different lagrangian multiplier scaling factor for different block
size. The blocks whose sizes are less than 16x16 share the same multiplier
of their parent block.
The additional gains/loss on top of the tune=ssim are:
Data Set Overall PSNR SSIM MS-SSIM
Lowres 2.918 -3.691 -2.596
Midres 1.708 -2.656 -2.624
HDres 1.619 -2.496 -2.391
Midres_10bd 1.518 -3.263 -3.561
The overall gains/loss comparing to tune=psnr are:
Data Set Overall PSNR SSIM MS-SSIM
Lowres 5.583 -6.208 -4.978
Midres 4.024 -5.610 -6.411
HDres 4.102 -6.614 -7.457
Midres_10bd 4.647 -7.181 -8.614
Change-Id: I0e6c5008488734e979b2dacde9fc2a17f3aa620f
* changes:
Update rdcost using the rd_mult in current block
Use distortion and rate of best_rd as the params
Use distortion and rate recursively in rd_pick_partition()
This CL is a preparation for implementing hierarchical SSIM rdmult scaling.
There is very little impact on metrics and speed:
avg_psnr ovr_psnr ssim
midres 0.009 0.009 0.015
perf stat -e instructions:u ./vpxenc park_joy_480p.y4m --limit=50
with this cl: 317,722,808,461
before: 317,700,108,619
Change-Id: I7b1d1482ac69f7bc87065a93223a0274bcbe8ce3
Also added rd calculation for negative rates and distortions.
This CL is a preparation for implementing hierarchical SSIM rdmult scaling.
Little impact on quality and speed:
avg_psnr ovr_psnr ssim
(mid_res) -0.015 -0.009 -0.018
perf stat -e instructions:u ./vpxenc park_joy_480p.y4m --limit=50
with this cl: 317,700,108,619
before: 317,669,279,763
Change-Id: I01588758b7be2aab32236440ec0e57d7af56e920
This CL is a preparation for implementing hierarchical SSIM rdmult scaling,
There is very little impact on metrics and speed:
avg_psnr ovr_psnr ssim
midres -0.04 0.005 0.012
perf stat -e instructions:u ./vpxenc park_joy_480p.y4m --limit=50
with this cl: 317,669,279,763
before: 317,717,562,045
Change-Id: I6f17864e7b17aad06a04ae4f470f75e975549db9
< 4 isn't meaningful in the first pass; additional analysis will be
done, but thrown out, unnecessarily increasing the runtime.
Change-Id: Ic3de77e3eaa7a8a3371f76f84693e9655c60fdba
this test is only useful for realtime mode testing given the number of
frames and that one-pass vod has never been a primary focus for
development.
BUG=webm:1512
Change-Id: I23208393a5fcc5bcf9b267fab4b0d1aad500918a
The spatial svc implementation has moved outside the library:
commit ed8f189ccc
Refactor: move svc example files to from vpx/ to examples/
BUG=webm:1629
Change-Id: I31c3ae7b20a6bd50615d1d6e48d4f93beca939e6
Keep the overshoot_detection_cbr_rt to the fast mode
(FAST_DETECTION_MAXQ), except for low-resoln at speed 5,
for non-screen content.
The increase in encode time (from using the more accurate
RE_ENCODE_MAXQ) is acceptable for speed 5 at low resoln.
Change-Id: I3089d1505553154ef046056465bc18130f7bd55a
- Fix the number of frames considered in calculation of
twopass active worst quality. For GF only group, frames
considered should be one less than baseline gf interval
accounting for the golden frame.
- Fix in calculation of normal_frames. As baseline gf
interval includes the golden frame, the number of
normal frames should be one less than baseline gf
interval.
Change-Id: Ic752f7d13d23772687e2fa407698766b3fdf5c67
This reverts commit c87ff4a09d.
Reason for revert: causes division by zero
Original change's description:
> Fix calculations in GF only group case
>
> - Fix the number of frames considered in calculation of
> twopass active worst quality. For GF only group, frames
> considered should be one less than baseline gf interval
> accounting for the golden frame.
> - Fix in calculation of normal_frames. As baseline gf
> interval includes the golden frame, the number of
> normal frames should be one less than baseline gf
> interval.
>
> Change-Id: I6c0cd0a39db23586fc390a6fba5d7aebc0dfce08
Change-Id: I522da652587ae7ca4177f6d4bb9f72abcff35637
Apply the minimum frame size clamp for all applicable frames. This
avoids bit-rate undershooting issue as reported in
BUG=b/133260125
Change-Id: I59ec028eee999ad5238602adf96465af7c4f4514
Don't allow the setting of copy_buffer_to_arf when the
application/user sets the refresh/update flags. Add new flag
(ext_refresh_frame_flags_pending) to indicate user sets the flags.
Change-Id: I482098c0f2552b04885132a728629ab3e207f08b
For video mode (non-screen) in CBR real-time mode:
increase the qp thresh to trigger setting to active_worst
on scene changes. Avoid big overshoots in content with
scene changes.
Change-Id: I74721b07b0d7b742cbef468ece70cca7da0f89eb
For higher layer ARF frames, limit active best
quality to the qindex of the lower layer ARF
frame.
Change-Id: I957cbd8ae02313cbc94eda2175e63a26d788459a
The ARF frames in last few gf intervals, would be
used as a reference by fewer ARF frames in the same
kf interval. Also, the ARF frames in the last GF
group would not be used as a reference in future.
Hence the active best quality for these ARF frames
is increased based on their temporal distance from
the next key frame.
Change-Id: Ice7eaa8a25384104b1d9cc021eec588c03053fc2
- Fix the number of frames considered in calculation of
twopass active worst quality. For GF only group, frames
considered should be one less than baseline gf interval
accounting for the golden frame.
- Fix in calculation of normal_frames. As baseline gf
interval includes the golden frame, the number of
normal frames should be one less than baseline gf
interval.
Change-Id: I6c0cd0a39db23586fc390a6fba5d7aebc0dfce08
The section intra rating used for the frames in the
first ARF interval was based on entire key frame
interval. However, for subsequent ARF intervals it was
based on that ARF interval. This discrepancy is fixed.
Change-Id: I3df358861d720e536c9c6f15da1cbd78f2dfffbc
This reverts commit 6d6cc17dc8.
Reason for revert:
This has not been reproduced on hardware. There is a strange
libc bug which may account for the behavior on arm because
the environment qemu is using is somewhat old. See discussion
on the webm bug.
To work around the failures in the nightly test the jenkins
job has been switched to use the hardfloat compiler and qemu
environment. Even though this is the same version, it has
not shown the hanging behavior.
Original change's description:
> disable row mt test
>
> deadlock is being investigated in attached bug.
>
> BUG=webm:1626
>
> Change-Id: Ia6d7020b8b1d274433aa89f36c9ed5b9facc5808
Bug: webm:1626
Change-Id: I104a82696a4c90bfbadfd39407c073adce73af0d
plane block size is used when computing model rd for uv.
However, it iterates thru sub-blocks based on tx size on uv planes
and plane block size could be bigger than that, which leads to reading
beyond tile boundary when the block is on it.
BUG=b/131414589
Change-Id: I362091484b1325b89d2175039323b235a06ebffc
When the perceptual AQ mode is enabled, cap the ARF boost to 2.5x
of the regular frame. This allows more consistent frame quality
across consecutive frames and sufficient bit rate allocation at
frame level for AQ mode.
Change-Id: I10f5e2860a3e4b412efe25cca635405bae293ebf
Note that when using --disable-runtime-cpu-detect the developer
must keep in mind what devices the library will be run on.
BUG=webm:1623
Change-Id: I0359e226bb678f8e5145bb30cd1cefc7e30c6c79
arm builds require too many tweaks to keep up with changes
to the ndk. Recommend ndk-build instead.
Update documentation and drop --sdk-path references. If
--enable-external-build is used instead we do not need the compiler
path.
BUG=webm:1622
Change-Id: Id024345afd7af988321f8f97ebab19c425cb0493
Values of mb_smooth_pct and mb_av_energy have been updated
correctly in vp9_rc_get_second_pass_params for higher layer
ARF frames.
Change-Id: Ic176e393eb8cc5f418235fee9accee84e9809607
Add a macro to to exclude VP9 specific assembly files from build if VP9
is not configured. This would otherwise cause a linking error for VP8
only builds.
BUG=webm:1625
Change-Id: I6d892b7c2837a2574538d18b776fd2b6d706da96
Trap the case where we end up with two short GF only groups just
before a key frame. For example, if the KF is 22 frames away
we are better doing one ARF group of size 16 followed by a GF
only group of 6 than two GF only groups of size 11 (when
min_gf_interval is 12).
Change-Id: Ie598a8a21c6e104cbe381b4792e77fd92d047725
Mask the values to show that we only want to store 1 byte. Switch
to lowercase ff since it's more prevalent in the file.
BUG=webm:1615
Change-Id: Ia8ede79cb3a4a39c868198ae207d606e30cfb1cb
Support the potential frame scaling use case. The operation flow
now allows the codec to allocate the memory buffer only when
perceptual AQ mode is enabled.
Change-Id: I7529e63131276dbe3a29f910d3a227f20dbc94a2
on CONFIG_BITSTREAM_DEBUG. this avoids an object file containing no
symbols which may cause warnings on some platforms.
Change-Id: I02af97d6970de949466c29f50d272733d97ee8d2
clang 7 integer sanitizer warns on unsigned->signed conversions when
the highest bit is 1.
BUG=webm:1615
Change-Id: I6381efaff9233254b40cb78f7bcf87090e0ad353
clang 7 integer sanitizer warns about storing any int16_t value
where the high bit is 1. Treated as an int, such number would
be positive. Treated as an int16_t, it is negative.
BUG=webm:1615
Change-Id: Idf655cd92d26b7c1180910159be3f64164577eca
For screen content nonrd_pickmode: reduce
threshold to select 4x4 tx_size, under certain
conditions.
Change-Id: If68c30172272868033f0e3011e53c76b4e7c48b6
For nonrd-pickmode: pass the source variance and the
mode (intra/inter) to select tx_size, for better tuning.
Neutral change for video mode, speed 7.
Some quality improvement for screen content.
Change-Id: I53336f23fa4f14076aa1cdf8036e9af73c43060a
In nonrd_pickmode for intra modes: add tx_size selection
based on Y prediction signal for the bsize.
The tx selection is done in model_rd, same as inter-modes.
Existing code for intra mode was first setting a tx_size based
only on the bsize, and then in some cases in block_yrd
(during the loop over bsize in units of tx_size) the tx_size
may be set again if model_rd is called in block_yrd.
This CL separates out the tx_size setting (based on Y channel
prediction via model_rd), and then block_yrd is called once
for whole bsize. This allows for better tuning of the tx
selection for intra modes in future change.
Adjust threshold in svc datarate test.
Negligible/neutral change in psnr/ssim metrics
for speed 7 and 8, 1 layer and SVC mode.
Change-Id: I33bc8447afdc3785482e13aac5c3636e13c59644
In the distant past this was used to distinguish between
armv5/6/7 targets when building the assembly files. The
project has not supported armv5/6 for a long time.
BUG=webm:1623
Change-Id: Ibec70e6624b651df0fa6f882ab6f201dc73e92e2
For row_mt=1, when mi->skip is set to 1 after parse based on
eobtotal for that partition, dqcoeff and eob need to be restored
as recon_partition doesn't increment these pointers for skip cases
Change-Id: I79711b0c175937aa6da3bba3b3bc053f91a8ce35
Move the setting to just before the inter-mode loop,
as for screen content the value may change due
to reset of segment.
No change in behavior except for screen-content.
Change-Id: I256795b581ceda352e57b88eba2e86aa18b0fdc4
In the calculation of boost for key frames, increase number
of frames to be scanned based on the content nature.
Change-Id: Ia4533966a00055d0bec712e073d82d4bd1dc715a
The frame next to scene cut frame does not usually have
a high second ref useage. Thus the sec ref useage of the
frame next to scene cut frame is tested against a
threshold for scene cut detection.
With this change scene cut detection is improved for
contents where genuine scene cuts were being missed.
Change-Id: I11190d848fa1c1dcd63aab81da799354371e2a30
Reduce the number of group_idx initialization.
Initialize the center to the median of the data group.
Change-Id: Ie16150610480bf54a6b5e2bc048ba1e940bef10f
Add speed feature for real-time to always force
SMOOTH filter for subpel motion. Can be useful in some
cases for noisy content or high motion at low bitrate.
Also some speedup in avoiding the checking of two filters.
Keep it off always for now.
Change-Id: I843d79aaddef75f9c6ded60906cc75c279a6e37a
This CL removes the extra floating math in tune=psnr, I will add
clear_system_state calls in tune=ssim in the next cl.
Change-Id: I7cdd4854b2b8e7e7f872f097c5535f10c80cfe0d
fixes a deadlock with an odd number of threads that go from < number of
tiles to >. the previous calculations were out of sync so going from
e.g., 8 tiles to 2 with 3 threads would result in scheduling only 2
workers, but thread_loop_filter_rows() would expect 3.
BUG=webm:1618
Change-Id: I78c967a8c3c927d929e13c949808a5ef443ebacb
The Wiener variance output has been sorted prior to the clustering,
which allows to directly use the uniform sampling as the initial
center points. It avoids empty cluster situations when the samples
are heavily distributed at two far ends and leave the middle empty.
Change-Id: I159fbfa6bbb4aafd19411fd005666d144cca30fc
Implementation with some tuning of the paper:
C. Yeo, H. L. Tan, and Y. H. Tan, "On rate distortion optimization using
SSIM," Circuits and Systems for Video Technology, IEEE Transactions on,
vol. 23, no. 7, pp. 1170-1181, 2013.
Test results:
avg_psnr ssim ms-ssim
lowres 2.516 -2.622 -2.450
midres 2.312 -3.062 -3.882
hdres 2.292 -4.293 -5.246
The encoding time is about the same as the baseline.
Change-Id: Ida2c380ade79b6c15cf12b88bf090069da8765d8
With switching to clang-7.0.1 we got new warnings. With this change the
warnings are back to 0 for all configurations (excluding warnings in
third_party)
BUG=webm:1616
Change-Id: I25ceb592c425394e8f14d333fb5680144f892213
Use chessboard search only for certain speeds/resolns
(speed >= 8) for real-time speed features.
Disable chessboard search for speeds <= 7.
~2.5 gain on rtc set for speed 7.
~1% slowdown.
Change-Id: Ic6898aa475817e128154f691413c73f65306e2a8
Add consistent switchable rate cost, which should be only
when non-integer motion mode is tested.
Neutal/negligible change in metrics.
Also diable the re-evaluation of ZEROMV mode after denoising
feature, as this rate cost fix exposed an exsting issue
with this feature.
Change-Id: I9e5479281810a392b9a409e238c564b2def8e546
For 1 pass CBR non-SVC encoding, on golden refresh:
condition lower/boosted active_best_quality setting
only if gf_cbr_boost_pct is set.
Reduces overshoot for hard clips.
Neutral change on rtc metrics.
Change-Id: I10f7e27767a3f80d63958a7e137155f7bc20504b
Apply a bias for film mode against intra coding
(especially DC_PRED) and compound modes if the sub
blocks of the current block have significantly different
variance in the source.
Change-Id: Iac1fc0510141be5c472a0ec57567bab3d2fc4164
The current libvpx encoder interface can potentially rollover an int64_t
value used to calculate the current timestamp. If the timebase was set
to microseconds and first timestamp was 0, then the rollover would
occur in about 10.675 days.
BUG=webm:701
Change-Id: I8d5aab46f8dcf250c1d4d43d5f3d27363c19cd54
The sharpness mode is enabled for hvc visual quality. Bypass the
skip block check that could potentially force all zero block in
sharpness mode. This resolves the patchy blockiness issue raised
in the 4K SDR HVC encode.
Change-Id: I0538a1b774b80c6b0899c921e80edecd4a440d5c
this was renamed in:
268f10669 Provide information on codec controls
but the corresponding type checked control call was missed.
Change-Id: I151cb42516b10e551b31273327de4ec1bac3c81b
Histogram-based noise estimation algorithm leveraged that low-noise sequences
tend to populate lower-valued histogram bins and high-noise sequences tend to
populate higher-valued histogram bins in a predictable/repeatable manner. The
algorithm compensates for histogram flattening and skewing toward zero as the
scene darkens.
Change-Id: Ia5acb611f0cc6d726280bd5ea5f45d42ff0dc2dd
This allows to use result from scene chage detection to exclude
the current frame from noise estimation analysis if the frame has
scene/ big content change (i.e., high_source_sad flag is set).
The behavior change for noise estimation may be small in practice,
since in the current code, a scene change would have blocks excluded
due to thresh_sum_diff, and the subsequent frames would also be mostly
excluded due to (past) non-zero motion vectors (until the
consec_zeromv > thresh_consec_zeromv is satisfied again).
But its better to completely exclude current frame if its a scene change.
Change-Id: Icd08bab7a8e1b994c7accced89697e0b2d7f50c5
Add threshold multipler for variance partitioning
as speed feature, and increase it by 2x for speed >= 9
for resoln >= VGA. Also only allow simple_interpol
filter when avg_low_motion is below threshold.
Better tradeoff of speed/quality comparing to speed 8.
Change-Id: I6bd29ad3cced470b32d04f60771120531112a5d9
Substantially increase the threshold for applying variance
adjustment in rd_variance_adjustment() for intra modes
only, especially for DC_PRED.
Change-Id: Idb3f0c5aca5ab58c9b79c3e993247719054d79c9
* changes:
Further Adjustments to film mode bias.
Add GF group noise weighting in rd_variance_adjustment()
Adjustment to low variance block bias in rd_variance_adjustment()
with clang-7 this causes additional warnings in x86 intrinsics and
elsewhere. disabling for now to unblock new changes.
BUG=webm:1615
Change-Id: Ide9cacee5547ed432f980f6804e1414f32639121
For film mode add a weighting to the thresholds used
in rd_variance_adjustment() based on noise measured in the
first pass.
Change-Id: I83ca669bb55aa52f1d34f03a2268b79fba890770
Always test thresholds using a scaled block variance value.
Source pixel variance no longer used so delete it as a parameter
to the function
Change-Id: I9e251edac6ebb15da98e40dcfa43333fe8b6ba55
to replace the variance from .dst which is the prediction buffer in
inter mode. Only enable it in tune-content-film mode at the moment.
Change-Id: I647b4a524a0849fda42541887ebc34091f152073
Disable part of a speed feature that blocks all intra modes
except DC_PRED when the source variance is low.
Change-Id: I2956951fd05933a39f7225d4dfe14e019410fee3
cyclic refresh does not work for speeds <= 4, so disable
it for this case. And dynamically disable it when
average_qp is close to MAXQ (only for non-svc), to improve
quality/rate control at very low bitrates.
Change-Id: I447be43aef0fbb80f4a30d81e11658b58744eae5
Unify the transform and quantization process for 4x4 - 16x16
transform block sizes. This doesn't affect the encoding speed
visibly. Remove it to reduce the maintenance load.
Change-Id: Ifbf20bf8554ecf7970a6279a2b783b1c58fac6e4
Find the most common segment index among all 16x16 blocks in a
64x64 block and use that as the 64x64 block level decision.
Change-Id: I67e85869d9fee0fc05450928f1eeaebe511cab6a
Separate the k-means clustering stage and the segmentation parse
stage to save unnecessary steps in a common function.
Change-Id: I60083e1d970e744f9a64112f856892d450f86669
VP8 and VP9 have different padding on buffer stride.
VP8 microblock is 16x16 so the buffer stride needs to be divisible by
16. Thus UV buffer stride is divisible by 8.
VP9 microblock is 8x8 so the buffer stride is only extended to be
divisible by 8. Then UV buffer stride isn't divisible by 8.
Change-Id: I6fa953feb951f2fb2e48f72a623786b85e23822f
Keep loopfilter on, and use half-pel instead of full.
This reduces big quality gap between the speed 8 and 7,
but still keeps speed 8 about 30-40% faster than speed 7.
Tested on screenshare clips with scroll and slide changes.
Change-Id: Id63b44f59655f3e3dc1b49d89291d97e7323081a
The force smooth_filter should only be used
for noisy content, so for now keep it off and
add TODO. Also fix/adjust low-resoln condition
and threshold in cyclic refresh.
Change-Id: I6c456dc9f23daabba20badd65a2f7ee6c5e259c4
On scene/content changes for base layer of screen:
reduce 32x32 split threshold, bias rdcost for flat
blocks if sse_y is non-zero, and avoid early exit on
intra-check.
Reduce artifacts in scroll content.
Change-Id: I144357a61462351173af900e0b8a47dac4aad6ca
Normalize the Wiener variance calculation for stack ranking. Remove
potential dependency on blocks at frame boundary.
Change-Id: I37e8634d714a1c34e99f9f7c4f1bb6ea81d56112
Improve detection of key frames especially in low contrast
and low motion regions.
This patch adds a function to the key frame detection to test
for specific patterns in the intra signal in the first pass stats
that tend to be indicative of a key frame.
This is intended to compliment the existing code and finds some
scene cuts that were previously being misssed.
Tested on two clips where the existing code was struggling to
identify the key frames this patch improved detection as follows.
Film clip 1: (detected / actual)
Old (2/5) New (5/5)
Film Clip 2
Old 4/11 and one false +, New 7/11 and 1 false +.
Short 4K Film Scene
Old 1/2 New 2/2
In testing so far I have not seen many extra false +'s though
it is likely that there will be some cases and this may need
further tweaking.
One one of our longer form film test reels ~20k frames)
the change picked up around 35 key frames that were
previously missed, mainly in darker scenes. There were a few
extra (or different) false positives cause by bright flashes or
explosions but these were cases where there was little
difference between inter and intra coding.
Awaiting testing on standard sets.
Change-Id: I1ff4a587e0a47667eb93b197f39b79a1130faeca
Abstract the control outside rd_pick_partition function. No need
to switch between x->cb_rdmult and the cpi->rd.RDMULT here.
Change-Id: Ia3104ebe15b5e59a4f29ffe6e8c7d718ecb998a8
Adapt the quantization to provide higher quality at smooth regions
where the Wiener variance is smaller.
Change-Id: Ibfd594d1de2ba34d2440d0aa7991b0fdac057ea5
Force smooth_interpol filter for low resolutions at high Q,
avoid the loopfilter strength reduction for similar conditon,
and reduce thresh_motion for cyclic refresh turnoff.
Change-Id: I4e9121d1cdc7d1b04992c741dc4f0cec281592f7
boundary_ls[j] is the upper bound of data centered at ctr_ls[j]
Add vp9_get_group_idx() for computing group_idx
Change-Id: I3b1b488edf8acbfb63c469eeeba15f3e42b0a645
Explicitly compare the block location against tile coordinate to
decide if intra prediction boundary is available. No coding stats
will be changed by this refactoring.
Change-Id: I80b3a131366bb2c5f8ea53a139ed6e9b0b7ddb68
For screen content real-time mode: don't check TM
intra for bsize >= BLOCK_32X32.
Small speedup and avoid some artifacts seen
in scrolling screen content.
Change-Id: I72d7731eeb6ac9ee96e65af522c1a9aabb6dc4ef
Set the lst/gld/alt_fb_idx and refresh flags for
key frames at the start of encoding (in svc_set_params).
This then avoids new code/function in update_references()
and in copy_flags_ref_update().
Change-Id: Id3503c0c628540c20f11a540c118c4ee4cf04848
In case median filter returns 0 value, bypass the Wiener filter
stage. This avoids potential divided by 0 case.
In the same place use a temp variable to take the Wiener filter
output instead of returning to the coeff array.
Change-Id: I45f57c515b4062a0aa1f312eda852462cb655d8e
Normalize the block level Wiener variance based decision according
to the frame level Wiener variance.
Change-Id: Ic2bdf1b322a65661775541dd6c174ba71579461a
This commit introduces a Wiener variance term. For each block in
the source frame, we first estimate its film grain noise level
using median filter in the transform domain. Each transform
coefficient is then processed using Wiener filter to account for
the impact on the energy level due to film grain noise. The result
leads to a second moment of the denoised signal.
Change-Id: Ibce7cb1b0cb8fe1aba807d95289712271d576948
The biggest offender in terms of preventing retention of film
grain in high rate film content is the use of DC-PRED mode.
Some of the directional modes whilst not strictly preserving
grain do better at at least preserving some texture.
This change blocks the early breakout of the rd loop based
on the reference frame giving the best result so far. In practice,
unless DC-PRED was chosen as the best mode so far, the other
directional intra modes would not even be considered.
As the film grain mode also tends to bias against DC-PRED (or
intra in general) this was pretty much blocking all use of directional
intra modes.
The patch also allows for a broader spectrum of DC modes at the
16x16 transform level than previously.
Change-Id: I860b7726ea9f5fcbb3ec1a90edbdd8cade2e8b28
Adds a variable to GF group structure to store a noise
energy metric for the current group that can be used
in things like film grain retention code.
Change-Id: I81b07630d3242f7928110f19a6c1ed4c86125f05
Don't force skip of zero-golden reference when
zero_temp_sad_source = 0, as it be may the
inter-layer reference. And remove the flatness conditon
when superblock is static.
Change-Id: I6b4b6eac0f6a2abc862c23d0e5467c7cf61995ef
zero_temp_sad_source is only computed when
compute_source_sad_onepass and sf->use_source_sad are
on, which currently is only for the top layer of the
layered encoding. So qualify the usage of
zero_temp_sad_source on those flags.
This affects the quality/speed of the lower layers of
screen content mode when SVC (quality layers) are used.
Change-Id: I54167265a05a4b918ce015931375aa42d3e75cf5
Force all upper spatial layers to be key frame if the base layer is key.
Mode only works for inter-layer pred=off and non-flexible mode.
Add flag to write out bitstream for each spatial layer in example
encoder.
Change-Id: I5db4543cf8697544ae49464f2157e692640d5256
For nonrd-pickmode in screen content mode: modify logic for
inter and intra mode check for spatially flat blocks.
Condition skip of non-zero/zero inter mode based on
zero_temp_sad_source, and force intra/DC check regardless.
Reduces artifacts in scrolling motion.
Change-Id: Iee75cd19d03296afeb649c5bce628806103769ae
If golden referene is selected as long-term reference,
bias the denoiser filter to use last reference.
Fixes visual artifact.
And reduce the thresh_svc_golden, which was used
to reduce the artifact occurrence.
Change-Id: I08f24160ca11bd8f5f70acaefe989d5f92988132
clang treats -Wmissing-declarations differently than gcc. This
provides similar coverage for clang.
Fix vpx_clear_system_state() warning on 32bit builds:
note: this declaration is not a prototype; add 'void' to make it a
prototype for a zero-parameter function
Change-Id: I5a424bc38d47c0a3dc751d65c1efea5733907785
Compared to speed 8 for low resolutions:
quality loss is ~8-10%, and encoder fps is ~15%
higher on ARM for 1 thread.
Change-Id: I4f12390d2917a5c4045114ef81a05edb2a3b9c96
The SSE4_1 version of temporal filter does not distinguish between bd 10
and bd 12.
Speed up:
Function Level:
| !SS_X | SS_X
!SS_Y | 6.44X | 6.37X
SS_Y | 6.56X | 6.63X
Video Level:
2.5% speed up on basketballpass_240p over 150 frames on speed 1,
bitdepth 10, auto-alt-ref=1
BUG=webm:1591
Change-Id: I49aa2ed4acfe80a8d627038322de66cbe691296e
In variance partition for screen content mode:
force split to 32x32 if source pre-process detects
non-zero temporal sad.
Reduce artifacts in scroll motion content.
Change-Id: Ifbe2b500eb03ae853faa28a045ce4f1185443939
Increase threshold to detect frames with high
num of motion blocks, and fix conditions to detect
horiz & vert scroll and avoid split below 16x16 blocks
in variance partition.
Reduces artifacts in horizonal scroll screenshare testing.
Change-Id: Icf5b87f69971d7331c660fc2727c9246c6cbf8b5
For nonrd-pickmode CBR mode: reduce the skip
golden ref thresholds, to reduce some psnr
regression in some clips, while still effectively
reducing flashing block artifact occurrence.
Change-Id: I468dcf5354411aeb54ac3ef56c6fb73267d93fde
This patch increases the preference for maintaining similar variance
between source and reconstruction and thus helps improve film grain
retention.
The changes are only active when film mode is selected
Change-Id: I3bc082dca678a0f32ec00f30f5d90d0f95ca2381
Change to the default RD multiplier computation in set_segment_rdmult()
The default here is wrong as for modes like AQ 1 setting the rdmult based on the
segment ID for bsize will tend to result in the RD loop favoring partition sizes where
the resulting segment assignment has the lowest Q, as these partition sizes will be
then evaluated with a lower value of rdmult. For a valid rd comparison between
partition sizes within a single SB64 we need to use the same value of rdmult.
This change fixes an observed issue with AQ 1 where almost all the blocks were being
assigned to segment 0.
Change-Id: Ibf87e8ca60bca45b8fee866ac6fd53feae11dab4
Change to init/reset level of the denoiser from
kDenLow to kDenMedium, and the init noise level to kLow.
This affects the denoiser level during the initialization
stage of the noise estimation.
Improves denoising for noisy content during init stage of
noise estimation, with little effect for low noise/clean content.
Change-Id: I247a17b0f01f646fc2e91a4a070ad69bdb788cae
In non-rd pickmode for screen content:
this logic to reset segment should only be for cyclic_refresh
mode on, so add that condition explicitly.
There may be other uses of segments, like ROI, so we
should condition this reset logic on cyclic_refresh,
as it was intended for that mode only.
Change-Id: I954e6cee968fbca35b34286c4a7ca2531c8e9823
For real-time CBR mode: golden reference mode testing is
skipped under certain conditons based on sse of zero-last mode.
This was done for svc mode. Here we add similar condition
for non-svc/1 layer encoding.
Reduces flashing block artifacts that can occur in background
areas with noise.
Change-Id: I93f71ea9507af8c9153fc6c0ba7dcc7a0fa8810d
For CBR mode: clamp the Q to worst/best quality in
adjust_q_cbr().
Under certain conditions, when the worst/best quality is
suddenly changed by a large amount mid-stream, the Q
adjustment from the final Q from adjust_q_cbr may not respect
the worst/best qualiy limits.
Change-Id: I3776129325d89882d422b22e6247d44660dd90ac
When compiled for High Bitdepth SSE can overflow 32-bit unsigned
integer, so change it to 64 bit. Also fixing unit/int mismatch of sum
BUG=webm:1601
Change-Id: Ib576ed1d5579b0c2b4661058aa64119560b652bf
Lower the frame_motion and consec_zeromv thresholds,
this make the noise estimation and denoiser have more effect
on noisy clips.
Change-Id: I49cf5d78a04d00fcf8538bee6f3b2980efe6b3b5
~10% speed up with no quality change for speed 8.
7% quality gain for speed 9 with no speed change.
Change-Id: I7eaaa4b82f7b082c9b15aa1d7624765ecc5082e7
The visited is not set to 1 after an item is pushed into the heap.
This may cause one item being pushed into the heap multiple
times, which may incur buffer overflow and memory corruption.
Change-Id: I443f1e5693856bb4066542403f98492d4daec69d
Add last_q[] to layer context, and add limit on
Q change from previous layer/frame. For now put
hard limit of 12 for decrease.
For 1 pass CBR sreen content mode.
Change-Id: Ifb972c9b6831440c80b1cb07a054c577ece930ec
Also write it to opsnr.stt when internal stats is enabled.
Removed some redundant code in vpxenc.c and vp9cx_set_ref.c
Change-Id: I3700137fff0be92a23e4ab75713db72da1dc4076
Write height and width of top layer to ivf header in SVC.
vpxdec Can't decode it correctly when output is y4m.
Change-Id: I9b2f1d54696611a30e252bdfd182897d191d92b5
Remove the block variance and skip flags from the input features. They
do not seem to reduce the average loss of the model. Also decrease the
number of hidden nodes. The model size is reduced significantly.
Compression quality and speed are both neutral.
Change-Id: Ic62f73c4f4c0a3148285f575747f0423ff568c64
VP9 encoder still inserts key frame periodically when VPX_KF_DISABLED is
set in non SVC for 1-pass CBR.
BUG=webm:1592
Change-Id: Ie99d7c5b95230d739e263a2d87879693c53f620e
Include the sizes of the above and left partition block as additional
features.
This affects speed 0 and 1.
Compression change is almost neutral(about 0.03% on average).
Average encoding speedup is 3~6% depending on QP and resolution.
Change-Id: I8bddfadf6072ae757c124da0819302850d8c6fe7
For drop due to large overshoot feature (in 1 pass CBR):
add additional condition that current prediction error
is larger than that of last encoded frame. This make the
drop due to sudden overshoot more robust, and improves
rate convergence for steady hard content.
Change-Id: If20027d26b4dcd290e4f788ae8e2760d95b536a5
Instead of calling get_vpx_decoder_by_name(), derive
decoder interface directly.
This will avoid dependecy on tools_common and hence any potential
updates needed to build fuzzer, when tools_common uses functions
defined in a different file
With this dependency removed, fuzzer no longer needs to enable examples
when building vpx_dec_fuzzer binaries
Change-Id: I05753edf041b4bc742a6dc06e809a8a2929d379f
For screen-content mode, with aq-mode=3: increase the
qp thresh for disabling the cyclic refresh.
Improves bitrate convergence for content that has been
static for long period.
Change-Id: Ica63a741402923a611ab1b86c0900f75d2d5f941
For non-rd pickmode: include H and V intra mode check for
spatially flat blocks when the sf->short_circuit_flat_blocks
speed feature is set.
Small improvement on screen content tests.
Change-Id: I3391d02cce6a46160be6ccc8a1e33fd8547eb467
When coding a GF only group it makes more sense to scan forward
from the GF to choose the boost level rather than backwards from
the end of the group towards the GF.
In practice we do not often code GF only groups in normal 2 pass
encodes and when we do the video is usually almost static which means
the direction does not matter much. However, a forward scan makes
more sense and is how things used to work before we started using
arfs most of the time.
Change-Id: I64a5a731ff579c8af86d8a6718830d426b16a755
When encoding at high bitrates, integer overflow
occurs in the the calculation of bits allocated
for layered ARF frames.
Change-Id: I94ad9eea759367a222235a3b5d1c777578dc6ba9
The mv_cost contains mv_mode cost and mv_diff cost.
The mv_mode cost is inferred from default_inter_mode_probs.
The mv_diff cost is estimated used the log2 function.
Change-Id: I62702bdb5c3fec018e3302765f5dd749fceebc12
This changes the highbd version of temporal filter to information from
both luma and chroma planes.
Performance:
AVG_PSNR | OVR_PSNR | SSIM
-0.144% | -0.165% | -0.150%
The performance is evaluated on lowres_bd10.
Change-Id: I89d1bd46cd60c26d658b6a53aa63835e90d8e291
For screen content mode: always force intra check
for spatially flat blocks that have moved. Also
adjust/fix condition for forcing check of
zeromv-golden for quality layers.
Reduces artifacts in screensharing tests.
Change-Id: Iafd62fb24a4e05f5b12af663dde2805fdb4c7b36
Modify early breakout condition for non-rd pickmode
for quality layers: when lower layer has lower QP force
test of zeromv on golden (lower layer reference) before
breakout due to skip.
Reduce artifacts, observed in cases of scrolling content.
Change-Id: Id834b1eb024a4c97f0e74d8b7f7a0351459e088f
In the ML based partition search speed feature, use MV result of
previous simple motion search as the starting point for the next one.
Compression change is neutral; encoding speed becomes slightly faster.
Change-Id: Iea554f28f7966fc5b5857e12b06de58e3fa312a6
This reverts commit a4d2f59b69.
Reason for revert: Re-enables SSE4_1 version of apply temporal filter now that the mismatch is fixed in fa540837aa,
Original change's description:
> Revert "Enable SSE4 version of apply temporal filter"
>
> This reverts commit 4f3cd48bfe.
>
> Reason for revert: Found a mismatch with c version
>
> Original change's description:
> > Enable SSE4 version of apply temporal filter
> >
> > Evaluating on 5 midres clips with 4 bitrates over 30 frames on speed 1
> > auto_alt_ref=1, there is a speed up of 1.660%.
> >
> > BUG=webm:1591
> >
> > Change-Id: Idbda58548679e6f7b8fc0d7f6144f7be057ef690
>
> TBR=yunqingwang@google.com,builds@webmproject.org,chiyotsai@google.com
>
> Change-Id: Ibca973576d72d6db4b647a08aef23389d5d6605a
> No-Presubmit: true
> No-Tree-Checks: true
> No-Try: true
> Bug: webm:1591
TBR=yunqingwang@google.com,builds@webmproject.org,chiyotsai@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: webm:1591
Change-Id: I26effdbaf4d52e4650c263b6ed9d3d80e505f5cb
This reverts commit 4f3cd48bfe.
Reason for revert: Found a mismatch with c version
Original change's description:
> Enable SSE4 version of apply temporal filter
>
> Evaluating on 5 midres clips with 4 bitrates over 30 frames on speed 1
> auto_alt_ref=1, there is a speed up of 1.660%.
>
> BUG=webm:1591
>
> Change-Id: Idbda58548679e6f7b8fc0d7f6144f7be057ef690
TBR=yunqingwang@google.com,builds@webmproject.org,chiyotsai@google.com
Change-Id: Ibca973576d72d6db4b647a08aef23389d5d6605a
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: webm:1591
Adjustments to the calculation and use of a noise estimate in
the first pass Q estimate and adaptation of temporal filtering.
This change was tested and gave gains for both auto-alt-ref=1
and auto-alt-ref=6 as follows:
Results are Av PSNR, Overall PSNR, SSIM and PSNR-HVS
auto-alt-ref=1
low_res 0.007, -0.042, -0.018, 0.074
mid_res -0.142, -0.239, -0.173, -0.129
hd_res -0.322, -0.405, -0.397, -0.367
NF_2K -0.058, -0.099, -0.201, 0.028
auto-alt-ref=6
low_res -0.058, -0.171, -0.188, -0.027
mide_res -0.149, -0.155, -0.171, -0.137
hd_res -0.252, -0.339, -0.259, -0.297
NF_2K -0.015, -0.068, -0.120, 0.092
In all sets there were some winners and losers but significantly
more winners. The biggest change was Stockholm in the
hd set with an improvement of 5-6%
Change-Id: Ieec71e1c4e3e09b76c288efa7b4d1b00015b3a11
Evaluating on 5 midres clips with 4 bitrates over 30 frames on speed 1
auto_alt_ref=1, there is a speed up of 1.660%.
BUG=webm:1591
Change-Id: Idbda58548679e6f7b8fc0d7f6144f7be057ef690
This adds a preliminary version of vp9_apply_temporal_filter in SSE4.1.
This patch merely adds the function and does not enable it yet.
Speed Up:
| ss_x=1 | ss_x=0 |
ss_y=1 | 19.80X | 19.04X |
ss_y=0 | 21.09X | 20.21X |
BUG=webm:1591
Change-Id: If590f1ccf1d0c6c3b47410541d54f2ce37d8305b
This function compute the rd cost for each mv_mode and return the
one with minimum rd cost.
eval_mv_mode()
Evaluate the rd cost for a given mv_mode.
Change-Id: Ia1b3ec7e1dd538e443e1bc79f2cab352408cd0a0
This is useful for catching functions which should be static and
instances where the relevant rtcd file was not included.
BUG=webm:1584
Change-Id: Ied395847a664eedce59e8ed5180bd16d059ab0ac
Given an mv_mode, this function will return the corresponding mv.
find_ref_mv()
A helper function finds the nearest and near mvs from the neighbor
blocks.
select_mv_arr[]
An array used for storing selected motion vectors.
Change-Id: Ibeb434007f65b2c6e461360f208d99455e76bcbf
This function sets src and pre buffer of MACROBLOCK
and MACROBLOCKD.
Will add static decorator once this function is called.
Change-Id: I0fb46784dd97839e4d87c9e027fe8c59683e70d8
If WindowsTargetPlatformVersion is not set, the Visual Studio 15 (2017)
toolchain assumes that Windows 8.1 is being targeted. Since ARM64
support is only present and unlocked in Windows SDKs >= Windows 10 1809,
set that SDK as required in the vcxproj files.
Note that this will not be an issue in Visual Studio 16 or greater,
hence the -eq major version check.
https://developercommunity.visualstudio.com/content/problem/128836/windowstargetplatformversion-to-use-the-latest-ava.html
Bug: chromium:893460
Change-Id: Ib069501ad384d91349b1f635722dedd31a4edd97
This commit introduces a new speed feature that determines the
SEARCH_METHOD used by temporal filter when doing 16x16 block on
full_pixel_motion_search. On speed 0, the most exhaustive method MESH is
used. On speed 1 and above, a faster method NSTEP is used.
Performance:
| AVG_PSNR | AVG_SPDUP | AVG_SPDUP:AVG_PSNR
MISRES | 0.007% | 2.818% | 402:1
HDRES | 0.004% | 4.897% | 1224:1
In the case of midres, there is a small quality gain of -0.021% on
OVR_PSNR.
Performance measurement is done on speed 1 with auto_alt_ref=1.
Quality is measured on full midres set over 60 frames. Speed is measured
on 5 midres clips over 4 bitrates over 30 frames.
STATS_CHANGED
Change-Id: Ic1879d2237f8734529e194767a6cf5e43e20b47b
The current unit tests for temporal filtering only tests single
channel version of temporal filter. Since VP9 currently uses both luma
and chroma channel information for temporal filtering on low bitdepth,
there is no unit case in this scenario.
This commit adds some basic unit tests to facilitate further development
on temporal filtering.
BUG=webm:1591
Change-Id: Id38ceba5305865d7148e9b2bc636acddae54d6c2
For svc with frame dropping in full_superframe_drop or
constrained dropped mode: the buffer level for a given layer
may be capped from increasing too much. This is because that layer
may be dropped even though its buffer is stable (the dropped is forced
due to underflow in other layers in full/constrained svc-drop mode).
This capping is needed to prevent decrease in qp over consecutive
frame drops.
The capping already exists and has been used, but this change
introduce an error that prevented its usage:
https://chromium-review.googlesource.com/c/webm/libvpx/+/1330875
The fix here is to also cap the bits_off_target as well, since after
the change mentioned above, its the bits_off_target that is used to
update buffer on next frame (which in turn affects qp for next frame/layer).
Change-Id: Ifdab5d478e91cce20ecec51faa574eed375ee36b
Reduces the number of rows calculated for 2D 4-tap interpolation filter
from h+7 rows to h+3 rows.
Also fixes a bug in the avx2 function for 4-tap filters where the last
row is computed incorrectly.
Performance:
| Baseline | Result | Pct Gain |
bitdepth lo| 4.00 fps | 4.02 fps | 0.5% |
bitdepth 10| 1.90 fps | 1.91 fps | 0.5% |
The performance is evaluated on speed 1 on jets.y4m br 500 over 100
frames.
No BDBR loss is observed.
Change-Id: I90b0d4d697319b7bba599f03c5dc01abd85d13b1
After encoding key frame on base spatial layer,
if the overshoot is significant, reset the
avg_frame_qindex[INTER] on base spatial layer for
all temporal layers.
This forces the active_worst_quality to increase
on subsequent frames/layers and reduces frame dropping.
Change-Id: I53a3cd14131d69120e59a649b7ed1bfde3e940ee
Used 20-frame clips got from Deb in end-to-end unit tests to improve
the test coverage.
TODO: remove 10-frame clips.
Change-Id: I06ec2d35f5c5c47263d3be61623c80f52fd18ffe
When CONFIG_VP9_HIGHBITDEPTH is enabled,
lowbd modules were called in the hbd path.
This patch fixes the issue.
(cherry picked from commit 797ec1cd66)
BUG=webm:1589
Change-Id: I1caf701514dbf80eb75b953f40b1e7238f265a2c
ybf->buffer_alloc and ybf->buffer_alloc_sz should ideally be kept in
sync. If ybf->buffer_alloc is reset to NULL after being freed, then
ybf->buffer_alloc_sz should be reset to 0.
Change-Id: I7e7566b563ddf145d0e46050c5b6bd141084f8b3
In test/external_frame_buffer_test.cc, rename CheckXImageFrameBuffer()
to CheckXImageFrameBuffer().
Change-Id: Ifea3910445673be465d7536a69f85f1a2e2bce6e
this resolves some msan errors.
the same change was done in libaom:
5ab58722c Add missing initializations of HBD buffers
Change-Id: I8882af45b95c90ba43bf138c7d305a6c3b99e61c
When CONFIG_VP9_HIGHBITDEPTH is enabled,
lowbd modules were called in the hbd path.
This patch fixes the issue.
Change-Id: I59820180fbed120697b6ef1fc1a02be0d35ac1d5
Instead of creating a new decoder instance when restarting all threads
after they were shut down, re-create threads on the new flag.
BUG=webm:1577
(cherry picked from commit 7be8d2df6c)
Change-Id: I80211d47e8d4beaa361416b58e99dd65d8da39c4
Instead of creating a new decoder instance when restarting all threads
after they were shut down, re-create threads on the new flag.
BUG=webm:1577
Change-Id: I6272ecaa1b586afdaa5ed8d6eab80aff8f5eb673
vp8_norm table has 256 elements while index to it can be higher on
fuzzed data. Typecasting it to unsigned char will ensure valid range and
will trigger proper error later. Also declaring "shift" as unsigned char to
avoid UB sanitizer warning
BUG=b/122373286,b/122373822,b/122371119
Change-Id: I3cef1d07f107f061b1504976a405fa0865afe9f5
Values in [q]coeff1 were not correctly stored. This caused a segfault
in the sse2 libvpx__nightly_optimization jobs.
Broken in:
commit 85032bac38
Author: Johann <johannkoenig@google.com>
Date: Fri Dec 21 00:27:00 2018 +0000
fdct_quant: resolve missing declarations
BUG=webm:1584
Change-Id: I5f5fad34ec5e32023f5b40ff3691125754c11ced
Before this patch, if mi_col_end was odd, then the for loop for 'mb_col'
was looping once LESS than it should have been.
For example, if mi_col_end = 47, then the loop was terminating when
mb_col == 23. However, the correct behavior would be to terminate when
mb_col == 24.
The issue was introduced in:
https://chromium-review.googlesource.com/c/webm/libvpx/+/423279
This can lead to many of the stats being inaccurate, for such videos
(with mi_col_start/end having an odd value).
As an example:
Even for very static content, fp_acc_data->intercount can never reach the
same value as num_mbs. And in turn, pcnt_inter can never reach the value 1
(that is, 100%). This would lead to very static videos NOT being marked
static, and encoded like regular videos.
Note: this is just one possible effect based on observation. Other
issues are also possible based on other stats.
Improvement on some test clips:
-------------------------------
- One test clip saw a gain of -2.580% in VBR mode (and -3.153% in Q
mode). The reason for improvement: a wrongly detected scene cut was
avoided due to corrected value in 'this_frame->pcnt_inter'.
- Some very static clips correctly marked as having 100% zero motion.
This avoided addition of unncecessary alt-refs, thereby reducing the
bitrate.
BDRate (PSNR) on regular sets (VBR mode):
-----------------------------------------
lowres: 0.0
midres: -0.027 (some clips were better/worse, but I double checked that
changes were as expected, given correction in stats calculation).
hdres: 0.0
STATS_CHANGED for the types of videos described above.
Change-Id: Ifbc2c0c0815d23ec4015475680bdf8886f158dcc
Add full_pixel_exhaustive_new() and exhuastive_mesh_search_new().
The two functions are variants from full_pixel_exhaustive() and
exhuastive_mesh_search().
In the new versions, we use mv inconsistency in place of
mv entropy cost.
Change-Id: Icec98e6fae24f2771806a3e78276734624ec0303
The smallest block size of motion field is 4x4, but the mi_unit
size is 8x8, therefore the number of units of motion field is
"mi_rows * mi_cols * 4".
Change-Id: I95292904d757705d39b78af5d0cf2d25f376c642
Use variable block sizes in temporal filtering. Based on prediction
errors of 32x32 or 16x16 blocks, choose the block size adaptively.
This improves the coding performance, especially for HD resolutions.
Speed 1 borg test result:
avg_psnr: ovr_psnr: ssim:
lowres: -0.090 -0.075 -0.112
midres: -0.120 -0.107 -0.168
hdres: -0.506 -0.512 -0.547
Change-Id: I8f774e29ecb2e0dd372b32b60c32d8fa30c013a8
This reverts commit 02b3ef7fae.
Reason for revert: fails to build under visual studio
Original change's description:
> Add Tile-SB-Row based Multi-threading in Decoder
>
> Add the multi-thread function that decodes a video row by row instead
> of a tile at a time. Create a job queue for queueing all parse and recon jobs.
> Each SB row of a tile is a job.
>
> Performance Improvement:
>
> Platform Resolution 3 Threads 4 Threads
> ARM 720p 36.81% 18.37%
> 1080p 32.27% 14.76%
>
> ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
>
> Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
BUG=webm:1587
Change-Id: Ia4c8f5128922a205cd9fd83aaef8a2e73764d4a7
This CL allows to limit memory consumption of the frame buffer pool. As
the result if compiled with VPX_MAX_ALLOCABLE_MEMORY set codec will fail
if frame resolution requires more memory
This is backported CL aae2183cb58b60d01b8e4e15269ee9f48dd72908 from
aomedia
Tested:
configure --extra-cflags="-DVPX_MAX_ALLOCABLE_MEMORY=536870912"
make
./test_libvpx
BUG=webm:1579
Change-Id: Ic62213b600a7562917d5a339a344ad8db4b6f481
vpx_asm_stubs.c only references these sse2 functions. Combine the files
similar to the way the ssse3/avx2 files are set up.
Mark the intrinsics as static because they are only used within the
macros here. It is unfortunate that the assembly functions can not be
marked static as well.
BUG=webm:1584
Change-Id: I342687a1046ae6ca46ae58644a7c170440de1dfb
The optimizations were accidentally disabled during the move from vp9
commit c3bdffb0a5
author Johann <johannkoenig@google.com> Fri May 15 18:52:03 2015
Move variance functions to vpx_dsp
subpel functions will be moved in another patch.
BUG=webm:1584
Change-Id: Ia7899ee0cfad13a0e1516b89756552064846e81c
this implementation does not scale well beyond that. this restores the
performance in v1.7.0.
BUG=webm:1574
Change-Id: I8f3464cfe871988fa06ebefe9954811fd002584e
The special case was put in to prevent a lossless test failure, the
issue has been dealt with by a recent fix of skip condition in
lossless mode.
Change-Id: Ia25d2bf6beead2208841b4f012171dffac15f411
most predate 1.4.0 the DBG enums were deprecated in 1.6.1. VPX_KF_FIXED
is left as it's still fairly widely used
BUG=webm:1573
Change-Id: Iacaad28a6fe7251f042a2b45507b00fc5b7a0eac
We only need to shift in the encoder when the input bit depth
does not equal to the encoder internal bit depth.
Change-Id: If9af62382ac6824f33dc7dcdd3d3ff7802b92e9a
Use same step_param for all spatial layers for now.
Some improvement in quality on scrolling for spatial
enhancement layer.
Change-Id: Ic9eed8ba5dd44493e9f5e81f6115df2a25825d16
Add the multi-thread function that decodes a video row by row instead
of a tile at a time. Create a job queue for queueing all parse and recon jobs.
Each SB row of a tile is a job.
Performance Improvement:
Platform Resolution 3 Threads 4 Threads
ARM 720p 36.81% 18.37%
1080p 32.27% 14.76%
ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
Move it to deeper stages where all the encoder configurations have
been set. This avoids the encoding failure when the buffer is
allocated before the encoder is fully configured.
Change-Id: I6723966fd2c7c36fbab9a92d1f3bd59c83ed95f0
Remove the "spatial_layer_id == 0" condition in
the speed features for setting the motion search
for screen content.
Change-Id: Ib47aea3af5f3b2e04226694b4126b2ae2f458f13
The breakout speed feature is currently only used by the non-rd
mode search path. Localize it to simplify set_offset() logic.
Change-Id: I27e7519c987a7caac2e4bd6be0ede1b9c8320e55
For non-base spatial layer in screen-content mode:
use nstep but with larger step_param value than sl0,
to avoid increase in encode_time.
Some improvement on scrolling slides content.
Change-Id: Ica918ac01664431d1fabb3c674d857cf6ad87414
Define the rc->high_num_blocks_with_motion, set in the
scene change analysis, to be defined per superframe.
This is used for increasing motion search area on
some (super)frames, e.g., for scrolling.
Also some code cleanup in rt_speed_feature_.
No change in behavior.
Change-Id: I1a5c04b9cd4aef1723ce42f82e981a2ca15c8b9d
Similar issue to 842265.
The pointer in vp8 postproc refers to show_frame_mi which is only
updated on show frame. However, when there is a no-show frame which also
changes the size (thus new frame buffers allocated), show_frame_mi is
not updated with new frame buffer memory.
Change the pointer in postproc to mi which is always updated.
BUG=913246
Change-Id: I5159ba7134a06db472c29a1d84b8d39bb60c7254
Increase the initial frame number threshold
for the mfqe, as using the running average of
last_base_qindex doesn't work well after very
first frame.
Only affects the very first few frames.
Fixes an issue with a test.
Change-Id: Ia249924257b44263e0b9f43cbff473902f08e28c
On scene/slide change detected on TL > 0 frame, only
reset the temporal layer pattern for flexible/bypass mode.
Change-Id: Ib848778addc10ef6981b92839af397833fd4a908
The control has been exposed to the vpxenc input parameter. Remove
the internal hard coded control that disables it at speed 1 and
above settings.
Change-Id: Ib17772cb895f24da5a7d0487e748cc1a9c6740b3
Remove the unused *_DEBUG_* enum values in vpx/vp8.h
This fixes issue with enabling MFQE, which was
caused in 4807f15, where the unused DEBUG flags
were removed from common/ppflags.h but not in vp8.h.
BUG=913246
Change-Id: I47f114ef20adc084cb4883add5ac3ebf58ae9f1d
Undamped adjustment is used for the first frame
of each frame type while updating the rate
correction factors.
Change-Id: I42f80daa123c4cd4e45c18c6960cc7a67e7df7e6
make the parameter constant to match the base class and mark the
function virtual. virtual is used to match the rest of the code base,
but now that c++11 is required all such functions could be changed to
override.
since:
bb3a82ec3 vp9 svc: add test for scaling partition on 1080p crash.
Change-Id: I4717f0116a231ea954b34da9cfec69c462c21699
Put test classes into svc_test namespace.
Make num_nonref_frames_ and mismatched_nframes private, as they're
computed by encoder/decoder hooks which shouldn't be modified outside
the class.
Add accessor to num_nonref_frames_.
Change-Id: I3836a45426796ba6a8c98dd31e21b5aec4b8abf4
make the parameter constant to match the base class and mark the
function virtual. virtual is used to match the rest of the code base,
but now that c++11 is required all such functions could be changed to
override.
Change-Id: I551a05bbd9d05a9eddb653f42eaad68880c88141
Search method and step parameter might be changed in speed settings.
In this case, we should update the search area offset due to the change
of search method.
Change-Id: I51dc584bbf35e998757da326355dd4b8a4d0093f
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr
c++11 has been required so <tuple> is safe to use
Change-Id: I873cb953104b361a8503b5839a3372ce2b99e73c
googletest builds cleanly with -Wextra
Remove comments about webm:1069. The vp8 issue is tracked in webm:1246.
Change-Id: I8bbb01d34503cc9c342f5c3aa78e9476f72b94c2
There was no setjmp on vpx_internal_error when there is no available
frame buffer, ready_for_new_data is not reset to 1.
BUG=webm:1571
Change-Id: I4f8efffb7d6fed3085b1f0229d0d1071a056b6c6
In single_motion_search, we use prev coded nb full mvs to compute
mv inconsistency.
lambda is set to block_area / 4.
This is a draft. Will to experiments to figure out the impact on
coding efficiency and visual quality.
Change-Id: Id10f72b3c7e6085bfbe1a6156b9fd6917843d001
Change the cross over point for switching between per pixel
and per block variance numbers when comparing reconstruction
and source complexity.
This improves the accuracy of the comparison for low variance
regions, For example, recon and source may both have an integer per
pixel variance of 1, but one of these may actually be be 1.01 and the
other 1.99.
The reason for using per pixel at all was because this number is already
available for the source block so does not need to be recomputed
here. Changing the threshold from >0 to >100 for using per pixel values
will thus cause a little extra work for some blocks.
With my default runs on derf and nf sets their is a net gain as follows:
(-ve = better, Overal PSNR, SSIm, PSNR-HVS)
derf low res -0.106, -0.107, -0.093
midres -0.000, -0.021, 0.001
hd res -0.198, -0.190, -0.282
nf2k -0.090, -0.088, -0.077
Change-Id: I53ef514fe1c35ee3f08c64e9b22fc05fc7fe5887
This commit introduces the optimized RDMult values for both key
and non-key frames. For key frames, the commit gets values back
from commit#b13f6154df9c0834d74f7e3d41e41c4208f56d18. For impact
on key frame only encodings, see commit message for that commit.
For inter frames, the values get optimzied by running encoding tests
in Q mode with the following range using 150 frames:
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62
The impact of current set of RDMULT values:
PSNR SSIM PSNR-HVS
lowres: -0.325% 0.422% -0.228%
midres: -0.377% 0.158% -0.376%
hdres: -0.309% 0.522% -0.322%
Test baseline is on commit#35617458
Overall, the values help PSNR based metrics, but hurt SSIM metric
slightly.
Change-Id: I7eba37a6524cb36b8498a1d104d2667781bc2089
since:
77fa51003 Replace deprecated scoped_ptr with unique_ptr
the unit tests require a c++11 capable compiler; future versions of
googletest (1.9.x) will as well, so this change was inevitable if we
wanted to keep the snapshot up to date.
Change-Id: Id5c646bd10fae09e7b705b7d5fad1344f2216282
This will allocate extra frame buffer if long term temporal reference is
used and denoiser is enabled on non-key frame.
Add test.
Change-Id: I0e8d1fdb9a2d697a8eed7fe6206bcb362e69f1c8
In first pass, scaled_low_intra_thresh should not be
compared with motion_error, as scaled_low_intra_thresh
accounts for bit-depth, whereas motion_error does not.
In addition, mv_cost is excluded for comparison.
Change-Id: Id2fa02d364c086876c71ffebb2dd763eaa647e4a
Postencode drop is only checked on base spatial
layers, and if set, whole superframe is dropped and
and next superframe is encoded at max-q.
Fix here is to make sure all layers are encoded at
max-q on a postencode dropped frame.
Change-Id: I2313d83ee29a382465bcec1085d8c73c37ce26d6
If scene/slide change is detected on current
superframe and max-q set because of high overshoot:
then if the lower/base spatial layer are skipped on
the current superframe, max-q is forced on the
next encoded base/lower spatial layers.
Change-Id: Id61efda86ee545395012e19476d19845e3932678
This fixes an issue where, in very rare error cases, one row of LPF
could be waiting infinitely for its previous row's LPF to complete.
With LPF optimization, the second row's LPF could be triggered before
the first row's LPF. In this case, the second row's LPF will wait for
LPF of n-sync number of SBs of the first row to finish. In error
streams, depending on when the error was detected, the LPF job of the
first row may then never be triggered. This puts the thread doing the
second row's LPF in an infinite wait.
The issue is reproduceable once in approximately 500 runs of the clip in
bug 1562.
BUG=webm:1562
Change-Id: I265d7df5ceeff0410334f5b9a4181f895bb54cab
Add functions that will do only parse or only recon. These are
duplicated and modified from decode_partition and decode_block.
Change-Id: I2201e235bf491e823ae63d27b2586bbb43b48929
This is still a work-in-process.
nb_full_mvs and lambda are set to zero for now, which means
mv inconsistency penalty is zero while doing the mv search.
Change-Id: I18680413d748fbdb9a33621f92f83e021036a3ab
Avoid passing in tpl_stats because this function will be called in
motion search, where tpl_stats should be fixed at the point.
Let further_steps becomes internal variable in the function.
Change-Id: Iebe380925eb1891c19e0b78163dab8e6bfafccdb
Fix condition for turning off denoiser due to high
motion: use proper superframe counter and
frames_since_key counter so this condition won't
take effect on key (super)frame.
Change-Id: Ic502bf5ebfa32a921f611a78e8e963eb62b5bc79
Last reference doesn't always exist when SVC layers changed dynamically.
When last is not a reference for current layer, copy block directly on
denoiser.
Change-Id: I9d98c4d6fdcfa25ba707db3333712761b5cf9ab8
Calculate the high bits of dqcoeff and store them appropriately in high
bit depth builds.
Low bit depth builds still do not pass. C truncates the results after
division. X86 only supports packing with saturation at this step.
BUG=webm:1448
Change-Id: Ic80def575136c7ca37edf18d21e26925b475da98
Calculate the high bits of dqcoeff in high bit depth builds and store
them appropriately.
BUG=webm:1448
Change-Id: I61a2f8bfcf2e30765f10a94073c4d58321d2fa24
Add check and reset (turn off) usage of long term
reference if some conditons (layer id of reference vs
current frame) are not met.
Change-Id: Ie3a84e3618f4fc4d5f8da4e67316cfbefb8bae78
This is a reland of 7d777ce613
Previous attempt was reverted due to build issues with older
versions of Visual Studio. We no longer support VS <= 13.
Original change's description:
> third_party/googletest: update to v1.8.1
>
> BUG=webm:1559
>
> Change-Id: I7a0b16c7bf3f97db2d8650a190b93aae7e12a948
Bug: webm:1559
Change-Id: I9cb39988286cc56125879222ef0bd952d61b7c1d
If the scale factor of the golden long term reference
is different from the last reference then disable usage
of long term reference.
This should not happen, but add this as a check against
some possibly incorrect update of the svc configuration.
Change-Id: Ic1062d4384e005007d8c922813fa8ad188d8fa98
When scaling up partition from lower resolution layer L, mi_row and
mi_col from L must be smaller than mi_rows and mi_cols from L.
Before this change, the condition was based on mi_rows from top layer
divided by 2, which is not necessarily equal to the mi_rows from lower
resolution layer.
Added variable in SVC structure to keep track of mi_rows and mi_cols
from each spatial layer.
Re-enable partition scaling for 1080p.
BUG=webm:1578
Change-Id: Icc1c701b095cfe0a92bfecca1ed39dbe21da12b6
If in constrained layer drop mode, avoid setting
skip flag if base layer is dropped, as whole superframe
will be dropped in this case. This avoids an assert trigger
in the svc superframe packing.
Change-Id: I51c953c7fee979790c65c798bac9bd3d805dc66f
In the limited test set, it improves the cq mode compression
performance by 1.9% in PSNR and 6% in SSIM as compared to use
same quantization parameter for all ARFs.
Change-Id: I35c4d7097b5838ab0b92d7f9937520721e3bb84b
Increase to 4 (from 2) on slide/scroll changes,
as there is an issue/failure with the current setting
with offline encode for high resolns.
Change-Id: I8f06c6bdcd59013ab000d75bd75770c667bf70d2
Reuse existing function for resetting temporal
layer pattern.
And fix to use first spatial layer to encode, and
some refactoring in encode_without_recode_loop().
Change-Id: Ifb22bb9de793ecb8e73f410e125c7c12383da1d2
vpx_realloc_frame_buffer() should validate the |border| parameter
earlier, before it allocates the buffer and preferrably before it uses
|border|.
This backports libaom commit 2860b3ae8b764bdfa2b8c7a06df2673e907b993f:
https://aomedia-review.googlesource.com/c/aom/+/74324
Change-Id: Ib9d59d74e27430ccb1e83c6ad5424aff9672c989
Condition the pre-encode buffer update based on
TS diff on temporal layers = 1 for now, as some
fix is needed for the case where #temporal_layers > 1.
Change-Id: I58163b956db415217e4687a31f8ba110545b09f5
Since the Windows SDK has an ARM32-only arm_neon.h, files including it
during ARM64 Windows builds need to be redirected to arm64_neon.h.
Instead of editing many files to include ARM64-Windows-specific ifdef
logic, this commit introduces an ARM64-Windows-specific version of
arm_neon.h that performs the needed redirection and lands earlier in
the header search path than the ARM32-only arm_neon.h.
Change-Id: Idc63947a238ca1bd0c479d8f4ad68950487947c6
Refactor the code with some changes.
Split update into two parts: move the fillup
(with per-frame-bandwidth) before the encoding, and
keep the leaking part (with encoded_frame_size) after
the encoding (postencode).
For SVC with ref_frame_config usage: allow usage of timestamp
delta for the fillup part of buffer, instead of the (average)
framerate passed in via the duration.
Moving the buffer fillup (+per-frame-bandwidth) part to the
pre-encode causes some difference in performance
(since buffer level affects active_worst/QPand frame-dropping),
but the change is observed to be small.
Made small adjustment to active_worst_quality to compensate.
Adjust some thresholds in datarate tests.
Change-Id: I81a5562367034f318cffd451304bc4a34bf02a1d
Windows builds can use msbuild.exe to build libvpx through a set of
generated Visual Studio project files. This commit adds awareness of
ARM64 Windows to this process by adding ARM64 configurations and
setting msbuild properties to consume the right SDK version.
Change-Id: I1bbc01cbe7be3d53c4e1af6cd96c6e4170aa4915
In order to correctly configure for Windows 10 on ARM, this change adds
a --target value arm64-win64-vs15 to ./configure and adds feature
enable/disable logic for the new platform.
This is merely sufficient for Chromium targeting ARM64 Windows.
Bug: 893460
Change-Id: I46194286f63104bdf6ac57d719fdf1e5d5fa72c8
The tpl model assumes a relative short stats buffer length. Hence
it is not ready to support GF-only GOP structure where the max
length can go up to 250. Disable tpl model in such setting to avoid
a rare encode failure in GF-only setting.
Change-Id: I3409dbb829a8105478876684ec21a2bd405c33c8
As thread count is now randomized, serial and threaded modes can be
combined to a single binary.
With this change, threads takes values between 1 to 64 and tests both
single thread and multi-thread variants of the decoders
Change-Id: I6dd2a3aa03bff9c0e2c126843b543d46892be696
Rework the recursive ARF allocation to avoid missing one frame's
type assignment issue in GF only GOP structure. This fixes a rare
encoder failure issue in GF only setting.
Change-Id: I3e41fe36d3cb954de25ffc058a42b2b8be0fcd7a
vpx_dec_fuzzer.cc can be built with clang++ to generate fuzzer binary
Build instructions are part of the file
Change-Id: I19ba0bd49b236e27b27e81a83f6de59f15bdc994
To compute the total budget for a depth layer, exclude the count of
frames that have been allocated the bit budget. This improves the
avg PSNR by 0.15% and overall PSNR by 0.25% for lowres and midres
test sets.
Change-Id: I5115e33e1422dc930179142cd29aeebe97425283
Simplify max value calculation on aarch64 by using vmaxv. Much
faster for 4x4 but diminishing returns as the block size grows.
Only the vp9 quantize has a speed test hooked up. Anticipate
similar results for the other quantize versions.
Before:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
After:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
Change-Id: Ic95812b3fdbd4e47b4dcb8ed46c68a9617de38d2
Recognizing that max dc_quant used in rdmult computation is 21387 and
21387 * 21387 * 88 / 24 is still within the range of int32_t, this
commit simplifies the computation with minor cleanups.
Change-Id: I2ac7e8315d103c0bb39b70c312c87c0fda47b4f9
Refactor define_gf_group_structure() to unify the single-layer,
multi-layer ARF, and GF only GOP structure setup.
Change-Id: Iebbe9c3742fc58ae4e77b1072ebecb3ee7bd26b2
Encode the next frame at max q.
For layers: post_encode_drop is only check on base
spatial layer, and if base is post-encoded-dropped,
then whole superframe is dropped.
Added API to guard postencode dropping. Turned off by default.
Added unittest.
BUG=b/112990050
Change-Id: I42fee279014aca616f7a4d9b582cb2bf5da2f2e7
Increase search area, use NSTEP, and in some cases avoid
bsize below 16x16. This for base spatial layer when many blocks
in the frame have motion (from scene detection analysis).
Improves quality for scrolling motion.
Change-Id: If77b43e738a6c43610d4727a95712667088db564
Address poor key frame detection in some content.
This patch improves on poor key frame / scene cut detection observed
with some test content. The content in question was letter boxed film
style material and also had quite low contrast. For both 1080P and 4K
multiple genuine scene cuts were being missed.
The changes alter the conditions for marking a transition as a "flash" rather
than a scene change. The new code still deals well with genuine flashes as
observed in the "crew" test clip, without falsely flagging some of the
the scene cuts in the "film" test clip.
The new film test clip also had some "flash" frames caused by a lightning
effect and in one case a flash occurred right before a scene change. This
caused a misplacement of the key frame but has been addressed by a new
clause that requires the coded error for the next frame after a candidate
key frame to be lower than the current frame.
The patch also changes the way in which neutral blocks (similar inter and
inter error) are handled in the candidate key frame decision in a way which
hopefully handles the letter boxed format better.
During wider testing some film clips still had missed key frames but this
patch does improve things. In the case of the initial test clip the encoder
correctly marks all 3 scene cuts vs 0 before the patch.
Testing with our standard (mainly short single kf) derf and NF test clips
is neutral.
Change-Id: I3b7dcfe7b2fb13fd0816ea46acc3e69c8bc581b3
When ref frame is INTRA_FRAME, pre buffer shouldn't be used.
This CL copies behavior in single thread. That should apply to
multithreading case too.
BUG=webm:1496
Change-Id: Ibe9ab8ea9dc664151fa7ebac529d5fd1a481b4a3
Track the effective maximum layer depth in a given group of
pictures. Keep it in the GF_GROUP data structure.
Change-Id: If777c4e0f4a871c7226a91e3871f445e92f18b24
Make it a standalone operation unit. Refactor to cut off unnecessary
dependency between define_gf_group_structure() and
allocate_gf_group_bits().
Change-Id: I954fd4e96152471a994f2ffd38a72061ab517ddd
Found with clang-tidy. This value is unused in libvpx.
There is an existing test which ensures this is not used:
test/encode_api_test.cc:
EXPECT_EQ(VPX_CODEC_INVALID_PARAM,
vpx_codec_enc_config_default(kCodecs[i], &cfg, 1));
Change-Id: I94bd0663c6652b4267204c02c3921972c854d0b0
This function specifically only aligns the stride and not the base buffer
like vpx_img_alloc does.
BUG=webm:1444
Change-Id: I3092827eeec3c9e16306a3973534d3a362a337e8
Make the maximum layer depth allowed a control parameter in
GF_GROUP. No coding stats would change.
Change-Id: I9d17167da322831e7013d761980e1c16375a161b
For 1 pass cbr encoding mode, with frame-dropping on:
increase the rate correction threshold for drop-overshoot detection,
to better capture cases of large overshoot.
Change-Id: I1153b1b71cf106749dd985074d6bc8f37d163c7e
Always use src/ref and _ptr/_stride suffixes.
Normalize to [xy]_offset and second_pred.
Drop some stray source/recon_strides.
BUG=webm:1444
Change-Id: I32362a50988eb84464ab78686348610ea40e5c80
Can't call internal error from the decoder thread.
Add vpx_internal_error_info to MACROBLOCKD. When corrupted frame
detected, the decoder thread returns to its own context and signal
completion of decoding for current frame.
The main decoding thread will detect error too and return error code to
decoding API call.
Each thread will signal end of decoding of the frame. Main thread waits
for the signal of all other threads to start decoding next frame.
BUG=875626,webm:1496
Change-Id: Icd05fbc558893a4e7d8532c1e7177e7550283a64
Space the quantization parameter distribution according to the
layer depth for multi-layer ARF coding structure. This allows
lower layers to have relatively smaller quantization parameters
than higher layers. It improves the compression performance
in constant q mode for multi-layer ARF system:
avg PSNR overall PSNR SSIM
lowres -0.33% -0.31% -1.44%
midres -0.29% -0.38% -1.14%
hdres -0.27% -0.49% -1.02%
Change-Id: I9cfe2f27e6c0029c30614970a46de3045840264e
The 16x16 array was changed to aligned. The 8xN and 4x4 functions
use aligned loads/stores on their internal arrays as well.
BUG=webm:1570
Change-Id: I9cfe53d7c8ed76e8854c2688eb9a509b876471d8
Marginally faster. Most importantly it drops a dependency on an
external symbol (vp8_bilinear_filters_x86_8).
Change-Id: Iff022e718720f1f0eeced6201a1ad69a9c9c4f45
Row based multi-thread needs extra memory to store the parsed
co-efficients, partitions and eob. This commit adds memory for the same.
Change-Id: I13fa4a6ada2ec3048bc973e465055b832429388f
Performance:
| 4X4 | 8X8 |16X16|64X64|
2 DIM|1.491|1.902|1.772|1.479|
HORZ|1.145|1.521|1.757|1.497|
VERT|1.176|1.614|1.707|1.467|
Each number in the chart above is 8-tap function time / 4-tap function time.
The framerate tested on jets.y4m for 100 frames on speed 1 increased from 3.72
fps to 3.91 fps (about 5% increase).
Change-Id: Ic0ad275cf32fafeefd0a89811badd8adff2134a0
After the frame quantizer estimate run in tpl model, reset the
actual value assigned to the current coding frame. This would
avoid certain frame update flags being overwritten by different
frame types' update.
Change-Id: Idde2ba1108f1f68747b14149b211f882965c99f0
Used 8-tap interp filter in temporal filtering to achieve more accurate
motion search result. Using 8-tap sharp gave slight better result than
using 8-tap regular.
Speed 0 borg test showed that
avg_psnr: ovr_psnr: ssim:
hdres: -0.160 -0.157 -0.173
midres: -0.083 -0.061 -0.183
lowres: -0.077 -0.099 -0.204
Speed test didn't see noticeable encoder time changes.
Change-Id: I97dc3c4864b5a5675a6c1e3952799b81eedd7d93
The use of show existing frame requries no further operation on
that coding frame. Bypass the corresponding process.
Change-Id: Ia092027a8a543be0ca54c00b4d51e453039712b8
When the ML_VAR_PARTITION experiment is turned on, replace
REFERENCE_PARTITION with ML_BASED_PARTITION at speed 5.
Coding gains(avg_psnr) compared to baseline:
ytlivehr 1.63%
ytlivelr 0.07%
Tested encoding speed with several clips from ytlivehr and ytlivelr
on linux desktop(rt, vbr, 4 threads). Encoder speed is on average
faster than baseline:
360p: 14% faster
720p: 7% faster
1080p: 1.5% faster
Change-Id: I39b00078176ff516f7306818f33ba2b1ea53dfa1
MAX_ARF_GOP_SIZE accurately reflects the maximum frame operated
per group of pictures. Use that to replace MAX_LAG_BUFFERS in
such use cases.
Change-Id: Id26f9b1b2b0c38f255dee19795356c387d06d033
* changes:
Add do_motion_search
Preserve code of doing mv search in raster order
Variant implementation of changing mv search order
Add feature_score_loc_sort
Init mv_[dist/cost]_sum in init_tpl_stats
Change mv search order according to feature_score
With this change, there will be three version of mv search scheme
on the codebase simultaneously.
We will do further experiment to evaluate which version is better
in terms of visual quality and coding performance.
Change-Id: I6bf504b4551316ef10b8a341ab3ba14d0ec977ce
This patch enables rectangular partition search on speed 1 for high
bit depth encoding. The encoding speed loss is reduced thanks to
recently added speed features.
This only affects speed 1 high bit-depth encoding.
Coding gains:
avg_psnr ovr_psnr
lowres_bd10(480p) 1.34% 1.40%
midres_bd10(720p) 1.28% 1.33%
Average speed loss:
QP=30 QP=40 QP=50 average
480p 2.5% 2.3% 2.6% 2.5%
720p 4.0% 3.9% 3.2% 3.7%
Change-Id: Id9cac4eea0769d94e093c9d170194659b3342d89
If we are using keyframe only coding - either coding a
single frame, or a sequence of keyframes - in the end-usage=q
mode, use the cq_level directly as the quality of each
coded frame, rather than boost them.
Ported from AV1: 563a0d1eb92bdc1e987df071a568d8406c4ffa92
Change-Id: I6dc929b8b4f0aa18e279139077f3a87958c92245
Some repeated codes are refactored as inline functions. No performance
degradation is observed. These inline functions can be used for width 8
and width 4.
Change-Id: Ibf08cc9ebd2dd47bd2a6c2bcc1616f9d4c252d4d
Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.
Change-Id: I12e46679f3108616d5b3475319dd38b514c6cb3c
We start mv search from the block with highest feature score, then
move on to the block's neighbors with with an searching order using
their feature scores.
We use max heap to help us achieve the functionality.
This feature is under flag USE_PQSORT
Change-Id: Ie5dc5ea715b0f9a7a594e5080a7cb4f5309f5597
This CL is for facilitating the upcoming change,
a variant implementation of change mv search order according to
feature score
Change-Id: Ie6024b1a5ec02343aea6aa81fc14f94e2e515d06
Sort the feature_score in descending order.
Do mv search from the block with higher score to the block with
lower score
Change-Id: I47a87cd66ea3e40d8c8fc55a7517ab8aa10fdb94
The quantization step size should be scaled properly for high bit depth
settings.
This only affects speed 0.
Encoder speed change is almost neutral.
There is a small coding gain of 0.09%.
Change-Id: I96b2bae03a53ce8ccd6428e3a050cfe18e06a024
Refactor to form a systematic reference frame update system for
the temporal dependency model. This prepares to support the multi-
layer ARF system.
Change-Id: Idb90fbe3966695b487c1a0a52f4626b0b6807434
The interp filter tap calculation was not accurate to tell the
difference between 2 taps and 4 taps. This patch fixed the bug, and
resolved Jenkins test failures in mips sub-pel filter optimizations.
BUG=webm:1568
Change-Id: I51eb8adb7ed194ef2ea7dd4aa57aa9870ee38cfc
_beginthread() is not declared on __STRICT_ANSI__ mode.
-----
[CXX] test/quantize_test.cc.o
In file included from ./vp8/common/threading.h:194:0,
from ./vp8/encoder/onyx_int.h:24,
from test/quantize_test.cc:24:
./vpx_util/vpx_thread.h: In function 'int pthread_create(TID*, const void*, void* (*)(void*), void*)':
./vpx_util/vpx_thread.h:259:20: error: '_beginthread' was not declared in this scope
tid = (pthread_t)_beginthread(thread_start, NULL, 1024 * 1024, targ);
^~~~~~~~~~~~
./vpx_util/vpx_thread.h:259:20: note: suggested alternative: 'thread'
tid = (pthread_t)_beginthread(thread_start, NULL, 1024 * 1024, targ);
^~~~~~~~~~~~
thread
-----
Change-Id: I774a071162b3876a7f3253ce7c5749f1b0b45818
For speed 0:
coding loss 0.045%; encoder speedup 6%.
For speed 1(only affects videos smaller than 720p):
coding loss 0.11%; encoder speedup 6.5%.
Change-Id: Ie441c9bad2021503e86fefd2f1fa3e1a42070bec
There are Jenkins test failures in mips sub-pel filter optimizations.
[ RUN ] MSA/ConvolveTest.MatchesReferenceSubpixelFilter/5
../libvpx/test/convolve_test.cc:889: Failure
Expected equality of these values:
lookup(ref, y * kOutputStride + x)
Which is: 255
lookup(out, y * kOutputStride + x)
Which is: 11
mismatch at (1,0), filters (4,0,1)
This relates to the 4-tap kernel added recently. This CL is a temporary
fix, while we investigate the issue.
BUG=webm:1568
Change-Id: If64c552b794425687cca4fbed893d8ccb73c89a5
Process the frames in the order of GOP structure definition.
Decouple the dependency on rc->baseline_gf_interval.
Change-Id: I0d42c542aca552975cc8f08b0eb8b22ccf6a9537
Add frame_gop_index to track the frame offset within a group of
picture. This reworks the GOP frame offset calculation and use
case. The coding stats remain identical.
Change-Id: I94d0957bcc327f6bbeac6e84157635663c36b953
Add an encoder side reference frame buffer pool to store the
reference frames for tpl model. This servces as an intermediate
step to support multi-layer ARF system. The buffer memory size will
be optimized afterwards.
Change-Id: If2d2f095d4911a4996f6c2a0b0a8e3d235ceadb2
Generalize the tpl model framework to support the newly designed
GOP structure system. The existing tpl model assumes single layer
ARF.
This design will separate the tpl model operation for GOP with
and without ARF cases. When a GOP has ARF, the maximum lookahead
offset would upper limit the needed frame buffer to build the
tpl model for the entire GOP. When a GOP does not have ARF, we
would use the temporal model in a different approach.
The first step will focus on GOP with ARF. All the tpl model related
operation will only be triggered by ARF frame generation.
Change-Id: I13ab03a7bc68f5a4f6b03f2cb01c10befe955e73
off_t requires sys/types.h on OS/2
-----
[CC] test/../ivfenc.c.o
In file included from test/.././ivfenc.h:13:0,
from test/../ivfenc.c:11:
test/../././tools_common.h:36:9: error: unknown type name 'off_t'
typedef off_t FileOffset;
^~~~~
make.exe[1]: *** [test/../ivfenc.c.o] Error 1
-----
Change-Id: Ia09935e5de8573e63185369fc139e3355664afd1
This patch optimized apply_temporal_filter function. The diff^2 for each
pixel in the 16x16 block is calculated once beforehand, so that we don't
calculate it multiple times while evaluating a pixel's neighbors. This
would speed up the function.
Change-Id: Ibdb8b041f317fd6df198950e2acf9cfcde26860d
Tested on lowres_bd10(480p) and midres_bd10(720p), average coding
loss is 0.09%; average encoding speedup is 9%.
Only speed 0 is affected.
Change-Id: Ia8d48c1c6d1669745f0e956b172572a37e42f0c7
This CL modified 4-tap interp filter coefficients to be even numbers,
which would help in writing 4-tap filter SIMD optimizations. The coding
performance change was negligible. Speed 1 borg test showed:
avg_psnr: ovr_psnr: ssim:
lowres: -0.003 -0.012 -0.017
midres: 0.029 0.018 0.043
hdres: 0.024 0.044 0.033
Change-Id: Id7c54bb9a9c1aee19c41bc6f1dc3b9682d158bba
This reverts commit bc066684ca.
Reason for revert: <INSERT REASONING HERE>
Regression in webrtc perf test
Original change's description:
> vp8: Increase rate threshold for overshoot-drop
>
> Increase the rate threshold for the dropping when
> overshoot is detected during encoding. This helps
> to prevent some unneccessary drops for hard content.
>
> Change-Id: I258bf33883d46347efd44e1e192cb25c444d05fe
TBR=sprang@chromium.org,marpan@google.com,builds@webmproject.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: Ib0e84747430ba6d04e479f9efd86d628b80a1e67
This is for non_greedy_mv experiment only
This is part of the change of changing mv search order according
feature_score.
Change-Id: I432efccd83d448a4a275dffd37921c76c3d84588
The gop index 0 is default as kf / gf. The effective first coding
frame controlled by the current GOP rate allocation is indexed 1.
Call the tpl model build for the current GOP once at index 1
position. This would unify the calling system for single/multi-layer
ARF GOP structure.
Change-Id: I4ce69337e04646098d5513c0aa56b4e0b4483337
1) Let mode_estimation() save the results into tpl_frame directly
2) In tpl_model_store(), replace copies of tpl_stats parameters by
memset()
Change-Id: Ia5978d91cb60cf896bd53d3f27701ef9ae3ba09a
Added the 4-tap interp filter, and used it for speed 1 sub-pel motion
search. Speed 2 motion search still used bilinear filter as before.
Speed 1 borg test showed good bit savings.
avg_psnr: ovr_psnr: ssim:
lowres: -1.125 -1.179 -1.021
midres: -0.717 -0.710 -0.543
hdres: -0.357 -0.370 -0.342
Speed test at speed 1 showed ~10% encoder time increase, which was
partially because of no SIMD version of 4-tap filter.
Change-Id: Ic9b48cdc6a964538c20144108526682d64348301
Added the accurate sub-pel motion search. In this patch, used the 8-tap
filter in sub-pel motion search, and this was enabled at speed 0.
Speed 0 borg test showed that
avg_psnr: ovr_psnr: ssim:
lowres: -1.363 -1.403 -1.282
midres: -0.842 -0.849 -0.720
hdres: -0.480 -0.488 -0.503
Speed test at speed 0 showed ~8% encoder time increase.
Change-Id: I194ca709681ea588f3f6381093940e22d03c4d7b
Take the original loopfilter multi-thread optimization
(dafe064289) along with the fixes for bugs
1558 and 1562.
BUG=webm:1558
BUG=webm:1562
Change-Id: Ibbf6bd13f4ffff0e79184ccfd6b85a49e067a6d8
The last element of the cpi->scaled_ref_idx array was not used, so
reduce the array size by 1.
The corresponding libaom CL is
https://aomedia-review.googlesource.com/c/aom/+/72445.
Change-Id: I9166f0fbe1a7898c8b611b1535fcc74b4f766997
The cm->ref_frame_map and pool->frame_bufs arrays are of different sizes
(REF_FRAMES and FRAME_BUFFERS, respectively), so init_ref_frame_bufs()
cannot iterate over these two arrays using the same for loop.
Change-Id: Ica5bbd9d0c30ea3d089ad2d4bcf6cd8ae2daea64
It seems that null pointer checks such as the following may make clang
scan-build think pool->frame_bufs may be a null pointer:
buf = (buf_idx != INVALID_IDX) ? &pool->frame_bufs[buf_idx] : NULL;
if (buf != NULL) {
This "misinformation" may make scan-build warn about the ref_cnt_fb()
function's use of its 'bufs' argument (Dereference of null pointer) when
we pass pool->frame_bufs to ref_cnt_fb().
Rewriting the above code as:
if (buf_idx != INVALID_IDX) {
buf = &pool->frame_bufs[buf_idx];
not only is clearer but also avoids confusing scan-build.
Change-Id: Ia64858dbd7ff89f74ba1a4fc9239b0c4413592c8
Make decisions more aggressively to improve encoding speed.
Coding gains(avg-psnr) after this change over baseline:
rtc 1.55% for speed 7; 2.89% for speed 8.
ytlivehr 2.20% for speed 6.
Change-Id: If6ac4a942a5b4708bcc6b0a49bd92fbc4d67c3f8
This patch included changes to facilitate accurate sub-pel motion
search. More patch will follow to turn on accurate sub-pel motion
search.
Change-Id: I224c28c338353fe5c7609372162f79885c54248f
Previously, the prepare_nb_full_mvs might construct nb_full_mv with
wrong mvs (from other ref frame).
The following changes will fix the bug.
1) Let ready in TplDepStats becomes int array
2) Add parameter rf_idx
3) Use mv_arr instead of mv to build the nb_full_mv
Change-Id: I199798aec4c6762d54799562e142457cc26ee043
Increase the rate threshold for the dropping when
overshoot is detected during encoding. This helps
to prevent some unneccessary drops for hard content.
Change-Id: I258bf33883d46347efd44e1e192cb25c444d05fe
Report the correct filename in error message.
Explicitly assign floating point value to double type.
Change-Id: I42fd2da6e16b1e3e7ec221d5d562a728a93c0196
This patch tweaks the calculation of the active maximum GF interval
and also the break out clause for the GF interval loop. The changes
force the maximum and where possible the break out value to be odd
which in turn will result in an even length ARF group if ARF coding is
selected (vs GF only coding).
The primary aim was to improve coding with multi layer arf groups.
For the single layer case there are small net gains in 3 out of 4 sets
(low,md, hd) and a small net drop for the NF2K set.
For multi-layer the gains (opsnr, ssim, psnr-hvs : -ve = better) were:-
Low res: -0.109, -0.038, -0.036
Mid res: -0.204, -0.171, -0.242
Hd res: -0.330, -0.471, -0.496
NF 2k: -0.165, -0.149, -0.157
Change-Id: I245f8561f5d1bd34312a0133c670c2154a0da23f
The end-to-end reconstruction quality is represented only by the
displayable frames. Drop the coding stats from ARF frames.
Change-Id: Ib8241db448611f4b6477f107930eaa273f960e20
Add the ml_var_partition_pruning encoder speed feature that
uses a neural net model to prune partition-none and partition-split
search. The model uses prediction residue variance and quantization
step size as input features.
Encoding speed gain for speed 0(tested over 20 hdres clips):
QP=30 QP=40
average 17.7% 18.3%
max 24.46% 26.6%
Coding loss:
lowres 0.071%; midres 0.098%; hdres 0.163%
Currently it is enabled for speed 0 low-bit depth only. It needs to be
tuned for other settings.
Change-Id: Ifb7417daa6bb6e7c97bb676269ce54ab0dc7b8c8
As we move to unify the GOP structure layout control, the variable
arf_update_idx and arf_ref_idx are deprecated.
Change-Id: Iadcb9e6033d419d4b2015fe747c23be59a7da787
Their is no valid last boosted Q availably when estimating the maximum
group length for the first ARF group in a clip, so use a value based on
the current max q.
Change-Id: Ida0b4bfb7ce7433089ad808abed7f59c88527a81
This provides and alternative (still to be tuned for edge cases)
approach to adjusting the gop intra factor when multi-layer coding
is in effect that does not alter single layer coding.
Change-Id: Iba86d65a6e68e86aa031b7e1f0b6a4c55761b1b8
Make partition decisions using machine learning models. The goal is to
achieve better coding quality than the variance-based parititioning
without much encoding speed loss.
To enable this experiment, use --enable-ml-var-partition for config.
When eanbled, the variance-based partitioning is replaced by this ML
based partitioing for speed 6 and above in real time mode(except low
resolution or high bit-depth).
Current coding gains(average PSNR):
speed 6 speed 7 speed 8
rtc 2.04% 2.65% 3.90%
ytlivehr 3.11% 4.53% 11.57%
hdres(rtc mode) 5.10%
Further testing and tuning is needed to see if the speed and quality
tradeoff is reasonable.
Change-Id: I0da5a2fbc22c3261832b32920ee36d9b19d417af
This reverts commit 5efde3914f, reversing
changes made to 3a29159372.
This is badly broken and may help somewhat for multi-layer but is hurting
massively in single layer encodes.
I ran this through this morning and while it often helps in SSIM it is badly down
for global PSNR and PSNR-HVS with some clips down by 35-40%. This is in line
with previous experiments where I have found that a bigger boost helps SSIM
but hurts PSNR and PSNR HVS.
I was also working on changes to the I factor that gave some improvements
in single layer though these were based upon the active Q mostly. I also have
looked at a bug for the first group where int_lbq is not properly defined and
will submit an interim patch for this while I look for a better solution.
In the meantime I think we should revert this.
The (Global PSNR, SSIM, PSNR-HVS) for the patch as is in my runs for
single layer vs a couple of days ago seem to be (-ve is better).
Low res 0.346, -1.475, 0.239
mid res 1.581, -1.300, 1.731 (worst result down by 30-40% in psnr)
hdres 0.665, -0.712, 1.043 (worst result down by 17-19% in psnr)
NF2k 0.927, 0.111, 1.3220 (Worst result down by 5-7% in psnr)
Change-Id: I55952b71b8cfc5a84484b3b659c5f8a530f3a755
Add a MID_OVERLAY_UPDATE abstract to support multi-layer
ARF-Overlay frame based approach. When setting the frame update
type to be USE_BUF_FRAME, the encoder will use show_existing_frame
to process the intermediate ARF frames. When setting the frame
update type to be MID_OVERLAY_UPDATE, the intermediate ARF frames
will go through an overlay frame for display.
Change-Id: Ia0c91452c09d39312ac22d855cdf681b7da851c5
In some rare cases, all possible paritions may be skipped during RD
search. The patch makes the encoder do rectangular partition search if
both partition-none and partition-split are not allowed.
Tested on the rtc and ytlivehr testsets with speed 5 and 7, no coding
stats changes were observed.
Change-Id: I8b6d8b62b6d2431be8e73317d113311c98f631d5
Increase the total prediction error budget linearly with the
allowed ARF layer depth. This in general improves the compression
performance, but does hit corner cases on a few clips at very
low bit-rate range (corresponding to 26 - 28 dB range). To mitigate
such problem, we temporarily work around this problem by limiting
the first GOP size to be ~8 so as to not drain up the bit resource.
The overall compression performance improvements over the current
multi-layer ARF system in speed 0 are:
overall PSNR avg PSNR SSIM
lowres -0.47% -0.13% -1.51%
midres -1.30% -1.16% -2.80%
hdres -0.91% -0.84% -2.15%
Change-Id: Ia4880ab63e98e15a9db99aea6eabfd3d1da9270d
This was previously brought in with the examples. When building
with --disable-examples and --enable-codecs-srcs, this file
gets lost.
Change-Id: Id8bd67cb78c4f06647f34e85f425dfc701c640c0
The function is called in motion_compensated_prediction when
CONFIG_NON_GREEDY_MV is on.
The parameter lambda is used to adjust the importance of
mv consistency between neighbor blocks.
The lambda value is set to a random value for now, and still needs
to be tuned.
Change-Id: I918eb36a686eaa56b4009058f5f329e90c75870b
The new version of refining search function will take into account
neighbor motion vectors' inconsistency while doing mv search
Change-Id: Iaf535fde04805de3dc7dd9a32f1695bf454e2d63
This new version of diamond search function will take into account
neighbor motion vectors' inconsistency while doing mv search
Change-Id: Icbde9880305cb8aea7937d6ddcef1597bf9be018
When multi-layer ARF is enabled, use the corresponding gfu_boost
factor assigned to each ARF to compute the best_quality_index
adjustment. This on average improves the coding performance by
0.2% for lowres and hdres, 0.4% for ntflx2k. It seems this change
will only affect a small group of clips, e.g., pamphlet, bowing,
mobcal_720p, etc., which tend to gain 4-5%, whereas the rest
clips remain largely identical coding statistics.
Change-Id: Ie19636a6cf32214aefd73e21ead2aea647ddbca8
in set_rt_speed_feature_framesize_independent().
use_nonrd_pick_mode is already set for speed >= 5, so need to set again
for speed >= 6.
Change-Id: Idb0a4b36d21e305bd63f19e98a70f615ad76f514
These variables are being fed to sse2 functions, that use aligned
loads.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I796c3483c6f3425d63d9262b02b19da59d536600
With --enable-better-hw-compatibility an access to array element -1
can be observed for VP9/ActiveMapTest.Test/0
../vp9/encoder/vp9_rdopt.c:3938:53: runtime error:
index -1 out of bounds for type 'RefBuffer [3]'
There doesn't seem anything that would prevent ref_frame from being 0.
If there is no reference frame it can probably be assumed that it
isn't scaled.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I0a29cd0ffc9a19742e5e72203d5ec5d0a16eac7a
Do one more subpel MV search each round. This improves coding
efficiency slightly:
lowres 0.12%
midres 0.11%
hdres 0.13%
Also renames the control flag for subpel MV search quality.
Encoding speed loss is less than 1%.
This only affects speed 1.
Change-Id: I3aecd25342f2dcacea6c143db494f7db6282cb92
Allow the encoder to fully utilize the decoder's capability to
handle both 1 fwd + 2 bwd case and 2 fwd + 1 bw case.
Change-Id: I3f984d52552ddb701b80b042d979f8fe09dd3a80
Without calloc valgrind reports usuage of uninitialized data in
vpx_get_ssim_metrics.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I9cd38b8031ea3f22c1436894ddaf9e0ccf5a654e
When built with -fsanitizer=address,undefined a number of tests,
such as ByteAlignmentTest.SwitchByteAlignment or
ByteAlignmentTest.SwitchByteAlignment produce runtime errors about
unaligned 4-byte loads/stores. While normally not really a problem,
this does technically violate the language and it is eays to fix in
a standard conforming way using memcpy which does not produce
inferior code.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: Ie1e97ab25fe874f864df48b473569f00563181ae
Generalize the encoder comp_fixed_ref and comp_var_ref assignments.
Make it fully support 2 fwd + 1 bwd and 1 fwd + 2 bwd settings
that VP9 decoder allows.
Change-Id: Id74da9a66327189a3fdf382d447243003c431131
Drop the check of compound modes where the two reference frames
share the same reference frame sign bias in sub8x8 coding blocks.
Change-Id: I47b45256582b2b5ea1372c9130d8f28cd226a29c
Generalize the comp_refs counts update support the case where one
has 1 fwd and 2 bwd reference frames too.
Change-Id: I979216a95d45efef51026158f94612bef39d3c6d
When running tests built with
-fsanitize=undefined and--disable-optimizations
the sanitizer will emit errors of the following general form:
runtime error: member call on address 0xxxxxxxxx which does not
point to an object of type 'WithParamInterface'
0xxxxxxxxx: note: object has invalid vptr
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
^~~~~~~~~~~~~~~~~~~~~~~
invalid vptr
This can be traced to calls to WithParamInterface<T>::GetParam before
the object argument has been initialized. Although GetParam only
accesses static data it is a non-static member function. This causes
that call to have undefined behaviour.
The patch makes GetParam a static member function.
upstream pull request:
https://github.com/google/googletest/pull/1830
The alternative - if the pull request is denied - would be to
modify all parameterized tests to have them derive from
::libvpx_test::CodecTestWith*Params as the first base class.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I8e91a4fba5438c9b3e93fa398f789115ab86b521
The feature score is used to indicate whether a block's mv is reliable
or not.
Now we use Harris Corner Detector method to compute the score.
Change-Id: Ibbe7a1c1f3391d0bf4b03307eaabb5cc3cfb1360
When the frame size is not multiples of mv search bsize,
the fractional part will increment the mv rows/cols by 1
Change-Id: I4333a207406610c540059a9356a82084832ca85b
The compound mode can only be run between two reference frames
with different sign bias flags. Skip the search over same sign
bias reference frames in the rate-distortion optimization.
Change-Id: I4a57feedea880883cf87200de51862beac108310
Enable the encoder to produce compound reference frame writing
that supports both 2 fwd + 1 bwd and 1 fwd + 2 bwd cases.
Change-Id: I63d2141435e2de7d8115d52b974fc41c2e608405
x86inc.asm's cglobal macro is frequently used to declare more
arguments than the function actually has. Normally, this is
done to aquire an alias to a register that would correspond to
that positional function argument if it existed. This is safe
when used in this manner.
In the case fixed here, however, the alias is used to temporarily
store adresses obtained through the GOT in memory. Because those
extra arguments don't actually exist, those stores corrupt the
callers stack frame.
SSE2/VpxHBDSubpelVarianceTest.Ref is a test that may fail as a
result.
To simply fix the space allocated to actual arguments that have
been loaded into registers already is reused.
This avoids having to allocate extra space for local variables.
Also removed duplicate code while at it.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I505281ecaa6be586185fe6a2d34d62bdf40c839f
Properly reflect its functionality that assigns the reference
frame sign bias to all the reference frames.
Change-Id: I7b597feeb06acd4c3a004cd51e4b285357315360
Make it support automatic checking and assigning the reference
frame sign bias for all the reference frames.
Change-Id: Ie82f8f872e742130a652b6d5bc109039ac46ae3b
Update the frame index counting from key frame offset for all
the processed frames at the encoder. This would allow encoder to
automatically decide frame sign bias next.
Change-Id: Ibbdc2a29b7245be27422272e1fb539596eed63d1
This entry will only be effectively used at the encoder side.
Adding it to the RefCntBuffer data structure would help make the
associated logic a lot simpler. Its effect on the decoder side
would be explicitly sent through the bit-stream.
Change-Id: I1660dce9e0bb6e28c3315d5e0df6dc4a9298f71f
Overload the use of arf_src_offset to account the relative frame
offset for all the coding frames within a GOP.
Change-Id: Ia86dede37c6a93d9f23098c15dbd936acefd75dc
use the recommended format [1] of:
<PROJECT>_<PATH>_<FILE>_H_
[1] https://google.github.io/styleguide/cppguide.html#The__define_Guard
"All header files should have #define guards to prevent multiple
inclusion. The format of the symbol name should be
<PROJECT>_<PATH>_<FILE>_H_."
Change-Id: I2e8ab0b32fb23c30fa43cff5fec12d043c0d2037
verify pointers passed to vp9_cyclic_refresh_free() and
vp9_setup_pc_tree() before attempting to free members of the structs.
based on the change in libaom:
ie41de6b5a AV1FrameSizeTests.LargeValidSizes: avoid segfault.
Change-Id: Ib81759923cb442e19f42e6edb4b61171d8799ba6
Always use cpi->multi_layer_arf branch if enable_auto_arf >= 2.
Use enable_auto_arf value to indicate max number of ARF
levels to use in multi-arf case.
Further cleanup to of old code follow in seperate patches.
Change-Id: I25cd1e4a119a2d482a15705f5126389054764f9f
With the refactoring of logics that determines if a frame needs
re-code runs to adapt to the target bit-rate, the variable
first_inter_index is no longer in effect use. Hence remove it.
Change-Id: I045894ad1f8b1e00fa40d5a55d762bad0d31b27d
* changes:
Remove some deprecated FRAME_UPDATE_TYPE elements.
Remove some deprecated constants.
Remove unused rate control data elements
Remove extra_arf_allowed.
Always allocate cpi->common.postproc_state.limits using unscaled width.
With ./configure --enable-pic --enable-decode-perf-tests
--enable-encode-perf-tests --enable-encode-perf-tests
--enable-vp9-highbitdepth --enable-better-hw-compatibility
--enable-internal-stats --enable-postproc --enable-vp9-postproc
--enable-error-concealment --enable-coefficient-range-checking
--enable-postproc-visualizer --enable-multi-res-encodin
--enable-vp9-temporal-denoising --enable-webm-io --enable-libyuv
segfaults tend to occur in VP9/DatarateOnePassCbrSvcSingleBR.* tests.
This is an analogue to issue
https://bugs.chromium.org/p/webm/issues/detail?id=1374
where a buffer allocated using a scaled width is reused after scaling
back to the original size. Unfortunately, in this case the unscaled
width doesn't appear to be known in the immediated context of the
allocation, so the the signature of vp9_post_proc_frame needs to be
changed to provide that information in order to provide a similar fix
as in #1374.
Signed-off-by: Matthias Räncker <theonetruecamper@gmx.de>
Change-Id: I6f943aafbb3484ee94c5b38d7fcdd9d53fce3e5f
Removal of some frame types relating to deprecated multi-arf work.
Added a dummy value for the USE_BUF_FRAME frame type in the
declaration of the rd_frame_type_factor[FRAME_UPDATE_TYPES] structure.
Change-Id: I7173f2fe33a53117e1bde6f9621efc1a5951240b
This reverts commit 753fd86e86.
This also has the fix for the DoS reported in bug 1558.
BUG=webm:1558
Change-Id: I65ea84e0c11d6bd40d8cb0587dfe934b3ac11dce
The bit-stream syntax doesn't support lst2/3/bwd reference frame
update. Remove the deprecated function that goes such assumption.
Change-Id: I306c582c2efc63928e4231adef2ee549076a987c
The bit-stream syntax doesn't support the use of lst2/3 frames.
Remove the update_multi_arf_ref_frames() function that assumes
such functionality.
Change-Id: Id5389285c84fe6c578c52d210aa47ef3cb789f8e
Make direct use of frame type in the available VP9_COMMON structure.
Eliminate the need to map through rf_level to fetch the frame type.
This change doesn't alter the coding stats. It simplifies the
vp9_frame_type_qdelta() function logic and removes unnecessary
reference to rf_level.
Change-Id: I1a7b2f5abcae39aa4a60d08a6011dde38ecf3b58
This function is used to in part decide if to trigger recode loop
for the first normal P frame in a GOP. Rework its design logic to
support the GOP with multi-layer ARF. Allow recode when there is
a transition from ARF/OVERLAY/USE_BUF to normal P frame.
The overall coding performance for multi-ARF gets slightly better
(less than 0.1% for show_existing_frame case). Tested on a few
clips, the encoding speed remains similar too. This change primarily
serves to help integration of multi-layer ARF and dual-ARF systems.
Change-Id: Ia44e44526b05029b1546985b3eb649e767d5444f
These information will help with making better mv search decision
Add functionality to dump tpl_stats for offline analysis
Change-Id: Ic2ec34368499c9bccb4d1f21a12b66453847fcf2
Use separate frame context index to code frames at different layers.
The maximum index cap is set as 3. This improves the compression
performance of multi-layer ARF by 0.15% across the test sets.
The overall coding gains from multi-layer ARF are
avg PSNR ovarall PSNR SSIM
lowres -3.9% -3.7% -3.2%
midres -3.5% -3.2% -2.3%
nflx2k -4.3% -4.6% -3.0%
Change-Id: I8a0b345fdd47823c018544a6b4748753faf89dc1
match the const in the header; quiets a visual studio warning.
since:
04b3d49ba vp9-svc: Allow for setting framerate per spatial layer.
Change-Id: I0a216eb8fe1a689fe6822bbfac70f7c98e9b1a70
This patch fixes a rate control bug that can manifest if the recode
loop is activated for all frame types. Specifically things go wrong when the
recode loop is used on an overlay frame that has a rate target of 0 bits.
The patch prevents adjustment of the active worst quality and repeat recode
loops for overlay frames.
The bug showed up during artificial experiments on re-distribution of bits in
ARF groups but does not activate in any current encode profile, as even best
best quality does not currently allow recodes for all frames.
Change-Id: I80872093d9ebd3350106230c42c3928e56ecb754
Temporarily fork the auto-alt-ref control meaning. When it is set
to be 1, use single layer ARF as baseline. The value 2 would enable
dual ARF system. Any number above it would trigger automatic multi-
layer ARFs.
We would gradually refactor and integrate dual ARF and multi-layer
ARF systems next, and eventually make auto-alt-ref directly control
the layer depth.
Change-Id: I292d27111ae8a596b97444afecf4b896043e543f
Extend the upper limit from 2 (dual ARFs) to maximum ARF layers.
This would later allow --auto-alt-ref to directly control the
ARF layer depth later on.
Change-Id: I6324fe980122e73dc98f81c8d7de1193a1a16e51
Normal frame boost factor is set to be 100 as the baseline for
ARF boost. Replace the hard coded number with a macro.
Change-Id: I81ce30138f7819844e7a2d811de9e1ccbeb85da5
Re-count the factors to decide bit boost factor for the
intermediate layer ARFs. Make the gfu_boost factor assigned to
each ARF adapt to its local factors.
This and the recursive change 5bfe9eb together improves the
multi-layer ARF compression performance:
avg_psnr ovr_psnr ssim
lowres -0.39% -0.54% -1.6%
midres -0.98% -1.26% -2.3%
hdres -0.95% -1.13% -2.3%
Change-Id: I5fec3ea75cae58825787dc88dadc7e8697a041ea
Recursively calculate the rate boost for the ARF frames at the
given layer depth from the remaining available bit resource after
the prior layer ARFs consumption.
Change-Id: I0e31bac4f87b895ca20605dc1307a8fc0d2a516d
Increase the bit allocation for the intermediate layer ARFs. The
current strategy assigns higher offset to the lower layer ARFs.
The needed budget is borrowed from the base layer ARF allocation.
Change-Id: I16b6e9cce4dab8e73e7b097674d1a8504205026e
When multi-layer ARF mode is enabled, increase the encoder buffer
to account for the situation where several ARFs are coded together
in a frame packet.
Change-Id: I4e53095f6b6ac5a3c8d79414411ac39880bf1523
This change is in response to quality issue in b/112953058
The quality regression observed is a result of a bug that manifested
because of a very short key frame group at the start of a chunk.
The group was so short that it was less than the minimum allowed
length of an ARF group, so the initial group was coded as a GF only
group. However, group length was not set correctly and the result
was a frame coded with a target of 0 bits.
This causes two problems:
Firstly one very poor frame that caused the issue to be raised.
Secondly that one frame obviously overshoots its 0 target very heavily
and this has the effect moving the needle significantly in terms of the
adaptive rate control (specifically the estimate of bits per macro block
used to estimate the active Q range). Consequently there is undershoot
for most of the rest of the chunk and the overall rate ends up much lower
than the target (14Mb/s vs a target of 22Mb/s). (The sharp drop in the
overall rate is also noted in the issue.
BUG=b/112953058
Change-Id: Ide9cce57acd3dee0f9496b752902e7b4735f2c7f
Keep the ARF and P frame rate allocation distribution. All the
intermediate ARFs are treated same as regular P frames.
Change-Id: I7807b8e6a8f19b6e1b09b9b7d119b3c88ef90b67
The additional constraint imposed on inter-layer
prediction should only be done for non-bypass (fixed)
svc mode.
Change-Id: Ia22cdb7bc21684776c9a13397e177a1e1c3d55a2
This rate control bug in the original patch is not the underlying cause
of the quality regression but simply unmasked a problem which stems
from applying 0 bits to the last frame in a short KF group at the start
of a chunk.
This reverts commit d10b1f2336.
Change-Id: I32c91a24a14d013853bb8e5587aa69600e6a0063
For fixed/non-flexible SVC mode: on non-key spatial
enhancement layers modify constraint on the inter-layer
prediction to include the first_spatial_layer_to_encode.
Change-Id: I6a59174976ad72d555653704dcd3b03c52e31b6f
VP9E_SET_SVC_LAYER_ID sets the first spatial layer to
encoder per superframe, so add this parameter to svc encoder.
This is needed, for example, to properly set is_key_frame for
spatial layers when base spatial layer is skipped encoded.
Change-Id: Ifd4ac77f539197ec021e62f4c624a6cc79d64f43
Add a ML model to predict if rectangular partition search can be skipped
without much coding loss. This model is enabled for speed 0 low bitdepth
only.
Impact on coding performance is minor:
avg_psnr ovr_psnr ssim
lowres -0.005% 0.005% 0.017%
midres 0.100% 0.114% 0.134%
hdres 0.048% 0.083% 0.074%
jvet480p 0.035% 0.027% 0.044%
jvet720p 0.094% 0.090% 0.174%
Tested encoding speed over 20 midres and hdres clips, average speed
gain is about 8%; maximum speed gain is 23%.
Change-Id: I5d4029dec7134c53ac68ab6cf0c8077dc0b767ed
The show_existing_frame mode still needs to be sent to the decoder.
Account for this as 1 byte. This would make the encoder properly
update its state.
Change-Id: I32a59ccb5d0e02cc6367c1a264b2de72dc1432a7
Linking c++ libraries built with gcc 6 and gcc 7 on arm
generates some warnings because of incompatibilities between those
compilers:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728
libvpx does not generate a c++ library. C++ is only used for examples and tests.
Change-Id: I3d5d5ef3fb66743bff26a833d6641898975e9f71
This reverts commit dafe064289.
Corrupted files may cause the decoder to hang as row progress in the
loopfilter is used to progress each thread.
BUG=webm:1558
Change-Id: I0674ce9af14d3fb7b2da8124e7b600616c8e734a
Always parse --required options. Previously they were only parsed for
x86_64.
Make entries passed in additive if there are existing required flags.
Mark 'neon' as required for armv8/aarch64.
BUG=chromium:876548
Change-Id: I55c6aad4536a9d8423e223e5616f3aa26d6b2941
If a ref frame is masked out, we do not need to do motion search for it.
It makes speed 0 a little faster.
Change-Id: I68f71255b2798b24fd1d5b28ed24a2ef87251413
The previous enc/dec mismatch detection assumes the previously
reconstructed frame would always stay at frame buffer pool index
at 0. It could hence cause certain delay in enc/dec mismatch
detection when the immediate reconstruction frame is not yet
propagated to index 0 in the buffer map pool.
This change always keeps the latest decoded show frame buffer
index and directly gets the reconstructed frame from encoder and
decoder buffer pools to check for mismatch.
Change-Id: If53092cbc42ab78d55af5b83f12a489fc362f3ae
This reverts commit 416b7051d7.
Reason for revert: it causes visual quality drop as described in b/112953058.
Original change's description:
> Prevent double application of min rate in two pass.
>
> The initial allocation of bits in the two pass code to each frame
> should be within the min max limits on the command line. However,
> when forming an ARF group the cost of the ARF is shared by frames
> in that group such that the residual bits for a frame could drop below
> the min value. This change prevents the minimum being re-applied
> after the cost of the ARF has been deducted as this may otherwise
> cause low rate sections to overshoot their target.
>
> Test runs comparing to a baseline run with min and max section pct
> 0-2000% vs one closer to the YT use case (50-150%) suggest that
> this fix not only results in better rate control but also gives a better
> rd outcome.
>
> For example the HD set vs 0-2000% baseline (opsnr, ssim).
> Old code (50-150): +0.751, +1.099
> New code(50-150): +0.241, -0.009
>
> Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
TBR=yaowu@google.com,paulwilkins@google.com,debargha@google.com,builds@webmproject.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: Ic9849e4e0db64e9d92bbb9df9cc923230a15c4df
Make the encoder side handling of prev_frame and last_show_frame
update synchronized with the decoder behavior.
Change-Id: I0f265391cba182d7cc266a1c327fe6b92e24ab17
When the current frame is coded by directly using a reference
frame in buffer, no need to update the prev_mi frame information
for next frame encoding control.
Change-Id: I33fda8e70cdb31eb5b13b63e3dbd6e96ff85154d
Previously we often skip all compound inter prediction modes,
causing large coding loss. This patch modifies how we set the
ref_frame_skip_mask so that compound modes are considered in RDO.
This affects speed>=1.
Coding gains(overall psnr):
lowres midres hdres average
speed 1 0.54% 0.43% 0.64% 0.53%
speed 2 0.59% 0.48% 0.60% 0.56%
Tested encoding speed on 10 HD sequences, average speed loss is
5% for speed 1; 2% for speed 2.
Change-Id: Ib8758af7ee7c9812022bd21c5fe61631e2bb8e5c
Allow the encoder to skip temporal filter for intermediate ARFs
that are later used in show_existing_frame mode.
Change-Id: Ieed635bf7672b62f5c287bde43765f80362a345e
The enum USE_BUF_FRAME makes the use of show_existing_frame. In
this setting, all the reference frame buffer condition will stay
unchanged.
Change-Id: I5b7b28488dbd94982f721667128f004e4e6a00d8
Point the current frame buffer towards the existing reference frame.
In the meantime, release the original new_fb pointer.
Change-Id: Ic83a698cac5cdaaabdf61acffb936ec130a84d1c
Skip the loop filtering for frame coding in show_existing_frame
mode. This matches the decoder operation for show_existing_frame
mode.
Change-Id: I96f275cf5384eb5fe8c0404ec4142cf5b580ac16
When the show_existing_frame mode is on, directly point the new
frame pointer towards the existing reference frame buffer entry.
Change-Id: Ic50b25655fe95ea702fb529afacb7701ec17adcb
No need to process through the frame encoding stage when a current
frame is coded using show_existing_frame.
Change-Id: I36c6f04e344326fa6ecc95cd0a4e4fd6f467fdcb
This enum indicates the use of show existing frame, and conducts
no reference frame buffer update.
Change-Id: I8bf3121376640baf24b580ebea58e9ccbdd641da
Determine if an ARF is on the future side by checking if its
offset meets the gop frame length. This unifies the support to
single- and multiple-layer ARF cases.
Change-Id: I5ab26f54311c345a9b574ffca5ff0a8dbcf4c031
Adding LPF within the tileworker hook. This means that LPF will be done
immediately after decode, without waiting for all threads to sync.
Performance Improvement -
Platform Resolution 2 Threads 4 Threads
X86 720p 7.24% 22.04%
1080p 5.29% 17.02%
ARM 720p 4.61% 8.75%
1080p 5.55% 12.03%
x86 Improvement measured on Intel Core i7-6700 CPU @ 2.10GHz set
in performance with turbo mode off
ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
Change-Id: Ifa73c71b40db3fa7fa16f54f4e3aa06d1258caae
Make the bit-stream writer match the decoder behavior, when the
show existing frame feature is used.
Change-Id: Ibc8153f8668da0f9a2ed8af3b42dae91a5ac08c7
Allow the bit-stream writer to support potential use of
show_existing_frame. At this point, cm->show_existing_frame is
always 0.
Change-Id: I64fed1d72db6d4902d56774854ce24fb7a082e0c
Use both luma and chroma components simultaneously to estimate the
non-local mean kernel and build the temporal filter. It improves
the compression performance primarily for chroma components. Tested
in speed 0 and vbr mode, the coding gains are:
Overall PSNR SSIM PSNR_U PSNR_V
low -0.10% -0.12% -0.48% -0.49%
mid -0.13% -0.16% -0.58% -0.88%
720p -0.31% -0.24% -0.75% -0.72%
hd -0.09% -0.10% -0.59% -0.79%
nefl2k -0.30% -0.13% -0.53% -0.50%
Change-Id: I24d39997818322b0d69bd9dbeda02c60cd2b2e1b
Unify the temporal filter operations for the luma and chroma
components. Handle them in a single loop over the pixels in the
processing block.
Change-Id: I9ea1946f3a6fb37da6867aa78140d45cad0facf0
For screen-content with aq-mode = 3: identify spatial
flat superblocks in the setup stage and don't mark them as
candidates for refresh. Spatially flat blocks are already
removed from refresh at a later stage in the encoding (in pick_mode),
but doing this at the setup stage of cyclic refresh (before encoding)
allows refresh to more quickly hit the text areas. Only drawback is
an extra source variance calculation for a set of superblocks on
each frame.
Adjust the refresh rate: lower it to reduce overshoot since
more texture areas are hit faster with this change.
Change-Id: I88fa20e52fdbf1a938ae814f9b48c887f1f909d2
When the feature is enabled and the memory is not available, allocate
it. There was a case where speed feature changed in the middle of stream
but the number of tiles stayed the same, memory was not re-allocated.
Another case is where speed for base layer is different than that of
higher quality layers (same resolution). Removed the speed constraints
forcing base layer using same speed setting.
Thus the memory for adaptive_rd_thresh_row_mt stayed NULL but the
feature was enabled.
Add an end to end test to cover this case.
Change-Id: I2f1f802ef98a554571b30094d3600b9439228457
The default is set to turn on the temporal dependency model at
speed 0. Use --enable-tpl to control turning it on/off when calling
vpxenc.
Change-Id: I61614cd8100ae57dc01fd46b2a69c5b67287f18a
1: Lower rdmult used in trellis optimization
2: Shut off the end of block optimization that tries end of block
at every sub position if any of the coefficients are > 1.
3: Change the rounding and zbin factor according to sharpness.
4: Disable the skip block check that calculates RD using SSE from
predictor.
Change-Id: I247b61a26fa22f12f8b684e7cd6d4e368de7c3e4
When the group of picture runs over 24 in length, skip the use of
temporal dependency model, since the model assumes maximum 25
lookahead frames.
Change-Id: I6386dd33bcdaf1229fae978130b4c3b43d071918
Low speeds in good mode are too slow.
Move CBR large tests to non-'Large' ones such that they can run in
Jenkins per commit.
Change-Id: I1da73ca96ee89abcf3566d51ff52f1f2e904a048
Add metrics that are being updated per-frame to
the layer struct, so each layer using the cyclic
refresh has the correct update. This is more consistent
for the rate control and refresh rate.
Some improvement in screen content clips.
Neutral for SVC on rtc set.
Change-Id: I0a9862fb6b6a79e894e2ff30c120dc4aa26fcda5
Add flag to separate two cases of bypass (flexible) SVC mode:
usage of using the SET_SVC_REF_FRAME_CONFIG vs passing in the
frame_flags in the vpx_encode (only used for temporal layers).
This fixes failures in Datarate Temporal layer test,
introduced in commit: a66da31
Change-Id: Ie62f933987c20792d1f963d645e98c1903bdd423
For CBR real-time mode: refactor usage of speed feature to
handle overshoot on slide/scene change. Add 2 modes to indicate
how slide/scene change is processed for re-setting Q/rate control.
Keep the speed setting to 1 for speed >= 5, otherwise set to 0.
Video content and screen content are now handled in similar way,
though with different thresholds.
Some fixes to thresholds and reset: correct the reset of the buffer
level to optimal level for each temporal layer, if scene change
frame will be encoded at max_q.
Also increase the min_thresh for video mode (non-screen content):
this is to avoid scene change detection on cases like large
lighting changes, cameras focus. And increase in min_thresh
makes it more robust to sudden increase in noise level.
Change-Id: I256d350da6e92d2ddc09f100fc06ac147cbc1e49
Before this patch, pred_mv is used only when the
adaptive_motion_search speed feature is on(speed>=1).
This patch enables pred_mv for speed 0 as well.
Coding gains:
avg_psnr ovr_psnr ssim
lowres -0.31% -0.32% -0.38%
midres -0.37% -0.41% -0.42%
hdres -0.30% -0.31% -0.29%
Tested encoding speed over 18 midres sequences with QP=40. The
overall speed loss is about 0.6%.
Change-Id: I8987e9efb5a70d2bf8779fc2a43838009f9bbd8a
Add update_buffer_slot to SVC API to allow for refreshing
any of the 8 reference buffers. Remove frame_flags from
the struct.
Remove svc tests from vp8 build.
BUG=b/112292577
Change-Id: I0551c349d2b311227245a8ed1639cdbbaf5bc5db
For spatial layers: use the correct mi_cols/rows in the
scene detection. The scene detection for spatial layers
is only called once per superframe, but we were using wrong
mi_cols/rows (those for base spatial were being used).
Also increase frame_since_key threshold to account for spatial
layers.
Change-Id: I2731da49684a798c4718693a0468eda7db82d2bd
Previously if the number of tiles decreased within a clip and there were
fewer super block rows than workers the mi_row calculation would cause
rows to be skipped. The num_workers stored is the max allocated amount,
use sb_rows to limit the active ones if the row count is smaller as
additional threads will provide no benefit.
Change-Id: I1750296c8c21082de2594afecc4d6a3929db1f12
Do some extra full pixel search to improve motion vector quality.
Currently it is enabled for speed 1 only; disabled for real time mode.
Coding gain for speed 1:
avg_psnr ovr_psnr ssim
lowres -0.23% -0.23% -0.35%
midres -0.33% -0.35% -0.38%
hdres -0.28% -0.29% -0.28%
Tested encoding time over 10 HD sequences. Overall speed overhead is
1.5% for QP=30; 0.6 % for QP=40.
Change-Id: Ic2ea4d78c4979de9d5090c9d7c702944f155f8af
this avoids reading 4 pixels into another block, which may be operated
on by a different thread. quiets a tsan warning.
Change-Id: Id27ad9d61819b0e5de0230647b4b510f7c265a71
Comparing the size values with subtraction requires casting. Sort in
descending order.
(a < b) - (a > b)
If a is greater, this is 0 - 1 = -1
If the values are equal, this is 0 - 0 = 0
If b is greater, this is 1 - 0 = 1
Change-Id: I5c20fd10fbc97c391c6858235c44d25d7db57f0e
"b_width_log2" and "b_height_log2" should be "b_width_log2_lookup" and
"b_height_log2_lookup", respectively.
Change-Id: I3ad49e45007cd9fcf5dd463c7d01e22745939231
For real-time 1 pass mode: overshoot detection and max_Q
reset should only be for screen-content mode.
This fixes some failures in the 1 pass VBR tests, from
the commit: 2fae9991
Change-Id: I70cbe4e6fd83cfe0c7662f13b779551bf4f319cb
For real-time screen-content mode: increase the
qp_thresh for max_Q setting on slide changes.
This will make bitrate spikes less likely on slide changes.
Change-Id: Ie13524a06490214456b1c9c042a864ea0d0750c5
Code cleanup; add some comment.
Also remove a reduncant call to vp9_get_mvpred_var() at the end when
method is MESH.
Change-Id: I4b58e7e1c42161642708f8b0342ab3c0ce39ed7d
For real-time screen content mode: for speed >= 6 disable
the re_encode_overshoot feature. This means for speed >= 6
the Q and rate control is reset on slide changes based on
the scene/slide detection and the current Q (and not on a
first pass encoded frame at current Q).
This reduces encode time on slide changes, but may be less
accurate in deciding when to reset/max-out the Q.
Change-Id: Id0fdcafd55bc43bd8b3afee211e524f37c8ddce6
Take partition cost into consideration during rectangular partition
mode search.
Compression change is neutral. Encoding speed can be a little faster
at low quality settings. With QP=55 at speed 0, average speed up over
15 midres sequences is about 2.7%.
Change-Id: I6d423459675b5f1e4e1475dbbf6f67ab970a4832
Append mesh search to the diamond shape search to refine
the full pixel motion estimation for source ARF generation.
It improves the average compression performance.
Speed 0
avg PSNR overall PSNR SSIM
mid -0.18% -0.18% -0.22%
hd -0.25% -0.23% -0.36%
nflx2k -0.22% -0.23% -0.37%
Speed 1
avg PSNR overall PSNR SSIM
mid -0.10% -0.08% -0.11%
hd -0.25% -0.27% -0.38%
nflx2k -0.20% -0.20% -0.34%
The additional encoding time is close to the sample noise
range. For bus_cif at 1000 kbps, the speed 0 encoding time
goes from 83.0 s -> 83.6 s.
Change-Id: I48647f50ec3e8f7ae4550a4bde831f569f46ecf3
For nonrd_pickmode: add clamp/check to make
sure tx_size is not set to lower than 8X8,
for the model_rd_large function (which is only
called for big block sizes).
No change in behavior.
Change-Id: I9c6093068e406ac16cfd6784ba75868906225378
Simplify the pass-in data structure. Use a reference TplDepStats
pointer to replace multiple data sent in.
Change-Id: Ibebced5d7f411d2c4a8a34a9b7eb87453fb78d13
Disable cyclic refresh on slide/scene change frame. It was already
disabled on the re-encode for the slide change, but this change
makes sure its always disabled on a detected slide change (which
may not be re-encoded at high Q).
Change-Id: I1195c855bca25985d4d41e5b657adf124e901760
The ".syntax unified" directives in a few source files aren't valid
ADS assembly directives, and they break compilation for windows,
since ads2armasm_ms.pl doesn't handle them.
Explicity add them via ads2gas.pl and ads2gas_apple.pl instead,
and tweak one instruction to be valid unified syntax.
Change-Id: I37f1709f163d11474597161fe02eb433859cb9b8
Use diamond search for full pixel motion estimation to build
the temporal dependency model and the source arf frame. This gives
better full pixel motion estimation accuracy. It improves the
compression performance.
In speed 0,
avg PSNR overall PSNR SSIM
midres -0.32% -0.30% -0.65%
hdres -0.88% -0.91% -1.31%
nflx2k -0.47% -0.48% -0.81%
In speed 1,
avg PSNR overall PSNR SSIM
midres -0.24% -0.28% -0.50%
hdres -0.82% -0.83% -1.18%
nflx2k -0.58% -0.60% -0.89%
The encoding speed change is minor due to the fact that such motion
estimation is triggered once at the beginning of each group of
picture coding.
Change-Id: Ib25c0ff4f7450c85fd7a38d24319bd7ae1b9dac8
The coding performances drop slightly in speed 0
lowres 0.021%
midres 0.043%
hdres 0.087%
The speedups in speed 0 are observed as follow
city_cif.y4m 4.5% speedup
pamphlet.y4m 6.9% speedup
Change-Id: I2f6209964ffdf7a93919b79033d8e6f9bc44d824
Force 4x4 transform size under some conditions for real-time
screen-content mode. Improvemet on text in some screen clips.
Change-Id: I77cafa23ea1060ef4334dc07eac53189bf80e0ec
Fix multi-thread encoder result test induced by
the prune_ref_frame_for_rect_partitions speed feature.
BUG=webm:1552
Change-Id: Idc3b3759651f76285ffd90059c6a2846c4d91a00
For real-time/nonrd_pickmode: under some conditions
force check of intra modes for flat blocks with motion.
Reduces artifacts for screen-content mode.
Change-Id: If320f41a90982b14c48d91150f59f048a62982b1
For real-time screen content: don't allow early
breakout in nonrd-pickmode on slide change.
Avoid artifacts.
Change-Id: I09c6927a5d85b46ce059ea5954a3719a7362fb99
Profile 1 or 3 bitstreams may require 11 bytes for the header in the
intra-only case.
Additionally add a check on the bit reader's error handler callback to
ensure it's non-NULL before calling to avoid future regressions.
This has existed since at least (pre-1.4.0):
09bf1d61c Changes hdr for profiles > 1 for intraonly frames
BUG=webm:1543
Change-Id: I23901e6e3a219170e8ea9efecc42af0be2e5c378
This reverts commit d72cd51d83.
Reason for revert: <INSERT REASONING HERE>
Doesn't seem to really remove the artifact that was the cause for this change. Reverting for now.
Original change's description:
> vp9: Adjust reset segment for real-time screen-content
>
> For real-time screen content mode when the short_circuit
> flat_blocks feauture is enabled: reset segment to 0 for
> coding block if its flat, regardless of temporal source_sad.
> Reduces some artifacts on flat areas.
>
> Change-Id: I9620e424bedc5a13f87cc4f66af7c0e86043c89c
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: I83ee9fd75bfb621a4f3e9afbcc07e7c6ca5c51d6
For real-time screen content mode: when slide change
is detected, for spatially flat blocks (source_variance = 0) on
the re-encoded frame, skip inter modes (so force intra) if
non-zero temporal variance is detected for the coding block.
Add flag to keep track of re-encoded frame at max Q.
Reduces artifacts on slide change.
Change-Id: I28151f412aba6ab8cb03f30087c7ce16d443654b
If CONFIG_SIZE_LIMIT is defined, vpx_realloc_frame_buffer should fail if
width or height is too big.
This carries over commit ebc2714d71a834fc32a19eef0a81f51fbc47db01 of
libaom: https://aomedia-review.googlesource.com/c/aom/+/65521
Change-Id: Id7645c5cefbe1847714695d41f506ff30ea985f6
This patch limits the active min Q for normal frames based on the previous
KF/GF/ARF. In a few cases, especially at the end of a clip where there
has been systemic underspend, (as is often the case with slide shows),
this prevents the encoder rapidly dropping Q on normal frames (just to
try and use up bits), such that they end up with a lower Q than the key
frame / GF / ARF off which they key.
Change-Id: Ic8def5c0d1e37ca2202e007ec1d13e501c0a91dd
For real-time screen content mode when the short_circuit
flat_blocks feauture is enabled: reset segment to 0 for
coding block if its flat, regardless of temporal source_sad.
Reduces some artifacts on flat areas.
Change-Id: I9620e424bedc5a13f87cc4f66af7c0e86043c89c
Add a speed feature to prune reference frames for rectangular
partitions. Rectangular partition RD search happens after square
partition RD search. With this feature, we keep record of the ref
frames picked by square partitions, and only consider those ref
frames during rect partition RD search.
With this feature on, the computation cost of rect partition RD
search is greatly reduced, so we can afford to skip rect partition
RD search less aggressively.
Overall, both compression and encoding speed are improved. Only
speed 0 is affected.
Coding gains:
lowres midres hdres
ovr psnr 0.00% -0.36% -0.37%
avg psnr 0.00% -0.36% -0.36%
Tested encoding speed with QP=40 on about 30 sequences.
Speed gains:
lowres midres hdres
average 13.4% 7.1% 6.1%
max 28.0% 12.0% 9.8%
Change-Id: Id5f36dd2ac75028ae98550d67b0a524aa251b692
Properly scale the distortion metric according to the tranfer
function gain of the transform block size.
Change-Id: I8e3539d8936f5db78c1352f902f72ef19fc09ed8
This commit adds a command line argument "--row-mt". Passing "--row-mt=1" will
set the row_mt flag in the decoder context. This flag will be used to
determine whether row-wise multi-threading path is to be taken when the
row-wise multi-threading functions are added.
Change-Id: I35a5393a2720254437daa5e796630709049e0bc2
Apply a fixed maximum boost for static key frame
groups / slide show content (if > 8 frames long).
This insures sufficient boost on shorter sections
whilst preventing excessive boost on longer sections.
Change-Id: I5b857dab023d674cfd55bced3437f3bce3b4f1cb
Where a KF group is very short but static make sure
it is coded as a single GF group. Previously there was a
bug where such groups could be coded as an arf group
with the arf in the next scene.
Change-Id: I4504ae2b03c4877fcecfa58dd503879aa4eefac4
Set an upper limit on the maximum boost for a static
GF only group such as in slide shows as part of tweaks
to quality / rate trade off.
Change-Id: Ic72575328419cdcf82ad3a20a1d9b947538c25c6
Slight adjustment to rules for defining static groups.
Adjustment of small bias towards 0,0 motion in first pass.
Change-Id: Id1d3753979ad54622f983f4de08472738317ec8e
This patch adds in detection of slide show content and allows
for coding of long GF only groups up to a length of 240 frames rather
than coding a large number of shorter ARF groups that gradually
lower the Q.
In test samples this patch gave rise to a substantial improvement in
overall psnr and a drop in data rate. In some cases the average psnr
fell, however, with the boost and minQ values set as they are.
This is to be expected because average psnr is dominated by the
best frames in the sequence and previously a relatively poor key frame
could be followed by progressively better alt refs. For example a key
frame at q7.5 but subsequent alt refs improving it to lossless.
For slides displayed for several seconds, savings of >= 20% (or
commensurate quality gains) are likely.
This patch allows for long GF groups in static sections before and after
complex transitions (e.g. fades) with one or more normal ARF groups
during the transition. However, it enforces a single "normal" length
GF group after the transition before any extended group is allowed.
The reason for this is that the ARF that spans the transition my not have
a very high quality and hence may not be a good GF for the long static
section that follows.
Change-Id: I66cc404c3b85e87dae9829b49d9d631cbf04e037
Ref frame buffer is corrupted but it's not checked before it's used to
compute the reconstructed previous frame buffer.
BUG=webm:1496
Change-Id: Ief0e85b91b19576632685d17c8176c8d29158028
For screen-content real-time CBR mode: on a detected slide change
that is encoded at max Q (to prevent excessive overshoot), increase
the perc_refresh in the cyclic refresh following the slide change.
Use counter to increase refresh up to some #frames from slide change.
This is attempt to increase quality ramp-up after slide change without
causing too much excess overshoot.
Change-Id: Ie4ec4361082803a522f4a8794b3bb0178c9cf307
This fixes the build with at least GCC 7.3, where it was previously failing
with:
sum_squares_neon.c: In function 'vpx_sum_squares_2d_i16_neon':
sum_squares_neon.c: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
s2 = vpaddl_u32(s1);
^~
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vpaddl_u32(s1);
^
sum_squares_neon.c: incompatible types when assigning to type 'int64x1_t' from type 'uint64x1_t'
s2 = vadd_u64(vget_low_u64(s1), vget_high_u64(s1));
^
sum_squares_neon.c: incompatible type for argument 1 of 'vget_lane_u64'
return vget_lane_u64(s2, 0);
^~
The generated assembly was verified to remain identical with both GCC and
LLVM.
Bug: chromium:819249
Change-Id: I2778428ee1fee0a674d0d4910347c2a717de21ac
For real-time screen content mode: when scene/slide change
is detected and re-encode is decided, force hybrid_intra
mode search if slide change is big and alot of Intra modes
were used. hybrid_intra mode will use rd-based intra mode
search for small blocks.
Overall better PSNR on clip with slide changes, with similar
encoded frame size. Encode time lightly higher on average with
this change.
Change-Id: I503835253b777b9f98d74e75a52a8000b76c310c
Assign the estimated qp for the overlay frame too. Cap the minimum
quantization parameter to be 1 to avoid lossless coding in the
temporal dependency model setup.
Change-Id: I8acbc7182045dbf3017b6712a119b18407b76ab0
This reverts commit 9c2c234a0b.
Causes multithreading test failures in 32 bit configurations.
BUG=webm:1547
Change-Id: Idb480b206a87b7cd6affbafffde8d8e1b6aee621
Set the multiplier for motion estimation using the estimate frame
quantization paramter in the temporal dependency model.
Change-Id: Ia9a843111c1504d7ae8b12113374831ee79c85b8
Gather the availabel statistics to estimate the frame level
quantization parameter set in a group of pictures. This will be
called in the tpl model construction. No visible coding stats
change would occur.
Change-Id: Ic412e4afd9a60f1317a5f8eab6a4f6d5e48c4c07
For real-time non-rd pickmode: force check of
intra modes on INTER frames for scene changes.
Reduces artifacts on scene changes.
Change-Id: I5ae80869072db156791ace554c0a470f3785e9c6
Send the gf_group index as argument into the function. This
prepares later re-use of this function in the tpl model.
Change-Id: Id6203105629e687172c651a013d38c207b60ace7
libaom commit 80a5b09337a80093e1e7ae5eb540020a22949805:
dec_free_mi: Reset cm->mi_alloc_size.
libaom commit fb0dd0bb80fc95ef016f1421b105a52fffa32816:
Clear cm->width and cm->height on alloc failure.
libaom commit ccb27264089a8cfa1334391ebbcb6a11b8dff442:
Misc. resize fixes along with the resize test
Note: only the change to enc_free_mi in av1/encoder/encoder.c
is merged.
Change-Id: I602813230d40125e59608fa013085dca3e160c33
Use a valid frame rather than the one from the bug to avoid dealing with
trailing data. The decode would fail on x86 due to read size differences
in the entropy decoder.
The updated file was created from the first frame in:
vp90-2-02-size-08x08.webm
BUG=webm:1539
Change-Id: Ibcc2f6fa435bcf360a40fc9a202a8baba42b24da
Add 32x32 Hadamard transform in C implementation. Replace the
forward 32x32 2D-DCT in tpl model with Hadamard transform. This
would reduce the overhead encoding time due to running tpl model
by ~3x.
Change-Id: I1c743dab786b818d89f14928cc3998d056830aa9
Relax the Lagrangian multiplier adjustment limit from 1/4 to 1/2
fluctuation. This allows the temporal dependency model takes more
effect on changing the rate allocation across blocks.
Change-Id: Ida59ad628d35f196a1299d96e21bb684c20b0143
avoids duplicate errors should DecompressedFrameHook fail and a
potential end-less loop should dec_iter fail to advance.
Change-Id: Ifb2673d02188a8aad75cda8bb960bb56fe70d218
The factor mc_dep_cost includes intra_cost additiona already. Hence
no need to add it again in the denominator.
Change-Id: I750ae86e1d3019b4a3aebd03dec8db362589619e
Use this flag to indicate the temporal dependency model for the
given frame is properly set up.
Use the pointer address to decide if the tpl_stats_ptr array needs
to be released.
Change-Id: I541fe098f51981010011ae0af2535d8a5762d254
It is already initialized at superblock level, but since
it is computed per coding block, based on some speed features,
better to initialize it in pick_inter.
No change in behavior, as currently the speed features
that enable use of source_variance in pick_inter are fixed
at the frame-level.
Change-Id: Ic787ac2f389ba1bced98716096e7b5cffba856a7
fixes an endless loop caused by successful read return on eof.
since:
00a35aab7 vpx[dec|enc]: Extract IVF support from the apps.
BUG=webm:1539
Change-Id: I64dbb94189ea6a745d53a4bacc033f5f58eafb37
Use case is for layered (SVC) coding to allow higher
resolution layers to continue decoding with temporal references,
while base spatial layer is intra-only frame.
Made encoder changes to real-time path for encoding intra-only
frame. The intra-only frame will be followed by the overlay/copy
frame (with both packed in the same superframe).
Use existing control to enable intra_only frame.
Intra only is only applied to base spatial layer, and only
allowed under fixed/non-flexible SVC mode, and only for
1 < number_spatial_layers < 4.
Added svc datarate unittest for inserting intra_only frame
as sync frame. Added svc end to end tests to check mismatch.
Change-Id: I2f4f0106b2c4f51ce77aa2c1c6823ba83ff2f7a0
Signed-off-by: Marco Paniconi <marpan@google.com>
Delete assert that is not valid in all cases.
This can occur if the last group in a clip is a GF only
group. Here the frame count reflects the nominal
positioning of the "next" GF (were it to exist) one
frame beyond the of the end of the clip.
Change-Id: I0d36b83de0ab478dab032599ee7df7fff4a35cd5
This adds the following command line options to
vp9_spatial_svc_encoder test app:
--drop-frame=<arg> Temporal resampling threshold (buf %)
--tune-content=<arg> Tune content type default, screen, film
--inter-layer-pred=<arg> 0 - 3: On, Off, Key-frames, Constrained
Change-Id: I653d1924fb6e525edb2d1e84739be0b88e773e1c
For screen-content: use the previous actual number of seg
blocks for the segment weight, used in the rate control
for setting frame-level Q.
Small overall increase in psnr on several screen-content clips.
Change-Id: Id414fb7f1b0ba578d464437d7f9c1783a0cad310
Reset segment to base (segment#0) on spatially flat
stationary blocks (source_variance = 0). Also increase
dc_skip threshold for these blocks.
Reduces artifacts on flat areas in screen content mode.
Change-Id: I7ee0c80d37536db7896fa74a83f75799f1dcf73d
Reset the last_coded_q_map and the sb->index in the cyclic_refresh
on a re-encode for slide change, so the refresh can start again
right after slide change.
Change-Id: I10cbc8354de8f7c2863b4212e6793b58a048b330
Add scene detection flag to choose_partitioning to force split
of 64x64 block partition. This reduces artifacts on slide changes.
Bug:b/110978869
Change-Id: I9cc79a7c03f3aa2edeb28656b09a2177b72d59a8
Adapt the Lagrangian multipler based on the spatial variance in
the temporal dependency model. The functionality is disabled by
default. To turn on, set enable_tpl_model to 1.
Change-Id: I1b50606d9e2c8eb9c790c49eacc12c00d3d7c211
Use the avg_frame_low_motion to reduce/turnoff this
early exit for higher motion content. Get some quality
back for higher motion clips and keep the same exit
thresh for low motion clips.
Change-Id: I95daf754dc0048b3e935d1a753f7f1101e6ffb77
ARGBToRGB24Row_AVX512VBMI fails to compile on Mac:
row_gcc.cc: instruction requires: AVX-512 VBMI ISA AVX-512 VL ISA
BUG=libyuv:789
Change-Id: Ibd584e8c82e3ce86ec5460b4243f84f5dbdf4c81
This patch relates to motion artifacts as described in Issue 73484098
The aim of this patch is to promote the use of smaller partition
sizes in places where some of the sub blocks have very low
spatial complexity and some have much higher complexity.
The patch can have a small impact on encode speed, but much
less than alternative approaches such as lowering the rd thresholds
that limit the partition search when distortion is low.
The patch also applies a similar sub block strategy for AQ1.
Metrics results for our standard sets over typical YT rates.
(Overall PSNR, SSIM, PSNR HVS) % -ve better.
Low Res -0.274, -0.303, -0.330
Mid Res 0.001, - 0.128, -0.100
Hd Res -0.236, -0.371, -0.349
N 2K -0.663, -0.798, -0.708
N 4K -0.488, -0.588, -0.517
Change-Id: Ice1fc977c1d29fd5e401f9c7c8e8ff7a5f410717
The avg_frame_low_motion metric is only computed on the
top spatial layer, and since its part of the layer context
struct, it needs to written to all lower spatial layers for
consistency.
Small/minor change in metrics.
Change-Id: I92a001c37aeb332e613212288b13a2ed9745af88
For SVC: apply the sse_zeromv early exit also to
the case where golden is second temporal reference.
Set the thresh_svc_golden threshold for this case.
This is reduce the encode time for case where golden
is second temporal reference for SVC.
Change-Id: I8c0c87dd746579d3c4f5e983c7f9dd0a1e1476e0
vpx_quantize_b:
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 Old VSX time = 8.1 ms, new VSX time = 7.9 ms
vp9_quantize_fp:
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 Old VSX time = 6.5 ms, new VSX time = 6.2 ms
Change-Id: Ic2183e8bd721bb69eaeb4865b542b656255a0870
Low bit depth version only. Passes the Trans32x32Test test suite.
Trans32x32Test Speed Test (POWER9 Model 2.2)
32x32 C time = 212.7 ms (±0.1 ms), VSX time = 82.3 ms (±0.0 ms) [2.6x]
Change-Id: If906ec9b56ce3818cae0cc462c7277284ab29859
Enable ML based partition search breakout on speed 0 when frame
resolution is less then 720p and bitdepth is 8.
Compression performance change is neutral.
Tested encoding speed over 20 480p sequences:
Speed gain(%) QP=30 QP=40 QP=50 QP=60
max 14.4 18.6 17.8 24.4
average 4.6 9.0 8.0 13.2
Change-Id: Ia0d2947030ac774dc1533eb27ffc57f5b788a6ce
Disable the cyclic refresh for very low average Q.
This reduces encoded bitrate for static slides after the
the quality has ramped up well enough (low Q). And as the
cyclic refresh is not needed at low Q in most cases, this
has minimal/no effect on quality on RTC set.
Change-Id: Id6d449aa2351bb6886d72aafb2d406e967ed2789
Fixes to nonrd coding mode for lossless mode: keep
skip_txfm to 0 (no skip) and disable the encoder breakout.
This makes the encoding lossless when that mode is selected
for real-time (nonrd pickmode).
Also the disable the cyclic refresh for lossless mode.
Change-Id: I20a11ef6df08accec472d26fabebd14d51f4d337
Fix condition in frame dropper for SVC to handle case
where spatial layer is skipped encoded (due to 0 bitrate).
Change-Id: I24185178774d73e8bb1c406acc0292422dfbe174
Create tests for sync layer. The purpose of new tests is not to check
bitrate targeting, thus they're put in a new file.
Create a base class for svc tests, which is also inherited by svc datarate
tests, to reduce code redundancy.
Start decoding in the test from the frame of layer sync.
Change-Id: I7226d208279ad785873dffef51e0a8abef23b256
Previous CLs have implemented the construction of the hierarchical
structure at the encoder side. This CL is to define and configure the
according flags that will guide the reference frame update according to
the constructed hierarchical structure.
Change-Id: Iae55f2400f7c7beff41feff9308f87bfc70c7b21
This CL is to hook up the implemented hierarchical structure
construction as well as its corresponding bitrate allocation
functionality with the defining of a GF group.
Currently the hierarchical structure is off by default. Hence this CL
has no impact on coding performance.
Change-Id: I9e1ddfd877559e99072c23970f7fe103b64ed9ee
Fix mingw builds for x86_32 by updating past:
https://chromium.googlesource.com/libyuv/libyuv/+/8fa02df3c0591754958a50
Pick up upstream fixes for clang 5 builds with --disable-optimizations.
Disable libyuv by default when building for msa. We have not been able
to update libyuv because of build issues with mips. This can be
revisited when we update the mips compiler used in Jenkins.
BUG=webm:1509,libyuv:793,webm:1514,webm:1518
Change-Id: Id0b9947cb5e0aa74f2f74746524ab6ff2d48796f
This CL migrates the bit allocation scheme from libaom and combines the
scheme for hierarchical layer with the updated scheme in libvpx that
uses a modified scheme to calculate the target bitrate per frame.
Change-Id: I63593ed528abd4a6a1a8681abf6c9cf06c7a2ee0
These fail to build with clang on 32 bit with
--disable-optimizations
Upstream libyuv has addressed these and we will get updated
versions on the next roll. At the moment, we don't use
libyuv for copying alpha data and so this is a quick fix.
BUG=webm:1514
Change-Id: I0040c3ae048f8d896c2082deeb2e32070a32c453
for q-index between 150 and 200.
Previously the ML based breakout feature is only supported for q-index
larger than 200.
This only affects speed 1 and 2, resolution under 720p, q-index between
150 and 200, low bit-depth.
Compression performane change is neutral.
Encoding speed gain is up to 30% for speed 1;
up to 20% for speed 2.
Results from encoding city_4cif_30fps:
speed 1, QP=38
before: 37.689 dB, 41007b/f, 2.91 fps
after: 37.687 dB, 40998b/f, 3.46 fps
speed 1, QP=48
before: 35.959 dB, 22106b/f, 3.66 fps
after: 35.950 dB, 22118b/f, 4.83 fps
speed 2, QP=38
before: 37.630 dB, 40999b/f, 4.42 fps
after: 37.633 dB, 41063b/f, 4.63 fps
speed 2, QP=48
before: 35.905 dB, 22177b/f, 4.90 fps
after: 35.889 dB, 22145b/f, 5.92 fps
Change-Id: Ibd4a2f4d7093fb248ab94ddd388cbaa8de2c5ef7
Add encoder control to allow application to insert
spatial layer sync frame. The sync frame disables
temporal prediction for that spatial layer.
This is useful for RTC application to have receiver
start decoding a higher spatial layer, without inserting
a key frame on base spatial layer.
If the layer sync is requested on the base spatial layer
this then force a key frame, otherwise it only disables
the temporal reference for that spatial layer, allowing
temporal prediction to continue for the other layers.
Although the temporal prediction is disabled and reset
on a layer sync frame, the inter-layer prediction for the
sync frame is enabled on INTER frames. So the meaning of
INTER_LAYER_PRED_OFF_NONKEY is modified to mean disable
inter-layer prediction on non-key and non-sync frames.
Added unittest for inserting layer sync frames.
Bump up ABI version.
Change-Id: Id458acc400a77c853551f125c4e7b6d001991f03
Keep denoiser and skin detection disabled since some key functions don't
work with >8 bits source.
Add test for HBD with denoiser and cyclic refresh enabled to make sure
nothing crashes.
BUG=webm:1534
Change-Id: Id61fe1e38ed1768f273870a6bdd5f163aa769fe4
This commit builds up the temporal prediction dependency propagation
within the group of pictures.
Change-Id: Id04cfc0323e6a5c4ac4a570d53e20d1229b3ee11
Compute the coding block partition mode cost as additional rdcost
to the cumulative rate-distortion cost from each coding block. This
changes the coding performance slightly due to the rounding error.
The compression performance change is neutral.
Change-Id: Ibdccae0e79263a0e70af7592a8cb11458d795f8d
Use a linear model to make partition search breakout decisions.
Currently the model is tuned for large quantizers and small resolutions.
So it is only used when q-index is larger than 200 and frame
width/height is smaller than 720. Also it's not yet supported for high
bit depth.
Tested speed 1 and 2 on lowres and midres. Compression performance is
neutral. At low bitrates, encoding speedup is up to 50% for speed 1;
up to 30% for speed 2.
Some sample numbers:
into_tree_480p, speed 1
QP=60 before: 35.228 dB, 3488b/f, 7.78 fps
now: 35.217 dB, 3475b/f, 11.57 fps
QP=50 before: 37.492 dB, 7983b/f, 6.24 fps
now: 37.491 dB, 7974b/f, 7.55 fps
PartyScene_832x480_50, speed 1
QP=60 before: 30.104 dB, 22426b/f, 3.28 fps
now: 30.109 dB, 22410b/f, 4.43 fps
QP=50 before: 33.016 dB, 46984b/f, 2.78 fps
now: 33.018 dB, 46998b/f, 3.35 fps
into_tree_480p, speed 2
QP=60 before: 35.175 dB, 3506b/f, 10.96 fps
now: 35.185 dB, 3510b/f, 13.47 fps
QP=50 before: 37.448 dB, 8016b/f, 9.04 fps
now: 37.459 dB, 8048b/f, 9.81 fps
PartyScene_832x480_50, speed 2
QP=60 before: 30.060 dB, 22537b/f, 4.42 fps
now: 30.061 dB, 22541b/f, 5.38 fps
QP=50 before: 32.923 dB, 47134b/f, 3.85 fps
now: 32.920 dB, 47073b/f, 4.31 fps
Change-Id: I674cba4f027c4c65f7837d5ec9179d6201e6ba86
Support intra prediction mode search to find the best intra mode
cost for temporal dependency model building.
Change-Id: Ie62d6af8d0c9f65dee742876f3af9cdd5e3f1d63
Support the motion compensated prediction search to find the motion
trajectory and hence to build the temporal dependency model.
Change-Id: I861ea85a0d4cc2897cb0dfe2e95378bf7d36209f
clang-6 seems to support it out of box.
E.g. VP9SubtractBlockTest.DISABLED_Speed with the workaround:
[ BENCH ] 4x4 286.5 ms ( ±0.2 ms )
Without:
[ BENCH ] 4x4 215.2 ms ( ±0.9 ms )
Change-Id: I28b3a2cc93c0d72f52f5a48cc06d8ed4ef26913f
Moves the check into a function, check_gcc_avx512_compiles,
that behaves somewhat similarly to check_gcc_machine_options.
Change-Id: I2bef3ddd98e636eef12d9d5e548c43282fac7826
Schedule the frame processing to construct temporal dependency
statistics within a group of pictures. Align the corresponding
reference frames.
Change-Id: I8969f5c335a4a5c2614f4530b636fe13a25a8a98
This CL separates the defining of the GF group structure from the
handling of its bitrate allocation. The encoder performance should stay
unchanged.
Change-Id: Ib77967757702bb4b284034e429d4c41ae86d0838
The model construction would incur 15% slowdown for speed 2. The
speed change on speed 0 is unnoticeable.
The current speed features set up would DISABLE temporal dependency
model for all speed settings.
Change-Id: Ic45dd962f3a54a8f5f0452502dc05e352dc09ca1
The PROCESS16 macro now uses 8-bit lanes instead of 16-bit lanes.
SADTest Speed Test (POWER8 Model 2.1)
16x8 Old VSX time = 16.7 ms, new VSX time = 9.1 ms [1.8x]
16x16 Old VSX time = 15.7 ms, new VSX time = 7.9 ms [2.0x]
16x32 Old VSX time = 14.4 ms, new VSX time = 7.2 ms [2.0x]
32x16 Old VSX time = 14.0 ms, new VSX time = 7.4 ms [1.9x]
32x32 Old VSX time = 13.4 ms, new VSX time = 6.5 ms [2.0x]
32x64 Old VSX time = 12.7 ms, new VSX time = 6.3 ms [2.0x]
64x32 Old VSX time = 12.6 ms, new VSX time = 6.3 ms [2.0x]
64x64 Old VSX time = 12.7 ms, new VSX time = 6.2 ms [2.0x]
Change-Id: I51776f0e428162e78edde8eac47f30ffd2379873
Following are completed in defining GF group structure in firstpass:
1. Remove redundant alt_frame_index;
2. Remove hard coded index value with the variable of frame_index.
Change-Id: I7b56e454559bbf704afc7410ea9832b20ffcd57e
Cast the counter to uint64_t in case it overflows.
The assert was to prevent c[0] * Pfac being overflow beyong unsigned int
since Pfac could be 2^8. Thus c[0] needs to be smaller than 2^24.
In VP9, the assert was removed and c[0] was casted to uint64_t.
Bug: 805277
Change-Id: Ic46a3c5b4af2f267de4e32c1518b64e8d6e9d856
VSX versions of the SAD functions of width 8.
SADTest Speed Test (POWER8 Model 2.1)
8x4 C time = 68.7 ms (±0.3 ms), VSX time = 31.8 ms (±0.1 ms) [2.2x]
8x8 C time = 55.6 ms (±0.3 ms), VSX time = 18.3 ms (±0.1 ms) [3.0x]
8x16 C time = 46.5 ms (±0.1 ms), VSX time = 15.6 ms (±0.1 ms) [3.0x]
Change-Id: I843f3b34e103b72deeade4a939193d8b53cee460
Speed tests are added for the SADTest test suite. These test use the
AbstractBench and print the median run time of SAD operations. Speed
tests are disabled by default.
Change-Id: I5d0957248f9b5b307ae2d757d5f8d4761a1dd712
When golden was the inter-layer reference, a block that selected the golden ref
would not be denoised.
But when golden is used as a second temporal reference then we should denoise
blocks that select the golden reference.
This changes allows for that.
Change-Id: Ifdea2ac88f6a74f73520fedcd7fec2f32c559ec9
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
32x32 C time = 93.1 ms (±0.4 ms), VSX time = 6.5 ms (±0.2 ms) [14.4x]
Change-Id: I7f1fd0fc987af86baf2b74147a25aee811289112
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Change-Id: Ic09bc786c57cc89bba14624064216b52996075eb
When the second (gf) temporal reference is used in SVC:
the reference is refreshed on base TL superframes, and so
the rc->frames_since_golden counter was also only updated on
base TL frames. But this was disabling the golden reference
from being used as a temporal reference for TL > 0 frames
(since frames_since_golden was 0/not updated on TL > 0 frames).
Fix is to copy the update of rc->frames_since_golden to all
upper temporal layers. This allows TL > 0 frames to test the
golden inter mode.
Gain on RTC set: ~2%, ~8% on desktop_vga clip.
Encode time increase ~5-8% on linux, 3SL-3TL run with 1 thread.
For now keep this off for TL > 0 frames in speed features, so
this change does not change current behavior for speed >= 7.
Change-Id: I405708f3f80039ae47bd64ec53e66f92160acd9e
Terminate early and skip neural net model when linear score is already
high enough, which indicates that we should not skip split and
rectangular partitions.
No changes on compression; encoding speed improves slightly.
Change-Id: I4e0995090200eb4889344da905d2f7048673af5f
For the feature of using second temporal reference (when
inter-layer is off): move the buffer_idx assignement and
refresh flag settings further down to vp9_rc_get_svc_params(),
since is_key_frame is set there for every frame/layer.
Otherwise it was using the setting from the previous frame/layer.
This makes the refresh more consistent for both layers for
2 spatial layers case.
Small/negligible change in metrics.
Change-Id: I88279243bc27898448e8891dba38143d936cf6d5
~2x speedup or better.
[ RUN ] C/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 365.1 ms ( ±2.2 ms )
[ BENCH ] 8x4 258.5 ms ( ±0.3 ms )
[ BENCH ] 4x8 202.7 ms ( ±0.2 ms )
[ BENCH ] 8x8 162.2 ms ( ±0.5 ms )
[ BENCH ] 16x8 138.8 ms ( ±0.3 ms )
[ BENCH ] 8x16 121.5 ms ( ±0.4 ms )
[ BENCH ] 16x16 110.2 ms ( ±0.5 ms )
[ BENCH ] 32x16 104.8 ms ( ±0.1 ms )
[ BENCH ] 16x32 32.7 ms ( ±0.1 ms )
[ BENCH ] 32x32 30.0 ms ( ±0.0 ms )
[ BENCH ] 64x32 28.7 ms ( ±0.0 ms )
[ BENCH ] 32x64 20.1 ms ( ±0.0 ms )
[ BENCH ] 64x64 19.3 ms ( ±0.0 ms )
[ RUN ] VSX/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 155.3 ms ( ±0.9 ms )
[ BENCH ] 8x4 99.3 ms ( ±0.4 ms )
[ BENCH ] 4x8 77.2 ms ( ±0.1 ms )
[ BENCH ] 8x8 45.7 ms ( ±0.0 ms )
[ BENCH ] 16x8 34.1 ms ( ±0.0 ms )
[ BENCH ] 8x16 29.5 ms ( ±0.0 ms )
[ BENCH ] 16x16 19.9 ms ( ±0.0 ms )
[ BENCH ] 32x16 15.1 ms ( ±0.0 ms )
[ BENCH ] 16x32 16.7 ms ( ±0.0 ms )
[ BENCH ] 32x32 14.1 ms ( ±0.0 ms )
[ BENCH ] 64x32 12.6 ms ( ±0.0 ms )
[ BENCH ] 32x64 12.0 ms ( ±0.0 ms )
[ BENCH ] 64x64 11.2 ms ( ±0.0 ms )
Change-Id: I89ce12b6475871dc9e8fde84d0b6fe5c420c28c7
Some compiler releases allow the -mavx512f arg without actually
implementing support. Test for this situation, and disable avx512
when it is detected by configure.
BUG=webm:1536
Change-Id: I63952153bb4b24aa9f25267ed47a0fe845d61f8b
When inter-layer prediction is disabled on INTER frames, allow
for next highest resolution to have second temporal reference.
Current code allowed for only top/highest spatial layer.
Change-Id: I102137273e3e4d57512a13d95e8ccb9c5b0a7b4b
For mode where second temporal reference is used in SVC: allow
for using/testing this reference (golden ref) in the variance
partition scheme (choose_partitioning).
Small positive gain (~0.25%) on metrics for 3 layer SVC,
negligible change in speed.
Change-Id: I29b8315da530e60db3d6c90faa8fb178d9f2de26
When inter-layer is disabled on INTER frames, this will allow
use of a second (longer term) temporal reference for SVC.
Only enabled on highest resolution spatial layer.
Average gains of ~4% on RTC set, speed decrease of about ~2%.
Change-Id: I3c2d415653c448eb7269c828e120fe8bb2ef3f97
For the case where a second (long term) temoral reference is
used in the SVC: this additional parameter is to make sure the
buffer slot selected for this reference is available for usage,
i.e., it is never used for any of the 3 references set for the
fixed SVC patterns.
And some code cleanup (replace cpi->svc).
No change in behavior.
Change-Id: Icba46edfbbefb94d5ea8e2d5c24cccd85a406ee6
When resize happens and cyclic refresh is not applied on the
current (resized) frame, the sb_index is not reset and then
might be out of boundary on future frames when the
cyclic refresh is applied.
Change-Id: I05282fc4bc2323522d60e019ed0790d69221a2f7
functions: upper camelcase
members: lowercase with trailing '_'
decl order: functions (overrides marked virtual), members
after:
656e8ac61 VSX version of vpx_post_proc_down_and_across_mb_row
766d875b9 VSX version of vpx_mbpost_proc_ip
35e98a70b VSX version of vpx_mbpost_proc_down
b2898a9ad Bench Class For More Robust Speed Tests
Change-Id: Ib257bd607c5c1248d30e619ec9e8a47cc629825b
Allow for second temporal reference for top spatial layer in SVC,
when inter-layer prediction is disabled on INTER frames.
The second temporal reference is labelled as the golden reference
and the update/refresh of this reference buffer is only on base
temporal layer superframes. For now the period of refresh is
fixed at every 20 TL0 superframes.
Average gain is ~4% on RTC set, several clips up
by ~8-12%. Speed loss is about ~2% on mac.
Feature is disabled as default for now.
Change-Id: I2e5db5052c62dbe958a3b14be97d043823b7a529
Fixed some settings in nonrd pick mode to allow for frame-level bilinear
to be set.
On Galaxy S8+ it has 4% speed up on high motion clips. Almost the same
for low motion.
0.17% quality loss on RTC.
Change-Id: I044a7de020183754ba08bb6c96c5a78ba5c7fea2
Add condition of LAST frame to the consec_zeromv and
avg_frame_low_motion metrics. This is needed for SVC as
the golden reference is a spatial reference and should
not be included in the metric computation.
Small/negligible change in metrics on RTC set.
Change-Id: I6ea16298fae566bb288c34cf50d120b509146eee
Low bit depth version only. Passes the VpxPostProcDownAndAcrossMbRowTest
VpxMbPostProcAcrossIpTest Speed Test (POWER8 Model 2.1)
C time = 121.3 ms (±4.0 ms), VSX time = 9.4 ms (±0.3 ms) [12.9x]
Change-Id: I28300779e197ea3855cf30867d17a2805388b447
Add a neural net model that uses the same features as the existing
linear model. Make the pruning decision based on both the linear
and the neural net model. It provides more accurate predictions,
and may improve compression and/or encoding speed.
This only affects speed 0.
Coding gain:
0.37% on midres
0.34% on hdres
0.50% on jvet8b720p
Encoding speed impact(average over locally tested 20 clips from midres
and hdres):
QP=20: down by 2.5%.
QP=30: down by 3.9%.
QP=40: donw by 4.5%.
QP=50: up by 5.2%.
Change-Id: I402ec799745ad3b74abf0789fa5e124fe64e704d
The avg_frame_low_motion and consec_zeromv are frame-level
metrics that are updated on every frame. For SVC these should be
updated on top spatial layer (full resolution).
Small/negligible change in metrics.
Change-Id: Ibe14f05be3b82daa9dd60378097ff11a27f1b95e
This is a combination of the following 3 reverts. The changes cause
issues on certain hardware devices. We'll pull them for now to allow for
further investigation.
Revert "Experiment regarding playback problems on Bravia TVs."
This reverts commit 624f8105f5.
Revert "Improved slide show coding"
This reverts commit f4091bc30e.
Revert "Improved coding on slide show content."
This reverts commit 2fa333c2ae.
BUG=b/77492144
Change-Id: Ifba937792d644a9286307262f050216408e8ecf4
For CBR mode with aq-mode=3: reduce delta-q for second
segment and limit how much the frame-level q can decreae
from one frame to the next.
Reduces bitrate spikes in slide/sreen content.
Change-Id: Id9ac4b7270f07e09690380755cfbef4aec5c26dc
Low bit depth version only. Passes the VpxMbPostProcAcrossIpTest.
VpxMbPostProcAcrossIpTest Speed Test (POWER8 Model 2.1)
C time = 188.5ms (±0.2ms), VSX time = 65.2ms (±0.1ms) [2.9x]
Change-Id: I1cf72365d94a9d7f1e9323925a87a30e3bd5cfe2
Low bit depth version only. Passes the VpxMbPostProcDownTest.
VpxMbPostProcDownTest Speed Test (POWER8 Model 2.1)
Full calculations:
C time = 195.4 ms, VSX time = 33.7 ms (5.8x)
Change-Id: If1aca7c135de036a1ab7923c0d1e6733bfe27ef7
To make speed testing more robust, the AbstractBench runs the
desired code multiple times and report the median run time with
mean absolute deviation around the median.
To use the AbstractBench, simply add it as a parent to your test
class, and implement the run() method (with the code you want to
benchmark).
Sample output for VP9QuantizeTest
[ BENCH ] Bypass calculations 4x4 165.8 ms ( ±1.0 ms )
[ BENCH ] Full calculations 4x4 165.8 ms ( ±0.9 ms )
[ BENCH ] Bypass calculations 8x8 129.7 ms ( ±0.9 ms )
[ BENCH ] Full calculations 8x8 130.3 ms ( ±1.4 ms )
[ BENCH ] Bypass calculations 16x16 110.3 ms ( ±1.4 ms )
[ BENCH ] Full calculations 16x16 110.1 ms ( ±0.9 ms )
Change-Id: I1dd649754cb8c4c621eee2728198ea6a555f38b3
Move frame dropper to after scene detection and noise estimation.
Scene detection and noise estimation operate on source data and
update metrics along sequence, so they should be moved before
the frame dropper.
Also we don't want to drop on scene change, as the scene detection
and (possible) re-encode step will be missed.
Change-Id: I3d9e16d785bd5ace6707db2abce77ddc110bfef4
For the max_consec_drop parameter in svc frame drop:
since passing value 0 in the control would completely
disable the dropper, only allow for values >= 1 to be set.
Change-Id: I6b74ec9cc08a638fa571d6246a021dab9c811d14
The pointer in vp8 postproc refers to show_frame_mi which is only
updated on show frame. However, when there is a no-show frame which also
changes the size (thus new frame buffers allocated), show_frame_mi is
not updated with new frame buffer memory.
Change the pointer in postproc to mi which is always updated.
Bug: 842265
Change-Id: I33874f2112b39f74562cba528432b5f239e6a7bd
For screen content mode: changes to reduce occurence of
significant QP decrease (from one frame to next),
which can cause large frames (overshoot/delay).
-cap the buffer increase to optimal level for frame drop
mode where full superframe can drop
-reduce the max_adjustment_down due to buffer overflow
-reduce qp threshold to trigger re-encode on large frame
Change-Id: I3e30e4814192b5f728abff3f7359eb64f561b8f0
In vp9_svc_constrain_inter_layer_pred() we disable the
inter_layer prediction if anything but only the previous
spatial layer (from same supeframe) is used for inter_layer
prediction. This check and disabling was only allowed when
the control VP9E_SET_SVC_INTER_LAYER_PRED is set to
INTER_LAYER_PRED_ON_CONSTRAINED.
But the control VP9E_SET_SVC_INTER_LAYER_PRED is needed for setting:
INTER_LAYER_PRED_ON/INTER_LAYER_PRED_OFF/INTER_LAYER_PRED_OFF_NONKEY.
So there is a conflict with setting INTER_LAYER_PRED_ON_CONSTRAINED.
Fix for now is to always allow for this disabling check
(disable inter_layer reference if its not previous spatial layer) as
long as inter_layer prediction is used (i.e., not set to _OFF).
A separate fix if needed may be to invoke another control for setting
INTER_LAYER_PRED_ON_CONSTRAINED.
This was causing an issue with enabling spatial layers on the fly
(say spatial layer 2), where since INTER_LAYER_PRED_ON_CONSTRAINED was
not set (default), the inter_layer prediction was then using a reference
from 2 spatial layers below (spatial layer 0).
Change-Id: Ic6434000665f63aab27c509b5eb7b8fc965827bc
When encoding a given spatial layer and the same spatial layer
on previous superframe was dropped (or disabled due to 0 bitrate),
the lst_fb_idx for current layer is set to the buffer index that
was last updated on TL0 frame (for the same spatial layer).
This condition was to maintain proper temporal prediction pattern
under frame drops, and it should only apply to INTER frames.
But the condition was causing an assert to be triggered on spatial
layers whose base are key frames. Fix is to condition this reset of
lst_fb_idx on the "is_key_frame" flag. Also initialize the
fb_idx_upd_tl0 to -1 and only use it for a given spatial layer
if its been set.
These issues can happen when superframe drop happens just before
a key frame, or when stream starts with lower layers and dynamically
enabled higher spatial layers.
Added datarate unittest the inserts key frame after superframe drop,
and verified that this fix is needed for test to pass.
Also modified the existing DisableEnable spatial layer test to trigger
the issue of using fb_idx_upd_tl0 when it hasn't been set for a
spatial layer.
Change-Id: I059d1135736aca17e1326b9b4a2b16371eb4634e
This patch experimentally reduces the maximum GF interval for
static content such as slide shows.
It does not fully revert the previous slide show patches as this
still allows the codec to code static sections only using GFs
groups rather than ARF groups or a mix of ARF and GF groups.
However, the maximum group length is reduced.
Change-Id: Ia968b608efb9a67d2402b12e979695d58ddc1ad7
Has some effect for SVC on base spatial layers (which only
reference LAST) or on upper spatial layers when inter_layer
prediction is disabled.
Small speedup on Mac of ~1%, for 3 layer SVC with inter-layer
prediction disabled.
Change-Id: I05be5da8843e0d32e9d85f6eb951cf1894e781d8
Keep a lower rate threshold for video case.
Also lower the exiting threshold somewhat for screen-content mode.
Change-Id: I79649a36678d802fd4d4080754fd366e78904214
The compression performance change is +/-0.01% for both speed 0/1.
Locally tested the encoding speed:
ped_1080p 150 frames speed 0
79544 b/f 41.339 dB 503072 ms ->
79566 b/f 41.338 dB 493009 ms.
speed 1
79789 b/f 41.152 dB 104583 ms ->
79770 b/f 41.153 dB 102607 ms
Change-Id: Ief200b613608643e5708cebe979982eb4a84831b
Disable 8x8 blocks for higher resolutions,
reduce mv_thresh for 1/2 subpel motion, and
disable golden reference at superblock level
based on source sad and motion content.
~6% loss in RTC metrics over current speed 9.
Speedup about ~10% for high motion clip on linux.
Change-Id: I7ff8f81ac93ee8a90d5a1f4837c955d000bd75e7
Low bit depth version only. Passes the VP9QuantizeTest.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
Full calculations:
C time = 1456 ms, VSX time = 80 ms (18x)
Change-Id: I1b1d6d03b1aeff63640efbdeb222cab857ddd95e
Enable alt_ref and compound prediction at speed 5.
For 1 pass VBR mode, when lag > 0.
Gain for Live set: ~3% gain on average, several
clips have gains ~5-15%.
Encoder fps decrease ~5-10%, on desktop with 4 threads.
For now enable it only for resolutions <= 1280x720.
Change-Id: I25e3d61a2244a3a01962624052c5adf4837965c7
Set subpel search stop to 2 when motion vector is non zero.
10% speedup on 1 and 2 threads on Samsung Galaxy S8+.
Change-Id: I7323bb913000229cf60a37495bf88bcc51d0ac96
Remove some unused code and add parameter to keep track
of the layer_id of the frame buffer indices last refreshed.
This is useful for verifying constaints on spatial-temporal pattern,
for fixed/non-flexible mode.
Change-Id: I6957bb43157eb31df49dac1b8245facc043e4a49
When the whole superframe is dropped (due to rate control),
don't increment the temporal layer counter.
This is a temporary fix to prevent an issue where temporal
prediction pattern is possibly broken.
Updated svc_datarate tests to handle this case.
Change-Id: Icac44fdc9d0f08a957776c937584db4b2c7927c7
Remove big endian PowerPC 64 from configure, as this build is problematic and
not supported. PowerPC 64 will be limited to little endian (ppc64le).
BUG=webm:1525
BUG=webm:1508
Change-Id: Id6a86d5913192549e03ac8f77879ba7526b752c8
Process 16 coefficients on the first iteration (a full 4x4) and 24 coefficients
on subsequent iteration.
VSX/VP9QuantizeTest.DISABLED_Speed
Before:
4x4 176 ms
8x8 91 ms
16x16 72 ms
After:
4x4 152 ms
8x8 82 ms
16x16 64 ms
Change-Id: I07cb130833504206ccdc5bc12ae5af369364999a
Condition some early exitis in nonrd-pickmode on the
motion vector, to make sure we always test (0, 0) for
inter-layer prediction.
Change-Id: Id0e790ecc75ccfb7031d3e8786ccdd13781d81fe
The warnings give an extra line which is confusing sometimes.
E.g.
Warning: Read invalid frame size (308164564) // This is for frame 5
Warning: Failed to decode frame 5: Invalid parameter
Warning: Read invalid frame size (1936229463) // This is for frame 6
Warning: Failed to decode frame 6: Invalid parameter
Warning: Read invalid frame size (2282536257)
Change-Id: I1753fa32079deca5c8b534c6ca9a527cc9e491e9
Change the limit of frame size in ivf reader used by test to make it
consistent with ivf reader used in vpxdec.
Change-Id: I19ab05adf51eca65322e609efdf4d83ad66af847
If the scale factors are 1 (no scaling), set the threshold
for skipping the inter-layer prediction to 0, so we will
more often test this mode.
Improves quality for upper layers for quality layers
in svc mode.
Change-Id: Iaf848d44f6cc153780db861b76517a4cf9672c45
When the previous frame is dropped, for the current
spatial layer make sure the lst_fb_idx corresponds
to the buffer index last updated on the (last) encoded
TL0 frame(for same spatial layer).
This is needed to preserve the temporal prediction pattern
for fixed/non-flexible mode under frame dropping.
Change-Id: Ifc8e257beb025654a81580c4da0a181235724508
This patch improves coding of slide shows with fade or other
complex transitions.
Previously, fades and other complex transitions between static "slides"
were sometimes being incorrectly marked such that they were coded
as a single static slide rather than two slides with a transition.
As the initial key frame for the first slide is not necessarily a good
predictor of the second slide and ARFs were turned off, this led to a
poor visual and metrics outcome in some such cases.
This patch allows for long GF groups in static sections before and after
a complex transition (instead of just with simple slide transitions) with
one or more normal ARF groups during the transition. It also enforces a
single "normal" length GF group after the transition before any extended
group is allowed. The reason for this is that the ARF that spans the
transition my not have a very high quality and hence may not act as a
good GF for the long static section that follows.
Change-Id: Ica1f979e27d8a0625f3cebf7b7cf6d69edccaba9
When eob is 0, pixel domain distortion is more accurate and efficient.
This mainly affects speed >= 2. Speed 0 always use pixel domain
distortion; speed 1 use it most of the time.
Compression impact(negative means gain):
speed 2 speed 3 speed 4
lowres -0.04% -0.06% -0.06%
midres -0.10% -0.10% -0.20%
hdres -0.01% -0.03% -0.06%
Encoding speed is about neutral.
Change-Id: I77b957658deeaad57381fd13afc11bacdec8c08f
When doing both check_header and check_lib, the check_header call
will already enable pthread_h if the header was found. This was
overlooked when the pthread linking check was amended into a header
check and a separate linking check in 9b7d4cce63.
This brings back the same result as the original check in 38dc27cc6.
Change-Id: I0efb38f5780f7c79e2eb2b14290d6094096ea222
The memset is added to better handle frame drops
with the GET_SVC_REF_FRAME_CONFIG contro
There is an issue with some tests in bypass mode,
so condition it on that for now.
Change-Id: I2635037143f14ff62a36be7c22b2b604a0c1efc2
For fixed (non-flexible) SVC mode.
No change in behavior.
Needed for future change to make Intra-only frame work.
Change-Id: I91e18776e7ef27c9c6fcbc8d5f64764d9cc3d9a9
Key frame updates the slots corresponding to the 3 references
last/golden/altref, but for SVC where more references buffers
may be in use, especialy for dynamically swithing up/down in layers,
make sure we should update all 8 slots on key frame.
Change-Id: Ifcca12608f420d5bae32b92794a3afe9b6369f77
This fixes failures on the datarate tests for
temporal layers with frame dropping.
The memset was only added to better handle frame drops
with the GET_SVC_REF_FRAME_CONFIG control from 43c58df3.
So ok to remove it for now.
Change-Id: I256d9ac4278b93fe6f39b94cce2e458a1a5eff69
Add another level (INTER_LAYER_PRED_ON_CONSTRAINED) to the
inter-layer prediction control. This new level enforces the
condition that a given spatial layer S can only do inter-layer
prediction from the previous spatial layer (S - 1) from the same
time/superframe.
BUG=webm:1526
Change-Id: I0a1ec95b2c220c7b13a9a425d5fb0a8814c23c70
Remove the unneeded vp9_copy_flags_ref_update_idx(cpi),
and initialize the struct parameters needed for the
GET_SVC_REF_FRAME_CONFIG. This init is useful for the case
for spatial layer frame drops.
Change-Id: If89e8349f6246c33720ecbb758d41a932d21e496
If block size is larger than 32x32, search transform size for one level
less than the other blocks.
This mainly affects speed 0 and 1, as speed >= 2 uses largest transform
size(except for keyframes and alt-ref frames).
Compression(negative means gain):
speed 0 speed 1
lowres -0.007% 0.00%
midres 0.023% -0.011%
hdres 0.002% -0.016%
Encoder speed:
Tested on crowd_run_1080p 30 frames
Fixed QP = 30, speed 0: 582.5s -> 564.6s
speed 1: 75.0s -> 73.3s
Change-Id: I46622efafe0e88d502efa1480a5324ead1d1e8d0
Copy ref frame index in SVC struct after set in encoder.
Rename ext_{lst,gld,alt}_fb_idx to {lst,gld,alt}_fb_idx.
Bump up ABI version.
BUG=webm:1527
Change-Id: I06209040cb83d374030f40b79f0b36b0efe9f97d
check_lib can be a stub that always returns true - make sure to
still use check_headers as before 38dc27cc6.
Change-Id: I5d471de56b16c015a0b686fa6c6caefa35bb89b4
Don't allow for changing the perc_refresh with screen-content
mode, as this helps reduce some overshoot for static content.
Change-Id: Idbe1849e7a14ef18fda20bee6dced809f134b7f7
For CBR mode: modify the qp clamping to allow q to respond
faster to overshoot. Can reduce some suprious overshoot events
observed in screen content coding.
Change-Id: I0b3f54b0d1b4086182f834e557a4121950b176d4
Refactor the scene detection for 1 pass cbr to allow the
scene detection to be checked once per superframe (on the base layer),
using the full resolution sources.
If scene change is detected: check for re-encoding due to
large overshoot for all spatial layers withing the superframe.
Add speed feature to control the re-encode step.
Keep the re-encode step on for now.
Small change in nonrd_pickmode to remove the possible skip of golden
reference for SVC, when the high_source_sad is set for the superframe.
Change only affects SVC encoding with screen-content mode enabled.
Change-Id: If4cfb52cb0dd0f0fce1c4214fa8b413f8f803d56
Old vs New
Variance 64x64 time: 1145 ms 797 ms
Variance 64x32 time: 1200 ms 831 ms
Variance 32x32 time: 1228 ms 1135 ms
Variance 32x16 time: 1374 ms 1491 ms
Variance 16x16 time: 1688 ms 1571 ms
sse2 vs avx2
Variance 32x64 time: 1645 ms 957 ms
Variance 16x32 time: 2031 ms 1243 ms
Variance 16x8 time: 3071 ms 2275 ms
Change-Id: I0202a556e4629977d647e219c2e897e1ab6accb2
The setting this_key_frame_forced can lead to large key frame sizes,
not suitable for CBR rate control used for RTC.
Change-Id: Idf6d2bf385d5b1494f4bf783f623b7c202f34e55
Old vs New
Variance 64x64 time: 197 ms 143 ms
Variance 64x32 time: 200 ms 146 ms
Variance 32x64 time: 203 ms 140 ms
Variance 32x32 time: 214 ms 152 ms
Variance 32x16 time: 243 ms 153 ms
Variance 16x32 time: 234 ms 197 ms
Variance 16x16 time: 205 ms 205 ms
Variance 16x8 time: 228 ms 222 ms
Variance 8x16 time: 228 ms 232 ms
Variance 8x8 time: 282 ms 240 ms
Variance 8x4 time: 506 ms 341 ms
Variance 4x8 time: 518 ms 415 ms
Variance 4x4 time: 604 ms 628 ms
Observed vp9 encoder speed up when encoding a 720p video.
Change-Id: Iebb98f3b3d8adbc11a733a529d8427ce3d2a5314
This avoids enabling pthreads if only pthreads-w32 is available.
pthreads-w32 provides pthread.h but has a link library with a
different name (libpthreadGC2.a).
Generally, always using win32 threads when on windows would be
sensible.
However, libstdc++ can be configured to use pthreads (winpthreads), and
in these cases, standard C++ headers can pollute the namespace with
pthreads declarations, which break the win32 threads headers that
declare similar symbols - leading us to prefer pthreads on windows
whenever available (see d167a1ae and bug 1132).
Change-Id: Icd668ccdaf3aeabb7fa4e713e040ef3d67546f00
libvpx only emits:
VPX_IMG_FMT_{I420,I422,I440,I444,I42016,I42216,I44016,I44416}
and additionally supports YV12 as input.
interleaved yuv, rgb and alpha formats are unused.
Change-Id: Ie2ab1099e950c6e696f475d46882f5c47a174042
Switch the order of constrained and layer drop mode,
and keep constrained_layer_drop as the default.
Update the svc datarate tests.
Change-Id: I764270f7b4964b87b0cd3da6c2f96a628f212a30
The control is set by log2 of number of threads (such that the number of
tiles is the same of number of threads).
Thus it should be log2(num_threads) instead of (num_threads >> 1).
Change-Id: I2ccec5557e660048dad3e561534e1c74fc8eec1f
For spatial layers whose base is a key frame, i.e., when
svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame = 1,
allow for hybrid search, similar to what we do on key frames.
For small blocks (<= 8x8) rd-based intra search will be used,
otherwise non-rd pick mode is used.
Feature is controlled by nonrd_keyframe, which is set to 1
for now on non-base spatial layers, so this change has
currently no effect.
Small change only when inter-layer prediction is off, as we now
call vp9_pick_intra_mode instead of vp9_pick_inter_mode on key frame.
But this change is very small/insignificant.
Change-Id: I5372470f720812926ebbe6c4ce68c04336ce0bdd
This reverts commit 5cc8df5bcf.
Reason for revert: <INSERT REASONING HERE>
We need to do this on all key frames in the stream (not just the first one). Will make another cleaner change for this.
Original change's description:
> vp9-svc: Fix to first superframe when inter_layer is off.
>
> When the application selects the setting INTER_LAYER_PRED_OFF
> each spatial stream should be decodeable separately.
> For this we need to force key frames on all spatial layers
> on the first superframe.
>
> In order to maintain the quality at the beginning of the stream
> the active_worst for spatial layer of the second superframe is set
> to the last_QP of the correspondng spatial layer of the first superframe.
> Also make sure nonrd_keyframe is set for non-base spatial layers.
>
> Change only affects SVC mode wit number_spatial_layers > 1 and
> svc->disable_inter_layer_pred == INTER_LAYER_PRED_OFF.
> And only affects first and second frame of sequence.
>
> Change-Id: I8ee9a0873ab1d3a02515774571f719617771ad41
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: If73d9f3932224fc6751e773763adf7e8ee67d17f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
When the application selects the setting INTER_LAYER_PRED_OFF
each spatial stream should be decodeable separately.
For this we need to force key frames on all spatial layers
on the first superframe.
In order to maintain the quality at the beginning of the stream
the active_worst for spatial layer of the second superframe is set
to the last_QP of the correspondng spatial layer of the first superframe.
Also make sure nonrd_keyframe is set for non-base spatial layers.
Change only affects SVC mode wit number_spatial_layers > 1 and
svc->disable_inter_layer_pred == INTER_LAYER_PRED_OFF.
And only affects first and second frame of sequence.
Change-Id: I8ee9a0873ab1d3a02515774571f719617771ad41
Cyclic refresh is disabled on key frames, but we did not
disable it for for spatial layers whose base is a key frame
(i.e., on a key-superframe).
This fix means generally somewhat lower frame-level QP will be
used for those spatial layers whose base is a key frame,
which will generally mean little better quality for the
key-superframes.
Change-Id: Idf090651aa2f5856fb6696c89198a9f6d5d50280
Generating file lists on a non-mac with:
--target=x86-iphonsimulator-gcc --enable-external-build
the lack of xcrun would cause a warning to print:
libvpx/build/make/configure.sh: line 1397: [: : integer expression expected
Change-Id: I4623b6c5b65296bc71986cd042823f4be9427b42
In the SVC encoder LAST ref frame should be the last temporal
reference at the same resolution. This is the case for the default/fixed
patterns, but may not be the case for arbitrary pattern in flexible mode.
Add check that the LAST reference frame has same resolution as the current frame.
If the reference scale for LAST is different from current treat the current
frame as key frame just for the purpose of superblock partitioning.
This avoids potential segfault in vp9_int_pro_motion_estimation() for different
scaled reference.
Change-Id: I4276ff616de46cd4e12c73316f85ae313f170242
Previously we attempted to convert 411 input. Remove support
because malformed 411 input can cause the conversion to crash.
BUG=webm:1386
Change-Id: I3d41465a94867ee7f8eaa43fb76beb41f8fa644b
When writing out stream for spatial layer N,
make sure to include all spatial layers up to N.
Fixes an issue with the streams when frame dropping occurs.
Change-Id: I1e20b7dac6b94dcda751043541dd8a12f7df6d8c
in BasicRateTargetingVBRLagZero and
BasicRateTargetingVBRLagNonZeroFrameParDecOff after:
e0b28ad69 Add extra case to wq_err_divisor()
BUG=webm:1512
Change-Id: Id181613cc191ff2a2281deffe141efb982501edf
As we add more tests to datarate_test.cc, it's growing bigger and hard
to find specific test.
Split it to vp8, vp9 and svc ones.
Change-Id: Ie8c302010cf304a95554bee19d87ddc90498d0fb
source tools/set_analyzer_env.sh <sanitizer>
will set the compiler, flag, and sanitizer variables necessary to build
and run a variety of sanitizers.
Change-Id: I5dd2ae947cb337d5ccf2a11e9fe87991bc8ba0c8
Add extra case for 360P and smaller.
This hurts a little in psnr for the derf cif set but helps a little
in terms of average rate accuracy. Most clips come in a little
smaller with this patch.
No impact on larger formats.
Change-Id: I5056246cb53b90f961ff9ea5813937f33778aa4c
For the fixed/default SVC patterns, GOLDEN is the
spatial reference, except on key frames, where LAST
is labeled as the spatial reference.
The current code was assuming GOLDEN is always the
spatial reference for the purpose of selecting the
subpel motion (due to the downsampling filter).
Fix is make sure flag_svc_subpel is set and used
with spatial_ref, which is labeled as the proper
spatial reference before entering mode check.
Some quality improvement on key frames.
Change-Id: Id236bcd47055b035731cc910ed84449d7e29f50c
In the constrained framedrop mode for svc: modify the buffer check
condition relative to (non-zero) dropmark to include uppper spatial layers,
in addition to the current spatial layer.
But keep the single layer check if the buffer goes below zero, since
in this case (buffer underflow) we should force drop of that layer
regardless of upper layers.
Change-Id: Id277f0b4a3ae6275effdd5f5f0c80e3229c17424
To reduce the memcpy() cycles in vp9_rd_pick_inter_mode_sb().
The maximum value of mode_map is (MAX_MODES - 1) = 29.
Change-Id: I5704bd66838ea0b075f0afb001f5cbebfd3f1602
googletest imports tuple into testing to allow for compatibility across
c++ versions where tuple may be in std::tr1 or std. fixes deprecation
warnings under visual studio 2017
Change-Id: Id78b372d5478b12d8c8f63fd3f2166fec25aa8be
Add verfication for constrained svc framedrop mode: check that
if a given spatial is dropped, all uppper layers must be dropped.
Change-Id: I9b4821b23c95d1d9d0c031a41af19984647ec5dc
Add the logic for the constrained framdrop mode for SVC.
Add test case in datarate unittests.
Also lower target bitrates in the tests to better test
frame dropper.
Change-Id: I8ee1b8cb56d835c233ad1fbe0fc1456cb2e7291f
Add encoder control to set the frame drop thresholds per
spatial layer, and add a frame drop mode: 0 = per-layer drop,
and 1 = constrained drop mode (a drop on a given layer forces
drops to all upper layers).
Default is mode 0 (per-layer dropping).
Implementation for mode 1 will come in subsequent change.
If the control is not used, then the spatial layer frame
drop thresholds (water mark) are all equal and set to the value
given by the encoder config (oxcf->drop_frames_water_mark).
Bump up the ABI version.
Change-Id: Id038d4181b86fa98b3d44d026f96d5f344d81629
Clears a warning when generating VS project files with older versions of
bash:
declare: -n: invalid option
Change-Id: Id0c0bc17dc5a1599f7d2d73e3cc9259a45540f3f
Even on x86_64, emms has to be called if the x87 state has
been clobbered - the calling code (either within libvpx or
in a caller outside of libvpx) may be using the x87 instructions,
even though use of them isn't all that common on x86_64.
This fixes builds with clang for mingw/x86_64.
Change-Id: I1f6072835590b862bad156f17331ba65c813ddd9
* changes:
configure: Add an arm64-win64-gcc target
test: Check for ARCH_X86_64 in addition to _WIN64
configure: Add an armv7-win32-gcc target
ads2gas: Add a -noelf option
This reverts commit 60a3cb9ad8.
Reason for revert: x87 instruction usage might not be as
clear cut as I would like. At the very least, llvm mingw
builds appear to having issues with emms.
Original change's description:
> remove fldcw/fstcw from Win64 builds
>
> _MCW_PC (Precision control) is not supported on x64:
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2
>
> The x87 FPU is not used on Win64 or ARM so setting the x87 control word
> is not necessary. The SSE/SSE2 and ARM FPUs don't have a precision
> control - the precision is embedded in each instruction - so the need to
> set the control word is also gone.
BUG=webm:1500
Change-Id: I25bcfa96bc9c860f6c7e03315d75fa6fd1d88ec5
_MCW_PC (Precision control) is not supported on x64:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2
The x87 FPU is not used on Win64 or ARM so setting the x87 control word
is not necessary. The SSE/SSE2 and ARM FPUs don't have a precision
control - the precision is embedded in each instruction - so the need to
set the control word is also gone.
BUG=webm:1500
Change-Id: I014513282a7dc320d1cdeaec48249d98a66bf09f
This configuration doesn't require any extra custom settings, since
it only uses neon intrinsics that are handled automatically by the
compiler (no external assembly).
Change-Id: I35415c68f483a430c0672e060a7bbd09a3469512
This builds for windows on arm, with llvm-mingw. The target triplet
is named -gcc since that's how similar existing targets are named,
even though it technically runs clang (via frontends named
"$CROSS-gcc").
Assemble using $CC -c since there's no standalone assembler
available (except perhaps llvm-mc).
Change-Id: I2c9a319730afef73f811bad79f488dcdc244ab0d
adaptive_rd_threshold_mt is set to 1 when speed >= 7 for SVC.
QVGA in SVC uses speed 5 which set adaptive_rd_threshold_mt to 0.
If VGA or HD is dropped for the last super frame, the flag is still 0
when the encoder is destroyed. Thus memory won't be released.
Change the bitrate threshold in datarate test.
Change-Id: I55352cc0b030568d38eb735d99c2fa29058d3690
Compiler -- gcc (Debian 7.3.0-5) 7.3.0
./libvpx/vp9/encoder/vp9_denoiser.c:374:9: assuming signed overflow
does not occur when assuming that (X + c) < X is always false
[-Wstrict-overflow]
for (j = 0; j < xmis; j++) {
Change-Id: Ib7397e718ff717bdabc088fc4c6e1771381fb522
Add VP9E_SET_SVC_INTER_LAYER_PRED to disable inter layer (spatial)
prediction.
0: prediction on
1: prediction off for all frames
2: prediction off for non key frames
Bump up ABI version.
Change-Id: I5ab2a96b47e6bef202290fe726bed5f99bd4951f
SVC frame dropper: modify the logic to allow for individual
spatial layers to drop. This removes the constraint that all
upper spatial layers must drop when a given spatial layer drops.
Add a flag to the pkt to indicate whether a spatial layer is
encoded or dropped. This is needed for applications that enable
this feature (frame dropping for SVC).
For a current spatial layer, if its previous spatial layer is
dropped, then disable certain features for that layer:
inter-layer prediction, base_mv, partition_reuse, copy partition.
Also add the constraint to never drop a spatial layer if its
base layer is a key frame.
Updates to sample encoder (vp9_spatial_svc_encoder) and the
SVC datarate unittests to properly handle frame dropping.
Bump up ABI version.
Change-Id: I7d14ccf67b8d014a7abfce5ba3989fc623e94067
Only target 32bit builds. Visual Studio does not define _mm_empty for
64bit configurations.
Rename emms.asm and remove from 32 bit builds to avoid empty file
warnings.
Don't check register state on 64bit builds.
BUG=webm:1500
This reverts commit 60beb781c1.
Change-Id: I5ac4cf6c67249ff24f7da19792144de20527bfce
avoids potential OOM when allocating 3 buffers for 16383x16383; 3840 is
used as a replacement
this test was missed in:
215bddf32 vpx_scale_test: reduce max size for 32-bit targets
Change-Id: I515adf5999c6ef1724394ccd62d677134bd35e6d
Adjustment to initial active based on image size.
Add extra breakout case for kf boost loop.
Small adjustment to q delta calculation for key frames.
Net % improvements for all standard tests sets (-ve values) measured
using c-bvr mode.
(Overall PSNR, SSIM, PSNR-HVS)
Low Res: -0.223 -0.229 -0.107
Mid Res: -0.175 0.008 -0.180
High Res: -0.131 0.106 -0.206
NFlix 2K: -0.390 -0.271 -0.489
NFlix 4K: -1.370 -0.825 -1.590
Change-Id: I06a39de43594e1a99bb0cb281af15cdb8058a8ed
If a given spatial layer decides to drop, due to the
buffer/overshoot conditions for that layer, then drop
that current spatial layer and all spatial layers above.
In the current implementation the svc frame counter
(and hence the pattern for the non-flexible SVC case)
are updated on frame drops.
Also add last spatial layer encoded to the pkt.
This is useful for RTC applications that enable
frame dropping for SVC.
Update to the SVC datarate tests:
enabled frame dropper on all SVC datarate tests, and
made a fix to properly set the temporal_layer_id, which
works now even on frame drops.
Change-Id: If828c193f3cb6b1839803fd52fe9fbbda5b5a039
This reverts commit 13d0955b25.
Reason for revert:
this should be investigated further to ensure the memset is really
necessary outside of the static analysis pass.
Original change's description:
> vp9_loopfilter.c: zero lfl_uv
>
> The initialization depends on cm and mi_row which static
> analysis does not approve of.
>
> Clears a static analysis warning:
> warning: The right operand of '+' is a garbage value
> const loop_filter_thresh *lfi = lfthr + *lfl;
>
> Change-Id: I8c863ced2b1e9a7e10103b7281098f20941a6ca2
TBR=johannkoenig@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Icadb6438fbcddba747622f06f2eadebdb333edf6
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Fix a bug when middle and top spatial layer are skip encoded
(disabled) and then re-enabled again, during the sequence.
Issue is that pending_frame_count in the packing may
be incremented on middle layer, even though that layer is skipped
(not encoded and hence zero size). Fix is to add size check.
Modified existing unitest to reproduce the issue.
Change-Id: I86d806a112d468e06b04fbf7c46ae07db9e0ad93
The initialization depends on cm and mi_row which static
analysis does not approve of.
Clears a static analysis warning:
warning: The right operand of '+' is a garbage value
const loop_filter_thresh *lfi = lfthr + *lfl;
Change-Id: I8c863ced2b1e9a7e10103b7281098f20941a6ca2
These values are not consistently set before calling update_best_mode.
In vp9_rdopt.c they are individual values instead of a struct and are
zero'd at declaration.
Clears a static analysis warning:
warning: The right operand of '-' is a garbage value
RDCOST(x->rdmult, x->rddiv, (rd->rate2 - rd->rate_uv - other_cost),
warning: The right operand of '-' is a garbage value
(rd->distortion2 - rd->distortion_uv));
Change-Id: I19895d062e7c0ac67937126ebc5dcb0afd3a2931
The loop appears to set map[i] with the intention of running
the 'j' loop up to that point. However, without zero'ing map[]
first the behavior is unpredictable.
Fixes a static analysis warning:
warning: Branch condition evaluates to a garbage value
for (j = 0; j < 4 && map[j]; ++j) {
Change-Id: Ifa39353d8aa5cc47b467a7d3d8cdd3b5319fd997
These values are set in main() from user input. Ensure
they are cleared out first.
Clears a static analysis warning:
warning: The right operand of '*' is a garbage value
1000.0 * rc->layer_target_bitrate[0] / rc->layer_framerate[0];
Change-Id: I09bd209be5aff31b87597a24d37a9673fa99381b
This reverts commit 118a57045b.
Reason for revert: Fails on Visual Studio builds:
vpxmdd.lib(vpx_ports_emms_mmx.obj) : error LNK2019: unresolved
external symbol _m_empty referenced in function
vpx_clear_system_state
Original change's description:
> use intrinsics for 'emms'
>
> BUG=webm:1500
>
> Change-Id: I3235d8c2abc01dd3a35e14c5cbcfe20283ff8fb2
Change-Id: Ia9c40bc103c57cced83353249c55218eaf2f0b0c
Static analysis does not recognize that output_rc_stat guards
the usage of window_size. Clears this warning:
The right operand of '>' is a garbage value
if (frame_cnt > (unsigned int)rc.window_size) {
set_rate_control_stats sets window_size to 15. Zeroing it
just introduces another static analysis warning.
Change-Id: Ieee7b81a385f986e42189101cfa39279e519b368
This should be taken care of by parse_superframe_index but
the static analysis is not recognizing it because it depends
on 'marker' which is read from the bitstream.
Clears a static analysis warning:
The right operand of '*' is a garbage value
rc.layer_encoding_bitrate[layer] += 8.0 * sizes[sl];
Change-Id: I8ee48a98f907bc7b46869fd27a351f33e2e7de71
Print error messages as they are encountered. This was the default
behavior.
Removes a static analysis warning regarding the use of strncat:
Null pointer argument in call to string length function
As this is the only use of strncat in the library, remove it and the
associated public function.
Change-Id: Id55305c5a4d65f11da88c3a2203ff824200f526f
sl was passed to set_frame_flags_bypass_mode, triggering
an uninitialized variable warning. Inside the function it
is only used as a local variable.
Change-Id: If743626e9e10fd41d135e3b4ad6196dc4dc90172
When an enhancement spatial layer is skipped, we should check
for updating the layer frame counters.
Change-Id: Ib79d0955c62fb465f59ef2f9ac45240ae2614d7b
This causes assert to trigger in choose_partitioning().
This can happen in some cases when enhancement layers
are enabled midway during the stream.
Change-Id: I69c3c8b4b1e3f1c7d8d7294d633ca5ddca148e8b
The largest frame is currently in choose_partitioning:
warning: stack frame size of 44156 bytes in function 'choose_partitioning'
but adding HBD amplifies other things:
warning: stack frame size of 51480 bytes in function 'dec_build_inter_predictors'
Add some padding for sanitizer and variances between compilers.
BUG=webm:1498
Change-Id: I0d94d4f94d25dafafca9d7484881c2ce5f8de371
The file contains sse2 implementations related to various block error
functions. Update the .mk file to include it only when sse2 is
requested.
BUG=webm:1500
Change-Id: I67b766faed425fd7a96db8541b13c69670b65fec
For SVC, if any of the layer scale ratios are not
2x2, then disable the partiton_reuse, which assumes
2x2 scaling between layers.
Change-Id: I8b3163de0826052bbb1bfe03554a074c89510558
Set phase_shift = 0 if the scale factors are
above 3/4. Removes artifact for scale factors
close to 1.
phase_shift = 8 is to get an averaging filter
(decimated pixel aligns to 8/16, midway between source pixels),
and only makes sense for scale factors multiples of
2 (1/2, 1/4,...).
Removes artifact for high scaling ratios.
Change-Id: Id0a85869d6c6156dda0032c697ded2de78fad6bd
This change is targeted mainly at higher resolutions where typically
the average error per MB is much smaller. hence this patch replaces
a fixed error per MB factor with a tiered value.
It also adds in a fixed offset value that acts as a minimum return score.
Note also minor fix to debug stats output.
The results are overall beneficial (-ve) on our test sets, most notably for
higher definition formats (see below - overall psnr, ssim, psnr hvs)
low res: 0.184 -0.262 -0.166
mid res: 0.094 0.075 0.049
hd res: -0.752 -0.300 -0.800
NF 2K: -0.353 1.095 -0.302
NF 4K: -1.245 -0.578 -1.205
The most notable negative case is pierseaside 2K which appears to be worse by
8-10% (which has a big impact on the overall gain for the NF 2K set). Closer
inspection reveals that the drop does not relate to the key frame boost
per se as in both cases the key frame substantially undershoots its target. Rather
this is a side effect relating to the initial Q range allowed for the key frame and
a poor initial complexity estimate. This will hopefully be improved in a later
patch.
Change-Id: I4773ebe554782f4024c047c3c392c763a3fe843b
Changes in the function size in bytes (in lieu of performance metrics)
Before After Diff
vpx_fdct32x32_avx2 29564 -> 28334 -1230
vpx_fdct32x32_sse2 38053 -> 36309 -1744
Change-Id: Ie0b3e6ed7c3f2e9ea45f9d6a1ce1e27d068cee6b
Extended ROI struct suitable for VP9.
ROI input from user is passed into internal struct and applied on every frame
(except key frame).
Enabled usage of all 4 VP9 segment features (delta_qp, delta_lf, skip,
ref_frame) via the ROI map input.
Made changes to nonrd_pickmode for the ref_frame feature.
Only works for realtime speed >= 5.
AQ_MODE needs to be turned off for ROI to take effect.
Change example in the sample encoder: vpx_temporal_svc_encoder.c to be suitable
for VP9.
Add datarate test.
Bump up ABI version.
BUG=webm:1470
Change-Id: I663b8c89862328646f4cc6119752b66efc5dc9ac
This reverts commit 4e5b4b5848.
Reason for revert: Commit message inaccurate.
Original change's description:
> Add ROI support for VP9.
>
> Extended ROI struct suitable for VP9.
> ROI input from user is passed into internal struct and applied on every frame
> (except key frame).
>
> Enabled usage of all 4 VP9 segment features (delta_qp, delta_lf, skip,
> ref_frame) via the ROI map input.
> Made changes to nonrd_pickmode for the ref_frame feature.
>
> Only works for realtime speed >= 5.
> AQ_MODE needs to be turned off for ROI to take effect.
>
> Change example in the sample encoder: vpx_temporal_svc_encoder.c to be suitable
> for VP9.
> Add datarate test.
>
> Bump up ABI version.
>
> BUG=webm:1470
>
> Change-Id: I7e0cf6890649adb98a5fda2efb6ae1fa511c7fc9
TBR=yaowu@google.com,jzern@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: I000dbd81e0c67cb8a0dcde4013ee9bf7afb038f0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: webm:1470
This patch adds in detection of slide show key frame groups.
The detection assumes extremely low or 0 motion for all frames
in the key frame group.
If this case is detected the boost level is set to a very high value
and the min Q to a lower value for the key frame itself.
Alt refs and golden frames are disabled to save bits (up to a limiting
maximum interval currently set to 240 frames).
In test samples that I created, this patch gave rise to a substantial
improvement in overall psnr and a drop in data rate. In some cases the
average psnr fell, however, with the boost and minQ values set as they are.
This is to be expected because previously a relatively poor key frame
could be followed by progressively better alt refs. For example a key
frame at q7.5 but subsequent alt refs improving it to lossless. Given that
average psnr tends to be dominated by the best frames, a ramp like this
from q7.5 to lossless may give a better average psnr than, for example,
coding the entire sequence at q2.5. Overall psnr, however, will be much
better in the latter case. The option exists to boost the key frame further
which would insure much better results for all metrics, but at the expense
of smaller bitrate savings. Given that these samples tend to have very
good quality anyway this seems like a bad trade off.
For slides displayed for several seconds, bitrate savings of >= 20% are likely
and much larger gains are possible in some cases.
Change-Id: Ib4b61e153c55d3f2f561153da13fdb56f397a52b
Extended ROI struct suitable for VP9.
ROI input from user is passed into internal struct and applied on every frame
(except key frame).
Enabled usage of all 4 VP9 segment features (delta_qp, delta_lf, skip,
ref_frame) via the ROI map input.
Made changes to nonrd_pickmode for the ref_frame feature.
Only works for realtime speed >= 5.
AQ_MODE needs to be turned off for ROI to take effect.
Change example in the sample encoder: vpx_temporal_svc_encoder.c to be suitable
for VP9.
Add datarate test.
Bump up ABI version.
BUG=webm:1470
Change-Id: I7e0cf6890649adb98a5fda2efb6ae1fa511c7fc9
This value was originally set in response to requests from the hardware
team before levels were properly defined for VP9.
Even if a level is not specified for an encode, it imposes a maximum
frame size for videos of dimensions <= 1080P. For larger formats the
limit was set at 250 bits per MB.
This patch modifies the limit to be more in line with the requirements
specified for level 4 (max rate for a 4 frame group of 16 Mbits). If a lower
level is specified at encode time and this mandates a smaller maximum frame
size then the level requirement will still take precedence.
Increasing this value allows for some slide shows or very low motion clips
to code a better quality key frame.
Change-Id: Ic08e0e09c8a918077152190c59732b9a1c049787
The stats input pointer, when passed in, already points to the
frame after the golden frame so should not be advanced here.
This fix has a small mostly positive effect on results in our test sets
(tested using corpus vbr settings) and gives a gain of almost 0.5%
in overall psnr (plus slightly smaller gains on other metrics) for the
4K set.
The bug also caused a crash in calculate_group_score() in another
patch which allows coding of slides in a slide show as a single
long KF group without ARFs or GFs.
Change-Id: I57a3a24baf442ce55dbc91fba05e056697c63a6f
For encoding with --enable-multi-res-encoding, with 1 layer, when the
target bitrate is set 0, under these conditions null pointer
will be de-referenced. Fix is to check
cpi->oxcf.mr_total_resolutions > 1. Also added NULL pointer check.
This issue causes crash for asan build in chromium clusterfuzz.
BUG=805863
Change-Id: I9cd25af631395bc9fede3a12fb68af4021eb15f8
this allows the test to be sharded more efficiently and speeds up the
run when built with slower configs, e.g., asan.
Change-Id: If6d863b76871e3934704a1079bbf17f4886932c7
scaled_temp frame is used as an intermediate buffer for
2 stage down-sampling: two stages of 1/2 down-sampling
for a target of 1/4x1/4. This is used in 3 layer SVC
to avoid duplicate frame downsampling (on middle layer).
As this allocation is only needed/used when the
number_spatial_layers > 2, add this condition to avoid
unneeded allocation for 1 and 2 spatial SVC.
Change-Id: If342466644f685c1ea3ca5344b581793e5136c09
For 3 spatial layers with 1/2 downsampling, the
downsampling filter for the middle layer was not
set for the very first frame, so it was defaulting
to the subsample filter (no averaging/phase = 0).
Its not set due to the two stage scaling that is
done for 1/4 on base layer, during which the intermediate
1/2 result is saved for the middle layer.
Fix for now is to set the default downsampling filter
to Bilinear (averaging/non-zero phase) for all layers on
init (vp9_init_layer_context):.
Change-Id: Ic7407810b34c621e7e7420682508d45478bdffcf
Eliminate false positives in previous patch.
The previous patch did a good job of detecting slide transitions but
in discussions a number of situations were identified that might trigger
harmful false positives. This risk seems to be born out by some testing
on a wider YT set done by yclin@.
This patch adds an additional clause that requires that the best case
inter and intra error for the frame are very similar,meaning it is almost
as easy to code a key frame as an inter frame. This will certainly prevent
the false positive conditions that Jim and I discussed and even if one
does occur it should not be very damaging.
The down side is that this clause may mean that we still miss some
real slide transitions, especially if the images are small and similar. If this
proves to be the case then some further adjustment of the threshold may be
required. However, in the specific problem sample provided we do trap every
transition correctly.
Change-Id: I7e5e79e52dc09bc47917565bf00cc44e5cddd44c
Only affects 2 temporal layer case.
Modified the flags for 2 temporal layers to make
top layer (top spatial, top temporal) a non-reference
frame, conistent with the 3 TL case.
Add mismatch check to the datarate test of changing
svc pattern on the fly, which is test for 2 temporal
layers.
This re-applies the change: 254e2f5501,
that was reverted in: 658eb1d675.
Change-Id: Ib5fd4a7a0312c0c05329ae75baac480af34b4694
This reverts commit 254e2f5501.
Reason for revert: <INSERT REASONING HERE>
Original change's description:
> vp9 svc: fix to make top layer frame non-ref
>
> Add mismatch check to the datarate test of changing svc pattern on the
> fly.
>
> Change-Id: I6a878736de44e6a40c077ed6430aabd7fadabdd9
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Ibcb600438098f8dc380fe7e1de90cb81fc367468
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
In the process of fixing a ubsan warning:
commit 738b829b8c
Fix incorrect size reading
the inferred check of start < end was removed. This causes fuzzed files
to get a little further and segfault in vp8dx_start_decode.
Change-Id: I316e23058753ba42dbcc46d27eb575f51c8a9e9a
When compiling an app using libvpx in Xcode 9.2, a warning is
thrown in vpx_frame_buffer.h:
"Parameter 'new_size' not found in the function declaration"
Switching it to 'min_size' to match the comment text and the
callback type definition prototype resolves it.
Change-Id: I7a3e4a857c2007c2d0d390e22054d7bc85068aa1
The following changes were not carried back from the release branch:
commit f87a4594fb
Revert "Add frame width & height to frame pkt. Add test."
commit c5dc3373db
work around pic issue with gcc 6
BUG=webm:1490
Change-Id: Id3e15983d5565680c05a0c454544003a615a4d7f
Cherry pick from vp9:
commit 85770264ac
Guard against incorrect size values moving *data past data_end.
Check read length against the difference of the buffers.
Change-Id: I5e8679ddd447c4d73deb80be5ec94841a92c5fcd
For SVC, on spatial enhancement layer, intra
search was disabled unless best reference frame
is golden (i.e., spatial/inter-layer prediction),
except for some other conditions (lower layer is key
or golden is not an allowed reference).
Fix is to add the base temporal layer condition,
so intra search will not be force-disabled for base
temporal layer frames.
This improves metrics (-1-2%) for SVC 3 and 2 layer config.
Some small encode time is expected, but since condition
only affect base temporal layers (i.e., every 4 frames
for 3 layers), increase is small.
Change-Id: I10b824faef99560dfdeeb02ba8bf8e3e1eea6255
In nonrd-pickmode: the golden/spatial reference for inter-layer
prediction may be skipped in the mode testing. Add QP dependency
to reduce the threshold for skipping (i.e., check it more often)
at high QP, if the lower layer was encoded at lower QP relative
to the current layer.
At high QP, a better quality lower resolution is more likely to
provide good spatial (inter-layer) prediction.
avgPSNR/SSIM metrics up by ~1% (all clips positive gain or neutral).
Some decrease in encode time (~1-2%) expected at lower bitrates,
for 3 layer SVC.
Change-Id: I9ee0f62d4b10d4ebd30165d378ecfa4399ae5ef1
warning: Tag `XML_SCHEMA' at line 941 of file `doxyfile' has become obsolete.
warning: Tag `XML_DTD' at line 947 of file `doxyfile' has become obsolete.
Change-Id: I85e39c4fb154569b8d7f68bdf362408983e9bd4f
Remove an adjustment to two cyclic refresh (aq-mode= 3)
parameters for SVC. The adjustment was to reduce the
delta-qp on second segment, and reduce the motion threshold.
This was done early on in the SVC encoder development,
in the latest codebase removing this adjustment yields
some improvements in metrics.
The avgPSNR/SSIM metrics increase on average by ~1%
(most clip positive gain), for 3 and 2 layer SVC.
Change-Id: I7a4d5114f16b2a1df383dbe6b3fe02940e29e6cc
vp9 does not support multi-res encoding, the request should not crash.
+ encode_api_test: unconditionally expose multi-res test
vpx_codec_enc_init_multi should fail independent of
CONFIG_MULTI_RES_ENCODING if not for the same reason.
Change-Id: I44fc58ef70ee4e0e482cb6a5736885f4cb2a8517
(cherry picked from commit 004fb91416)
Fix is from the patch in the issue.
Release memories allocated before early exit.
BUG=webm:1482
Change-Id: I64952af99c58241496e03fa55da09fd129a07c77
(cherry picked from commit 5b6ae020b6)
vp9 does not support multi-res encoding, the request should not crash.
+ encode_api_test: unconditionally expose multi-res test
vpx_codec_enc_init_multi should fail independent of
CONFIG_MULTI_RES_ENCODING if not for the same reason.
Change-Id: I44fc58ef70ee4e0e482cb6a5736885f4cb2a8517
In commit 577d4fa79, int8_t was used to replace char. This will result in a
compilation error, for int8_t was typedefined to signed char, but not char.
Change-Id: I5c9837e01b0b58688a7741f5c9a99a76ca887e4a
Remove trailing commas to keep multiple elements on one line.
Add blank lines to prevent comments from being treated as blocks.
clang-format guards for struct with a comment in the middle.
Change-Id: I3bcb8313ae8aaf69179249a13b4087b1272cdbc0
These don't appear to make any sense given their context. The
commit log also does not reveal anything.
Discovered due to spurious clang-format indenting:
https://bugs.llvm.org/show_bug.cgi?id=35930
Change-Id: I732a66056ba4c05e3e132a2f236fe10f7a282900
Allow*OnASingleLine appears to no longer apply to
typedef structs.
Adjust closing parenthesis/opening brace on functions.
Remove trailing commas to keep multiple elements on one line.
Change-Id: I6e535a8ddb15c9b3de8216ce8ddb2a18241af46c
Remove comments above #define statements because they get
indented unnecessarily.
https://bugs.llvm.org/show_bug.cgi?id=35930
Add blank lines to prevent comments from being treated as
blocks.
Change-Id: I04dce21b2a10e13b8dc07411a0019c098f6dd705
For the vp8 simulcast/multi-res-encoder:
Add flags to keep track of the disabling/skipping of
streams for the multi-res-encoder. And if the lower spatial
stream is skipped for a given stream, disable the motion
vector reuse for that stream.
Also remove the condition of forcing same frame type
across all streams.
This fix allows for the skipping/disabling of the base
or middle layer streams.
Change-Id: Idfa94b32b6d2256932f6602cde19579b8e50a8bd
This reverts commit bd1d995cd3.
Remove the feature from the release as it requires additional work.
BUG=webm:1485
Change-Id: I1a01ac2525703af97a456a3eed85718306c0f734
Without this applications cannot use the vpx_codec_control macro
for VP9_SET_SKIP_LOOP_FILTER. The tests only cover the underscored
version vpx_codec_control_().
Change-Id: I3e6c1888307b76636fdc1a8deae70b5c14238163
(cherry picked from commit 373e08f921)
Without this applications cannot use the vpx_codec_control macro
for VP9_SET_SKIP_LOOP_FILTER. The tests only cover the underscored
version vpx_codec_control_().
Change-Id: I3e6c1888307b76636fdc1a8deae70b5c14238163
The in/out (or zoom metrics) in accumulate_frame_motion_stats()
are in effect a % of the blocks that have a motion vector pointing
either towards or away from the center. As such they are already
normalized in terms of image size and the thresholds against which
these are tested should be image size independent.
In practice a zoom either in or out is an indicator for a shorter group
length so the abs value is more important as a breakout clause.
This patch fixes the threshold test. Clips without noticeable zoom show
no effect but some with strong zooms such as "station" show a big
gain (5-10%). Average psnr-hvs gain on hdres set was 0.292%
Change-Id: I4f97a72b0e273e4e844ade15285749c32cd81c1c
(cherry picked from commit 0226ce79e9)
Remove trailing commas to keep multiple elements on one line.
Remove trailing empty lines to keep comments from being indented.
https://bugs.llvm.org/show_bug.cgi?id=35930
Change-Id: I0a66dde95f2a304f13cb85a2e9197afca20051e8
For SVC: if an enhancement layer (spatial_layer > 0)
has 0 bandwidth, skip/drop the encoding of the layer.
This allows the application to dynamically disable
higher layers for SVC.
Add flag to signal the skip encoding, this is needed
to modify the packing of the superframe when the top
layer is skipped/dropped.
Also moved some updates (current_video_frame counter and
the last_avg_frame_bandwidth) to the postencode_update_drop_frame().
Added datarate unittest for dynamically going from 3 to 2
and then back to 3 spatial layers.
Change-Id: Idaccdb4aca25ba1d822ed1b4219f94e2e8640d43
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates some function pointer casts that I missed in my last
CL https://crrev.com/c/780144.
BUG=chromium:776905
[1] https://www.chromium.org/developers/testing/control-flow-integrity
Change-Id: I1c7adbdfffa4fe0b62e993bfb31d06e64b022d66
Adds a breakout threshold to key frame boost loop.
This reduces the boost somewhat in cases where there is a
significant zoom component. In tests most clips no effect
but a sizable gain for some clips like station.
Change-Id: I8b7a4d57f7ce5f4e3faab3f5688f7e4d61679b9a
This fix improves detection of key frames in slide shows.
In particular it helps if the slides are pictures of varying formats
as in a sample provided by yclin@.
This change does not impact any of the clips in our standard tests
but for the example slide show test clip helped global psnr by
several db and resolved a serious visual quality issue.
Change-Id: Iaeeeed55dc0bb50aeacd4996ed660ced06374603
Switch from bilinear to eighttap_smooth for frame-level
downsampling at low resolutions (<= 320x240).
avgPSNR/SSIM metrics increase from ~0.5-2% (all clips positive gain),
for 2 and 3 spatial layer SVC, with 3 temporal layers.
Small/negligible increase in encoding time (< 1%).
Change-Id: I758472fc4fddd51d87f13c9d1a1cd4986ef5d41f
The in/out (or zoom metrics) in accumulate_frame_motion_stats()
are in effect a % of the blocks that have a motion vector pointing
either towards or away from the center. As such they are already
normalized in terms of image size and the thresholds against which
these are tested should be image size independent.
In practice a zoom either in or out is an indicator for a shorter group
length so the abs value is more important as a breakout clause.
This patch fixes the threshold test. Clips without noticeable zoom show
no effect but some with strong zooms such as "station" show a big
gain (5-10%). Average psnr-hvs gain on hdres set was 0.292%
Change-Id: I4f97a72b0e273e4e844ade15285749c32cd81c1c
Increase the recursive average factor from 15/16 to 3/4
to make the noise estimation respond faster.
Small/neglible change on low noise content, but better
denoising for noisy content.
Also encoder speedup of ~2-3% observed on some noisy clips.
Change-Id: I9dd02fe961ca24b411fe4c2732f814bf1e9a7f9f
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: Ie5871f710d9eb39047d8d9f48b907c0633e1f830
INLINE is defined as __forceinline for vs* configs, but is the
normal, compiler-discretion inline for gcc/clang configs. This
makes many functions very large when building for windows targets,
much larger than they are elsewhere.
Use '__inline' as a consistent definition to get consistent function
sizes. Although Visual Studio documentation says that 'inline' is
only available in C+ code. This is probably incorrect, since Visual
Studio 2017 accepts C99 'inline' even when passed /TC. Nevertheless,
this commit uses the recommended '__inline' for consistency.
Thanks to David Major for the diagnosis.
Change-Id: Ib0b31a3afcea77822c84fe3c6cd452add66d825a
eob is a pointer to a uint16_t. previously the code would store 64-bits
causing a crash or test failure with the right stack layout.
Change-Id: Ibd653baf323db114f2444951b9d8b00c596bf15a
This reverts commit 86842855d3.
SSSE3/VP9QuantizeTest.EOBCheck/1 fails on Mac and the build breaks under
visual studio due to a #if within another macro.
Change-Id: I475095a04aafcc714fade2b24e4df7b682be2cd1
Modify and update the SVC datarate unittests to verify the
rate targeting for each spatial-temporal layer.
The current tests were only verifying the rate targeting
of the full SVC stream, not individual layers.
Also re-enabled a test that was disabled.
This is a stronger verification of the layered rate control
for SVC for 1 pass CBR encoding.
Added PostEncodeFrameHook, needed to get the layer_id and
update the layer buffer level.
Change-Id: I9fd54ad474686b20a6de3250d587e2cec194a56f
This c version uses the shortcuts found in the x86
vp9_quantize_fp functions.
The test was updated to use the correct quant/round range.
Change-Id: I5d19f8af2fddda8e50910249eafb740acb29415b
For a large change in the target avg_frame_bandwidth,
via the update in change_config()), reset the buffer_level
to optimal_level.
This fix prevents possible frame drops, where for example,
encoder suddenly goes from lower to higher bitrate.
Change-Id: I2f844c41d04c01240e85f574e59d2b9075c7eb6d
the random number generator creates values from [0, range) add 1 to all
and make hev more realistic by mirroring its calculation of level >> 4,
i.e., [0, 3]
Change-Id: Ic19be5d7ba668deb17c96f143b739116a4b5d21c
Optimize function vp8_mbloop_filter_vertical_edge_mmi and
function vp8_mbloop_filter_horizontal_edge_mmi.
Make full use of memory loading delay slot and reduce unnecessary
instructions.
Change-Id: I61da2c3a44c06044225461f46bf487d83cba6c16
all_builds.py has been more or less replaced by Jenkins.
author_first_release.sh is unused.
ftfy.sh has been obviated by having the whole tree clang-format clean.
Change-Id: I741315ad9042e6e901f07410e93f28371db703b2
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.
Change-Id: I8bab801416e7f4958c28c6d080e3cf785a50f82b
With recent fixes to rate control for SVC the
buffer underrun in the tests does not happen,
so comment and TODO can be removed.
Also, in some of these SVC tests, replace the HD clip
with the corresponding VGA clip, which has > 400 frames.
For the (niklas) HD clip: it has only 60 frames but the
test was running up to 300 frames. Fixed it to 60 frames.
Keep some tests with the HD clip, needed for the 4 thread
and 5 level scaling test.
Change-Id: I0a2356a908e8b2271c7a422eb8b15c0d56eec968
For large dynamic changes in target avg_frame_bandwidth, or
a change in resolution, via the update in change_config()),
reset the under/overshoot flags (rc_1_frame, rc_2_frame)
to prevent constraining the QP for the first few frames
following the change.
For SVC use the spatial stream avg_frame_bandwidth in
reset condition.
For the avg_frame_bandwidth condition, use fairly large
threshold (~50%) for now in reset.
This allows for better/faster QP response if, for example,
application dynamically changes bitrate by large amount.
Change-Id: Ib6e3761732d956949d79c9247e50dba744a535c0
Denoise 2 spatial layes at most.
Add noise sensitivity level 2 for vp9 such that applications can control
whether to denoise the second highest spatial layer.
Add tests to cover this case.
Change-Id: Ic327d14b29adeba3f0dae547629f43b98d22997f
Immediately following a key frame the trailing second reference
error in the first pass stats will be based on a reference frame from
the prior key frame group and will thus usually be much larger.
This fix eliminates that effect (which typically triggers a short arf
group immediately after a key frame). It also changes the accounting
for the first frame in each new arf group.
This change gives large gains on a couple of clips that contain mid
sequence key frames (e.g. 6% on 1080P tennis). Overall there was
a net gain in PSNR and PSNR-HVS ~(0.05- 0.4%) and mixed results for
SSIM (+/- 0.2%).
Change-Id: I8e00538ac2c0b5c2e7e637903cac329ce5c2a375
Downsampling filter for SVC was set to subsample (phase 0)
for HD -> VGA, and bilinear averaging (phase 8) for VGA -> QVGA.
This change makes it bilinear averaging for HD -> VGA.
Given the recent commit 9f9d4f8, quality is improved with
this change: avgPSNR/SSIM up ~1-3% on HD clips in RTC set.
Speed decrease of ~1% for 3 layer SVC.
Change-Id: If834a320e372b8b922a6bf7cab4227703b1beae6
Move the early exit checks on usable_ref_frame and
skip_ref_find_pref up before the check on flag_svc_subpel.
The code under flag_svc_subpel requires frame_mv to be set
for the golden/spatial reference, which is only set if the
both those exits don't pass.
No change in behavior.
Change-Id: Id304276c745eeb389ff85fa2dcf510d5976bc413
For nonrd pickmode on a given spatial layer, the spatial
(golden) reference was always only using zeromv for prediction.
In this patch if the downsampling filter used for generating
the lower spatial layer is an averaging filter (nonzero phase),
we allow for subpel motion on the spatial (golden) reference to
compensate for the shift. This is done by forcing the testing of
nonzero motion mode to compensate for spatial downsampling shift.
Improvement for cases where the downsampling is averaging filter.
In the current code this is only done for generating
resolutions <= QVGA.
Improvement for avgPSNR/SSIM on RTC set for speed 7: ~1.2%.
Gain is larger (~2-3%) for VGA clips with 2 spatial layers.
~1% speed slowdown for 3 layer SVC on mac.
Change-Id: I9ec4fa20a38947934fc650594596c25280c3b289
Don't add include files to the archive. Avoids build failures for
Windows such as:
the input file 'libvpx_g.a(x86_abi_support.asm.o)' has no sections
Change-Id: If9c8e70c0ec913b7ad7dd6a08d4fa19011114ad2
nasm does not accept x64
yasm has accepted (and appears to prefer) win64 at least as far back as
1.0.0:
http://yasm.tortall.net/releases/Release1.0.0.html
Change-Id: Ied881b1df0570da256b1bd7e131e7817e47f768f
Set num_inter_modes based on ref_mode_set_svc, which is
smaller set than ref_mode_set (which may use alt-ref).
No change in behavior.
Change-Id: I31169bb09028db230552c6fca0a86959d1ade692
1. Delete unnecessary zero setting process.
2. Optimize the method of calculating SSE in vpx_varianceWxH.
Change-Id: I58890c6a2ed1543379acb48e03e620c144f6515f
Avoids duplicate computation of UV predictor.
Bit-exact when static_threshold is zero.
Small/neutral difference on RTC set with nonzero static_threshold
(since UV predictor won't be skipped with this change).
Small speed gain, ~1-2%, at speed 8.
Change-Id: Iba8d22a307768b391e29d63c9826aac5a4d9c285
this is only meant for testing. along with --enable-experimental
--enable-spatial-svc require VPX_TEST_SPATIAL_SVC to be defined rather
than bumping the encoder ABI.
Change-Id: I7f34d9f60300fa31ccf22e1a4aa619392c391b2e
For 1 pass cbr SVC: GOLDEN is the spatial reference,
better not to check for encoder_breakout on this reference.
Small positive ~0.075% (mostly neutral) gain in avgPSNR/SSIM metrics.
No observed change in encoder speed.
Change-Id: Ib337f16d6771105bf06384c6a23ad047fc690418
For the case when the number of temporal layers > 1,
the buffer levels (starting/optimal_buffer_level,
and maximum_buffer_size) were not scaled properly.
In vp9_update_layer_context_change_config():
when setting the layer-buffer levels, fix is to scale
the layer-target_bandwidth by the target_bandwidth
(which is the full stream bandwidth) instead of the
spatial_layer_target.
This is needed because prior to the call
vp9_update_layer_context_change_config(), set_rc_buffer_sizes()
is called which sets the buffer levels based on target bandwidth
(which is the full bandwidth for the SVC stream).
This fix properly sets the layer-buffer levels based on the
layer-bandwidth, and leads to better rate targeting.
Small/neutral change in avgPSNR/SSIM metrics on RTC set.
Change-Id: Ic0f4f7f3487c37b9a9adb4781ae5edfed7140a57
Control Flow Integrity [1] indirect call checking verifies that function
pointers only call valid functions with a matching type signature. This
change eliminates function pointer casts to make libvpx CFI-safe.
[1] https://www.chromium.org/developers/testing/control-flow-integrity
Change-Id: I7e08522d195a43c88cda06fa20414426c8c4372c
For reference frames: enable scale partition for
superblocks with low source sad or if bsize on lower-resoln
is at least 32x32.
Keep feature disabled for base temporal layer.
Small regression in avgPNSR/SSIM metrics, ~0.5-1%.
Speedup ~2-3% on mac for SVC (3 spatial/3 temporal layers) at speed 7.
Change-Id: I5987eb7763845b680059128b538bb5188be0cca5
When allow_partition_search_skip is set the two pass code
can optionally skip the partition search in the rd loop if the image
appears static (based on selection of 0,0 motion).
Unfortunately 0,0 motion does not necessarily mean that there are
no meaningful changes or that motion or intra modes will not be selected
in the second pass.
Disabling "allow_partition_search_skip" may hurt the encode speed a little
for a small number of clips but can have a big impact on compression.
The most notable example of this in our test sets is "bridge_close_cif"
where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
psnr-hvs.
Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024
Enable partition copy on boundary and scale blocks along the boundary.
Rename copy_partition_svc to scale_partition_svc.
Do not copy if the block crosses the boundary.
Change-Id: I37a04d48f11b15c4ea67facd7631193ec2f62150
Fixes a build issue when relocation is not allowed:
relocation R_X86_64_32 against '.rodata' can not be used when making a shared object
Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
Removal of parameters to and code in calc_frame_boost() that is no
longer required.
No change to results from previous patch.
Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017
The decay accumulator clause covers similar ground to the
new clause that tests the accumulated second reference error
so it has been removed to reduce complexity.
Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525
Add a clause to the breakout test for alt ref groups that
examines the size of the accumulated second reference
frame error compared to the cost of intra coding.
This clause causes a reduction in the average group length for many
clips. Alongside the change to the group length the minimum
boost is increased.
On balance the results are positive for psnr and psnr-hvs
but is negative for ssim/fast ssim for the smaller image formats.
Strong gains on some harder clips (eg ducks take off (midres) ~20%,
husky (lowres) 6-17%. Most of the negative cases are lower motion
clips. Subsequent patch hopefully will help with those.
Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4
Fix/cleaup the conditioning for usage of the reuse-lowres
partition feature.
Replace the non-reference condition with the top temporal
layer, and put this condition in the speed feature.
This prevents doing update_partition_svc() on every
VGA frame, instead it will now only do update for VGA in
the top temporal layer frames.
Also this makes it easier to test/enable this feature
for lower layer temporal frames.
Change-Id: Ia897afbc6fe5c84c5693e310bcaa6a87ce017be5
For new VP9 only content type adjust the rate distortion and ARF
filter based on the relative spatial variance of the source and
reconstruction.
In regards to the RD loop the method favors modes where the
reconstruction variance is similar to the source variance. However it
is currently only applied to regions where the source variance is quite
low.
For very low variance blocks it applies a further bias against intra
coding and large prediction block sizes (the later in particular limit
the usefulness of the loop filter).
The final part of this change is to lower the strength of the ARF
filter for blocks where the source has very low spatial variance, to
encourage some low amplitude texture or noise to pass through
the filter.
This change improves the retention of film grain and fine noise /
texture in spatially flat regions, but as expected causes a significant
drop in PSNR on many clips. This is to be expected because similar
but misaligned noise or texture will give a lower PSNR than a flat
noise free reconstruction. However, it is worth noting that most clips
show a strong gain in FAST SSIM.
The features are enabled on the vpxenc command line by setting
--tune-content=film.
VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
Removed three parameters that are no longer needed in calls
to calc_arf_boost() and associated minor changes.
No impact on encode results.
Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb
For choose_partitioning (speed >= 6): avoid computation
of minmax variance for non-reference frames in SVC.
Existing condition only avoided this for speed >= 8.
Combine that existing logic with non-reference condition.
Small speedup (~0.5-1%) for 3 layer SVC,
neutral change on avgPSNR/SSIM metrics.
Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
Release frame buffers for non-ref when the decoder is destroyed.
Enable the non ref test.
BUG=b/68819248
Change-Id: Id87ef3b0a62318f9812e927cd957c05c859047fa
For SVC with 3 spatial layers:
Add feature to copy/upscale partition from middle spatial layer
to the upper/highest resolution, when superblock sad is not high.
Enabled for speed >= 7 and only for non-reference frames.
Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
For a chosen interval "i" the existing arf boost calculation examined frames
+/- (i-1) frames from the current location in the second pass.
This change checks to make sure that the forward search does not extend
beyond the next key frame in the event that the distance to the next key
frame is < (i - 1).
Small metrics gains on all our test sets but these are localized to a few clips
(e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)
Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6
The new test will run a SVC bitstream which has non ref frames.
It checks the number of buffer acquired and released to make sure all
external frame buffers are released.
Add a new test bitstream:
vp90-2-22-svc_1280x720_1.webm
which has 400 frames in total, and 1 spatial layer and 2 temporal layers.
There is one non ref frame every other frame.
Disabled for now. Will be enabled with the fix.
BUG=b/68819248
Change-Id: I0515336fd9809a9e1fceba90e4dce53dabaf53a5
Added command line control of Corpus VBR.
The new corpus vbr mode is a variant of standard
VBR (end-usage=0) where the complexity distribution
mid point is passed in rather than calculated for a specific
clip or chunk.
The new variant is enabled by setting a new command line
parameter --corpus-complexity to a zero value. Omitting
this parameter or setting it to 0 will cause the codec to use
standard vbr mode.
The correct value for a given corpus needs to be derived
experimentally using a training set such that the average
rate for the corpus is close to the target value.
For example our using our low res test set with upper and lower
vbr limits of 50%-150% and a corpus complexity value of 650
gives a similar average data rate across the set to using standard
vbr. However, with the corpus mode easier clips will be allocated
fewer bits and harder clips more bits rather than having the same
rate target for all.
Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589
For nonrd_pickmode: if early_term is set there should be
no need to include UV in rdcost (when color_sensitivity is set).
Neutral change on RTC and RTC_derf metrics, for speed >= 5.
No change for ytlive metrics.
Very small speed gain (~0.5%) on some clips with strong color content.
Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
For real-time encoding, 1 pass with non-zero lag-in-frames.
Added speed feature to control the feature.
Enabled for speed >=6 for now, under VBR mode.
avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
some clips up by ~3-5%, some clips neutral gain, average gain
across clips is ~1%.
Small/negligible decrease in speed.
Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
Even though frame_size is calculated in uint64_t, it winds up in an int
size value.
This was exposed with the msan test because the memset is called with
(int)frame_size, leading to a segfault.
Change-Id: I7fd930360dca274adb8f3e43e5e6785204808861
Change type of sum_square_error from int64_t to uint32_t.
Change type of sum_error from int64_t to int32_t.
This reduces the stack usage from ~131K to ~87K.
BUG=b/68362457
Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
swap '{' and c-style comments removing a few redundant ones along the
way; covers most leftovers from the clang-tidy run against an
x86_64-linux config.
Change-Id: I67a45596f80a12389faca49c5be440875092a7df
Changed the intrinsics to perform summation similiar to the way the assembly does.
The new code diverges from the assembly by preferring unsaturated additions.
Results for haswell
SSSE3
Horiz/Vert Size Speedup
Horiz x4 ~32%
Horiz x8 ~6%
Vert x8 ~4%
AVX2
Horiz/Vert Size Speedup
Horiz x16 ~16%
Vert x16 ~14%
BUG=webm:1471
Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
Set adaptive_row_thresh_mt = 1 at speed >= 7,
for svc when multi-threading is used with row-mt.
This allow the adaptive_rd_thresh feature to be used
in the nonrd-pickmode.
~1-2% speedup for SVC encoding with small quality
loss (< 0.6%) on RTC set.
Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
Disable cyclic refresh if ROI is used and add flag to properly handle
the static_thresh deltas.
Remove the ROI test for cyclic refresh (it's allowed but disabled if ROI
is used).
Add an example in vpx_temporal_svc_encoder.c. Turned off by default.
BUG=webm:1470
Change-Id: Ief9ba1d7f967bc00511b412b491c3f70943bfbda
Note this change will trigger the different C version on SSSE3 and
generate different scaled output.
Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
Small inncrease the sad_thresh1, avoids some false
detection of possible scene changes within lag.
Small improvement in few clips on ytlive, otherwise neutral change.
Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
Expose the threshold for setting key frame on cut,
and increase it for speed 5.
Also small adjustment to min_thresh.
No change in overall metrics or fps.
Small quality improvement and lower encode time on scene cuts.
Change-Id: I36e06ff3b26b6c29aede39c23fce454525fc9026
Small increase in threshold for the 1 pass VBR datarate tests.
Needed due to commit:
<017257a Adjustment to scene detection and key frame>
Change-Id: I28b3bd7db2192a8cc2bccc3cb0e3b8dbb910ca16
The initial allocation of bits in the two pass code to each frame
should be within the min max limits on the command line. However,
when forming an ARF group the cost of the ARF is shared by frames
in that group such that the residual bits for a frame could drop below
the min value. This change prevents the minimum being re-applied
after the cost of the ARF has been deducted as this may otherwise
cause low rate sections to overshoot their target.
Test runs comparing to a baseline run with min and max section pct
0-2000% vs one closer to the YT use case (50-150%) suggest that
this fix not only results in better rate control but also gives a better
rd outcome.
For example the HD set vs 0-2000% baseline (opsnr, ssim).
Old code (50-150): +0.751, +1.099
New code(50-150): +0.241, -0.009
Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
Let it test extreme inputs and all filter types.
In the future ConvolveTest should test regular 8-bit functions in
high bitdepth mode.
Change-Id: I1042564d1d390589ca203070fe332c6da3315d75
For 1 pass vbr: use higher threshold on avg_sad
and force key frame under scene cut detection if
above the threshold. Allow it for speed >= 6 for now,
since it does not use the full nonrd_pickmode partition
(as in speed 5).
Improves quality somewhat on scene cut frames.
Neutral on overall metrics and fps for speed 6 on
ytlive set.
Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
Change to the bit allocation within a GF/ARF group.
Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
complexity score of the frames in that group but then share bits evenly between
the "normal" frames in that group regardless of the individual frame complexity
scores (with the exception of the middle and last frames).
This patch alters the behavior for the experimental "Corpus VBR" mode such that
the allocation is always based on the individual complexity scores.
Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
This patch makes further changes to support an experimental
corpus wide VBR mode that uses a corpus complexity
number as the midpoint of the distribution used to allocate bits
within a clip, rather than some average error score derived from the
clip itself.
At the moment the midpoint number is hard wired for testing and
the mode is enabled or disabled through a #ifdef. Ultimately this
would need to be controlled by command line parameters.
Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
vpx_convolve8_avg works by first running a normal horizontal filter then a
vertical filter averages at the end.
The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the
horizontal step.
vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code.
Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
This reverts commit 9311ef18b4.
Reason for revert:
Notice small regression in some clips.
Will revisit in another change.
Original change's description:
> Speed >=5 real-time: add TM intra mode for high_source_sad.
>
> Small/neutral change in metrics or speed for ytlive.
> Some improvement in quality on frames with big content change.
>
> Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
For 1 pass vbr: increase min_thresh slightly, and also add
condition on golden/arf update for using full nonrd_pick_partition.
Reduces possible false detection for scene cut detection.
Neutral/small change in metrics or speed for speed 5.
Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
Small/neutral change in metrics or speed for ytlive.
Some improvement in quality on frames with big content change.
Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
Lower SAD threshold to select non_rd pickmode partition
at superblock level more often.
Small gain in metrics, small/negligible decrease in speed.
Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
avoid doing the full nonrd_pickmode based partition.
No change in overall metrics or speed.
Reduces encode times on scene cuts by 10-20%.
Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308
For 1 pass vbr mode:
On no-show_frame/ARF: instead of skipping alt_ref_frame
completely in mode testing, allow for checking (0, 0) on alt_ref.
Small gain in metrics, ~0.18%, no change in speed.
Change-Id: I32a3c24faca64ab70dd5091071a0dc301db7dd1e
For 1 pass vbr: when significant content/scene change is detected
(high_source_sad = 1) reduce/turnoff the additional qdelta on the
active_worst_quality. This helps somewhat to reduce the occurrence
of large frame sizes and large encode times.
Allow it only when use_altef_onepass is enabled.
Neutral/no change on metrics.
Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
For speed 6 real-time mode: use adapt_partition
on ARF frame instead of REFERENCE_PARTITION (which is slower).
This requires enabling compute_source_sad_onepass for no-show_frames.
Speedup of ~3-5% on some clips that heavily use ARF,
small loss (~0.2%) in quality on ytlive set.
Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
Speed comparing with the one calling vpx_scaled_2d_neon()
~1.7 x in general
~2.8x for BILINEAR filter
BUG=webm:1419
Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
Scale 3x3 block instead of 16x16 block in each loop. Disabled by
default.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3.
Optimization code will be smaller and faster.
2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
For real-time mode. Move the switch to fixed partition
for is_src_frame_alt_ref so all speeds may use it
if use_altref_onepass is set.
Improves metrics by ~2% for ytlive set at speed 4
(where use_altref_onepass is currently used).
Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee
Only has effect when sf->use_altref_onepass is enabled,
as in that case scene detection is skipped for non-show frame
and so high_source_sad does not get reset to 0.
No change in metrics or speed.
Change-Id: I421f066d239341449c18826089e1810b9fc5967f
Add stats for past ARF usage, and use it to disable
ARF usage based on some conditions.
Overall improvement on ytlive set, reduces the regression
on the problem clips for this feature.
Only affects when sf->use_altref_onepass is enabled
(currently off by default).
Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
This reverts commit 535b7b915a.
This is actually used in CBR to reset the rate control if high source sad is detected.
Original change's description:
> Remove the speed condition on scene detection in 1 pass code.
>
> Scene detection is used for VBR mode and for screen_content mode.
>
> It was also enabled for CBR mode via the speed condition,
> but currently the analysis in the scene detection is not used
> in CRB mode (similar computations are done locally at superblock level
> when the source_sad feature is enabled).
>
> For 1 pass code.
> No change in behavior. Small speed gain, ~0.5%.
>
> Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
Change-Id: Ib4e6b02047f75632503e7b0fc870af97fa9291c3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Scene detection is used for VBR mode and for screen_content mode.
It was also enabled for CBR mode via the speed condition,
but currently the analysis in the scene detection is not used
in CRB mode (similar computations are done locally at superblock level
when the source_sad feature is enabled).
For 1 pass code.
No change in behavior. Small speed gain, ~0.5%.
Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
'iter' parameter is being checked for NULL in every call to
decoder_get_frame which is quite pointless because it is always
going to be NULL unless the application changed it. The code works
as described only because vp9_get_raw_frame returns -1 on all
subsequent calls after the first.
Change-Id: Ic736b9e8fe36fc1430fc11d6a9b292be02497248
* changes:
Remove the unnecessary cast of (int16_t)cospi_{1...31}_64
Remove the unnecessary upcasts of (int)cospi_{1...31}_64
Change cospi_{1...31}_64 from tran_high_t to tran_coef_t
Add the condition frames_since_golden > 0 to the
early exit check for ARF usage in nonrd_pickmode.
This improves quality of first frame following ARF, where
frame_since_golden = 0.
Small/neutral gain in metrics for speed 6, neutral change in speed.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Change-Id: I82e73e6ff6fc849e5ca5448563cb8a0515fe0cdc
A new bug was introduced in a80bdfd "Change sinpi_{1,2,3,4}_9 from
tran_high_t to int16_t". Reverted the change in this file.
BUG=webm:1450
Failed test C/TransHT.AccuracyCheck/26.
Change-Id: Id001f57aad811803ef7d367d2b2bc008d8499991
Modify simple_block_yrd condition in nonrd_pickmode for SVC:
allow it to be used also on base temporal_layer, only when
spatial_layer > 1 and block size < 32x32.
Speed up of about ~2% for 3 layer SVC, with little/negligible
loss in quality.
Change-Id: I7734bdae51cf51f22b96f6b2b27da20ea1d84344
Fix the setting to frames_till_gf_update_due, and
adjust the limit value.
Only affects when USE_ALTREF_FOR_ONE_PASS is enabled.
Neutral change to metrics and speed for ytlive.
Change-Id: I266d9a00b36221bc8602fa2746d4e8a8f7d4dfae
Only when USE_ALT_REF_ONE_PASS is enabled (off by default).
Force fixed partition to 64x64 when is_src_alt_ref_frame is true,
and don't force early exit for some modes in nonrd_pickmode
for ARF noshow frames.
Small gain ~0.2% on ytlive metrics for speed 6.
Neutral speed difference.
Change-Id: I27eb6622d0453c09a06ccdc3b16368762474d11d
Add datarate test, for both VBR and CBR mode, with the
frame_parallel_decoding mode disabled (and error_resilience off).
Change-Id: I54feec3248a68ecff4bef8d9a31bb1616fab77df
In the new AUTO mode, restrict the minimum alt-ref interval and max column
tiles adaptively based on picture size, while not applying any rate control
constraints.
This mode aims to produce encodings that fit into levels corresponding to
the source picture size, with minimum compression quality lost. However, the
bitstream is not guaranteed to be level compatible, e.g., the average bitrate
may exceed level limit.
BUG=b/64451920
Change-Id: I02080b169cbbef4ab2e08c0df4697ce894aad83c
Removed inline for GP load-store in case of (__mips_isa_rev >= 6)
Created one define LD_V for vector load and ST_V for vector store
Change-Id: Ifec3570fa18346e39791b0dd622892e5c18bd448
Also add column headings so that the output can still be parsed if the
set of headers changes later.
Change-Id: I4beaf266521e093db4acf5f715b18fdfb7e3d1cd
This reverts commit 8c42237bb2.
Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
Scale 3x3 block instead of 16x16 block in each loop.
Benefits:
1. Reduced number of different phase_scaler from 16 to 3. Optimization code
will be smaller and faster.
2. The maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)
BUG=webm:1419
Change-Id: Ibb9242a629ddb03e1ff93b859bece738255e698c
The intra mode rd penalty was implemented as a rate penalty.
Code was added to scale the penalty according to block size but
this was not done correctly for the SB level or sub 8x8.
The code did a weird double scaling in regard to bit depth that
has been removed. Given that it is a rate penalty the bit depth
should not matter.
This bug fix improves average metrics on our standard test
sets by about 0.1%
Change-Id: I7cf81b66aad0cda389fe234f47beba01c7493b1e
Move class VpxScaleBase to new file test/vpx_scale_test.h.
Add new file test/vp9_scale_test.cc with ScaleFrameTest.
BUG=webm:1419
Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1
This header doesn't build on g++ v6 as it's a C and not C++ header
(_Atomic is not a keyword in C++11). Since the C and C++ invocations
cannot be guaranteed to point to the same underlying atomic_int
implementation, remove support for them and use compiler intrinsics
instead.
BUG=webm:1461
Change-Id: Ie1cd6759c258042efc87f51f036b9aa53e4ea9d5
Makes main thread wait for the filter level to be picked to avoid a race
between the LPF thread and update_reference_frames(). This also
re-enables the failing tests under thread_sanitizer where this data race
was detected.
BUG=webm:1460
Change-Id: I7f5797142ea0200394309842ce3e91a480be4fbc
Fixes issue on iPad Pro 10.5 (and probably other places) where threads
are not properly synchronized. On x86 this data race was benign as load
and store instructions are atomic, they were being atomic in practice as
the program hasn't been observed to be miscompiled.
Such guarantees are not made outside x86, and real problems manifested
where libvpx reliably reproduced a broken bitstream for even just the
initial keyframe. This was detected in WebRTC where this device started
using multithreading (as its CPU count is higher than earlier devices,
where the problem did not manifest as single-threading was used in
practice).
This issue was not detected under thread-sanitizer bots as mutexes were
conditionally used under this platform to simulate the protected read
and write semantics that were in practice provided on x86 platforms.
This change also removes several mutexes, so encoder/decoder state is
lighter-weight after this change and we do not need to initialize so
many mutexes (this was done even on non-thread-sanitizer platforms where
they were unused).
Change-Id: If41fcb0d99944f7bbc8ec40877cdc34d672ae72a
Neutral on rtc set for speed 8. Neutral on ytlive for speed 5.
Saves some computation cycles but no speed gain observed on Pixel.
Change-Id: I34c4642cd543aa89c5b9c4bff6b7113577c64c91
This reverts commit df9ce12259.
Reason for revert:
Re-enabled tests still fail tsan in high bitdepth.
Original change's description:
> Re-enable disabled tests under TSan.
>
> These tests point to an already-fixed bug, this should no longer have a
> data race.
>
> BUG=webm:1049
>
> Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8
TBR=jzern@google.com,pbos@chromium.org,builds@webmproject.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: webm:1049
Change-Id: I232f1f7726bf795b301abfb2e07cad6756642e53
Rev d147771 fixed the test failure. So remove the resolution condition
for using source_sad in speed 6.
BUG=webm:1452
Change-Id: I1efba97e1ef5bd4de5f886299f6fcb907187abcd
Enable adapt_partition for vbr mode for speed 6.
This allows the usage of the pickmode-based partition
(used in speed 5), but only selectively for superblocks
with high source sad, otherwise the faster variance based
partition scheme is used.
For speed 6 on ytlive set: avgPSNR/SSIM metrics up by ~0.6%,
several clips up by ~1.5%. Small/negligible decrease in speed.
Change-Id: I12f3efef6b3e059391de330fdbe5a44c2587f1f8
For SVC at speed >= 7: only use the improved mv search
on base spatial layer, if top layer resolution is above 640x360.
~2.3% speedup
Small/negligible loss in avgPSNR metrics on rtc set.
Change-Id: Iaef75a57ebf1c248931bc1aa28d20b7fecac1851
This reverts commit f60d1dcd3d.
Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c
TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org
Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
For speeds < 7, increase threshold that controls the split
of 16x16->8x8 blocks, for resolutions 720p and higher.
Minor change for speed 5 (since it uses reference partition scheme
which only uses variance partition as first step).
For speed 6: ~0.5% increase in avgPSNR/SSIM metrics on ytlvie set.
No change in speed.
Change-Id: I5126580973201538d8ca26a9256b93c4d11d685b
Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.
Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a
Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.
Very close in speed to the assembly.
Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.
Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e
Add 1 if negative to get dqcoeff to round towards zero.
10-15% faster than converting to positive before shifting.
Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d
About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.
Both numbers would improve if the division for dqcoeff could be
simplified.
BUG=webm:1426
Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
This feature is used for the CBR RTC encoding mode
at speed >= 6. This change will exclude it for VBR mode.
For speed 6 live encoding (VBR):
avgPSNR/SSIM metrics on ytlive set up by ~1% (few clips up by 2/3%).
No change in speed.
Change-Id: I1a0dd94c334f7df309ab5a48d477d7e25355b798
* changes:
quantize: ignore skip_block in arm
quantize: ignore skip_block in x86
quantize fp: ignore skip_block in arm
quantize fp: ignore skip_block in x86
This should probably be handled before vp9_regular_quantize_b_4x4 even
gets called.
Fixes an assert resulting from removing skip_block from the quantize
functions.
BUG=webm:1459
Change-Id: I7f52b53f959b4654b3d4517ebda31a678f4d0fde
This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.
Add assert() and comments regarding the usage of skip_block.
Removing the parameter is a fairly involved process so leave it be for
the moment.
Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a
Despite abs_coeff being a positive value, all the other implementations
treat it as signed which simplifies restoring the sign.
HBD builds cast qcoeff to avoid a visual studio warning. Match
vp9_quantize.c style of casting the entire expression.
Change-Id: I62b539b8df05364df3d7644311e325288da7c5b5
Having a very low "lag_in_frames" value could cause the encoder to create
incorrect / corrupt ARF groups including displayed frames that update the
ARF buffer and false overlay frames that are coded at low rate but are not
actually overlays of a real ARF frame.
This is linked to a reported unit test "slow down" where the chosen parameters
(lag of 3 frames) gave rise to such "broken" ARF group(s).
See also BUG=webm:1454
Change-Id: If52d0236243ed5552537d1ea9ed3fed8c867232c
Having a very small value for "lag_in_frames" can result in
corrupt arf groups including displayed frames that update
the arf buffer and fake overlay frames that are not in fact
overlays of real arfs but are nevertheless starved of bits.
Leaving lag_in_frames at the default of 25 for these 5 frame two
pass VBR tests should now give rise to a valid ARF coding pattern
as follows:- K(ey), A(rf), N(ormal), N, N, O(verlay).
This change is part of a response to BUG=webm:1454 where broken
arf groups interacted badly with a change that corrects for large rate
misses. However, it may still in some cases increase encode time by
virtue of the fact that the unit test now codes a correct coding pattern
with "hidden" ARF frames.
Change-Id: Ifd0246a4c1d0be247247c754024d7a4ed5f66a6b
Some clips in nightly unit test exhibiting significant encoder slowdown which
appears to bisect to Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a.
The above change allowed for emergency iterations of the recode loop and
adjustment of the Q range if there is a large rate miss.
This patch disables the above adaptation for cases of cpu_speed >= 3 or more
specifically where cpi->sf.recode_loop >= ALLOW_RECODE_KFARFGF.
For speeds >= 3 the code does not currently run a dummy bit pack operation
inside the recode loop. Without this dummy pack operation there is no up to
date estimate of the current frame's size to use as a basis for assessing the
requirement for a recode. In practice it was using the previous frames size (or 0
for the first frame) which could cause odd behavior.
If we require the emergency rate correction added in Change-Id: I6923.. for
the higher speed settings it will be necessary to enable the dummy pack
which will in turn hurt encode speed.
BUG=webm:1454
Change-Id: I4fb3c6062ca9508325a6f31582f8e80f1a9b126f
Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.
To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising
For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'
Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.
Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0
Actual frame size and bitrate is all 0 when using SVC sample encoder
with sl = 1 because the stats are set in parse_superframe_index which
will not caculate properly when sl = 1 since there is no superframe.
Use pkt->data.frame.sz instead when sl = 1.
Change-Id: I93f5e98a4c779e32b007e1564ba5396af9e34ad6
Use input with a narrow range because the filter only applies when the
frames are similar.
Run CompareReferenceRandom more times. Especially before narrowing the
input range, the filter frequently did not apply.
Change-Id: Ie249bedf6d0d33dfa5884611cb1835788e418b38
this test fails with the configuration similar to the assembly prior to:
d52cb5972 quantize: copy ssse3 optimizations to intrinsics
BUG=webm:1458
Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf
Change the denoiser frame buffer management for SVC to more generally
handle the layer patterns in SVC (where last is not always refreshed).
This change is only for SVC with denoising and is bitexact.
Change-Id: Ic2b146a924cdf6e7114609158afa3d4880fe3fae
Testing of 4k videos encoded with a fixed arbitrary chunking interval
uncovered a bug where by if a chunk ends 1 frame before a real scene cut,
the next chunk may be encoded with two consecutive key frames at the start
with the first being assigned 0 bits.
This fix insures that where there is a key frame group of length 1 it is
at least assigned 1 frames worth of bits not 0.
See also patch Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
which by virtue of allowing fast adaptation of Q made this bug more visible.
BUG=webm:1456
Change-Id: Ic9e016cb66d489b829412052273238975dc6f6ab
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
When adapt_partition_source_sad is enabled (currently only at
speed 6 for resoln <= 360p): use lower subsize (8x8 instead of 16x16)
for nonrd_select_partition on 32X32 blocks.
And force avoiding rectangular partition checks in
nonrd_pick_partition for speed >= 6.
Small increase ~0.5 in metrics for speed 6 on rtc_derf,
no change in speed.
Change-Id: Id751bc8f7573634571b2d6f5e29627cd5cebccae
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
Enable fast adaptation of Q when there is a large overshoot
for the #ifdef AGGRESSIVE_VBR test case.
AGGRESSIVE_VBR is not currently enabled by default.
Change-Id: I7240bb6589795964b6b0b66df4468e4f21504e0f
Originally, for the purpose of keeping a fast first pass, the first-pass
stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but
that difference is very small that doesn't cause a mismatch between the
final bitstreams. However, if the encoder changes, this minor difference
may cause a mismatch. Thus, this patch always forces the first pass to
be bit exact.
BUG=webm:1453
Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8
For 1 pass CBR mode:
Apply the logic for dropping (and re-adjusting rate control)
due to large overshoot to the case of non-screen content when
drop_frames_allowed is enabled.
For the non-screen content case: add additional condition that
rate correction factor is close to minimum state, and flag to
constrain the frequency of the dropping.
Also handle the case of temporal layers and multi-res encoding.
Add some flags/counters to the layer context for temporal layers.
For multi-res: drop due to overshoot is checked on lowest stream,
and if overshoot is detected we force drops on all upper streams
for that frame.
This feature is to avoid large frame sizes on big content
changes following low content period.
No change in behavior for screen_content_mode = 2.
Change-Id: I797ab236cbbf3b15cad439e9a227fbebced632e6
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
When the superblock partition is based on the nonrd-pickmode,
we need to avoid the denoising. Current condition was based on
the speed level. This change is to make the condition at the
superblock level, as the switch in partitioning may be done at
sb level based on source_sad (e.g., in speed 6).
Change-Id: I12ece4f60b93ed34ee65ff2d6cdce1213c36de04
This reverts commit c9266b8547.
Disable source_sad when resolution > 1080P. The test should
pass now.
BUG=webm:1452
Change-Id: I72dde88e66590ff9e41da5e5dd83f5550a83f082
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
This reverts commit aa1c4cd140.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
Move the source_sad feature to speed 6 (from speed 7), and
add speed feature to switch from the variance-based partition
to reference_partition (which uses nonrd-pickmode for bsize selection)
if source_sad is high.
Currently used only for speed 6 for resoln <= 360p.
About 4-5% improvement on 360p in RTC set.
Some speed slowdown, but still ~30% faster than speed 5.
Change-Id: Ib0330ee5fe9fdd2608aed91359a2a339d967491c
This reverts commit 03f5e300d6.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
For 8-bit the subtrahend is small enough to fit into uint32_t.
For 10/12-bit apply:
63a37d16f Prevent negative variance
previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.
Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'
Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
Backend specific optimization for PPC VSX reads 16 bytes, whereas arm neon /
sse2 only reads <= 8 bytes. Although the extra bytes read are actually never
used, this is not a warrant for groping around. Fixed by allocating more when
building for VSX. This is reported by asan.
Also note - PPC does have assembly that loads 64-bit content from memory - lxsdx
loads one 64-bit doubleword (whereas lxvd2x loads two 64-bit doubleword) from
memory. However, we only have "vec_vsx_ld" builtins that mapped to lxvd2x, no
builtins to lxsdx. The only way to access lxsdx is through inline assembly,
which does not fit well in the origin paradigm.
Refer:
vsx:
vpx_tm_predictor_4x4_vsx @ third_party/libvpx/git_root/vpx_dsp/ppc/intrapred_vsx.c
neon:
vpx_tm_predictor_4x4_neon @ third_party/libvpx/git_root/vpx_dsp/arm/intrapred_neon_asm.asm
sse2:
tm_predictor_4x4 @ third_party/libvpx/git_root/vpx_dsp/x86/intrapred_sse2.asm
BUG=b/63112600
Tested:
asan tests passed.
Change-Id: I5f74b56e35c05b67851de8b5530aece213f2ce9d
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.
Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc
Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.
If all the optimizations were unified, the structure sizes could be
greatly reduced.
Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae
Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.
This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.
Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e
To fix valgrind issueis with SVC tests.
SVC encoding uses prune_evenmore which is causing uinit value.
Will re-enable later when issue is resolved.
Change-Id: I257ff878cf78197ddd813db056582a4d5fe94f44
When content_state_sb is set to LowVarHighSumdiff, don't reset
it to VeryHighSad. Visually better on clips with strong lighting changes.
Small/negligible change in RTC metrics and speed.
Change-Id: I20c383e3c4cf8d1149de5f9260449c0b7cf7c6aa
When int_pro_motion_estimation is done for superblock in
choose_partitioning, use it to avoid the full_pixel_search
for NEWMV mode, if bsize is >= 32X32.
For speed > 7.
Small/neutral change on RTC metrics.
~1-2% speedup on arm on high motion clip.
Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b
Although the low bitdepth functions are identical (excepting the need
for larger intermediate values) they do not pass these tests. This
improves the error output to aid debugging.
Simplify buffer usage with Buffer and removing unnecessarily aligned
variables.
eob is a single element and never written using aligned instructions.
BUG=webm:1426
Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35
Reduces memory usage, and speeds up encoding for some difficult clips.
No impact on output or metrics.
Ported from aomedia patch:
https://aomedia-review.googlesource.com/c/14501
Change-Id: I26ec69af8336f9e80da486a1cfbfc89a3596954d
This reintroduces the fix:
https://chromium-review.googlesource.com/c/422807/
and later reverted here:
https://chromium-review.googlesource.com/c/447843/
BUG=webm:1355
This time behind a compile time flag :
configure --disable-always_adjust_bpm
configure --enable-always_adjust_bpm
This should make side by side testing easier and let users of the
lib pick which way they want to go.
Change-Id: I7d7b37b83015dc001810af84c132cbc1e71ba8d6
For fixed pattern SVC: keep track of denoised last_frame buffer
for base temporal layer, and if alt_ref is updated on middle/upper
temporal layers, force an update to denoised last_frame buffer.
This allows for improved denoising on top temporal layers.
Change-Id: Icbd08566027d4d2eabc024d3b7a0d959d2f8c18b
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
Remove the remaining sizes and all the highbitdepth versions.
BUG=webm:1425
Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
Denoiser is used in real-time mode which does not use alt-ref.
Reduce memory usage when denoiser is enabled.
Change-Id: I54ba3bcaeeb1818bbdf718ef90e97d4897ff793d
* changes:
sad neon: avg for 64x[32,64]
sad neon: macroize 64xN definitions
sad neon: avg for 32x[16,32,64]
sad neon: macroize 32xN definitions
sad neon: avg for 16x[8,16,32]
sad neon: macroize 16xN definitions
this has been set to max since:
f5c36a5ce VP9: turn on tile-columns and frame-parallel-mode by default
~v1.4.0
Change-Id: Ic796fc05abe73a58700ec50e3f8e72d3462898ec
In the content_state for a superblock is set to HighSad,
use that to bias some decisions in variance partition and
nonrd pickmde: use int_pro_motion for sad computation in
choose_partitioning, and set large_block in pickmode based
on the content_state_sb.
Only affects speed >= 7.
Immprovement for high motion content.
Small gain (~1%) in RTC metrics.
Speedup of ~5 for high motion clip on android (speed 8, 1 thread).
Change-Id: I5774c4854f012b89c8e969f6129b60988c2ce11c
this has been on by default since:
f5c36a5ce VP9: turn on tile-columns and frame-parallel-mode by default
~v1.4.0
Change-Id: I52017ab0157feaf429dce3d9e1af8a53bb5c1b65
the file was empty after the struct removal. the only remaining use was
within vp9_dx_iface, but the wrapper became unnecessary after the
removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I515ab585d701e77d388d12b2802d844c424f9bcd
This patch attempts to address a bug reported for 4K video.
https://b.corp.google.com/issues/62215394
In this instance a perfect storm of a moderate complexity section
followed by a much easier section where a CGI overlay helped to
suppress film grain noise, followed by a much harder and very grainy
section at the end, cause a massive local rate spike that pushed a chunk
over the upper allowed rate limit.
This patch detects cases where the rate for a frame is much higher than
expected and allows, in this special case, for rapid adjustment of the active
Q range.
For the example chunk in the bug report the target rate was 18Mb/s and the
observed rate was over 37 Mb/s with a surge for the last few frames to over
100Mb/s. This patch brings the overall chunk rate right back down to ~18.2 Mbit/s
and almost completely eliminates the rate spike at the end. (See graphs appended
to bug report)
Also see I108da7ca42f3bc95c5825dd33c9d84583227dac1 which fixes a bug
unearthed during testing of this patch and also has a bearing on high rate
encodes such as 4K.
This patch does have a negative impact on some metrics. Most notably there are
clips in our standard test set where it hurts global psnr (though in many cases it
conversely helps SSIM, FAST SSIM and PSNR-HVS). It is also worth noting that
the clips (and data rates) where there is a big metric impact, are almost all cases
where there is currently a significant overshoot vs the target rate and overall rate
accuracy is greatly improved.
Change-Id: I692311a709ccdb6003e705103de9d05b59bf840a
Local application of:
https://github.com/google/googletest/pull/1066
Suppress unsigned overflow instrumentation in the LCG
The rest of the (covered) codebase is already integer overflow clean.
TESTED=gtest_shuffle_test goes from fail to pass with -fsanitize=integer
Change-Id: I8a6db02a7c274160adb08b7dfd528b87b5b53050
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
testing::Range does not include the end parameter in the set of values.
also adjust the start to 2 as the single threaded case is already
covered in another instantiation
Change-Id: Iae3bf3ed4363dd434eccfa5ad4e3c5e553fbee60
For nonrd_pickmode: add condition for checking
intra mode if the sb content state is VeryHighSad.
Reduces artifacts when sudden change in content.
Metrics on RTC/RTC_derf neutral (small gain).
No speed loss observed.
Change-Id: I07006d28fd2dc06c1d06b07630102b0fece50c40
the last frame_worker_owner, row and col references were removed in:
131bd06e6 remove vp9_dthread.c
BUG=webm:1395
Change-Id: Ia7fb2e8782b12a58d2a2263849d20a8abf06aef6
and the related prototypes in vp9_dthread.h. the last references were
removed in:
09dabc58d VP9_COMMON: rm frame_parallel_decode
vp9_dx_iface.c still uses FrameWorkerData
BUG=webm:1395
Change-Id: Ica8e98ae776fc0105f1fbbed9e0a729808980810
creating a thread associated with the sole worker isn't necessary when
only execute() is being used after the removal of frame_parallel_decode.
BUG=webm:1395
Change-Id: I2255ce72607321e5708bc82a632dc6825d4eff5c
Add a method to acm_random.h to generate ranges of values
Add a way to call that method to buffer.h
Adjust dct_[partial_]test.cc to use it.
Change-Id: I8c23ae9d27612c28f050b0e44c41cb4ad2494086
this field has been 0 since:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
BUG=webm:1395
Change-Id: I15448e9401e15329b54c6878dda033b17be5ec6b
VPX_CODEC_USE_FRAME_THREADING was made a no-op in:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
and the tests in this file have been disabled since:
6ab0870d4 disable VP9MultiThreadedFrameParallel tests
BUG=webm:1395
Change-Id: I2c7a250acb65cf9522cf8a7bb724bb92070e41c6
this was made a no-op in:
01d23109a vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op
and the test hitting this branch has been disabled since:
6ab0870d4 disable VP9MultiThreadedFrameParallel tests
rename the test to VP9MultiThreaded to exercise the tile-based threading
BUG=webm:1395
Change-Id: I35564a75eb5a7d7f7ccb923133b1b07295201f4c
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
* changes:
sad neon: rewrite 64x64 and add 64x32
sad neon: rewrite 32x32, add 32x16 and 32x64
sad neon: rewrite 16x8, 16x16, add 16x32
sad neon: rewrite 8x8 and 8x16
sad neon: rewrite 4x4 and add 4x8
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.
BUG=webm:1424
Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468
Existing logic was only affecting resolutions above 720p.
Needs more testing for reducing subpel for speed >= 8.
No change on RTC metrics.
Change-Id: I2f4bf9f25891614aafa9a86aa5a5063a3ccfce4d
This could save some cycles since skin detection is used in multiple
places in vp9.
1~2% speed up on ARM.
Change-Id: I86b731945f85215bbb0976021cd0f2040ff2687c
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
this normalizes these tests with the regular variance ones both in
implementation and test list output
Change-Id: I387aea81456f94b8223b8fb2a28cab94bc1aa9d5
Use the scene detection for CBR mode, and use it to reset the
rate control if large source sad is detected and rate
correctioni fact/QP is at minimum state.
Avoids large frame sizes after big content change following
low content period.
Only affects CBR mode for 1 pass at speeds 5, 6, 7.
Change-Id: I56dd853478cd5849b32db776e9221e258998d874
Fix misplaced cast that caused an overflow and incorrect rate adaptation
behavior for high data rates. This in particular will have affected 4k encodes
but could also have come into play for some higher rate 1080p cases.
In our standard test sets the quality impact is small though several high rate
clips show improved rate accuracy. This can also impact the number of recode
loop hits and on one problem 4k clip the encode time for speeds 0 and 1 was
reduced by >25%
Change-Id: I108da7ca42f3bc95c5825dd33c9d84583227dac1
Use it to limit NEWMV early exit in nonrd pickmode
Small change in RTC metrics, has some improvement
for high motion clips.
Change-Id: I1d89fd955e1b3486d5fb07f4472eeeecd553f67f
this is consistent with other threaded tests and ensures gtest_filters
meant to operate on these pick them up
Change-Id: I99ce53720553a22c4b9905a2882273c2be2c031b
and vp8_fast_quantize_b_impl_mmx; this was never enabled in rtcd
an sse2 version exists so there isn't much reason to keep a mmx
implementation around.
Change-Id: I8b3ee7f46ba194ffa0d0a6225a0f299f2a4dea90
use an int to quiet an unsigned rollover warning similar to:
25110f283 Fix an ubsan warning: vp9_quantizer.c
Change-Id: Iedecb79a17249bc18f10c0920f88cf704920f12b
Adjust the threshold for turning off cyclic refresh for high motion,
and avoid testing golden in nonrd pickmode for speed >= 8 if
golden refresh was long ago.
No change/neutral on RTC metrics.
Change-Id: I40959b8d9637f3553e7458bbabd8c6024c2c09c0
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
'in' is used for the reference fdct. 'coeff' is input to the idct being
tested and 'dst[16]' is output
Fixes a segfault on unaligned memory access on x86.
Change-Id: I3691b1380ed49986897dd89a63ce63a80a0e0962
this was deleted in:
98967645a Remove vpx_idct8x8_64_add_ssse3()
but this was merged in:
9e03eedf6 Merge changes Ib26dd515,Ie60dabc3
after:
a92991133 Merge "dct tests: run all possible sizes in one test"
which added a new reference
Change-Id: I8da4a6c80d27b237a378ff15eead1daab89e7e25
Don't overide max_gf_interval if it's not specified. It will
be assigned with a default value in vp9_rc_set_gf_interval_range().
BUG=b/62803416
Change-Id: Ide46ce00279ed076865fc54ce98c55a994f0c798
Sample encoder change: reduce max-intra-rate to 1000 and
buf-initial to 600. Paramaters affect target size of key frame.
Change-Id: I2be6bc2927f5fa74e19e1efa3fb574d23a503300
Sample encoder change: reduce max-intra-rate to 1500 and
buf-initial to 700. Paramaters affect target size of key frame.
Change-Id: I01e238378b63eeef28dfc2178baadffcd3cc7561
Adjust some parameters in sample encoder: vpx_temporal_svc_encoder.
Parameters adjusted to set lower QP for initial key frame,
and allow for larger target size on subsequent key frames.
Change-Id: I092ad968e5b51b9f495dadb6ee96e810663c910e
Modify fdct4x4_test.cc to support all size combinations. This does not
add any new tests and in fact fails a few. There were minimal changes
made to the tests so it's not entirely surprising that some of the
larger 12 bit transforms are failing since it was initially only used
for 4x4.
In follow up patches the tests in fdct8x8_test.cc, dct16x16_test.cc and
dct32x32_test.cc will be evaluated and moved to dct_test.cc.
BUG=webm:1424
Change-Id: I72a23430f457d7fae8c91e706adc0e77c25abc8f
Set the base_mv_aggressive for temporal enhancement layers (TL > 0).
Under the aggressive mode, skip the NEWMV depending on the
SSE of the base_mv. Also reduce the subpel motion to 1/2 under
aggressive mode if base_mv is good.
Speedup ~3% with small/negligible loss in quality on RTC.
Affects speed >= 6.
Change-Id: I89341b279cad6da2a04b76d5e726016191dacdb8
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
This was ported from the greedy version in AV1, written by Dake He
(dkhe@google.com).
See:
https://aomedia.googlesource.com/aom/+/master/av1/encoder/encodemb.c#137
Greedy version is disabled by default, but can be picked by setting
USE_GREEDY_OPTIMIZE_B to 1.
To be enabled by default later.
This is both faster and better in terms of compression.
Compression Improvement:
------------------------
lowres: -0.119
midres: -0.064
hdres: -0.405
Speed Improvement:
------------------
(Based on encode time of 3 videos of different difficulties at
3 different target bitrates)
With --cpu-used=0: 0.38% to 5.55% faster
With --cpu-used=1: 0.24% to 2.79% faster
With --cpu-used=2: 0.29% to 1.46% faster
Change-Id: Ia7a23b3b244ad8eb253ac9e43cd03c5e021d2635
* changes:
Update high bitdepth load_input_data() in x86
Clean array_transpose_{4X8,16x16,16x16_2) in x86
Remove array_transpose_8x8() in x86
Convert 8x8 idct x86 macros to inline functions
some build systems have trouble with duplicate basenames.
vpx_dsp/skin_detection.[hc] were added in:
658e85425 Merge skin detection code in vp8/9.
BUG=webm:1438
Change-Id: Ieaa70b40bda409ec23e6d179b47a930ac6243b05
Set subpel prune_evenmore only for non_reference frames,
instead of all TL > 0 frames. Gain some quality back at
cost of small speed loss (~1-2%).
Change only effects SVC encoding at speed >= 7.
Change-Id: I5b9f51e51dccfd7050521a66996176b0415ca3f9
the check for error correction being disabled was overriding the data
length checks. this avoids returning incorrect information (width /
height) for the decoded frame which could result in inconsistent sizes
returned in to an application causing it to read beyond the bounds of
the frame allocation.
BUG=webm:1443
BUG=b/62458770
Change-Id: I063459674e01b57c0990cb29372e0eb9a1fbf342
min_gf_interval should be no less than min_altref_distance + 1,
as the encoder may produce bitstream with alt-ref distance being
min_gf_interval - 1.
BUG=b/38450599
Change-Id: Ifb733daa643ebc668d1b23e1ce92db94b66dabe8
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.
BUG=webm:1424
Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
Keep the 1/4subpel for all frames, use SUBPEL_TREE_PRUNED_EVENMORE
for all temporal enhancement layer frames.
Change-Id: Ibc681acbb6fc75b7b3c57fc483fcb11d591dfc9a
It is initialized to be { INT_MAX, 0, ... } in ffe0f9b.
No effect on encoders.
Make it consistent with other initializations.
BUG=webm:1440
Change-Id: Ie2a180d93626b55914c8c4255e466a1986d2b922
visual studio will warn if a 32-bit shift is implicitly converted to 64.
in this case integer storage is enough for the result.
since:
f3a9ae5ba Fix ubsan failure in vp9_mcomp.c.
Change-Id: I7e0e199ef8d3c64e07b780c8905da8c53c1d09fc
For SVC 1 pass non-rd mode:
Force subpel seach off for SVC for non-reference frames
under motion threshold.
Add flag to svc context to indicate if the frame is not used
as a reference.
Little/no quaity loss, ~2% speedup.
Change-Id: Ic433c44b514d19d08b28f80ff05231dc943b28e9
Speed >=8: for resolutions above CIF, and for low motion content,
set subpel_search_method to SUBPEL_TREE_PRUNED_EVENMORE.
Small speed gain (~2%) on vga clips,
RTC metrics up by ~2-3% on average.
Change-Id: Ie26ba0264589652f92dfe74308740debf94cf0cc
x86 requires 16 byte alignment for some vector loads/stores.
arm does not have the same requirement.
The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.
Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.
BUG=webm:1435
Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
Fix the condition on usage of source_sad for temporal layers.
FIx allows it to be used for the case of 1 temporal layer.
Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
Adjust the max_copied_frame setting for temporal layers.
Keep the same setting for non-SVC at speed 8.
This change also enables copy_partiton for non-SVC at speed 7,
but with smaller value of max_copied_frame (=2).
~2% speedup for SVC speed 7, 3 layers, with little/no quality loss.
Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).
But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.
There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.
Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
The sub pixel variance uses a temp buffer which guarantees width ==
stride. Take advantage of this with the 4x and avoid the very costly
lane loads.
Change-Id: Ia0c97eb8c29dc8dfa6e51a29dff9b75b3c6726f1
For aq-mode=3: refactor the condition for turning off
the refresh. Add some adjustments for high motion content.
No/little change in RTC metrics, only affects high motion case.
Change-Id: I7da8eabfb0e61db014be4562806f72ee5ef4a43b
When temporal layers are used, only allow for copy partition
on the top temporal enhancement layer frames.
Change-Id: I5472abdc0f9f6c8dafa75a7a84c615e08ae22af8
Only affects speed 8.
Make changes to copy partition to fix a bug in setting microblock
offset. Avg PSNR shows 0.02% gain on rtc_derf and 0.08% loss on rtc.
Change-Id: I61c3e5914dde645331344388e7437e5638acd4f3
The modified error was a derivative of the "coded_error"
that was used to allocate bits between different frames on the
assumption that the allocation should be linear in terms of this
modified error. I.e. a frame with double the modified error score
should all things being equal get double the number of bits. The
code also included upper and lower caps derived from input
VBR parameters.
This patch improves the initial calculation of the clip mean error
(now called "mean_mod_score" as it is no longer a prediction error)
used as the midpoint for the rate distribution function and normalizes
the output "modified scores" scores such that 1.0 indicates a frame
in the middle of the distribution. The VBR upper and lower caps are
then applied directly to a frame's normalized score.
This refactoring is intended to make it easier to drop in alternative
distribution functions or to base the rate allocation on a corpus wide
midpoint (rather than the clip mean).
Change-Id: I4fb09de637e93566bfc4e022b2e7d04660817195
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.
BUG=webm:1422
Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
Advise the compiler that the store is eventually going to a uint8_t
buffer. This helps avoid getting alignment hints which would cause the
memory access to fail.
Originally added as a workaround for clang:
https://bugs.llvm.org//show_bug.cgi?id=24421
Change-Id: Ie9854b777cfb2f4baaee66764f0e51dcb094d51e
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().
Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
Most existing first pass stats are stored in a form normalized to a
macro-block scale. However the error scores for intra / inter etc were
stored as frame level values but mainly used as MB level values.
This change fixes that. Normalized per MB values make comparisons
between different formats easier and in any case this is usually what is
wanted.
An change in results should be limited to slight differences in rounding.
*** Change after patch 8 +2 requiring new approval.
Final pre-submit testing showed one 4K clip with above expected change.
Investigation showed this was due to a value used to test for ultra low intra
complexity in key frame detection. This was a per frame not per MB value but
also did not scale with frame size. Replacement with a small per MB value
(based on original per frame value and cif frame size) resolved the KF detection
problem.
Also converted kf_group_error_left to a double in line with other error values
to reduce rounding problems in KF group bit allocation
All clips and sets now show nominal (or 0) change as expected.
Change-Id: Ic2d57980398c99ade2b7380e3e6ca6b32186901f
This reverts commit 0d88e15454.
Reason for revert: chromium builds are failing to locate vpx_rv during dlopen()
dlopen failed: cannot locate symbol "vpx_rv" referenced by "libstandalonelibwebviewchromium.so"
Original change's description:
> Add visibility="protected" attribute for global variables referenced in asm files.
>
> During aosp builds with binutils-2.27, we're seeing linker error
> messages of this form:
> libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
> symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
> object
>
> subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
> Other messages refer to symbol references from deblock_sse2.o and
> subpixel_sse2.o, also assembled from asm files.
>
> This change marks such symbols as having "protected" visibility. This
> satisfies the linker as the symbols are not preemptible from outside
> the shared library now, which I think is the original intent anyway.
>
> Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
>
TBR=jzern@google.com,johannkoenig@google.com,rahulchaudhry@chromium.org,builds@webmproject.org
Change-Id: I0c2ea375aa7ef5fda15b9d9e23e654bb315c941b
This reverts commit 3704807805.
Reason for revert: <INSERT REASONING HERE>
Does not look to be the cause of the test failures.
Original change's description:
> Revert "vp8: Real-time mode: reduce mode_check_freq thresh for speed 10."
>
> This reverts commit 4a7424adba.
>
> Reason for revert: <INSERT REASONING HERE>
> Possibly causing test failures in roll into chromium.
>
> Original change's description:
> > vp8: Real-time mode: reduce mode_check_freq thresh for speed 10.
> >
> > Reduces quality regression at speed 10 for real-time mode.
> >
> > Change-Id: I9f624bea9ca262dab32ce9de7d6d91175d6becc8
> >
>
> TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
> # Not skipping CQ checks because original CL landed > 1 day ago.
>
> Change-Id: I1defcb74e78a5a3bd29b7d1b21a96a79fa26a457
>
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Change-Id: I13d86a2a68b8aa8c0c7465e6e58cff0e00bc7862
This reverts commit 4a7424adba.
Reason for revert: <INSERT REASONING HERE>
Possibly causing test failures in roll into chromium.
Original change's description:
> vp8: Real-time mode: reduce mode_check_freq thresh for speed 10.
>
> Reduces quality regression at speed 10 for real-time mode.
>
> Change-Id: I9f624bea9ca262dab32ce9de7d6d91175d6becc8
>
TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: I1defcb74e78a5a3bd29b7d1b21a96a79fa26a457
Move the tran_low_t helper functions to a new file. Additional
load/store functions will be added here.
Change-Id: I52bf652c344c585ea2f3e1230886be93f5caefc3
During aosp builds with binutils-2.27, we're seeing linker error
messages of this form:
libvpx.a(subpixel_mmx.o): relocation R_386_GOTOFF against preemptible
symbol vp8_bilinear_filters_x86_8 cannot be used when making a shared
object
subpixel_mmx.o is assembled from "vp8/common/x86/subpixel_mmx.asm".
Other messages refer to symbol references from deblock_sse2.o and
subpixel_sse2.o, also assembled from asm files.
This change marks such symbols as having "protected" visibility. This
satisfies the linker as the symbols are not preemptible from outside
the shared library now, which I think is the original intent anyway.
Change-Id: I2817f7a5f43041533d65ebf41aefd63f8581a452
Increase the partition and acskip thresholds for temporal
enhancement layers.
~1-2% speedup, with negligible loss in quality.
Change-Id: Id527398a05855298ad9ddac10ada972482415627
For SVC 1 pass non-rd pickmode, the interpolation filter for the
upsampling of the golden (spatial) reference was not being explicitly
set and instead was takin gwhatever value was set in the previous
mode/block (which would be either EIGHTTAP or EIGHTAP_SMOOTH).
Fix it to the default EIGHTTAP for now, to be updated/selected
adaptively in a later change.
Minor adjustmemt to rate targeting thresholds in datarate unittests.
Change-Id: I52085048674072c6cfb7163e11e9a2658d773826
A more detailed explanation of the experimentation
leading to this change can be found in:-
https://docs.google.com/a/google.com/document/d/13lsYhxgPyxUHvEess6wg9nikaonIZKY9Ak_Lpafv5Mo/edit?usp=sharing
This change gives gains across all our standard test sets for
overall psnr, ssim, fast ssim and psnr-HVS.
Values expressed as % reduction in bitrate.
Low res set -0.257, -0.192, -0.173, -0.101
Mid res set -0.233, -0.336, -0.367, -0.139
High res set -0.999, -1.039, -1.111, -0.567
NetFlix 2K set -0.734, -0.174, -0.389, -0.820
Netflix 4K set -0.814, -0.485, -0.796, -0.839
Change-Id: Ie981fb3c895c9dfcfc8682640d201a86375db5c8
Speed up for speed 0.
Reduce 10+% of encoding time for hdres in speed 0,
with less than 0.1% PSNR loss.
Compute total difference of previous and current frame context probability
model. If the diff is less than the threshold, skip recoding the frame.
Borg test (positive number means performance loss):
lowres midres hdres
PSNR: 0.030 0.032 0.065
Local speed test: bitrate set at 1200
blue_sky pedestrian rush_hour
Encoding time: -10.0% -16.5% -16.5%
Change-Id: I4e2d200ea3115d48b2c3e890143596b31b8ef9e9
Introduced append situation in Commit 0178d97 which could be
confusing. Clean a little bit and add some comments.
Change-Id: I69ad336f805aca7ce9d45515b8cd237423fadbb2
When the noise estimate is forced off due to large motion,
reset the counter and set smaller window for next estimate.
Change-Id: Ifa4ec95396134173a00d48353ad52f1b6a40c217
Add option in SVC to set the filter type and phase for
the frame level downsampling filters.
For 3 spatial layers: set downsampling filter type to bilinear
and set phase to 8, for lowest spatial layer.
Change-Id: Id81f4b1ba93db19c1cd37b6a46d1281a2c61bc43
Makes more sense to call the corresponding partial idct C function
instead of the full idct C function as the reference.
Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6
Base the condition on the resolution of the spatial layer.
And remove restriction on scaling factor.
Change-Id: Iad00177ce364279d85661654bff00ce7f48a672e
Read in a Q register. Works on blocks of 16 and larger.
Improvement of about 20% for 64x64. The smaller blocks are faster, but
don't have quite the same level of improvement. 16x32 is only about 5%
BUG=webm:1422
Change-Id: Ie11a877c7b839e66690a48117a46657b2ac82d4b
For lowest spatial layer, in 3 layer SVC, set the
downsampling filtertype to get averaging filter.
Needed for reducing aliasing on low-res layer,
small increase in overall encoder time.
Change-Id: Ia31460123bd91b72eca49b46dd924b9f226d4563
An intended behavior change disabling exhaustive searches in speed
feature causes VP9/DatarateTestVP9LargeDenoiser.4threads test failure.
Change the threshold to make it pass.
BUG=webm:1429
Change-Id: Ibcbe2314c6b2525799894f5d7204fc8eb4ec2a1e
Adjust thresholds for noise estimation, for resolutions above VGA.
Tends to push cleaner/low noise clips to LowLow state.
No change in RTC metrics.
Change-Id: I739ca6b797d0a60ccd1c6c6a2775269b1f007e5e
Set noise level to kLowLow for high motion low res clips.
Change the normalization in noise metric for low res.
Reduce the initial time-window for all resolutions.
Change-Id: Iaed39dbb50b205cd9c735dc5b84822304fb01987
Add support for everything except block sizes of 4.
Performance is better but numbers will improve again when the variance
optimizations land.
BUG=webm:1423
Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80
When a neon version is available it will be called. This allows
decoupling the variance implementations and has no real downside. For
most configurations, the call will be #define'd to the neon
implementation.
Change-Id: Ibb2afe4e156c5610e89488504d366b3e6d1ba712
When the width is equal to 8, process two rows at a time. This doubles
the speed of 8x4 and improves 8x8 by about 20%.
8x16 was using this technique already, but still improved a little bit
with the rewrite.
Also use this for vpx_get8x8var_neon
BUG=webm:1422
Change-Id: Id602909afcec683665536d11298b7387ac0a1207
Some of the mixed sizes were missing. They can be implemented trivially
using the existing helper function.
When comparing the previous 16x8 and 8x16 implementations, the helper
function is about 10% faster than the 16x8 version. The 8x16 is very
close, but the existing version appears to be faster.
BUG=webm:1422
Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004
Add 31bit pairs before unpacking in x86 block error code
AVX2 code provides a very minor performance improvement.
BUG=webm:1210
Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
For SVC 1 pass real-time: add condition to skip the
golden (spatial) reference mode in non-rd pickmode.
Condition is to skip golden if the sse of zeromv-last mode
is below threshold. And change order in ref_mode_set_svc
to make sure golden zeromv is tested after last-nearest.
Speedup ~3-4% with little/negligible quality loss.
Change-Id: I6cbe314a93210454ba2997945f714015f1b2fca3
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d22
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.
Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882) for REALTIME.
Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
Add tentative max cpb size values for levels 5.2 and up. Otherwise
encoding will fail when targeting for these levels.
Change-Id: Ib7e0ba4b9836ea1ac900b6822543812843d48463
107de19698 changes the encoder alt-ref selection behavior. Assuming
min_gf_interval = max_gf_interval = 4, the frame order would be
frm_1 arf_1 frm_2 frm_3 frm_4 frm_5 arf_2 before 107de19698;
frm_1 arf_1 frm_2 frm_3 frm_4 arf_2 frm_5 after 107de19698.
This patch reverts such alt-ref placement change.
Change-Id: I93a4a65036575151286f004d455d4fcea88a1550
Make some speed setting changes for temporal enhancement layers,
and remove the switch in subpel_force_stop for the aggressive_base_mv
in non-rd pickmode.
Gain some 2-3% speed with little/negligible quality loss.
Change-Id: I3e2a7f80ff45f38c0a6ceb01b34dbca2f53edbf0
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.
1~2% speedup on ARM for speed 8.
Borg tests show neutral results in both rtc and rtc_derf.
Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).
Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
In non-rd pickmode (speed >= 5), avoid duplication of computations in
model_rd_for_sb_y when the speed feature use_simple_block_yrd is
enabled (or for high bitdepth build under certain conditions).
QVGA, VGA and HD have 1.23%, 2.68% and 1.7% speedup on ARM for speed 8,
respectively.
Encoding results are bitexact for speed >= 5.
Change-Id: I3f9130810c21439f5ad7e159e21cb2243dcd05f1
Re-enable the SVC tests, wrap the non-zero expectation
in GetMismatchFrames around #if CONFIG_VP9_DECODER.
Change-Id: I0e8a2d78b868c32f18fe597540f397d3a1b303b5
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.
Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
The allow_exhaustive_searches feature improves the encoding quality
of FC_GRAPHICS_ANIMATION content a lot. For non-FC_GRAPHICS_ANIMATION
content, the quality test result is almost neutral. This patch makes
this feature to be used only for FC_GRAPHICS_ANIMATION content.
The motivation of doing that is to make this feature no longer adaptive,
which will be implemented in the following patch.
Change-Id: Ic911df6dd757402b6480789cc247801e99840369
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.
Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.
Keep the phase to 0/off for now.
Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
Disable the 1 pass CBR SVC tests with temporal_layers > 1.
Issue with the commit 863f860, which will cause encoder/decoder
mismatch due to skipping encoder loopfilter for non-reference frames.
Will re-enable the tests when fixed.
Change-Id: I74918a0045a17976b069c4be63fbeb921974df0d
This condiiton is not needed as key_frame should set the refresh
of the reference frames, but good to have for clarity in condition.
Change-Id: Icf9838e7e4f0ff5cf0a9562ae3b5d6c7e6f78702
Modify the frame flags to update the ARF on top layer,
for the tests:
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayersFrameDropping
This is needed to fix the encode/decoder mismatches caused by 863f860,
and removed in the revert e9b7f98.
Change-Id: I6b9fecfdd17315fc0179e29949338c77636026c0
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
This reverts commit 863f860bfc.
This causes encoder / decoder mismatches in various
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
BUG=webm:1408
Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
For low resolutions (<= CIF): use quarter-pixel and simple_block_yrd.
~5% gain on RTC_derf.
~6-7% slowdown on ARM.
Change-Id: I4439ebd1116b9decac04786503f978840b68a60c
Fix to avoid getting stuck at very low Q even
though content is changing, which can happen for --min-q=0.
Fix is to more aggressively increase active_worst_quality
when detecting significant rate_deviation at very low Q.
Change will only affect 1 pass VBR for --min-q < 4, so no
change in ytlive metrics for --min-q >= 4.
Change-Id: I4dd77dd7c08a30a4390da0ff2c8bda6fccfa76d7
Useful for SVC, where the top layer enhancement frames may
not update any reference buffers, as is the case for the
patterns in the 1 pass CBR SVC when #temporal_layers > 1.
~3% encoder speedup for SVC patterns with temporal layers
in 1 pass CBR mode.
Updated the SVC datarate tests for the mismatch frames.
Set the frame-dropper off in some tests with #temporal_layers > 1
so we can correctly set #mismatch frames. Adjusted rate target
threshold for tests where frame-dropper was turned off.
Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
The MV unit test revealed an integer overflow issue in vp9_mcomp.c.
This was caused if the MV was very large. In mv_err_cost(), when
mv->row = 8184, mv->col = 8184 and ref_mv is 0, mv_cost = 34363
and error_per_bit = 132412, causing the overflow.
BUG=webm:1406
Change-Id: I35f8299f22f9bee39cd9153d7b00d0993838845e
Set adaptive_rd_thresh to 2 when simple block yrd is not used.
Fix regression caused by computing y sad without
int_pro_motion_estimation on low res motion clips.
Overall 0.07% quality loss on rtc_derf.
Change only affects low res on speed 8.
Change-Id: Ic6a188a56529f1034d6431005fb4b0e24e8a7e27
For speed 5, 1 pass CBR: Don't use the nonrd_pick_partition
on the segment, rather use choose_partitioning followed by
nonrd_select_partition (as is done on base segment).
Little/no quality loss on RTC and RTC_derf (< 0.3%),
speedup of at least 5%.
Change-Id: I5273d5f950e60adf5e437b4ca8c4f63964641e83
If the noise estimation is avoided due to large motion,
the last_source for denoising should still be updated.
Change-Id: I67155ea7dbe9ac2785978e64a27bdafd7d57aac0
To reduce refresh on partial super-blocks on boundary,
for noisy input. Reduces some artifacts on noisy input.
Change-Id: I10b5808a296874e08c7f378b3df58466591d8dbe
Edit
Move the condition for effectively disabling the denoising
for speed 5 into the vp9_denoiser_denoise().
This is cleaner, and also moving the condition into vp9_denoiser_denoise
will keep the denoiser buffer updated with the current source.
This allows for more consistent behavior if speed is changed midstream.
Change-Id: Ia001f591c56e454bf724c3ae73c024badb183ef8
To prevent the motion vector out of range bug, added a motion vector unit
test in VP9. In the 4k video encoding, always forced to use extreme motion
vectors and also encouraged to use INTER modes. In the decoding, checked if
the motion vector was valid, and also checked the encoder/decoder mismatch.
The tests showed that this unit test could reveal the issue we saw before.
Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4
For 8-bit the subtrahend is small enough to fit into uint32_t.
This is the same that was done for:
c0241664a Resolve -Wshorten-64-to-32 in variance.
For 10/12-bit apply:
63a37d16f Prevent negative variance
Change-Id: Iab35e3f3f269035e17c711bd6cc01272c3137e1d
Temporal denoiser runs in non-rd pickmode, so it is only used
for speed >= 5. Regression exists for speed 5, due to use of
reference_partition (which use non-rd pickmode for partitioning).
Avoid denoising for now at speed 5.
Change-Id: I74a74d2e1404d7cfd33dcf4ec06dd2e503256cf0
Base the low_content_frame metric on the motion vectors,
and adjust the logic for preventing golden update.
Small change in behavior: small positive gain (~0.2-1%) on clips
with high activity.
Change-Id: I0b861c8e9666cd82b45cde5ee57ee8a1e5ab453c
Code cleanup: merged two functions that were doing postencode
update for cylic refresh, remove some unused code and fix comments.
No change in behavior.
Change-Id: I9be0d7e346d34dec29bf4e5bb380a7bf81c8480a
BUG=webm:1397
(yunqingwang)
To verify that this patch wouldn't cause much performance change,
the Borg tests were run. Here was the result:
avg_psnr overall_psnr ssim
hdres: -0.002 0.006 0.013
midres: 0 0 0
lowres: 0 0 0
Change-Id: Iae395ae7b741e0513cf5bab9dcace110b792a67d
The row mt sync read uses sync_range = 1, and wouldn't work if we want
to use a sync_range that is greater than 1. To make it work, this sync
read code is modified. Pass in col instead of col - 1 to make it
consistent with other row mt code in VP9, and then add 1 in "while"
codition.
Change-Id: I4a0e487190ac5d47b8216368da12d80fec779c1a
Issue/bug happens for denoising with spatial layers, where
the golden (spatial) reference is used in pickmode, but
denoising is only done wrt to last (temporal).
Fix is to make sure set_ref_ptrs is set before build predictors
in denoiser.
Change-Id: I793cf441341edf7c4a88b8ab1e1b22b3cb0eb508
Temporary override to condition for disallowing intra-search in SVC,
since golden (spatial) reference is currently suppressed due to
artifact issue.
Change-Id: I28ed7fdddc9fcdbcc0a4175a247a3ecc94c11767
For non-rd variance partition, avoid the chrome check
unless y_sad is below some threshold.
Small decrease in avgPSNR (~0.3) on RTC set.
Small/negligible decrease on RTC_derf.
Change-Id: I7af44235af514058ccf9a4f10bb737da9d720866
Refactor to split the 1 passs source sad computation into scene
detection (currently used for VBR and screen-content mode), and
superblock based source sad computation (used in non-rd CBR mode).
This allows the source sad computation for CBR mode to be
multi-threaded.
No change in compression.
Change-Id: I112f2918613ccbd37c1771d852606d3af18c1388
d207/d63/d45/d117/d135/d153
~9-45% better depending on the predictor on 32-bit ARM, similar range on
x86-64
this matches the non-highbitdepth implementation
BUG=webm:1316
Change-Id: Iddebdf7c58c6f31c47cae04da95c6e5318200e4c
Make the source_sad feature work properly for cases of VBR or
screen_content with SVC.
Added unittest for SVC with screen-content on.
Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b
tolerate a NULL hist being passed as a result of invalid parameters
passed to init_rate_histogram(). this fixes a divide by zero in
init_rate_histogram() with an invalid fps.
BUG=webm:1383
Change-Id: Id203e0f3b18d67a4a09aaf206abcce4708f966ec
Since y_sad is not computed yet (on the early exit due to source_sad),
no need to check for setting color_sensitiviy.
Only affects speed >=8. No change in behavior.
Change-Id: I3a6f2d20fed38d8b8ec51b75bcacf9a21f2db916
Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.
On average no loss on RTC set, ~4% speedup on mac.
Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc
Change tests to reflect use. Input sizes will be 8 or 16 (but not
necessarily square).
filter_weight is capped at 2 and filter_strength at 6
Speed test, disabled by default.
Change-Id: Idfde9d6c4b7d93aaf0e641b0f4862c15e2a2af7a
Change it to row based array to avoid the slow down cause by sync.
row-mt on, speed 8, 2 threads: ~4% speedup for VGA on ARM benefited
from adaptive_rd_threshold.
Change-Id: I887e65a53af20a6c4f48d293daaee09dab3512cf
- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
operations.
Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3
Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.
Small/no change on RTC metrics.
Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae
For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.
Only enabled for speed >= 8.
avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.
Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7
The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.
After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.
Re-did the quality/speed test to check the impact of the fix.
1. Borg test BDRATE result:
4k set: PSNR: +0.183%; SSIM: +0.100%;
hdres set: PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;
2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.
The result is in line with the original result.
Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda
Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.
Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.
No change in behavior.
TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225
Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6
Don't denoise spatial layer frames whose base layer is a key frame.
Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.
Change-Id: I87a6597812330500966458172acfce54af65f70f
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.
Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
Similar issue as Change bc1c18e.
The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.
Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.
Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.
Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.
The patch was rebased to the top-of-tree.
Borg test BDRATE result:
4k set: PSNR: +0.144%; SSIM: +0.043%;
hdres set: PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;
Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.
Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.
Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c
Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.
since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance
BUG=webm:1384
Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.
https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code
BUG=webm:1054
Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
Enable row-mt for SVC for real-time mode, speed >=5.
Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.
For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.
Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.
Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes
BUG=chromium:697956
Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.
Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4
Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.
Tried to remove store_in_output(), but speed gets worse.
Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST
Add VSX flags and check for -mvsx
Define empty setup_rtcd_internal
Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux
Detect VSX at runtime when enabled
Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.
BUG=webm:1301
Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.
No change in quality.
Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.
Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc
clear the entire array on error. the size used previously was equal to
the number of elements.
BUG=webm:1364
Change-Id: I2f2e16ed6e867f41d4774a5a8ac9cedaee11ce46
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).
Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.
Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
Moves the def from vpx_encoder.h -> vpx_codec.h. The defined value
is changed as part of this move.
Adds the value to decoder capabilities when CONFIG_VP9_HIGHBITDEPTH.
Change-Id: I7d61fc821cda29f1e32bb9b2b9ffd3d83966e419
This reverts commit d3db846cc5.
This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)
BUG=b/35804225
Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0
From commit:
https://chromium-review.googlesource.com/c/441393/
On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.
Small change in RTC metrics and speed.
Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.
Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
Re-organized the encoder threading tests and grouped tests into
4 parts. Added PSNR checking test to make sure the PSNR variation
is within a small range.
BUG=webm:1376
Change-Id: I09edb990236a87a4d2b2b0e1ceaf6c6435a35eff
vp9_highbd_block_error_8bit_c was a very simple wrapper around
vp9_block_error_c. The SSE2 implemention was practically identical to
the non-HBD one. It was missing some minor improvements which only
went into the original version.
In quick speed tests, the AVX implementation showed minimal
improvement over SSE2 when it does not detect overflow. However, when
overflow is detected the function is run a second time. The
OperationCheck test seems to trigger this case and reverses any
speed benefits by running ~60% slower. AVX2 on the other hand is
always 30-40% faster.
Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
Only works for bitdepth = 8 when compiled with high bitdepth flag.
4x speed ups for handling 1:2 down/upsampling.
Validated manually for:
1) Dynamic resize for a single layer encoding
2) SVC encoding with 3 spatial layers
Results are bitexact with the patch and the speed gain (~4x) in the
scaling was verified.
BUG=webm:1371
Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
The reduction showed improvement on RTC when aq-mode=3 is on.
Add that (cyclic refresh enabled) to the condition.
Only affects 1 pass CBR.
Change-Id: I5d0843002d8e31d7c165098a62e7a71146b08664
For speed 8 only.
3% speed up for QVGA and 6.3% for VGA on Nexus 6.
~3% avgPSNR decrease on rtc_derf and 2.9% on rtc.
Disabled for now.
Change-Id: I70133f1f6c804d663d594df437bfe7fdb0030d6a
The output needs to be aligned. Input is read with 'movq' not 'movqda'
so it is not expected to be aligned.
Change-Id: Ibd48a84c1785917a6a97c3689a05322abba486b4
Increase the variance partition thresholds for superblocks that
have low sum-diff (from source analysis prior to encoding frame).
Use it for now only for speed >= 7 or for denoising on.
Small change on metrics for rtc set: less than ~0.1 avgPNSR decrease
on RTC set, for both speed 7 and 8.
Change-Id: I38325046ebd5f371f51d6e91233d68ff73561af1
Use the simple block_yrd under certain conditions.
The optimization code is completed but the speed is still slower
(~6% on 720p) than the low-bitdepth build.
For now, use the more complex block_yrd under certain conditions
(always use it for speed <= 5, otherwise use it on key frames and for
bsize >= 32x32).
This gives about ~2-3% gain in quality for speed 7 on RTC set
(over high bitdepth build), with about the same encoder fps as the
low bitdepth build.
Change-Id: Ibe92a1945d0bd635f880befb4c815727df62d754
Modified the code to facilitate bit-match tests in first pass
Added unit-tests to test the row based multi-threading behavior for bit-exactness
Change-Id: Ieaf6a8f935bb1075597e0a3b52d9989c8546d7df
This change subtracts out low complexity intra regions that are also low
error in the inter domain, in the calculation of the frame prediction decay.
The rationale here his that low complexity regions (such as sky) do not imply
high prediction decay in the same way as high error intra or neutral blocks.
The effect of this is small in most clips but in a few clips it can be > 10%.
(E.g. In to tree)
Change-Id: If67ac23d17fca14285cad2defa464c61c9ea861c
vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use
of these parameters.
scan is used for C code and iscan is used for SIMD implementations.
Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5
* changes:
quantize_fp_32x32 highbd ssse3: enable existing function
quantize_fp highbd ssse3: use tran_low_t for coeff
quantize_fp highbd sse2: use tran_low_t for coeff
- Replace the corresponding assembly code.
- No user level speed performance degrade.
- Unit tests passed.
Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
This was created as part of the quantize_fp_ssse3 change. Both
functions use the same source file with different macro parameters.
Change-Id: I267050a559426a85955d215aa0aaca270439c5ab
The previous implementation confused bit/bytes/elements. It was using
'32' as the multiplier but that was mistakenly adopted because a 32x32
transform embedded the stride.
Change-Id: Ieeb867a332416b9a40580b5e7c9b20088e9e691a
The weight segment needs to only be computed once per frame,
so remove it from the funciton vp9_cyclic_refresh_rc_bits_per_mb(),
which is called within a loop inside vp9_rc_regulate_q.
Change-Id: Ia0e18b89abb97e42c466d4dbc47700d7f76555db
vp9_compute_qdelta_by_rate has almost 2% overhead in profiling on Nexus 6.
Reduce the calling of that function in speed 8 by estimating the delta-q.
Both rtc and rtc_derf show little/no change in avg psnr/ssim.
Encoding speed is 2~3% faster on Nexus 6.
Change-Id: If25933715783f31104a18a5092ea347b1221b5f5
This small change replaces the frame boost check in the arf group
length break out clause with a test against a prediction decay value.
The boost value is in fact partly dependent on the decay value but
this change means that the per frame boost calculation can be adjusted
without influencing the group length calculation.
The value chosen gives a close match on all the test sets with the previous
code (on average) but it was noted that a lower threshold was slightly better
for 1080P and up and a slightly higher value for small image sizes.
Change-Id: I4d5b9f67d5b17b0d99ea3f796d3d6202fd61ee0c
The function scale_sse_threshold() returns a threshold scaled
if necessary for use with 10 and 12 bit from an 8 bit baseline.
SSE error values would be expected to rise for the 10 and 12
bit cases where there are more bits of precision.
Hence the threshold used for the test should also be scaled up.
Change-Id: I4009c98b6eecd1bf64c3c38aaa56598e0136b03d
When eob is less than or equal to 38 for high-bitdepth 16x16 idct,
call this function.
BUG=webm:1301
Change-Id: I09167f89d29c401f9c36710b0fd2d02644052060
(Yunqing Wang)
This patch implements the row-based multi-threading within tiles in
the encoding pass, and substantially speeds up the multi-threaded
encoder in VP9.
Speed tests at speed 1 on STDHD(using 4 tiles) set show that the
average speedups of the encoding pass(second pass in the 2-pass
encoding) is 7% while using 2 threads, 16% while using 4 threads,
85% while using 8 threads, and 116% while using 16 threads.
Change-Id: I12e41dbc171951958af9e6d098efd6e2c82827de
This matches bitdepth_conversion_sse2.asm and produces substantially
better assembly. The old way had lots of 'movzwl' and 'shl' and storing
back to memory before loading into an xmm register.
Change-Id: Ib33e35354dfd691a4f8b1e39f4dbcbb14cd5302b
Clears up static clang analysis warning regarding divide by zero.
Trying to explain to the compiler how it's impossible to avoid
incrementing num_blocks at least once is difficult.
Change-Id: Ibaae43be572e5cd7a689b440dcd341c17d33443b
Where clang static analysis or gcc -Wmaybe-uninitialized warns of
uninitialized values, assign 0 to ints, MB_MODE_COUNT to
MB_PREDICTION_MODE, and B_MODE_COUNT to B_PREDICTION_MODE.
Assert that the modes have been changed from the invalid value by
the end of the function.
Change-Id: Ib11e1ffb08f0a6fe4b6c6729dc93b83b1c4b6350
While the new-mt mode is enabled(namely, allowing to use row-based
multi-threading in encoder), several speed features that adaptively
adjust encoding parameters during encoding would cause mismatch
between single-thread encoded bitstream and multi-thread encoded
bitstream. This patch provides a set_control API to disable these
features, so that the bit match bitstream is obtained in the unit
test.
Change-Id: Ie9868bafdfe196296d1dd29e0dca517f6a9a4d60
broken since:
c3f095c8b Merge "Fix to avoid abrupt relaxation of max qindex in recode path"
5f21aba4b Fix to avoid abrupt relaxation of max qindex in recode path
the original change pre-dated the addition of .clang-format
Change-Id: If5e399d9a805bcad9147360b13b36fbc8c560a7c
VBR method that allows a wider Q range for the first normal frame
in each ARF group and then centers the min - max range for the rest of
the arf group on the chosen Q value for that first frame.
This allows for quite rapid adjustment of the active Q range even if the
initial estimate is poor.
In some cases where the ARF frames themselves are tending to
undershoot but the normal frames are overshooting this can still give
net undershoot. This can be corrected by allowing a larger Q delta for
arf frames but is usually is a sign that the allocation to the arfs was to
high.
Change-Id: Icec87758925d8f7aeb2dca29aac0ff9496237469
Temporary fix until optimization work for block_yrd is completed.
This essentially reverts back to the state before the change:
https://chromium-review.googlesource.com/c/433821/
Compression loss is about ~5-6% on RTC set.
Speed-up (from using this simple/model-based block_yrd) over the low
bitdepth builds (which uses more complex block_yrd) is ~5% on 720p.
Change-Id: Ie0af9eb0d111e5595f587870c44f08317403b8d8
this prevents a rollover when tv_sec is a long:
signed integer overflow: 2776 * 1000000 cannot be represented in type
'long'
Change-Id: I03dc4476ee122b02e2856dad28358a20cf16a9f8
The RunQuantCheck() test on it exposes 16-bit overflow in stage 7 of
pass 2. Change to use saturating add/sub for both
vpx_idct16x16_38_add_neon() and vpx_idct16x16_256_add_neon() for high
bitdepth.
Change-Id: Ibf4c107a887553a52852cc582e28d38a5a5a2712
Add factor to increase varianace partition and ac skip thresholds,
under certain conditions (noise level and sum_diff), to increase
denoiser speed.
Change-Id: I7671140ef3598bf5f114a72623d68792bcd7b77b
Threshold for partitioning only affects VGA and lower res.
0.07% quality regression is observed in borg tests on rtc_derf
and 0.2% regression on rtc.
5.6% speed up for low res and 6.8% for VGA on Nexus 6.
Change-Id: If85a2919b48c991de66059c90f32ed06980452be
Fixed the following issue.
..\test\vp9_ethread_test.cc(69): warning C4805: '|=' : unsafe mix of type 'bool' and type 'int' in operation [C:\src\buildbot\test-libvpx\tests\dveCPjwhBE\.build-x86_64-win64-vs10\test_libvpx.vcxproj]
..\test\vp9_ethread_test.cc(69): warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning) [C:\src\buildbot\test-libvpx\tests\dveCPjwhBE\.build-x86_64-win64-vs10\test_libvpx.vcxproj]
Change-Id: I37f897cf12a0b7500d2fcbac9e4615f08a83fdb4
Modified the encoding stage to have row level entry points with relevant
initializations and to access the token information at row level
Change-Id: Ife10e55a7c1a420ee906d711caf75002688d9e39
* changes:
hadamard highbd ssse3: use tran_low_t for coeff
hadamard highbd neon: use tran_low_t for coeff
hadamard highbd sse2: use tran_low_t for coeff
Clears up static clang analysis warning regarding a dead store. Only
declare 'c' when it will be used.
Change-Id: I1ac0fc7f94bc44da63938c63cd1efcd6b95e0eb3
This commit resolves the compression performance regression in
real-time encoding setting when high bit-depth mode is enabled.
The current solution temporarily disables the SIMD implementations
of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
The commit makes the coding results bit-wise identical between
regular coding pipeline and high bit-depth at profile 0.
BUG=webm:1365
Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
Clears a clang static analyzer warning where 'cols' is assumed to be
less than 0, preventing the for loop from executing.
The assembly already requires that the size be 8 or 16 (U/V or Y plane)
and cols is a multiple of 8.
Change-Id: Ica4612690ead1638c94cfe56b306e87f8ce644f9
In non-rd pickmode: Allow speed 7 to also use larger block size in
model_rd. Small change in behavior for speed 7.
Change-Id: I8c5523e424308e8f0bc71b3f6324dec42a464cc8
In non-rd pickmode: small change in behavior for speed 6 and 7.
Remove condition on HIGHBITDEPTH flag.
Change-Id: I360a13fcc313d72612fe9b918162ef4bb278cdea
Add Buffer features for:
Setting the buffer to the output of an ACMRandom function.
Copying a buffer.
Comparing two buffers.
Printing two buffers.
Change-Id: Ib53fb602451a3abdcee279ea2b65b51fbc02d3df
Skip denoising for blocks < 16x16, and for block = 16x16
skip denoising for low noise levels and width > 480 for now.
Allow for some speed-up in denoiser.
Change-Id: Ib46cefe4741962d145fa08775defea3a9c928567
(yunqingwang)
1. Rebased the patch. Incorporated recent first pass changes.
2. Turned on the first pass unit test.
Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee
Increase the qp-delta, mainly for low resolutions,
excluding case of very low bitrates.
avgPSNR/SSSIM gain of ~3-5% on rtc_derf set.
Small change on rtc set.
Change-Id: Ice03d04bd0340404d1957666ef154fd64fed0606
Affecting only speed 8.
Speed tests on Nexus 6 show 4% faster for QVGA and 2.4% faster for VGA.
Little/negligible quality regression observed on both rtc and rtc_derf sets.
Change-Id: I337f301a2db49a568d18ba7623160f7678399ae1
(Yunqing)
This patch added the missing initialization in temporal filter.
Borg test BDRate results:
PSNR: -0.019%(lowres); -0.013%(hdres);
SSIM: -0.001%(lowres); -0.010%(hdres).
Other q values gave comparable but no better results.
Change-Id: I7ad0c18b39e6f558342688e2fe1e12fdb133ce9b
This currently runs 1000 * 1000 = one *million* times which is quite
unnecessary. It's one of the slowest items in Jenkins and takes over an
hour for each of the larger transforms.
Change-Id: I01653b5e610683e1a2d778ec60cf5065562ab8db
Only for speed >= 7, and affects skipping of intra modes.
Threshold is set low for now, needs to be tuned.
Small/no difference in metrics on rtc clips.
Change-Id: If9bdbd43f08d1f80407cdd2e9e5e96780dcd2424
Added the multi-threaded first pass encoder unit test in VP9. The test is
to check if the new multi-threaded first pass encoder(namely, new-mt = 1)
still generates matching stats. In the unit test, the new-mt mode will be
turned on once the multi-threaded first pass implementation is checked in.
Change-Id: Ic21bb1a55c454f024cfd2b397a4c148cfe638218
For short_circuit set to level 1, skip newmv for 64x64 blocks if the
low temporal variance flag is set. Also modify threshold for 64x64 split
in variance partitioning.
Overall speed-up on noisy clips of 2-4%.
Only affect speed >= 7.
Change-Id: I384b3772007e84de6f8707e480d2ddf1fe1f907d
Avoid quality loss when copying partition of superblock with large motions.
Maximum consecutively copied frames can be set (currently 5).
Change-Id: I11c30575514f02194c0f001444cf4021609e5049
Also set the flag to 1 when exit early choosing 64x64 block
such that skipping new mv for golden works in these scenerios.
Change the size of prev_segment_id to the number of superblocks
to save memory.
Borg test shows quality regression of 0.012% on average PSNR
and 0.035% on SSIM.
Change-Id: I5014224c8617d439d35c66ece3fed9ae30b31d23
Adds an optional output framestats.csv file that prints comparions
per-frame instead of averaged over the entire clip. It prints
per-channel and combined metrics for SSIM and PSNR.
Change-Id: Id28dfade27bc5775b59a9d83cfe8b37d1d52b686
The fix relaxes the max qindex based on the data from previous loop of
coding if output frame size is greater than maximum frame size allowed
Change-Id: Iac1f63ec67559d68766e090a7cbb80b812b2560f
before calling vp9_apply_encoding_flags() which may crash if the
resolution was invalid. this is the same change as:
c0523090b vp8e_encode: check validate_config return
BUG=https://bugzilla.mozilla.org/show_bug.cgi?id=1315288
Change-Id: Icd2aab322422e83d3a778fca6d7789e5000239d7
If enabled will compute source_sad for every superblock on every frame,
prior to encoding. Off by default, only on for speed=8 when
copy_partition is set.
Change-Id: Iab7903180a23dad369135e8234b7f896f20e1231
Used with --framestats=file.csv. Currently prints raw codec QP (not
internal 0-63 range) and bytes per frame.
Change-Id: Ifbb90129c218dda869eaf5b810bad12a32ebd82d
Avoid many visual artifacts. Compression quality is improved by more
than 1%. Encode speed is about 4% for QVGA and 6% for VGA faster on
android.
Change-Id: I4dd0a81429ddf7efdef1e80a191da5fb8de8e8af
Renames SSIM to VpxSSIM as an upscaled weighted SSIM metric, then prints
Y, U and V channels unweighted as well as a weighted but not scaled SSIM
score that's 8/1/1 parts Y/U/V (same as VpxSSIM).
Change-Id: Iff800cc8f145314eeb1a9b4af1e11a25bec095ca
For speed 8, it speeds up the encoding on android by 6% for QVGA and
7.4% for VGA with the new threshold. Overall PSNR is improved by 0.667
for rtc.
Change-Id: I4a644560b32c0b5b4e9f49ffb953d000413a3732
If enabled denoiser will only denoise the top spatial layer for now.
Added unittest for SVC with denoising.
Change-Id: Ifa373771c4ecfa208615eb163cc38f1c22c6664b
This commit reworks the SSSE3 implementation of the forward 8x8
2D-DCT. It uses a cyclic rotation approach to the temporary xmm
registers. It reduces the average cycles from 158 to 154. The SSE2
version uses 169 cycles.
Change-Id: I1b79b9642aae0ed3fb3cefb5b70246e6de5d5caa
When aq=3 mode is on and the gf_cbr_boost is set: make sure golden frame
is always refreshed, and don't incorporate segement cost in qp setting
on the boosted golden frame.
Better performance on RTC set with gf_cbr_boost on,
for example with gf_cbr_boost=50, gains from ~0.5-3%.
Change-Id: Ie811f5e4d444ff3320bd6e2c1745b2c4c09a8460
Quality improved by 1.866 and 0.386 for two noisy clips (dark720p and
marcooffice720p), respectively.
Change-Id: Ib33a7672ae9ca53da156208f7cd13f07b5543e44
Avoid the qp-clamping on gf/alt frame if gf_cbr_boost_pct is set.
Change only affect CBR mode when gf_cbr_boost_pct is set.
Change-Id: I0655ed4f2b047c8ed1ed33a070c17960ad776704
This was much more amenable to optimization than the across filter.
Speedup of almost 2.5x
BUG=webm:1320
Change-Id: I49acc0f9cb2e7642303df90132cbc938acade4c4
Speed test shows 25% gain on vpx_idct16x16_256_add_neon(),
and vpx_idct16x16_10_add_neon() got trippled.
Change-Id: If8518d9b6a3efab74031297b8d40cd83c4a49541
Fix compile warnings about implicit type conversion for
target=armv7-android-gcc in vpxenc.c.
BUG=webm:1348
Change-Id: I9fbabd843512f2a1a09f4bb934cd091e834eed9c
2016-12-22 14:56:20 -08:00
1214 changed files with 246526 additions and 117218 deletions
[](https://github.com/ShiftMediaProject/libvpx/releases)
Shift Media Project aims to provide native Windows development libraries for libvpx and associated dependencies to support simpler creation and debugging of rich media content directly within Visual Studio. [https://shiftmediaproject.github.io/](https://shiftmediaproject.github.io/)
Development libraries are available from the [releases](https://github.com/ShiftMediaProject/libvpx/releases) page. These libraries are available for each supported Visual Studio version with a different download for each version. Each download contains both static and dynamic libraries to choose from in both 32bit and 64bit versions.
## Code
This repository contains code from the corresponding upstream project with additional modifications to allow it to be compiled with Visual Studio. New custom Visual Studio projects are provided within the 'SMP' sub-directory. Refer to the 'readme' contained within the 'SMP' directory for further details.
## Issues
Any issues related to the ShiftMediaProject specific changes should be sent to the [issues](https://github.com/ShiftMediaProject/libvpx/issues) page for the repository. Any issues related to the upstream project should be sent upstream directly (see the issues information of the upstream repository for more details).
## License
ShiftMediaProject original code is released under [LGPLv2.1](https://www.gnu.org/licenses/lgpl-2.1.html). All code from the upstream repository remains under its original license (see the license information of the upstream repository for more details).
## Copyright
As this repository includes code from upstream project(s) it includes many copyright owners. ShiftMediaProject makes NO claim of copyright on any upstream code. However, all original ShiftMediaProject authored code is copyright ShiftMediaProject. For a complete copyright list please checkout the source code to examine license headers. Unless expressly stated otherwise all code submitted to the ShiftMediaProject project (in any form) is licensed under [LGPLv2.1](https://www.gnu.org/licenses/lgpl-2.1.html) and copyright is donated to ShiftMediaProject. If you submit code that is not your own work it is your responsibility to place a header stating the copyright.
## Contributing
Patches related to the ShiftMediaProject specific changes should be sent as pull requests to the main repository. Any changes related to the upstream project should be sent upstream directly (see the contributing information of the upstream repository for more details).
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.