Fix documents

Summary: Follow the instructions in T66611582. Now the only remaining problem is that headers must include copyright.

Reviewed By: alexnikulkov

Differential Revision: D32583915

fbshipit-source-id: 13d390d756825c5e91e7801bf0dc4efec9b8b1f7
This commit is contained in:
Zhengxing Chen
2021-11-23 23:45:00 -08:00
committed by Ruiyang Xu
parent 1c4e5805a4
commit 01cc5e72f8
10 changed files with 121 additions and 21 deletions
+24
View File
@@ -4,6 +4,30 @@ reagent.mab package
Submodules
----------
reagent.mab.mab\_algorithm module
---------------------------------
.. automodule:: reagent.mab.mab_algorithm
:members:
:undoc-members:
:show-inheritance:
reagent.mab.simulation module
-----------------------------
.. automodule:: reagent.mab.simulation
:members:
:undoc-members:
:show-inheritance:
reagent.mab.thompson\_sampling module
-------------------------------------
.. automodule:: reagent.mab.thompson_sampling
:members:
:undoc-members:
:show-inheritance:
reagent.mab.ucb module
----------------------
+8
View File
@@ -100,6 +100,14 @@ reagent.models.fully\_connected\_network module
:undoc-members:
:show-inheritance:
reagent.models.linear\_regression module
----------------------------------------
.. automodule:: reagent.models.linear_regression
:members:
:undoc-members:
:show-inheritance:
reagent.models.mdn\_rnn module
------------------------------
+21
View File
@@ -0,0 +1,21 @@
reagent.training.cb package
===========================
Submodules
----------
reagent.training.cb.linucb\_trainer module
------------------------------------------
.. automodule:: reagent.training.cb.linucb_trainer
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: reagent.training.cb
:members:
:undoc-members:
:show-inheritance:
+1
View File
@@ -7,6 +7,7 @@ Subpackages
.. toctree::
:maxdepth: 4
reagent.training.cb
reagent.training.cfeval
reagent.training.gradient_free
reagent.training.ranking
Executable → Regular
+1 -1
View File
@@ -1,2 +1,2 @@
#!/bin/bash
rm -rf api/* && sphinx-build -b html -E -v . ~/github/HorizonDocs
rm -rf api/* && rm -rf ~/github/HorizonDocs && sphinx-build -b html -E -v . ~/github/HorizonDocs
+1 -1
View File
@@ -22,7 +22,7 @@ sys.path.insert(0, os.path.abspath("../"))
project = "ReAgent"
copyright = "2021, Meta Platforms, Inc."
copyright = "2022, Meta Platforms, Inc"
author = "ReAgent Team"
# The full version, including alpha/beta/rc tags
+59 -16
View File
@@ -11,8 +11,8 @@ ReAgent: Applied Reinforcement Learning Platform
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: https://circleci.com/gh/facebookresearch/ReAgent/tree/master.svg?style=svg
:target: https://circleci.com/gh/facebookresearch/ReAgent/tree/master
.. image:: https://circleci.com/gh/facebookresearch/ReAgent/tree/main.svg?style=svg
:target: https://circleci.com/gh/facebookresearch/ReAgent/tree/main
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -22,8 +22,9 @@ Overview
ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook.
ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains
workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training,
counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white
paper here: `Platform <https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/>`_.
counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent, please read
`releases post <https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/>`_
and `white paper <https://arxiv.org/abs/1811.00260>`_.
The source code is available here: `Source code <https://github.com/facebookresearch/ReAgent>`_.
@@ -32,6 +33,7 @@ The platform was once named "Horizon" but we have adopted the name "ReAgent" rec
Algorithms Supported
~~~~~~~~~~~~~~~~~~~~
Classic Off-Policy algorithms:
* Discrete-Action `DQN <https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf>`_
* Parametric-Action DQN
@@ -39,6 +41,33 @@ Algorithms Supported
* Distributional RL `C51 <https://arxiv.org/abs/1707.06887>`_\ , `QR-DQN <https://arxiv.org/abs/1710.10044>`_
* `Twin Delayed DDPG <https://arxiv.org/abs/1802.09477>`_ (TD3)
* `Soft Actor-Critic <https://arxiv.org/abs/1801.01290>`_ (SAC)
* `Critic Regularized Regression <https://arxiv.org/abs/2006.15134>`_ (CRR)
* `Proximal Policy Optimization Algorithms <https://arxiv.org/abs/1707.06347>`_ (PPO)
RL for recommender systems:
* `Seq2Slate <https://arxiv.org/abs/1810.02019>`_
* `SlateQ <https://arxiv.org/abs/1905.12767>`_
Counterfactual Evaluation:
* `Doubly Robust <https://arxiv.org/abs/1612.01205>`_ (for bandits)
* `Doubly Robust <https://arxiv.org/abs/1511.03722>`_ (for sequential decisions)
* `MAGIC <https://arxiv.org/abs/1604.00923>`_
Multi-Arm and Contextual Bandits:
* `UCB1 <https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf>`_
* `MetricUCB <https://arxiv.org/abs/0809.4882>`_
* `Thompson Sampling <https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf>`_
* `LinUCB <https://arxiv.org/abs/1003.0146>`_
Others:
* `Cross-Entropy Method <http://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf>`_
* `Synthetic Return for Credit Assignment <https://arxiv.org/abs/2102.12425>`_
Installation
~~~~~~~~~~~~~~~~~~~
@@ -46,27 +75,42 @@ Installation
ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found
here: :ref:`installation`.
Usage
~~~~~~~~~~~~
The ReAgent Serving Platform (RASP) tutorial covers serving and training models and is available here: :ref:`rasp_tutorial`.
Tutorial
~~~~~~~~~~~~
ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we dont have access to a simulator.
In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time.
Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it,
we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.
We also have a set of tools to facilitate applying RL in real-world applications:
* Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL
* Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely
Detailed instructions on how to use ReAgent can be found here: :ref:`usage`.
License
~~~~~~~~~~~~~~
ReAgent is released under a BSD license. Find out more about it here: :ref:`license`.
| ReAgent is released under a BSD license. Find out more about it here: :ref:`license`.
| Terms of Use - `<https://opensource.facebook.com/legal/terms>`_
| Privacy Policy - `<https://opensource.facebook.com/legal/privacy>`_
| Copyright © 2022 Meta Platforms, Inc
Citing
~~~~~~
@article{gauci2018horizon,
title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
journal={arXiv preprint arXiv:1811.00260},
year={2018}
}
Cite our work by:
::
@article{gauci2018horizon,
title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
journal={arXiv preprint arXiv:1811.00260},
year={2018}
}
Table of Contents
~~~~~~~~~~~~~~~~~~~~~
@@ -75,13 +119,12 @@ Table of Contents
:caption: Getting Started
Installation <installation>
Tutorial <rasp_tutorial>
Usage <usage>
RASP (Not Actively Maintained) <rasp_tutorial>
.. toctree::
:caption: Advanced Topics
Distributed Training <distributed>
Continuous Integration <continuous_integration>
.. toctree::
+1 -1
View File
@@ -65,7 +65,7 @@ Now, you can build our preprocessing JAR
mvn -f preprocessing/pom.xml clean package
RASP
RASP (Not Actively Maintained)
^^^^
RASP (ReAgent Serving Platform) is a decision-serving library. It also has standlone binary. It depends on libtorch,
+1 -1
View File
@@ -7,7 +7,7 @@ BSD License
For ReAgent software
Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
Copyright (c) 2022-present, Meta Platform, Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
+4 -1
View File
@@ -10,9 +10,12 @@ batches, we use *off-policy* algorithms. To test a new policy without deploying
*counter-factual policy evaluation (CPE)*\ , a set of techniques for estimating a policy based on the
actions of another policy.
This tutorial is tested in our CircleCI `end-to-end tests <https://github.com/facebookresearch/ReAgent/blob/62661e35b62b06ed161e661b906616a2d389eb3a/.circleci/config.yml#L79-L128>`_.
If there is anything not kept up-to-date in this tutorial, please always refer to the latest code.
Quick Start
-----------
We have set up `Click <https://click.palletsprojects.com/en/7.x/>`_ commands to run our RL workflow. The basic usage pattern is
.. code-block::