Fix documents

Summary: Follow the instructions in T66611582. Now the only remaining problem is that headers must include copyright. Reviewed By: alexnikulkov Differential Revision: D32583915 fbshipit-source-id: 13d390d756825c5e91e7801bf0dc4efec9b8b1f7
2026-05-17 12:40:39 +00:00 · 2021-11-23 23:45:00 -08:00
parent 1c4e5805a4
commit 01cc5e72f8
10 changed files with 121 additions and 21 deletions
@@ -4,6 +4,30 @@ reagent.mab package
 Submodules
 ----------

+reagent.mab.mab\_algorithm module
+---------------------------------
+
+.. automodule:: reagent.mab.mab_algorithm
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+reagent.mab.simulation module
+-----------------------------
+
+.. automodule:: reagent.mab.simulation
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+reagent.mab.thompson\_sampling module
+-------------------------------------
+
+.. automodule:: reagent.mab.thompson_sampling
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
 reagent.mab.ucb module
 ----------------------

@@ -100,6 +100,14 @@ reagent.models.fully\_connected\_network module
   :undoc-members:
   :show-inheritance:

+reagent.models.linear\_regression module
+----------------------------------------
+
+.. automodule:: reagent.models.linear_regression
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
 reagent.models.mdn\_rnn module
 ------------------------------

@@ -0,0 +1,21 @@
+reagent.training.cb package
+===========================
+
+Submodules
+----------
+
+reagent.training.cb.linucb\_trainer module
+------------------------------------------
+
+.. automodule:: reagent.training.cb.linucb_trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Module contents
+---------------
+
+.. automodule:: reagent.training.cb
+   :members:
+   :undoc-members:
+   :show-inheritance:
@@ -7,6 +7,7 @@ Subpackages
 .. toctree::
   :maxdepth: 4

+   reagent.training.cb
   reagent.training.cfeval
   reagent.training.gradient_free
   reagent.training.ranking
@@ -1,2 +1,2 @@
 #!/bin/bash
-rm -rf api/* && sphinx-build -b html -E -v . ~/github/HorizonDocs
+rm -rf api/* && rm -rf ~/github/HorizonDocs && sphinx-build -b html -E -v . ~/github/HorizonDocs
@@ -22,7 +22,7 @@ sys.path.insert(0, os.path.abspath("../"))


 project = "ReAgent"
-copyright = "2021, Meta Platforms, Inc."
+copyright = "2022, Meta Platforms, Inc"
 author = "ReAgent Team"

 # The full version, including alpha/beta/rc tags
@@ -11,8 +11,8 @@ ReAgent: Applied Reinforcement Learning Platform
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


-.. image:: https://circleci.com/gh/facebookresearch/ReAgent/tree/master.svg?style=svg
-    :target: https://circleci.com/gh/facebookresearch/ReAgent/tree/master
+.. image:: https://circleci.com/gh/facebookresearch/ReAgent/tree/main.svg?style=svg
+    :target: https://circleci.com/gh/facebookresearch/ReAgent/tree/main

 --------------------------------------------------------------------------------------------------------------------------------------------------------------------

@@ -22,8 +22,9 @@ Overview
 ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook.
 ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains
 workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training,
-counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white
-paper here: `Platform <https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/>`_.
+counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent, please read
+`releases post <https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/>`_
+and `white paper <https://arxiv.org/abs/1811.00260>`_.

 The source code is available here: `Source code <https://github.com/facebookresearch/ReAgent>`_.

@@ -32,6 +33,7 @@ The platform was once named "Horizon" but we have adopted the name "ReAgent" rec
 Algorithms Supported
 ~~~~~~~~~~~~~~~~~~~~

+Classic Off-Policy algorithms:

 * Discrete-Action `DQN <https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf>`_
 * Parametric-Action DQN
@@ -39,6 +41,33 @@ Algorithms Supported
 * Distributional RL `C51 <https://arxiv.org/abs/1707.06887>`_\ , `QR-DQN <https://arxiv.org/abs/1710.10044>`_
 * `Twin Delayed DDPG <https://arxiv.org/abs/1802.09477>`_ (TD3)
 * `Soft Actor-Critic <https://arxiv.org/abs/1801.01290>`_ (SAC)
+* `Critic Regularized Regression <https://arxiv.org/abs/2006.15134>`_ (CRR)
+* `Proximal Policy Optimization Algorithms <https://arxiv.org/abs/1707.06347>`_ (PPO)
+
+RL for recommender systems:
+
+* `Seq2Slate <https://arxiv.org/abs/1810.02019>`_
+* `SlateQ <https://arxiv.org/abs/1905.12767>`_
+
+Counterfactual Evaluation:
+
+* `Doubly Robust <https://arxiv.org/abs/1612.01205>`_ (for bandits)
+* `Doubly Robust <https://arxiv.org/abs/1511.03722>`_ (for sequential decisions)
+* `MAGIC <https://arxiv.org/abs/1604.00923>`_
+
+Multi-Arm and Contextual Bandits:
+
+* `UCB1 <https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf>`_
+* `MetricUCB <https://arxiv.org/abs/0809.4882>`_
+* `Thompson Sampling <https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf>`_
+* `LinUCB <https://arxiv.org/abs/1003.0146>`_
+
+
+Others:
+
+* `Cross-Entropy Method <http://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf>`_
+* `Synthetic Return for Credit Assignment <https://arxiv.org/abs/2102.12425>`_
+

 Installation
 ~~~~~~~~~~~~~~~~~~~
@@ -46,27 +75,42 @@ Installation
 ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found
 here: :ref:`installation`.

-Usage
-~~~~~~~~~~~~

-The ReAgent Serving Platform (RASP) tutorial covers serving and training models and is available here: :ref:`rasp_tutorial`.
+Tutorial
+~~~~~~~~~~~~
+ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator.
+In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time.
+Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it,
+we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.
+
+We also have a set of tools to facilitate applying RL in real-world applications:
+
+
+* Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL
+* Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely

 Detailed instructions on how to use ReAgent can be found here: :ref:`usage`.

+
 License
 ~~~~~~~~~~~~~~

-ReAgent is released under a BSD license.  Find out more about it here: :ref:`license`.
+| ReAgent is released under a BSD license.  Find out more about it here: :ref:`license`.
+| Terms of Use - `<https://opensource.facebook.com/legal/terms>`_
+| Privacy Policy - `<https://opensource.facebook.com/legal/privacy>`_
+| Copyright © 2022 Meta Platforms, Inc

 Citing
 ~~~~~~

-@article{gauci2018horizon,
-  title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
-  author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
-  journal={arXiv preprint arXiv:1811.00260},
-  year={2018}
-}
+Cite our work by:
+::
+    @article{gauci2018horizon,
+      title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
+      author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
+      journal={arXiv preprint arXiv:1811.00260},
+      year={2018}
+    }

 Table of Contents
 ~~~~~~~~~~~~~~~~~~~~~
@@ -75,13 +119,12 @@ Table of Contents
    :caption: Getting Started

    Installation <installation>
-    Tutorial <rasp_tutorial>
    Usage <usage>
+    RASP (Not Actively Maintained) <rasp_tutorial>

 .. toctree::
    :caption: Advanced Topics

-    Distributed Training <distributed>
    Continuous Integration <continuous_integration>

 .. toctree::
@@ -65,7 +65,7 @@ Now, you can build our preprocessing JAR

   mvn -f preprocessing/pom.xml clean package

-RASP
+RASP (Not Actively Maintained)
 ^^^^

 RASP (ReAgent Serving Platform) is a decision-serving library. It also has standlone binary. It depends on libtorch,
@@ -7,7 +7,7 @@ BSD License

 For ReAgent software

-Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
+Copyright (c) 2022-present, Meta Platform, Inc. All rights reserved.

 Redistribution and use in source and binary forms, with or without modification,
 are permitted provided that the following conditions are met:
@@ -10,9 +10,12 @@ batches, we use *off-policy* algorithms.  To test a new policy without deploying
 *counter-factual policy evaluation (CPE)*\ , a set of techniques for estimating a policy based on the
 actions of another policy.

+This tutorial is tested in our CircleCI `end-to-end tests <https://github.com/facebookresearch/ReAgent/blob/62661e35b62b06ed161e661b906616a2d389eb3a/.circleci/config.yml#L79-L128>`_.
+If there is anything not kept up-to-date in this tutorial, please always refer to the latest code.
+
+
 Quick Start
 -----------
-
 We have set up `Click <https://click.palletsprojects.com/en/7.x/>`_ commands to run our RL workflow. The basic usage pattern is

 .. code-block::