mirror of
https://github.com/facebookresearch/ReAgent.git
synced 2026-05-17 12:40:39 +00:00
Fix documents
Summary: Follow the instructions in T66611582. Now the only remaining problem is that headers must include copyright. Reviewed By: alexnikulkov Differential Revision: D32583915 fbshipit-source-id: 13d390d756825c5e91e7801bf0dc4efec9b8b1f7
This commit is contained in:
committed by
Ruiyang Xu
parent
1c4e5805a4
commit
01cc5e72f8
@@ -4,6 +4,30 @@ reagent.mab package
|
||||
Submodules
|
||||
----------
|
||||
|
||||
reagent.mab.mab\_algorithm module
|
||||
---------------------------------
|
||||
|
||||
.. automodule:: reagent.mab.mab_algorithm
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
reagent.mab.simulation module
|
||||
-----------------------------
|
||||
|
||||
.. automodule:: reagent.mab.simulation
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
reagent.mab.thompson\_sampling module
|
||||
-------------------------------------
|
||||
|
||||
.. automodule:: reagent.mab.thompson_sampling
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
reagent.mab.ucb module
|
||||
----------------------
|
||||
|
||||
|
||||
@@ -100,6 +100,14 @@ reagent.models.fully\_connected\_network module
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
reagent.models.linear\_regression module
|
||||
----------------------------------------
|
||||
|
||||
.. automodule:: reagent.models.linear_regression
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
reagent.models.mdn\_rnn module
|
||||
------------------------------
|
||||
|
||||
|
||||
@@ -0,0 +1,21 @@
|
||||
reagent.training.cb package
|
||||
===========================
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
reagent.training.cb.linucb\_trainer module
|
||||
------------------------------------------
|
||||
|
||||
.. automodule:: reagent.training.cb.linucb_trainer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Module contents
|
||||
---------------
|
||||
|
||||
.. automodule:: reagent.training.cb
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
@@ -7,6 +7,7 @@ Subpackages
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
reagent.training.cb
|
||||
reagent.training.cfeval
|
||||
reagent.training.gradient_free
|
||||
reagent.training.ranking
|
||||
|
||||
Executable → Regular
+1
-1
@@ -1,2 +1,2 @@
|
||||
#!/bin/bash
|
||||
rm -rf api/* && sphinx-build -b html -E -v . ~/github/HorizonDocs
|
||||
rm -rf api/* && rm -rf ~/github/HorizonDocs && sphinx-build -b html -E -v . ~/github/HorizonDocs
|
||||
|
||||
+1
-1
@@ -22,7 +22,7 @@ sys.path.insert(0, os.path.abspath("../"))
|
||||
|
||||
|
||||
project = "ReAgent"
|
||||
copyright = "2021, Meta Platforms, Inc."
|
||||
copyright = "2022, Meta Platforms, Inc"
|
||||
author = "ReAgent Team"
|
||||
|
||||
# The full version, including alpha/beta/rc tags
|
||||
|
||||
+59
-16
@@ -11,8 +11,8 @@ ReAgent: Applied Reinforcement Learning Platform
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
.. image:: https://circleci.com/gh/facebookresearch/ReAgent/tree/master.svg?style=svg
|
||||
:target: https://circleci.com/gh/facebookresearch/ReAgent/tree/master
|
||||
.. image:: https://circleci.com/gh/facebookresearch/ReAgent/tree/main.svg?style=svg
|
||||
:target: https://circleci.com/gh/facebookresearch/ReAgent/tree/main
|
||||
|
||||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
@@ -22,8 +22,9 @@ Overview
|
||||
ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook.
|
||||
ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains
|
||||
workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training,
|
||||
counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white
|
||||
paper here: `Platform <https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/>`_.
|
||||
counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent, please read
|
||||
`releases post <https://research.fb.com/publications/horizon-facebooks-open-source-applied-reinforcement-learning-platform/>`_
|
||||
and `white paper <https://arxiv.org/abs/1811.00260>`_.
|
||||
|
||||
The source code is available here: `Source code <https://github.com/facebookresearch/ReAgent>`_.
|
||||
|
||||
@@ -32,6 +33,7 @@ The platform was once named "Horizon" but we have adopted the name "ReAgent" rec
|
||||
Algorithms Supported
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Classic Off-Policy algorithms:
|
||||
|
||||
* Discrete-Action `DQN <https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf>`_
|
||||
* Parametric-Action DQN
|
||||
@@ -39,6 +41,33 @@ Algorithms Supported
|
||||
* Distributional RL `C51 <https://arxiv.org/abs/1707.06887>`_\ , `QR-DQN <https://arxiv.org/abs/1710.10044>`_
|
||||
* `Twin Delayed DDPG <https://arxiv.org/abs/1802.09477>`_ (TD3)
|
||||
* `Soft Actor-Critic <https://arxiv.org/abs/1801.01290>`_ (SAC)
|
||||
* `Critic Regularized Regression <https://arxiv.org/abs/2006.15134>`_ (CRR)
|
||||
* `Proximal Policy Optimization Algorithms <https://arxiv.org/abs/1707.06347>`_ (PPO)
|
||||
|
||||
RL for recommender systems:
|
||||
|
||||
* `Seq2Slate <https://arxiv.org/abs/1810.02019>`_
|
||||
* `SlateQ <https://arxiv.org/abs/1905.12767>`_
|
||||
|
||||
Counterfactual Evaluation:
|
||||
|
||||
* `Doubly Robust <https://arxiv.org/abs/1612.01205>`_ (for bandits)
|
||||
* `Doubly Robust <https://arxiv.org/abs/1511.03722>`_ (for sequential decisions)
|
||||
* `MAGIC <https://arxiv.org/abs/1604.00923>`_
|
||||
|
||||
Multi-Arm and Contextual Bandits:
|
||||
|
||||
* `UCB1 <https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf>`_
|
||||
* `MetricUCB <https://arxiv.org/abs/0809.4882>`_
|
||||
* `Thompson Sampling <https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf>`_
|
||||
* `LinUCB <https://arxiv.org/abs/1003.0146>`_
|
||||
|
||||
|
||||
Others:
|
||||
|
||||
* `Cross-Entropy Method <http://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf>`_
|
||||
* `Synthetic Return for Credit Assignment <https://arxiv.org/abs/2102.12425>`_
|
||||
|
||||
|
||||
Installation
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
@@ -46,27 +75,42 @@ Installation
|
||||
ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found
|
||||
here: :ref:`installation`.
|
||||
|
||||
Usage
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The ReAgent Serving Platform (RASP) tutorial covers serving and training models and is available here: :ref:`rasp_tutorial`.
|
||||
Tutorial
|
||||
~~~~~~~~~~~~
|
||||
ReAgent is designed for large-scale, distributed recommendation/optimization tasks where we don’t have access to a simulator.
|
||||
In this environment, it is typically better to train offline on batches of data, and release new policies slowly over time.
|
||||
Because the policy updates slowly and in batches, we use off-policy algorithms. To test a new policy without deploying it,
|
||||
we rely on counter-factual policy evaluation (CPE), a set of techniques for estimating a policy based on the actions of another policy.
|
||||
|
||||
We also have a set of tools to facilitate applying RL in real-world applications:
|
||||
|
||||
|
||||
* Domain Analysis Tool, which analyzes state/action feature importance and identifies whether the problem is a suitable for applying batch RL
|
||||
* Behavior Cloning, which clones from the logging policy to bootstrap the learning policy safely
|
||||
|
||||
Detailed instructions on how to use ReAgent can be found here: :ref:`usage`.
|
||||
|
||||
|
||||
License
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
ReAgent is released under a BSD license. Find out more about it here: :ref:`license`.
|
||||
| ReAgent is released under a BSD license. Find out more about it here: :ref:`license`.
|
||||
| Terms of Use - `<https://opensource.facebook.com/legal/terms>`_
|
||||
| Privacy Policy - `<https://opensource.facebook.com/legal/privacy>`_
|
||||
| Copyright © 2022 Meta Platforms, Inc
|
||||
|
||||
Citing
|
||||
~~~~~~
|
||||
|
||||
@article{gauci2018horizon,
|
||||
title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
|
||||
author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
|
||||
journal={arXiv preprint arXiv:1811.00260},
|
||||
year={2018}
|
||||
}
|
||||
Cite our work by:
|
||||
::
|
||||
@article{gauci2018horizon,
|
||||
title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform},
|
||||
author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui},
|
||||
journal={arXiv preprint arXiv:1811.00260},
|
||||
year={2018}
|
||||
}
|
||||
|
||||
Table of Contents
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
@@ -75,13 +119,12 @@ Table of Contents
|
||||
:caption: Getting Started
|
||||
|
||||
Installation <installation>
|
||||
Tutorial <rasp_tutorial>
|
||||
Usage <usage>
|
||||
RASP (Not Actively Maintained) <rasp_tutorial>
|
||||
|
||||
.. toctree::
|
||||
:caption: Advanced Topics
|
||||
|
||||
Distributed Training <distributed>
|
||||
Continuous Integration <continuous_integration>
|
||||
|
||||
.. toctree::
|
||||
|
||||
@@ -65,7 +65,7 @@ Now, you can build our preprocessing JAR
|
||||
|
||||
mvn -f preprocessing/pom.xml clean package
|
||||
|
||||
RASP
|
||||
RASP (Not Actively Maintained)
|
||||
^^^^
|
||||
|
||||
RASP (ReAgent Serving Platform) is a decision-serving library. It also has standlone binary. It depends on libtorch,
|
||||
|
||||
+1
-1
@@ -7,7 +7,7 @@ BSD License
|
||||
|
||||
For ReAgent software
|
||||
|
||||
Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
|
||||
Copyright (c) 2022-present, Meta Platform, Inc. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without modification,
|
||||
are permitted provided that the following conditions are met:
|
||||
|
||||
+4
-1
@@ -10,9 +10,12 @@ batches, we use *off-policy* algorithms. To test a new policy without deploying
|
||||
*counter-factual policy evaluation (CPE)*\ , a set of techniques for estimating a policy based on the
|
||||
actions of another policy.
|
||||
|
||||
This tutorial is tested in our CircleCI `end-to-end tests <https://github.com/facebookresearch/ReAgent/blob/62661e35b62b06ed161e661b906616a2d389eb3a/.circleci/config.yml#L79-L128>`_.
|
||||
If there is anything not kept up-to-date in this tutorial, please always refer to the latest code.
|
||||
|
||||
|
||||
Quick Start
|
||||
-----------
|
||||
|
||||
We have set up `Click <https://click.palletsprojects.com/en/7.x/>`_ commands to run our RL workflow. The basic usage pattern is
|
||||
|
||||
.. code-block::
|
||||
|
||||
Reference in New Issue
Block a user