Commit Inicial

This commit is contained in:
2020-11-20 11:23:40 +01:00
commit 5611e5bc01
2914 changed files with 2625178 additions and 0 deletions

Binary file not shown.

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -0,0 +1,57 @@
TITANIC DATASET
Converted for use in DELVE by Radford Neal, June 1996.
Originally compiled by Robert Dawson, 1995.
The titanic dataset gives the values of four categorical attributes
for each of the 2201 people on board the Titanic when it struck an
iceberg and sank. The attributes are social class (first class,
second class, third class, crewmember), age (adult or child), sex, and
whether or not the person survived.
The question of interest for this natural dataset is how survival
relates to the other attributes. There is obviously no practical
need to predict survival, so the real interest is in interpretation,
but success at prediction would appear to be closely related to
the discovery of interesting features of the relationship. Note
that there are only sixteen possible combinations of input attributes
for this prediction task, so the interesting behaviour will be that
with small training sets.
Source from which the data was obtained.
The original source files are titanic.doc and titanic.dat, which were
obtained from the data archive of the on-line Journal of Statistics
Education, whose home page on the Web is at URL
http://www2.ncsu.edu/ncsu/pams/stat/info/jse/homepage.html
Carriage returns at the end of the lines were deleted, as was a line
containing a period at the end of each file. Other than this, the
titanic.doc and titanic.dat files are as obtained from this source.
The dataset was compiled by Robert J. MacG. Dawson, and discussed by
him in the on-line article 'The "Unusual Episode" Data Revisited',
Journal of Statistics Education, vol. 3, no. 3 (1995), available via
the URL above.
Notes on aspects of the data.
As discussed in the article, the dataset was reconstructed from
sources that were not completely clear, so there are undoubtably some
errors.
The cases in titanic.dat are clearly in a non-informative order,
grouped by identical attribute patterns. This has been retained for
the DELVE dataset file.
The representation of attributes has been changed to be more mnemonic.
Prior information regarding the significance of social class is
somewhat debatable. In the standard prior, I have considered status
to be an ordinal variable in which crewmembers come after third class
passengers. Perhaps crewmembers should be considered to be outside
this class ordering altogether, but that is not convenient.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,68 @@
NAME: Population at Risk and Death Rates for an Unusual Episode
TYPE: Complete record for all of population at risk
SIZE: 2201 observations, 4 variables
DESCRIPTIVE ABSTRACT:
For each person on board the fatal maiden voyage of the ocean liner
Titanic, this dataset records sex, age [adult/child], economic status
[first/second/third class, or crew] and whether or not that person
survived.
SOURCE:
"Report on the Loss of the `Titanic' (S.S.)" (1990), _British Board of
Trade Inquiry Report_ (reprint), Gloucester, UK: Allan Sutton
Publishing.
VARIABLE DESCRIPTIONS:
Column
1 Class (0 = crew, 1 = first, 2 = second, 3 = third)
10 Age (1 = adult, 0 = child)
19 Sex (1 = male, 0 = female)
28 Survived (1 = yes, 0 = no)
Values are aligned and delimited by blanks. There are no missing
values.
SPECIAL NOTES:
There is not complete agreement among primary sources as to the exact
numbers on board, rescued, or lost.
STORY BEHIND THE DATA:
The sinking of the Titanic is a famous event, and new books are still
being published about it. Many well-known facts--from the proportions
of first-class passengers to the "women and children first" policy, and
the fact that that policy was not entirely successful in saving the
women and children in the third class--are reflected in the survival
rates for various classes of passenger. These data were originally
collected by the British Board of Trade in their investigation of the
sinking.
PEDAGOGICAL NOTES:
These data make an interesting exercise if given to a class without
their context, which the students must attempt to discover. The
instructor will probably want to answer questions from the class,
"Twenty Questions" style.
There is a similar set of data circulating without any detailed
explanation or compiler's name attached, under the same title, which
omits the crew (and does not agree with any of the primary sources that
I was able to find.) Credit for the original idea goes to the
originator of that exercise: my version is merely an attempt to
provide a more complete context.
Additional information about these data can be found in the "Datasets
and Stories" article "The `Unusual Episode' Data Revisited" in the
_Journal of Statistics Education_ (Dawson 1995). Send the message
send jse/v3n3/datasets.dawson
to the address archive@jse.stat.ncsu.edu
SUBMITTED BY:
Robert J. MacG. Dawson
Department of Mathematics and Computing Science
Saint Mary's University
Halifax, Nova Scotia B3H 3C3
CANADA
rdawson@husky1.stmarys.ca

View File

@@ -0,0 +1,6 @@
The titanic dataset gives the values of four categorical attributes
for each of the 2201 people on board the Titanic when it struck an
iceberg and sank. The attributes are social class (first class,
second class, third class, or crewmember), age (adult or child), sex,
and whether or not the person survived. The question of interest is
considered to be how survival relates to the other attributes.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,4 @@
1 NLMH ordinal
2 NLMH binary
3 NLMH binary
4 NLMH binary passive=no